tcpdump mailing list archives
Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1
From: Jim Lloyd <jlloyd () silvertailsystems com>
Date: Mon, 23 Aug 2010 15:54:36 -0700
On Sun, Aug 22, 2010 at 11:44 PM, Guy Harris <guy () alum mit edu> wrote:
On Aug 21, 2010, at 3:30 PM, Jim Lloyd wrote:I have tested with the above logic while sniffing traffic on a GigEethernetNIC (eth0) and on the loopback device (lo). The test machine is an 8-core Opteron with 32Gb of RAM running CentOS 5.5 with kernel 2.6.18. Thetrafficgenerator program is a small program using libcurl to repeatedly downloadamix of static content from apache 2.2, with 4 concurrent connections. The test results are: pps Mbps avg packets/dispatch eth0 30K 850 3.009 lo 23K 1700 3.5 The total throughput here is excellent, so I'm not complaining. But whyisthe packets per dispatch so small? I was under the impression that atthesedata rates pcap_dispatch should process the requested 1000 packets percallinstead of just ~3.I know of no mechanism with PF_PACKET sockets - either when reading from them or when looking at a memory-mapped buffer - to delay wakeups until a certain number of packets, or a certain amount of packet data, is queued up on the socket buffer or in the memory-mapped ring buffer, so, if the buffer in question is currently empty, and a packet is added to it, any process reading from the buffer, or blocked in a select() on the socket, will be woken up. If, for example, two additional packets are delivered into the buffer while the process is being woken up and processing the first packet, and then the process handles the remaining two packets before any additional packets are added to the buffer, and thus goes to sleep again, you'll get 3 packets in that dispatch.
Ahh, thank you. That makes perfect sense.
How much work does Thunk do? If you're getting 30K packets per second, that's 1 packet every 1/30 of a millisecond, or 1 packet every .0333...ms, or 1 packet every 33 or so microseconds. A batch of 3 packets, at 1/30 of a millisecond per packet, takes 1/10 of a millisecond, so if the sum of "time to wake up the process" and "3*time to process a packet" is under 100 microseconds, a batch of 3 packets can be processed in less time than it takes for those 3 packets to arrive.
Yes, now the 3 or 3.5 packets per call makes perfect sense. Thunk basically just makes a copy of the packet and then puts the copy on a queue for a worker thread to process asynchronously. There are multiple worker threads and each thread must see a shard of connections, so Thunk also hashes the connection 4-tuple and computes the shard from the hash.
Does this mean the 512Mb memory buffer is huge overkill?For this application, it might be. Ok, my understanding is improving rapidly. One gap in that understanding
remains. What is the relationship between the socket receive buffer and the mmap buffer? Does the mmap buffer replace the socket receive buffer, or are both buffers used? If so, when are packets transferred from the socket receive buffer to the mmap buffer? I currently have my primary testing machine configured with net.core.rmem_default = 4194304 net.core.rmem_max = 16777216 Do we expect the rmem_default setting to be significant or not?
Aso, note that pcap_stats is not reporting any dropped packets, but I have alittle bit of evidence that some packet loss may be occurring whensniffingethernet.You have a recent version of libpcap, and a recent kernel, so pcap_stats() should be getting the dropped-packet statistics by calling getsockopt(PF_PACKET socket, SOL_PACKET, PACKET_STATISTICS, &statistics buffer, ...). The PF_PACKET socket code should increment the count of dropped packets any time it fails to put a packet into the buffer because the buffer is full. So the only time the drop count will increment is when the buffer is full?
I rarely see drops, but I've seen enough that this doesn't feel right. I did recently increase the size of the mmap buffer from 16Mb to 512Mb, so perhaps this means that 16Mb was definitely too small.
The evidence is that my application occasionally fails to reconstruct a TCP stream when sniffing ethernet, but never fails to reconstruct any TCP streams when sniffing loopback. However, I wouldn'tbesurprised if this is due to my TCP reconstruction code failing to handle some rare corner case that handles with real TCP packets but does nothappenwith loopback.You might want to see why the failure is happening - presumably your reconstruction code can handle out-of-order delivery of TCP segments, as well as overlapping segments; if a packet that was sent by the remote host gets dropped on the network before it reaches the machine doing the capturing, then there will eventually be a retransmission of some or all data, but, at least with SACK, the remote host might be told "I'm missing data in this range" when it's gotten data on both sides of that range, so you might get a retransmission that includes data that comes before data you've already seen.
I have logic to handle out of order segments, but no, I don't yet handle SACK. So far this application generally runs on traffic entirely within one datacenter, sniffing http traffic between load balancers and web servers, where I expect that selective ACKs are rare. But perhaps I am being naive. I added instrumentation today to find out if this is a bad assumption, and in my test lab I haven't been able to make it happen. Can you recommend any traffic generators that can provide test traffic with SACKs?
In addition, even if the packet *does* arrive at the Ethernet adapter, it might get dropped if, for example, the adapter driver's ring buffer is full; from the point of view of the networking stack, including both TCP and PF_PACKET sockets, that's indistinguishable from a packet lost on the network, except that the driver can increment a "packets dropped" count - but that count isn't available through a PACKET_STATISTICS call, so it won't show up in the results of pcap_stats(). That ring buffer is separate from the ring buffer used on PF_PACKET sockets in memory-mapped mode, so you can make the latter ring buffer as big as you want, and it won't prevent packet drops at the driver level.
Is there any way to detect these kinds of drops with Linux? It looks like the receive drops column in /proc/net/dev might be this count. Can you confirm? - This is the tcpdump-workers list. Visit https://cod.sandelman.ca/ to unsubscribe.
Current thread:
- pcap_dispatch on linux 2.6 with libpcap 1.1.1 Jim Lloyd (Aug 21)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 22)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 22)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Jim Lloyd (Aug 23)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 25)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Jim Lloyd (Aug 25)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 22)