tcpdump mailing list archives
Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1
From: Guy Harris <guy () alum mit edu>
Date: Sun, 22 Aug 2010 23:44:35 -0700
On Aug 21, 2010, at 3:30 PM, Jim Lloyd wrote:
I have tested with the above logic while sniffing traffic on a GigE ethernet NIC (eth0) and on the loopback device (lo). The test machine is an 8-core Opteron with 32Gb of RAM running CentOS 5.5 with kernel 2.6.18. The traffic generator program is a small program using libcurl to repeatedly download a mix of static content from apache 2.2, with 4 concurrent connections. The test results are: pps Mbps avg packets/dispatch eth0 30K 850 3.009 lo 23K 1700 3.5 The total throughput here is excellent, so I'm not complaining. But why is the packets per dispatch so small? I was under the impression that at these data rates pcap_dispatch should process the requested 1000 packets per call instead of just ~3.
I know of no mechanism with PF_PACKET sockets - either when reading from them or when looking at a memory-mapped buffer - to delay wakeups until a certain number of packets, or a certain amount of packet data, is queued up on the socket buffer or in the memory-mapped ring buffer, so, if the buffer in question is currently empty, and a packet is added to it, any process reading from the buffer, or blocked in a select() on the socket, will be woken up. If, for example, two additional packets are delivered into the buffer while the process is being woken up and processing the first packet, and then the process handles the remaining two packets before any additional packets are added to the buffer, and thus goes to sleep again, you'll get 3 packets in that dispatch. How much work does Thunk do? If you're getting 30K packets per second, that's 1 packet every 1/30 of a millisecond, or 1 packet every .0333...ms, or 1 packet every 33 or so microseconds. A batch of 3 packets, at 1/30 of a millisecond per packet, takes 1/10 of a millisecond, so if the sum of "time to wake up the process" and "3*time to process a packet" is under 100 microseconds, a batch of 3 packets can be processed in less time than it takes for those 3 packets to arrive.
Does this mean the 512Mb memory buffer is huge overkill?
For this application, it might be.
Aso, note that pcap_stats is not reporting any dropped packets, but I have a little bit of evidence that some packet loss may be occurring when sniffing ethernet.
You have a recent version of libpcap, and a recent kernel, so pcap_stats() should be getting the dropped-packet statistics by calling getsockopt(PF_PACKET socket, SOL_PACKET, PACKET_STATISTICS, &statistics buffer, ...). The PF_PACKET socket code should increment the count of dropped packets any time it fails to put a packet into the buffer because the buffer is full.
The evidence is that my application occasionally fails to reconstruct a TCP stream when sniffing ethernet, but never fails to reconstruct any TCP streams when sniffing loopback. However, I wouldn't be surprised if this is due to my TCP reconstruction code failing to handle some rare corner case that handles with real TCP packets but does not happen with loopback.
You might want to see why the failure is happening - presumably your reconstruction code can handle out-of-order delivery of TCP segments, as well as overlapping segments; if a packet that was sent by the remote host gets dropped on the network before it reaches the machine doing the capturing, then there will eventually be a retransmission of some or all data, but, at least with SACK, the remote host might be told "I'm missing data in this range" when it's gotten data on both sides of that range, so you might get a retransmission that includes data that comes before data you've already seen. In addition, even if the packet *does* arrive at the Ethernet adapter, it might get dropped if, for example, the adapter driver's ring buffer is full; from the point of view of the networking stack, including both TCP and PF_PACKET sockets, that's indistinguishable from a packet lost on the network, except that the driver can increment a "packets dropped" count - but that count isn't available through a PACKET_STATISTICS call, so it won't show up in the results of pcap_stats(). That ring buffer is separate from the ring buffer used on PF_PACKET sockets in memory-mapped mode, so you can make the latter ring buffer as big as you want, and it won't prevent packet drops at the driver level. - This is the tcpdump-workers list. Visit https://cod.sandelman.ca/ to unsubscribe.
Current thread:
- pcap_dispatch on linux 2.6 with libpcap 1.1.1 Jim Lloyd (Aug 21)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 22)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 22)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Jim Lloyd (Aug 23)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 25)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Jim Lloyd (Aug 25)
- Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1 Guy Harris (Aug 22)