tcpdump mailing list archives

Re: pcap_dispatch on linux 2.6 with libpcap 1.1.1


From: Guy Harris <guy () alum mit edu>
Date: Sun, 22 Aug 2010 23:44:35 -0700


On Aug 21, 2010, at 3:30 PM, Jim Lloyd wrote:

I have tested with the above logic while sniffing traffic on a GigE ethernet
NIC (eth0) and on the loopback device (lo). The test machine is an 8-core
Opteron with 32Gb of RAM running CentOS 5.5 with kernel 2.6.18. The traffic
generator program is a small program using libcurl to repeatedly download a
mix of static content from apache 2.2, with 4 concurrent connections. The
test results are:

         pps     Mbps     avg packets/dispatch
eth0      30K     850      3.009
lo        23K    1700      3.5

The total throughput here is excellent, so I'm not complaining. But why is
the packets per dispatch so small? I was under the impression that at these
data rates pcap_dispatch should process the requested 1000 packets per call
instead of just ~3.

I know of no mechanism with PF_PACKET sockets - either when reading from them or when looking at a memory-mapped buffer 
- to delay wakeups until a certain number of packets, or a certain amount of packet data, is queued up on the socket 
buffer or in the memory-mapped ring buffer, so, if the buffer in question is currently empty, and a packet is added to 
it, any process reading from the buffer, or blocked in a select() on the socket, will be woken up.  If, for example, 
two additional packets are delivered into the buffer while the process is being woken up and processing the first 
packet, and then the process handles the remaining two packets before any additional packets are added to the buffer, 
and thus goes to sleep again, you'll get 3 packets in that dispatch.

How much work does Thunk do?  If you're getting 30K packets per second, that's 1 packet every 1/30 of a millisecond, or 
1 packet every .0333...ms, or 1 packet every 33 or so microseconds.  A batch of 3 packets, at 1/30 of a millisecond per 
packet, takes 1/10 of a millisecond, so if the sum of "time to wake up the process" and "3*time to process a packet" is 
under 100 microseconds, a batch of 3 packets can be processed in less time than it takes for those 3 packets to arrive.

Does this mean the 512Mb memory buffer is huge overkill?

For this application, it might be.

Aso, note that pcap_stats is not reporting any dropped packets, but I have a
little bit of evidence that some packet loss may be occurring when sniffing
ethernet.

You have a recent version of libpcap, and a recent kernel, so pcap_stats() should be getting the dropped-packet 
statistics by calling getsockopt(PF_PACKET socket, SOL_PACKET, PACKET_STATISTICS,  &statistics buffer, ...).  The 
PF_PACKET socket code should increment the count of dropped packets any time it fails to put a packet into the buffer 
because the buffer is full.

The evidence is that my application occasionally fails to
reconstruct a TCP stream when sniffing ethernet, but never fails to
reconstruct any TCP streams when sniffing loopback. However, I wouldn't be
surprised  if this is due to my TCP reconstruction code failing to handle
some rare corner case that handles with real TCP packets but does not happen
with loopback.

You might want to see why the failure is happening - presumably your reconstruction code can handle out-of-order 
delivery of TCP segments, as well as overlapping segments; if a packet that was sent by the remote host gets dropped on 
the network before it reaches the machine doing the capturing, then there will eventually be a retransmission of some 
or all data, but, at least with SACK, the remote host might be told "I'm missing data in this range" when it's gotten 
data on both sides of that range, so you might get a retransmission that includes data that comes before data you've 
already seen.

In addition, even if the packet *does* arrive at the Ethernet adapter, it might get dropped if, for example, the 
adapter driver's ring buffer is full; from the point of view of the networking stack, including both TCP and PF_PACKET 
sockets, that's indistinguishable from a packet lost on the network, except that the driver can increment a "packets 
dropped" count - but that count isn't available through a PACKET_STATISTICS call, so it won't show up in the results of 
pcap_stats().  That ring buffer is separate from the ring buffer used on PF_PACKET sockets in memory-mapped mode, so 
you can make the latter ring buffer as big as you want, and it won't prevent packet drops at the driver level.
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Current thread: