tcpdump mailing list archives

Re: Libpcap on VMWare


From: "Mark Bednarczyk" <voytechs () yahoo com>
Date: Tue, 12 Jan 2010 20:59:57 -0500

Hi,

I have been working with Vikram on this issue and let me comment as well if
I may. The tests we are running are both under jNetPcap and under a native C
application without jNetPcap involved at all. My jNetPcap based tests don't
actually interact with java at all during capture. Java part is used to kick
off the test which runs completely in native land with an empty callback
function that does no work except to keep a few statistics about the packet
rate and some cummulative inform from pcap packet header delivered.

More comments inline...


-----Original Message-----
From: tcpdump-workers-owner () lists tcpdump org
[mailto:tcpdump-workers-owner () lists tcpdump org] On Behalf Of
Guy Harris
               This is similar in nature to
http://article.gmane.org/gmane.network.tcpdump.devel/4256 posting
(which is unfortunately unsolved). We are using jnetpcap which is a
wrapper over libpcap. Mark Bednarczyk posted the original
query (4256).

--------------------------------------

We are experiencing massive packet drops in libpcap while
working with
Non Windows guests on VMWare ESXi Server . The same thing
happens on
VMplayer (Host OS - Windows). We have tested on Ubuntu
8.04, FC11 and
Debian , the library seems to drop packets every where. The
load being
subjected to is not much but is constant (TCP packets of
1200 - 1500 bytes consistently).

The packet drops DO NOT occur on Windows Guest OSs (both
via ESXi and
VMPlayer). They only happen when we are working with
non-Windows guests.

Do they happen if you're running with Linux on bare hardware,
rather than under VMware?


No drops on NON-vmware platforms.



I.e., is there any reason to believe that this is a problem
with libpcap on VMware, rather than, for example, libpcap on Linux?


Yes, I think there is. The serious packet drops can occur on vmware linux
based platforms, while no packet drops for the same traffic loads (even upto
96Kpps that I've tested) on non-vmware linux platforms.



Libpcap version from Ubuntu:-

Libpcap (by dpkg) : ii  libpcap0.8     0.9.8-2
System interface for
user-level packet capture.

That means you're using a version of libpcap based on the
0.9.8 release.
The package

As a temporary measure, we initially thought we could need
to increase
the socket receive buffer size as someone did here

http://www.winpcap.org/pipermail/winpcap-users/2006-October/00
1521.html .
We tried configuration given in the link and it reduced
packet drops
substantially. To about 2% from over 20% earlier but still
not to zero.

Being new to Libpcap (and Linux) , we are still struggling
with some
basic understanding and would be grateful if someone could
set us on track.

1. What we did with these commands

sysctl -w net.core.rmem_max=4194304
sysctl -w net.core.rmem_default=4194304

was to increase the Linux socket size so that when libpcap opens a
socket to the BPF device

There are no BPF devices on Linux.  libpcap opens a PF_PACKET
socket and later binds it to a *networking* device.

it uses this size (of 4M here). Is this understanding correct?

From a quick look at the Linux 2.6.29 kernel, rmem_default
will be used as the default receive buffer size when any
socket is created; this includes PF_PACKET sockets, as well
as PF_INET sockets, and....


So changing socket size seems to aleavated the problem a bit. Is there a
corrolation there?


If so , how do we configure it from outside so that we can increase
it's size also ?

...it's irrelevant to the problem you're having.  The problem
is probably that libpcap, and your program, aren't reading
packets fast enough, so, given that the socket buffer has a
finite size, that buffer can eventually fill up, at which
point any more packets that arrive will be dropped.  Making
the socket buffer bigger will help there *IF* the
program+libpcap is capable, on average, of reading and
processing packets as fast as, or faster than, they arrive -
the buffer only helps if the inability to process packets at
full speed is temporary (program gets temporarily slowed down
by, for example, having to write the packets to a file, or a
short burst of packets arrives too fast) and the program can
later catch up.

Out test applications do not do any work with the received data and its all
native handler processing.

I can comment about the way that pcap_dispatch vs. pcap_loop seem to behave.
I have tried my tests with both functions and both drop packets. What is
surprising even at high packet rates is that pcap_dispatch does not seem to
buffer more that 1 or 2 packets before exiting the call. I would think that
at a higher packet rate, libpcap would like to buffer as many packets as the
ring-buffer allows before exiting pcap_dispatch call. This means that in our
test application, our outside loop around pcap_dispatch call has to call on
pcap_dispatch frequently since very few packets are processed per loop
iteration.

When testing with pcap_loop, the behaviour is a bit more as expected. I have
mapped out where libpcap pcap_stats indicates packet drops in a loop around
the pcap_loop call. That is: capture 10K packet 10 times. Consitently the
packet drops appear (as reported by pcap_stats) within the first 1K packet
batch captured with the new pcap_loop call. Once the first 1000 packets have
been dispatched there doesn't appear to be any more drops until the
pcap_loop call exits and is re-called again for another 10K packets. The
same test using pcap_dispatch points at packets being dropped all over. It
seems that packet drops occur shortly in between consecutive
pcap_dispatch/pcap_loop calls. Atleast in pcap_loop case there seems to be
some catching up going on, and then things stabilize. Since pcap_dispatch
exits much more frequently then pcap_loop that packet drops appear more
frequently through out the test loops.

Let me reiterate that under non-vmware platform I see no packet drops during
packet even at close to 100pps.


The buffer in libpcap only has to be big enough for the chunk
of packets libpcap reads - and, in versions of libpcap prior
to 1.0.0, it does a recvfrom() on a PF_PACKET socket, and
gets one packet at a time, so the buffer in libpcap only
needs to be big enough for one packet.

We got this link
http://public.lanl.gov/cpw/README.ring.html which talks
about various
environment variables (PCAP_FRAMES to be precise) that can
be used to
configure libpcap but I am not sure if this gentleman
compiled his own
libpcap version or this is applicable to standard distro as well.

It's his own version, so those environment variables don't
apply to the standard version.

*HOWEVER*, the main thing that his version of libpcap does is
support Linux's zero-copy (memory-mapped) capture mechanism.
Using that mechanism (or the zero-copy mechanism in FreeBSD
8.0 and later) means that there is a buffer that's in both
the kernel's address space and the application's address
space, so that data doesn't need to be copied from a
kernel-mode buffer to a user-mode buffer.  Packets *are*
still copied from the skbuff (Linux) or mbuf (FreeBSD) into
the shared buffer, so it's really more like "one-copy", but
that's still one fewer copy, so that could reduce the CPU
time required to receive captured packets.

In addition, on Linux, that means that, at least in theory,
when the application wakes up as packets arrive, it might be
able to receive more than one packet per wakeup - libpcap
will take packets from the shared buffer as long as there are
packets available.  Processing more than one packet per
wakeup can also speed up packet processing, so that the
application might drop fewer packets.  (With BPF - except on
AIX - even *without* the zero-copy capture mechanism, more
than one packet can be delivered per wakeup, so, whilst the
zero-copy mechanism in FreeBSD 8.0 and later will avoid one
copy, it shouldn't increase the number of packets delivered
per wakeup.  In addition, the capture mechanism WinPcap
provides on Windows also delivers more than one packet per wakeup.)

Libpcap 1.0.0 and later also support Linux's (and FreeBSD 8.0
and later's) zero-copy capture mechanism, so if you were
using libpcap 1.0.0 or later, rather than libpcap 0.9.6, you
might drop fewer packets.  (As per Dustin Spicuzza's e-mail,
"later" is better than "1.0.0"; "later" currently means "top
of Git tree".)

May we also know what is this ring buffer people keep
talking about ?

There's the ring buffer provided by newer versions of the
standard Linux kernel; that's what Phil Wood is referring to
in the link you mention above.

There's also Luca Deri's PF_RING:

      http://www.ntop.org/PF_RING.html

which requires modifications to libpcap to use.

Does
libpcap standard distro have a ring buffer (related to the
question above) ?

Versions of libpcap before 1.0.0 don't support the Linux
zero-copy capture mechanism; libpcap 1.0.0 and later do.

And can PCAP_MEMORY or PCAP_FRAMES environment variable
help increase
it (as in the link above and here
http://seclists.org/snort/2009/q1/209) ?

Only Phil Wood's libpcap supports those environment variables.

However, libpcap 1.0.0 and later have an API that lets an
application set the buffer size, on platforms where the
buffer size can be set; tcpdump 4.0.0 and later support that
API with the "-B" flag.  I don't know whether jnetpcap
supports the new APIs yet, however.- This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.

This is good background information for myself. jNetPcap as a wrapper also
supports zero-copy upto the java stack. We avoid any packet data copies by
wrapping around the native memory location and reading packet data directly
out of that location. There are packet copy functions as well for users who
have to keep the packet around longer such as on a queue and can't process
the packet immediatelly in the packet handler/java callback function, but
that is something that the programmer decides what to do with the packets
once they are received.

As to the set_buffer method, it is current supported by jNetPcap on win32
platforms, I haven't added it to the more general API for the remainder yet.
However all new functions (i.e. pcap_setdirection, etc...) will be added
soon and fully supported by jNetPcap API with the newer libpcap version
prerequisites for each platform.

Lastly,
  I'd be happy to provide access to my build lab with access to various
vmware based platforms for any troubleshooting.

Cheers,
mark...
http://jnetpcap.com


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Current thread: