tcpdump mailing list archives

Re: advice for heavy traffic capturing


From: "Fulvio Risso" <fulvio.risso () polito it>
Date: Mon, 9 Aug 2004 13:13:00 +0200

Hi Darren.

-----Original Message-----
From: Darren Reed [mailto:darrenr () reed wattle id au]
Sent: lunedi 9 agosto 2004 12.21
To: Fulvio Risso
Cc: tcpdump-workers () lists tcpdump org
Subject: Re: [tcpdump-workers] advice for heavy traffic capturing


Hi Fulvio,

Fulvio Risso, Loris Degioanni, An Architecture for High
Performance Network
Analysis, Proceedings of the 6th IEEE Symposium on Computers and
Communications (ISCC 2001), pg. 686-693, Hammamet, Tunisia, July 2001.

Is there any way you can get this (and the other date info.) into those
PDFs ?  It really helps put them in perspective.

No, because these paper are the exact copy of the published ones.
You can find date info on my homepage:
   http://netgroup.polito.it/fulvio.risso/pubs/index.htm


Winpcap appears, by design, to be the same as BPF.  If you reduced the
number of buffers in the ring used with NPF to 2 buffers, I suspect it
would be the same as BPF ?

No, there are two different architectural choices.
The ring does not have buffers; it has just space for packets; space
occupancy is exactly the size of the packet.

Ah, so you're using the buffers that the data is read into, off the NIC,
to put into the ring ?  Or to put in BSD terms, the ring is made up of
mbuf pointers ?

No, data is copied into the ring buffer. This is what is usually called
"first copy".
The "second copy" happens later, from the ring buffer (also called "kernel
buffer") into the application space.

We're forced to do this by the Win32 driver model, that specifies that
protocol drivers (such as  npf.sys is) must copy the packets they are
interested in.


You would have to be careful to not hold on to the buffers for too long
(or too many of them) or else surely you would run out ?

That would make direct access to the buffers from user space (using mmap
or similar) more involved.

Interestingly, there are a few large areas for improvement: timestamp
(1800 -> 270), Tap processing (830->560) and filtering (585 -> 109).

... and NIC drivers and Operating system overhead which, as you can see,
account for more or less 50% of the total overhead.

Yup.

The Intel 100 ProS have 128K for receieve, as I recall, the same as
the 1000MX card.  There wasn't much between these two, that I was able
to observe, except that the 100ProS was slightly better.

The amount of memory you have on the NIC is not very significant.
I cannot give you numbers right now, but this is not the parameter that
changes your life.

Why not ?  Well I suppose your results (if the 3com really does only have
16 or 32k of buffer) would support this.

Packet are transferred in memory through a bus-mastering process.
The memory on the NIC is used just to keep packets in case the transfer
cannot be initiated immediately.

Usually, cards tend to trasfer packets as soon as they are received.
Otherwise the latency for getting a packet can be unacceptably high, which
means that the timestamping is not precise, and user complain that the
latency of a PC in receiving packets from the network is too high.
This is why "interrupt mitigation" and such these technologies must be
usually turned on explicitly: they increase the latency in delivering
packets to the applications.

So, having 10KB or 100KB does not (usually) matter.


But maybe buffering is more important for BPF where you have the interrupt
masked out for longer while the data is copied ?

From this point of view, Win32 and BSD (ehm... older BSD, without device
polling) are mostly the same.
Also Win32 masks the interrupts for a while.


=========================================
A valuable result of this study is the quantitative conclusion that,
contrary to common belief, filtering and buffering are not the
most critical
factors in determining packet capture performance. Optimization
of these two
components, that received most attention so far, is shown to
bring little
improvement to the overall packet capture cost, particularly in case of
short packets (or when small snapshot length are needed). The
profiling done
on a real system shows that the most important bottlenecks lie in hidden
places, like device driver, interaction between application and OS,
interaction between OS and hardware.
=========================================

Hmmm, the testing I did would disagree with that, or at least so far as
to say that there is a "sweet spot" for buffer sizes and data rates (at
least with BPF.)  The hardware does make some difference - one of our
other test cards was a Netgear (FA-311?) and it was shit.

My recollection was that with the data sample we were using, with 1MB
captures enabled for BPF, at full speed, most reads were between 64k
and 256k, at a time.

There were other changes to BPF, unrelated to what you've changed, that
reduced packet loss from X% to 0%.  I copied these, this year, from
FreeBSD to NetBSD but I don't recall their vintage on FreeBSD.

It should be very helpful to have these results published somewhere.
They should be very helpful in the scientific community (at least, for us
;-) )


And, I would like to say, you need a global picture of where
the bottleneck
are before doing optimizations.

Oh, sure.  And one of those limiting factors is PCI.

Yes.


For instance, we're now working to decrease the 50% of the time spend by
each packet in the operating systems.

You're still working with Windows ?

We're implementing a prototype on Linux, just because we need some stuff at
kernel level which was not available on FreeBSD (which, for instance, is my
favourite system).
Then, we're planning to implement everything in Windows as well.


In the NetBSD emails, I think I ponder making changes to the buffering
so that it is more ring-buffer like (similar to what exists within NPF
if I understand the diagram right.)

Eh, what you're saying is good but... the double buffering in
the BPF has an
advantage: it is much simpler, and if you're not interested in memory
occupancy, it is a very good choice.

Yes.

We didn't realize it in 2001; now, we can see less black and
white in the
choice between a double buffer and a ring buffer...

What have you found that makes you say this ?
The simplicity in cpu cycle cost ?

1. simplicity
2. swappable buffers are very helpful if you plan to make statistics, not
only packet capture.
For instance, let's think about a system (like a NetFlow probe or something
like that) that collects statistics, then it returns data to the user every
N minutes. If you have two buffers you can put statistics in the first one,
while you can read data from the second one, and swap buffers every N
minutes.
If you have a ring buffer and your application wants to read data, you have
to stop collecting stats, lock the ring, copy its content in another buffer,
unlock the ring, read data from the second buffer, and restart computing
statistics.

So, depending on what you're planning to do, swappable buffers may be
better.


Is the JIT code easily ported to other platforms ?

Yes, as far as the platform is Intel ;-)

That's fine with me :)
Do you have a URL for this ?

http://winpcap.polito.it
You'll find everything in the source pack.
Cheers,

        fulvio

-
This is the tcpdump-workers list.
Visit https://lists.sandelman.ca/ to unsubscribe.


Current thread: