tcpdump mailing list archives

Re: Portable way to "block" on pcap_next_ex()


From: Guy Harris <guy () alum mit edu>
Date: Mon, 16 Jan 2012 12:25:00 -0800


On Jan 16, 2012, at 6:58 AM, Fernando Gont wrote:

On 01/15/2012 08:56 PM, Guy Harris wrote:
For my current app, it's probably just "annoying" (although no big 
deal). However, I was mostly concern about performance problems in
other applications. Put another way, if there's nothing that an app
can do without a packet being read, there's no reason for the app
to be awaken.

Well, presumably, yes, although I assume the folks at LBL had some
rationale for starting the timer when the read() is done rather than
when a packet arrives (starting the timer on a read() also
[,,,]

I guess that's the "logic" place to put a timeout? (Although in the pcap
case, the timeout is used for a different purpose (performance) rather
than "I don't want this call to block forever if there's nothing to read").

Well, more accurately, the *buffer* is used for performance (so that you don't get one wakeup and one read() call per 
packet, but per *batch* of packets), and the timeout is used to keep from blocking indefinitely waiting for the buffer 
to fill (as is the case if the timeout is set to 0 on a BPF device).  The issue is that, as the timer is started if a 
read is done with an empty buffer, if no packets arrive before the timeout expires, there will still be no packets in 
the buffer, and the read will return 0 bytes.

select()   and   poll()   do   not   work  correctly  on  BPF
devices; pcap_get_selectable_fd() will return a file descriptor on
most of those versions  (the  exceptions  being  FreeBSD  4.3  and
4.4), but a simple select() or poll() will not indicate that the
descriptor  is  readable until  a  full  buffer’s worth of packets is
received, even if the read timeout expires before then.  To work
around this, an application  that uses  select()  or  poll()  to
wait for packets to arrive must put the pcap_t in non‐blocking mode,
and must  arrange  that  the  select()  or poll()  have a timeout
less than or equal to the read timeout, and must try to read packets
after that timeout expires, regardless  of  whether select() or
poll() indicated that the file descriptor for the pcap_t is ready to
be read or not.

Sorry, what's the point of calling select() in this case,

Multiplexing operations such as, say, socket I/O and packet capture - the usual purpose of select().

and what's the rationale for the timeout value used with select() in this case?

Working around the fact that, on the OSes in question, doing a select()/poll()/etc. on a BPF descriptor doesn't start a 
timer, so that the select()/etc. will wait for the BPF "store buffer" to fill up before marking the BPF descriptor as 
readable, even though, with a timeout, a read on the descriptor will block until the "store buffer" fills up *or* the 
timer expires.

If you have a system where select() works as it should, i.e.:
[....]
select() will block until either

1) a bufferful of packets arrives

or

2) the timer, started when the select() is done, expires, regardless
of whether any packets are available to read.

This doesn't seem to agree with my tests.

I've just checked this on FreeBSD-8.2-release and on a current Ubuntu
system,

Sorry, I didn't make it clear enough that, when I said that, I was speaking only of systems using BPF, so it wouldn't 
apply to Ubuntu (or any other Linux distribution), for example.

and in both cases select() returns "readable" only for each
packet that is received.

What do you mean by "only for each packet that is received"?  Do you mean that it doesn't return "readable" if there 
are no packets to read?

In Solaris's case, that would depend on whether my app is actually
run before "to_ms" have elapsed since the reception of the first
packet, right?

Your app can't be run before "to_ms" have elapsed since the reception
of the first packet, because "the first packet" means "the first
packet received after the getmsg() is done on the DLPI descriptor" -
i.e., it's not the first packet received ever, it's the first packet
received in a packet batch.

Sorry, do you mean:

1) I call select(), and it blocks
2) select() returns "readable"
3) I call pcap_next_ex(), and *this* triggers the "to_ms" timer -->
hence this call will probably block for about "to_ms", too.

No, I mean that should *NOT* happen.  For Solaris with DLPI, the bufmod manual says

"To ensure that messages do not languish forever in an accumulating chunk, bufmod maintains a read timeout. Whenever 
this timeout expires, the module closes off the current chunk and passes it upward. The module restarts the timeout 
period when it receives a read side data message and a timeout is not currently active. These two rules insure that 
bufmod minimizes the number of chunks it produces during periods of intense message activity and that it periodically 
disposes of all messages during slack intervals, but avoids any timeout overhead when there is no activity."

With DLPI, you're reading from a STREAMS device; if there are any data messages available at the stream head, the read 
(getmsg()) will not block but will return the contents of the first data message.  A select() or poll() will indicate 
that the descriptor is readable if there are data messages available at the stream head, and will otherwise wait until 
a message is made available at the stream head, any timeout specified with the select() or poll() expires, or some 
other descriptor is readable.

The bufmod module receives individual data messages from what's below it on the stream, and accumulates them in a 
buffer.  When the buffer fills up, it's sent upstream as a single data message.  If a data message is received from 
below and there's no timeout in effect, it starts a timeout; when the timeout expires, the buffer is sent upstream with 
whatever messages it has in it (there will be at least one, as the timeout isn't started until a data message arrives; 
there might be more).

For Solaris 11 with BPF, it will probably work the same way BPF works on other OSes.-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Current thread: