tcpdump mailing list archives

Re: Libpcap reentrancy and PF_RING patch


From: Luca Deri <deri () ntop org>
Date: Wed, 2 Jan 2008 23:07:54 +0100

Gregor,
many thanks for your comments. Please see my comments inline.

On Dec 31, 2007, at 3:59 PM, Gregor Maier wrote:

Hi Luca,
Hi all,

I'm afraid, but I think your patch has a race condition under (at least under Linux). In Linux two syscalls are required to read a packet. First you read the packet, that you read the timestamp value with an ioctl. If
This is correct. However  when
   if (ioctl(handle->fd, SIOCGSTAMP, &pcap_header.ts) == -1) {
...
}

is called, the linux kernel calls (net/core/sock.c)

int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
{
        if (!sock_flag(sk, SOCK_TIMESTAMP))
                sock_enable_timestamp(sk);
        if (sk->sk_stamp.tv_sec == -1)
                return -ENOENT;
        if (sk->sk_stamp.tv_sec == 0)
                do_gettimeofday(&sk->sk_stamp);
return copy_to_user(userstamp, &sk->sk_stamp, sizeof(struct timeval)) ?
                -EFAULT : 0;
}

that basically return the time of the day either of the packet when captured, or when ioctl is called.

As libpcap is not calling this ioctl immediately (e.g. it might be that the BPF filter is evaluated), I suppose that even if the ioctl call is not in sync (e.g. another thread might use this timestamp) the loss of precision would be minimal. In general for high-performance packet handling, two system calls for reading one packet are too many and this is yet another reason why pcap on linux could be optimized. What do you think?



one thread gets scheduled just after the packet data is read, the
timestamp value will become bogus. Furthermore the pcap_read_* functions also read-modify fields in the pcap handle (p->breakloop, p->cc, etc). I
don't know whether they are meaningful for pcap_next_pkt, but if they
are, they are another race condition.

You're right this can happen (IMHO not as race condition but as interference of pcap_next_pkt with pcap_loop) but this is not the purpose of the patch. IMHO an application that calls pcap_next_pkt() won't also call from another thread pcap_loop(). Either you stay with pcap_loop or you move to pcap_next_pkt.



I think you won't be able to achieve reentrancy when reading from one
pcap handle without locking! Note however, that reading from different
pcap handles (e.g., one handle per thread) is already thread safe, since
no global data structures are used.

Correct but pcap/Linux do not offer packet load sharing across pcap sockets (e.g. see socket clustering support into PF_RING) so opening simultaneously several sockets won't solve the problem either



I general I'm not sure, whether a reentrant libpcap would have benefits.
When two threads read from the same pcap handle they will receive
packets in an arbitrary order. So either, the application doesn't care
about inter-packet relations at all, which seems unlikely. Or the
This is very likely in my world instead. I do network monitoring, netflow etc and this is not a problem at all given that packets have reasonably correct timestamps (see your comment above)

threads must be synchronized by the application anyway.

However, your patch can help when developing multi-threaded
applications. Being able to specify the buffer to which the packet is
written has the advantage, that it might save a memcpy operation to copy
the packet to a dedicated storage area. When I use libpcap I almost
always do a memcpy after the pcap_next call....


So in short, I think the patch is nice idea, but it does not provide
reentrancy and thus should not be labeled 'reentrant'.


I think that if the community is interested, something as to be done in order to address reentrancy and efficiency in libpcap. If for some reasons we still need locking, then it's inside pcap and not outside that this must happen. In fact the cost of mutexes is high, and spinlocks can't be used for locking pcap_XXX calls as their granularity is too big.

Regards, Luca



Just my 2ct.
Happy new year,
Gregor


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Current thread: