tcpdump mailing list archives
Re: [PATCH] enable memory mapped access toethernet device for linux
From: "Paolo Abeni" <paolo.abeni () telecomitalia it>
Date: Fri, 07 Dec 2007 09:51:02 +0100
hello, First, thanks for the detailed review. It seems that some of the relevant point has been addressed by Guy Harris, so I'll try to catch the others... On Thu, 2007-12-06 at 12:54 -0500, Alexander Dupuy wrote:
Rounding the ring size to nearest power of two wastes quite a bit of memory for full capture on standard Ethernet (2048/1514 = 26% wasted) and even more for typical jumbo frames (16384/9000 = 45% wasted). How exactly does this simplify ring navigation?
There are a couple of constraints to take care of. The ring blocks must be page aligned, but are allocated internally to a size that is a power of 2, so choosing a non power of 2 size of for the ring's block will waste memory. The ring frames can't run across the block boundary, so if the block size is not a multiple of the frame size there is some waste of memory at the end of each block. This gap can be reduced choosing a (possibly big) block size that will match the frame size, but this will lead to allocation of possibly big chunks of kernel memory, that is strongly discouraged by the Linux kernel developers. Having the frame size a power of two solve the above issue and simplify walking the ring, because there is no need to handle in special way the end of each ring block; differently we need to keep in the pcap handle the block size (and that should require adding some field to the handle structure) and check for end of block after each frame processing (to skip the gap at the block's end)
You also have to be much more careful about multiple calls to poll() within the loop, due to interrupts, interface down, and handle pcap_breakloop() correctly.
Sincerely here I miss the point. Currently If the poll() call is interrupted by a signal, the call is invoked again, as performed on other platforms. The interface down will cause the read call to return with error, and I suppose this is the standard behavior. Finally I expressly check for break_loop after each non fatal termination of poll(). Obviously there could be many bugs in may code, so if you pin-point something in it, that will help a lot!
I wonder if your "power-of-two" approach is just covering up some memory overflow problems. I also notice that you are limiting the number of ring slots to 128K (MAX_BLOCK_NR). While this is correct for 32-bit i386 Linux 2.4 (and earlier) kernels, the values are different on other architectures, and the kmalloc limit no longer applies for 2.6 kernels (there are other limits, though).
Please note that in both version of the patch I submitted, the effective ring size is selected using a binary search approach, starting from said limit. With the MAX_BLOCK_NR block limit on a 64bits platform the ring will hold by default 16K jumbo frames (or 32K standard ethernet frames) that in may experience is more than enough to handle at least a Gb ethernet link, even if it's filled with very small packets (but most NICs are simple unable to deliver such load the the host). On 32 bits platform the maximum ring frame number is doubled. Anyway the starting point for the binary search can be changed (i.e. increased) if necessary.
There's also an issue that with the ringbuffer, the initial contents can be quite substantial in the fraction of a second between the pcap_open and application call to pcap_setfilter; for some reason this is not so much an issue for the socket read() interface, although buffering takes place there as well, perhaps the kernel (re-)filters the socket buffer when the filter is changed? Anyhow, I've found it necessary to apply user-level filtering to the contents of the ring buffer from startup until the ring is empty the first time. There's also a (smaller) window between the packet socket() and bind() calls where packets from *any* interface may be queued in the ringbuffer; I also filter these out if the pcap_open was not for the "any" interface. (This one seems to apply in the socket read case as well, and I think I stole that code from there.)
The ring is created only after the 'bind' of the socket to the requested interface, and packet are delivered to the ring only after it's creation, so the second issue should not arise. On the other hand, it seams that the ring buffer isn't flushed by the kernel when a pcap filter is attached, so the first issue must be handled. An alternative, very simple solution would be to manually flush the ring buffer after setting the filter. It will cause the loss of same frames, but the same happen right now with standard, not memory mapped access. In conclusion I'll try to repost asap a modified version of the patch that will handle the filter issue. If some other change is required (i.e. a bigger default ring size) please let me know. ciao, Paolo -------------------------------------------------------------------- CONFIDENTIALITY NOTICE This message and its attachments are addressed solely to the persons above and may contain confidential information. If you have received the message in error, be informed that any use of the content hereof is prohibited. Please return it immediately to the sender and delete the message. Should you have any questions, please contact us by replying to webmaster () telecomitalia it. Thank you www.telecomitalia.it -------------------------------------------------------------------- - This is the tcpdump-workers list. Visit https://cod.sandelman.ca/ to unsubscribe.
Current thread:
- Re: [PATCH] enable memory mapped access to ethernet device for linux, (continued)
- Re: [PATCH] enable memory mapped access to ethernet device for linux Gianluca Varenni (Dec 05)
- Re: [PATCH] enable memory mapped access toethernet device for linux Paolo Abeni (Dec 05)
- Re: [PATCH] enable memory mapped access toethernet Gregor Maier (Dec 05)
- Re: [PATCH] enable memory mapped access toethernet Gianluca Varenni (Dec 05)
- Re: [PATCH] enable memory mapped access to ethernet devices Abeni Paolo (Dec 05)
- Re: [PATCH] enable memory mapped access toethernet device for linux Paolo Abeni (Dec 05)
- Re: [PATCH] enable memory mapped access to ethernet device for linux Gianluca Varenni (Dec 05)
- [PATCH] enable memory mapped access to ethernet device for linux Alexander Dupuy (Dec 06)
- Re: [PATCH] enable memory mapped access to ethernet device for linux Guy Harris (Dec 06)
- Re: [PATCH] enable memory mapped access to ethernet Andy Howell (Dec 07)
- Re: [PATCH] enable memory mapped access to ethernet device for linux Gianluca Varenni (Dec 07)
- Re: [PATCH] enable memory mapped access to ethernet Matthew Luckie (Dec 07)
- Re: [PATCH] enable memory mapped access to ethernet device for linux Guy Harris (Dec 06)
- Re: [PATCH] enable memory mapped access toethernet device for linux Paolo Abeni (Dec 07)
- Re: [PATCH] enable memory mapped access to ethernet device for linux Gianluca Varenni (Dec 10)
- Re: [PATCH] enable memory mapped access to ethernet Guy Harris (Dec 10)
- Re: [PATCH] enable memory mapped access to ethernet Gianluca Varenni (Dec 11)