tcpdump mailing list archives
Re: some questions about TPACKET3
From: Mario Rugiero via tcpdump-workers <tcpdump-workers () lists tcpdump org>
Date: Sun, 28 Jun 2020 16:23:23 -0300
--- Begin Message --- From: Mario Rugiero <mrugiero () gmail com>
Date: Sun, 28 Jun 2020 16:23:23 -0300
El sáb., 27 jun. 2020 a las 23:56, Michael Richardson (<mcr () sandelman ca>) escribió:Mario, can you confirm my understanding here.Hi Michael.In TPACKET3 mode, there are (tp_block_nr) pools of memory. The beginning of each block is tp_block_size in size, which can be large numbers like 4M in size. (2^22 in the kernel documentation example). (We, however seem to pick a blocksize which is only just big enough to hold the maximum snaplen.) Each one has a linked-list of tp3_hdr, which are interleaved with the packet data itself. The "next" pointer is the tp_next_offset. It seems from my reading of code that the kernel returns an entire chain of tp3_hdr to us, controlled by a *single* block_status bit. That is, we get entire chains of tp3_hdr from the kernel, and we return them to the kernel in single blocks. I think that this was not the case with tp2: in that packets were passed to/from the kernel one at a time, each one with their own TP_STATUS_KERNEL bit.AFAIK all of this is correct.For a contract, I am trying to improve the write performance by using async I/O. {I also need to associate requests and responses, which makes the ordering of operations non-sequential} I therefore do not want to give the blocks back to the kernel until the write has concluded, and for this I'm working on a variation of linux_mmap_v3(), which will callback with groups of packets, through a pipeline of "processors", each of which may steal the packet, and then return it later. I am realizing that I have to keep track of the blocks, not just the packets. I guess my original conceptual thinking was too heavily influenced by V2, and I was thinking that V3 had changed things by splitting the hdr from the packet, putting the constant-sized hdrs into a fixed sized ring, while the packet content was allocated as needed. I see that I am mistaken, but I'd sure love confirmation.I believe you may be thinking of AF_XDP. As you probably know, libpcap doesn't have support for it (yet), but I don't think you'll have trouble using it directly. I worked briefly with the RX side of it, so I may be able to help you with that. As you said, it splits headers from packets, sort of. The packet contents are stored in blocks of a buffer called UMEM. Contrary to PACKET_MMAP, you work with two queues per path, both containing descriptors to find the data in UMEM. These descriptors fit the role of the headers. For the RX side you have the FILL queue, where you store descriptors to indicate the kernel a given block is free to use, and the RX queue, where the kernel gives these blocks back when a packet passes the filter[0]. The TX side has a TX queue, where you store descriptors pointing to the data you want to send in the UMEM buffer, and a COMPLETION queue, where the kernel gives you the blocks back for reuse after the data was sent. IIRC, AF_XDP allows queuing packets to later send in a burst on request, but since I didn't work with that path, so I'm not 100% certain. Since the UMEM blocks are fixed size and one block is used for each packet, they consume more memory, but are much simpler to use for this and allows out-of-order release of resources. [0]: AF_XDP requires eBPF filters to be installed in the kernel.I am also considering rewriting packet_mmap.txt :-) -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | IoT architect [ ] mcr () sandelman ca http://www.sandelman.ca/ | ruby on rails [
--- End Message ---
_______________________________________________ tcpdump-workers mailing list tcpdump-workers () lists tcpdump org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Current thread:
- Re: some questions about TPACKET3 Mario Rugiero via tcpdump-workers (Jun 28)
- Message not available
- Re: some questions about TPACKET3 Mario Rugiero via tcpdump-workers (Jun 28)
- Message not available