tcpdump mailing list archives
Re: endianness of portable BPF bytecode
From: Denis Ovsienko via tcpdump-workers <tcpdump-workers () lists tcpdump org>
Date: Fri, 10 Jun 2022 23:37:41 +0100
--- Begin Message --- From: Denis Ovsienko <denis () ovsienko info>
Date: Fri, 10 Jun 2022 23:37:41 +0100
On Fri, 10 Jun 2022 14:26:34 -0700 Guy Harris <gharris () sonic net> wrote:On Jun 10, 2022, at 1:59 PM, Denis Ovsienko via tcpdump-workers <tcpdump-workers () lists tcpdump org> wrote:Below is a draft of such a file format. It addresses the following needs: * There is a header with a signature string to avoid false positive detection as some other file type that begins exactly with particular bytecode (ran into this during disassembly experiments). * There are version fields to address possible future changes to the encoding (either backward-compatible or not).Is the idea that a change that's backward-compatible (so that code that handles the new format needs no changes to handle the old format, but code that handles only the old format can't handle the new format) would involve a change to the minor version number, but a change that's not backward-compatible (so that to handle both versions would require two code paths for the two versions) would involve a change to the major version number?Yes, more or less. The draft format had a couple more fields not long ago, with those the version semantics seemed more apparent (for a while I thought Linux kernel cBPF is a superset of libpcap cBPF, but upon a closer inspection of the header files they seem identical). In any case, it is very convenient to be able to cycle a major version and to redefine everything beyond the signature and the version fields, that's the idea. Forward- and backward-compatibility between minor versions can be considered now.File format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 'c' | 'B' | 'P' | 'F' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Is the 'c' part of the retronym "cBPF" for the "classic BPF" instruction set, as opposed to the eBPF instruction set? (I didn't find any file format for saving eBPF programs, so this format could be used for that as well, with the magic number 'e' 'B' 'P' 'F'.)Yes, it is. In online documentation "eBPF" seems to clash with "BPF" a lot, so it seems better to avoid the confusion early. As it turned out after some research, the nominal binary format for eBPF is ELF. This is one of the most useful online documents I found: https://www.man7.org/linux/man-pages/man8/tc-bpf.8.html As you can see there and in the references into Linux kernel documentation, ELF eBPF seems to cover different bit widths, relocation types, debug information, lookup maps, multiple executable sections and what not. However, most of these features significantly overshoot the packet capture problem space on one hand, and don't seem to address simple practical needs of capturing parameters and context of a cBPF compilation and reproducing it later. So I figured it would be better to leave eBPF solution space alone and to use a separate purpose-designed file format for cBPF. Most of the meta-data TLVs below are purposed to help a developer to understand the context and reproduce the compilation.Type=0x02 (LINKTYPE_ID) Length=4 Value=<integer, link-layer header type>This could be 2 bytes long - pcapng limits link-layer types to 16 bits, and pcap now can use the upper 16 bits of the link-layer type field for other purposes.Fine.Type=0x03 (LINKTYPE_NAME) Length is variable Value=<ASCII string, the link-layer header type name>E.g. either its LINKTYPE_xxx name or its DLT_xxx name?Yes. The intent is to capture the input to pcap_datalink_name_to_val() if the latter was involved.Type=0x04 (COMMENT) Length is variabe Value=<UTF-8 string, comment or the generating software description>"Generating software description" as in the code that generated the BPF program?"libpcap x.y.z", "my script v1.0" or something like that.Type=0x05 (TIMESTAMP) Length=8 Value=<integer, Unix timestamp>Is this the time the code was generated?Yes.Is it a 64-bit time_t, or a 32-bit time_t and a 32-bit microseconds/nanoseconds value? I'd recommend the former, unless we expect classic BPF to be dead by 2038.It is the 64-bit integer time. -- Denis Ovsienko
--- End Message ---
_______________________________________________ tcpdump-workers mailing list tcpdump-workers () lists tcpdump org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
Current thread:
- endianness of portable BPF bytecode Denis Ovsienko via tcpdump-workers (Jun 02)
- Re: endianness of portable BPF bytecode Denis Ovsienko via tcpdump-workers (Jun 10)
- Re: endianness of portable BPF bytecode Guy Harris via tcpdump-workers (Jun 10)
- Message not available
- Re: endianness of portable BPF bytecode Denis Ovsienko via tcpdump-workers (Jun 10)
- Message not available
- Re: endianness of portable BPF bytecode Denis Ovsienko via tcpdump-workers (Jun 11)
- Re: endianness of portable BPF bytecode Denis Ovsienko via tcpdump-workers (Jun 10)
- Re: endianness of portable BPF bytecode (DRAFT revision 3) Denis Ovsienko via tcpdump-workers (Jun 25)
- Re: endianness of portable BPF bytecode (DRAFT revision 4) Denis Ovsienko via tcpdump-workers (Jun 30)