tcpdump mailing list archives
Re: BPF changes from bsdi (candidate 3.5 release)
From: Guy Harris <guy () alum mit edu>
Date: Thu, 11 Sep 2003 01:38:59 -0700
On Wed, Dec 29, 1999 at 11:57:30AM +0900, itojun () iijlab net wrote:
I've got a copy of bsdi41. bsdi41 includes better IPv6 support in kernel bpf, and as discussed at face-to-face meeting in Washington DC we are thinking of using it. The code itself is BSD-licensed, and I got permission from the author (prb () bsdi com) to integrate it into.
Well, now that Wind River are apparently stopping development of BSD/OS: http://alan.clegg.com/WINDRIVER.TXT should we integrate that code (and try to get its enhancements added to the various BSDs, and publish a spec for the Linux BPF engine developers to use to implement it)? Presumably the permission still applies. Also, could that let us redo the protochain support in libpcap? A couple of years ago, you said On Thu, Oct 25, 2001 at 08:32:13AM +0900, itojun () iijlab net wrote:
There is one tricky bit - currently, the code in libpcap to handle platforms with BPF assumes that there's a global variable, set by the BPF code generator, to indicate code generated to chase protocol chains in IPv6 is present; the comment in "pcap-bpf.c" saysmore correctly, a global variable to indicate whether BPF optimization code works okay or not. BPF optimization code make some assumption on the BPF code prior to the optimization, and BPF code generated by gen_protochain() does not conform to the assumption.
So what's the assumption it makes? Your message from 1999 at http://www.tcpdump.org/lists/workers/1999/msg00034.html says that the BPF code for protochain makes backward jumps; is the optimizer assuming no backward jumps? One of the things the BSD/OS people added was a tweak to make a 128-bit BPF_LDX do what appears to be IPv6 protocol chain chasing. Is that feature sufficient to implement the protochain stuff? If so, and if there aren't any BPF kernel implementations that support backward jumps (or, at least, not any standard ones), perhaps we should merge the BSD/OS changes into the libpcap BPF interpreter, have the code generator use BPF_LDX+BPF_128+BPF_MSH/BPF_LDX+BPF_128+BPF_MSHX for protocol chain chasing, and, if any of the BSDs get that added to their in-kernel BPF interpreters, have the BPF "pcap_setfilter()" do a "uname()" and check whether the OS should support them or not - if so, put the filter into the kernel, otherwise do the filtering in userland. (A similar thing could be done on Linux.) The 2002 BSDCon paper at http://www.usenix.org/events/bsdcon02/full_papers/lidl/lidl_html/index.html on the BSD/OS packet filter says: In order to support IPv6, several other new enhancements were made to the BPF pseudo-machine. Triple length instructions were added. A ``classic'' BPF instruction is normally 64 bits in size: 16 bits of opcode, two 8 bit jump fields, and a 32 bit immediate field. A triple length instruction has 128 bits of additional immediate data (the length of an IPv6 address). A new register, A128, was also added. The load, store, and jump instructions now have 128 bit versions. The scratch memory locations have been expanded to 128 bits, though traditional programs only use the lower 32 bits of each location. An instruction to zero out a scratch memory location (ZMEM) was added. Because BPF was not extended to handle 128 bit arithmetic, a new jump instruction was created that allowed for the comparison of the A register to a network address, subject to a netmask. The netmask must be specified as a CIDR style netmask, specifically a count of the number of significant bits in the netmask. ROM locations only have 32 bit values and it is in the ROM that a new destination routing address is passed. Currently it is not possible to use the next-hop routing capability with IPv6. which describes some of their extensions, although not the 128-bit BPF_LDX. Here are some notes I have on their extensions, based on looking at the code and the paper. BPF_TRIPLE (Q instructions are triple-length, and include an extra 128 bits for IPv6 addresses). To quote tha paper: Triple length instructions were added. A ``classic'' BPF instruction is normally 64 bits in size: 16 bits of opcode, two 8 bit jump fields, and a 32 bit immediate field. A triple length instruction has 128 bits of additional immediate data (the length of an IPv6 address). In addition to size codes of BPF_B (1 byte), BPF_H (2 bytes), and BPF_W (4 bytes), we have BPF_128 (16 bytes, presumably for IPv6 addresses); these are used with LD, LDX, ST, and STX. To quote the paper: The load, store, and jump instructions now have 128 bit versions. They also added a new A128 register. Addressing modes for those have, in addition, BPF_ROM and BPF_MSHX. BPF_ROM loads from or stores into a "PROM" array; that array is passed into "bpf_filter()" as an argument, but it's always NULL, with a length of 0, in "bpf_tap()" and "bpf_mtap()". It's used by their IPFW stuff. To quote the paper: The new memory type is called ROM and is an additional memory area to the original BPF memory spaces. The original memory spaces included the packet contents as well as the scratch memory arena. While the first implementation did in fact store read only information, the term ROM is now a misnomer as the ROM locations can be modified by the filter. This space, called ``prom'' in the source code, is used to pass ancillary information in and out of the BPF filter. BPF_MSHX is like BPF_MSH, only it adds the X register. I.e.: BPF_LDX+BPF_B+BPF_MSH X <- 4*(P[k:1]&0xf) BPF_LDX+BPF_B+BPF_MSHX X <- 4*(P[k:1]&0xf) + X (Hey, does this let us do variable-length link-layer headers more easily? NOT. There's the spill/fill problem....) If it's BPF_LDX+BPF_128+BPF_MSH{X}, it does IPv6 hackery: MSHX sets "k" to "pc->k + 6 + X", and MSH sets it to "pc->k + 6". If "pc->k" (or "pc->k + X") takes us to the beginning of the IPv6 header, then that takes "k" to the "next header" field of the header. It then clears the X register, and then fetches the next header type into the A register and: if the X register is 0, bumps "k" by 34 and "X" by 40; otherwise, gets the header length from the byte after the "next header" field, turns it to bytes, and advances "k" and "X" by that. If X is 0, this presumably means the IPv6 header; if it's not 0, it presumably means an extension header. If the A register equals the "jt" value, it stops. If the A register is IPPROTO_{HOPOPTS, ROUTING, DSTOPTS, FRAGMENT, AH}, it continues. Otherwise, it stops. The net result is that "A" contains the protocol type on which we stopped, and "X" points past the IPv6 header and all the extension headers. BPF_ABS now applies to stores? A new jump instruction is BPF_JNET. If not a Q instruction: BPF_JMP|BPF_JNET|BPF_K tests whether the AC, when anded with a netmask generated by taking bits 0x00001F00 of "pc->code", shifting them right, and using that as the n in /n, equals pc->k. BPF_JMP|BPF_JNET|BPF_X tests whether the AC, when anded with a netmask generated by taking bits 0x00001F00 of "pc->code", shifting them right, and using that as the n in /n, equals the X register when anded with the same mask. (pc->code) is part of the opcode. Is this some optimization, to let you do the netmask ANDing along with the testing in one instruction? It does similar things when a Q instruction, with IPv6 addresses. To quote the paper: Because BPF was not extended to handle 128 bit arithmetic, a new jump instruction was created that allowed for the comparison of the A register to a network address, subject to a netmask. The netmask must be specified as a CIDR style netmask, specifically a count of the number of significant bits in the netmask. In addition to the miscellaneous ops TAX and TXA, we have CCC and ZMEM. BPF_MISC|BPF_CCC does: #if defined(IPFW) && defined(KERNEL) ipfw_t *fp; k = pc->k; if (k >= 0 && k < ipfw_nfilters && (fp = ipfw_filters[k]) != NULL) A = fp->filter(fp, (struct mbuf *)p, IPFW_CALL, NULL); else #endif A = 0; continue; } To quote the paper: The new BPF instruction, CCC, enables the calling of a filter on the ``call filter chain.'' While it might seem that the acronym stands for ``Call Call Chain,'' it was actually derived from ``Call Circuit Cache.'' The circuit cache was the reason for the creation of the call chain. The CCC instruction returns the result of the call in the A register. BPF_MISC|BPF_ZMEM does: bzero(&mem[pc->k], sizeof(mem[pc->k])); which zeroes out all *128* bits of a memory word (yes, they're now 128 bits). To quote the paper: An instruction to zero out a scratch memory location (ZMEM) was added. - This is the TCPDUMP workers list. It is archived at http://www.tcpdump.org/lists/workers/index.html To unsubscribe use mailto:tcpdump-workers-request () tcpdump org?body=unsubscribe
Current thread:
- Re: BPF changes from bsdi (candidate 3.5 release) Guy Harris (Sep 11)