tcpdump mailing list archives

Re: BPF changes from bsdi (candidate 3.5 release)


From: Guy Harris <guy () alum mit edu>
Date: Thu, 11 Sep 2003 01:38:59 -0700

On Wed, Dec 29, 1999 at 11:57:30AM +0900, itojun () iijlab net wrote:
      I've got a copy of bsdi41.  bsdi41 includes better IPv6 support in
      kernel bpf, and as discussed at face-to-face meeting in Washington DC
      we are thinking of using it.
      The code itself is BSD-licensed, and I got permission from the author
      (prb () bsdi com) to integrate it into.

Well, now that Wind River are apparently stopping development of BSD/OS:

        http://alan.clegg.com/WINDRIVER.TXT

should we integrate that code (and try to get its enhancements added to
the various BSDs, and publish a spec for the Linux BPF engine developers
to use to implement it)?

Presumably the permission still applies.

Also, could that let us redo the protochain support in libpcap?  A
couple of years ago, you said

On Thu, Oct 25, 2001 at 08:32:13AM +0900, itojun () iijlab net wrote:
There is one tricky bit - currently, the code in libpcap to handle
platforms with BPF assumes that there's a global variable, set by the
BPF code generator, to indicate code generated to chase protocol chains
in IPv6 is present; the comment in "pcap-bpf.c" says

      more correctly, a global variable to indicate whether BPF optimization
      code works okay or not.  BPF optimization code make some assumption
      on the BPF code prior to the optimization, and BPF code generated by
      gen_protochain() does not conform to the assumption.

So what's the assumption it makes?

Your message from 1999 at

        http://www.tcpdump.org/lists/workers/1999/msg00034.html

says that the BPF code for protochain makes backward jumps; is the
optimizer assuming no backward jumps?

One of the things the BSD/OS people added was a tweak to make a 128-bit
BPF_LDX do what appears to be IPv6 protocol chain chasing.  Is that
feature sufficient to implement the protochain stuff?  If so, and if
there aren't any BPF kernel implementations that support backward jumps
(or, at least, not any standard ones), perhaps we should merge the
BSD/OS changes into the libpcap BPF interpreter, have the code generator
use BPF_LDX+BPF_128+BPF_MSH/BPF_LDX+BPF_128+BPF_MSHX for protocol chain
chasing, and, if any of the BSDs get that added to their in-kernel BPF
interpreters, have the BPF "pcap_setfilter()" do a "uname()" and check
whether the OS should support them or not - if so, put the filter into
the kernel, otherwise do the filtering in userland.  (A similar thing
could be done on Linux.)

The 2002 BSDCon paper at

        http://www.usenix.org/events/bsdcon02/full_papers/lidl/lidl_html/index.html

on the BSD/OS packet filter says:

        In order to support IPv6, several other new enhancements were
        made to the BPF pseudo-machine.  Triple length instructions were
        added.  A ``classic'' BPF instruction is normally 64 bits in
        size: 16 bits of opcode, two 8 bit jump fields, and a 32 bit
        immediate field.  A triple length instruction has 128 bits of
        additional immediate data (the length of an IPv6 address).  A
        new register, A128, was also added.  The load, store, and jump
        instructions now have 128 bit versions.  The scratch memory
        locations have been expanded to 128 bits, though traditional
        programs only use the lower 32 bits of each location.  An
        instruction to zero out a scratch memory location (ZMEM) was
        added.  Because BPF was not extended to handle 128 bit
        arithmetic, a new jump instruction was created that allowed for
        the comparison of the A register to a network address, subject
        to a netmask.  The netmask must be specified as a CIDR style
        netmask, specifically a count of the number of significant bits
        in the netmask.

        ROM locations only have 32 bit values and it is in the ROM that
        a new destination routing address is passed.  Currently it is
        not possible to use the next-hop routing capability with IPv6.

which describes some of their extensions, although not the 128-bit
BPF_LDX.

Here are some notes I have on their extensions, based on looking at the
code and the paper.

BPF_TRIPLE (Q instructions are triple-length, and include an extra 128
bits for IPv6 addresses).  To quote tha paper:

        Triple length instructions were added.  A ``classic'' BPF
        instruction is normally 64 bits in size: 16 bits of opcode, two
        8 bit jump fields, and a 32 bit immediate field.  A triple
        length instruction has 128 bits of additional immediate data
        (the length of an IPv6 address).

In addition to size codes of BPF_B (1 byte), BPF_H (2 bytes), and BPF_W
(4 bytes), we have BPF_128 (16 bytes, presumably for IPv6 addresses);
these are used with LD, LDX, ST, and STX.  To quote the paper:

        The load, store, and jump instructions now have 128 bit
        versions.

They also added a new A128 register.

Addressing modes for those have, in addition, BPF_ROM and BPF_MSHX. 

    BPF_ROM loads from or stores into a "PROM" array; that array is
    passed into "bpf_filter()" as an argument, but it's always NULL,
    with a length of 0, in "bpf_tap()" and "bpf_mtap()".  It's used by
    their IPFW stuff.  To quote the paper:

        The new memory type is called ROM and is an additional memory
        area to the original BPF memory spaces.  The original memory
        spaces included the packet contents as well as the scratch
        memory arena.  While the first implementation did in fact store
        read only information, the term ROM is now a misnomer as the ROM
        locations can be modified by the filter.  This space, called
        ``prom'' in the source code, is used to pass ancillary information
        in and out of the BPF filter.

    BPF_MSHX is like BPF_MSH, only it adds the X register.  I.e.:

        BPF_LDX+BPF_B+BPF_MSH   X <- 4*(P[k:1]&0xf)

        BPF_LDX+BPF_B+BPF_MSHX  X <- 4*(P[k:1]&0xf) + X

    (Hey, does this let us do variable-length link-layer headers more
    easily?  NOT.  There's the spill/fill problem....)

    If it's BPF_LDX+BPF_128+BPF_MSH{X}, it does IPv6 hackery:

        MSHX sets "k" to "pc->k + 6 + X", and MSH sets it to "pc->k +
        6".  If "pc->k" (or "pc->k + X") takes us to the beginning of
        the IPv6 header, then that takes "k" to the "next header" field
        of the header.  It then clears the X register, and then fetches
        the next header type into the A register and:

            if the X register is 0, bumps "k" by 34 and "X" by 40;

            otherwise, gets the header length from the byte after
            the "next header" field, turns it to bytes, and advances
            "k" and "X" by that.

        If X is 0, this presumably means the IPv6 header; if it's not 0,
        it presumably means an extension header.

        If the A register equals the "jt" value, it stops.

        If the A register is IPPROTO_{HOPOPTS, ROUTING, DSTOPTS,
        FRAGMENT, AH}, it continues.

        Otherwise, it stops.

        The net result is that "A" contains the protocol type on which
        we stopped, and "X" points past the IPv6 header and all the
        extension headers.

BPF_ABS now applies to stores?

A new jump instruction is BPF_JNET.

    If not a Q instruction:

        BPF_JMP|BPF_JNET|BPF_K tests whether the AC, when anded with a
        netmask generated by taking bits 0x00001F00 of "pc->code",
        shifting them right, and using that as the n in /n, equals
        pc->k.

        BPF_JMP|BPF_JNET|BPF_X tests whether the AC, when anded with a
        netmask generated by taking bits 0x00001F00 of "pc->code",
        shifting them right, and using that as the n in /n, equals
        the X register when anded with the same mask.

        (pc->code) is part of the opcode.  Is this some optimization, to
        let you do the netmask ANDing along with the testing in one
        instruction?

    It does similar things when a Q instruction, with IPv6 addresses.

    To quote the paper:

        Because BPF was not extended to handle 128 bit arithmetic, a new
        jump instruction was created that allowed for the comparison of
        the A register to a network address, subject to a netmask.  The
        netmask must be specified as a CIDR style netmask, specifically
        a count of the number of significant bits in the netmask.

In addition to the miscellaneous ops TAX and TXA, we have CCC and ZMEM.

    BPF_MISC|BPF_CCC does:

#if     defined(IPFW) && defined(KERNEL)
                        ipfw_t *fp;

                        k = pc->k;
                        if (k >= 0 && k < ipfw_nfilters &&
                            (fp = ipfw_filters[k]) != NULL)
                                A = fp->filter(fp, (struct mbuf *)p,
                                    IPFW_CALL, NULL);
                        else
#endif
                                A = 0;
                        continue;
                    }

    To quote the paper:

        The new BPF instruction, CCC, enables the calling of a filter on
        the ``call filter chain.'' While it might seem that the acronym
        stands for ``Call Call Chain,'' it was actually derived from
        ``Call Circuit Cache.'' The circuit cache was the reason for the
        creation of the call chain.  The CCC instruction returns the
        result of the call in the A register.

    BPF_MISC|BPF_ZMEM does:

        bzero(&mem[pc->k], sizeof(mem[pc->k]));

    which zeroes out all *128* bits of a memory word (yes, they're now
    128 bits).  To quote the paper:

        An instruction to zero out a scratch memory location (ZMEM) was
        added. 
-
This is the TCPDUMP workers list. It is archived at
http://www.tcpdump.org/lists/workers/index.html
To unsubscribe use mailto:tcpdump-workers-request () tcpdump org?body=unsubscribe


Current thread: