Wireshark mailing list archives

Re: RFD: New language to write dissectors


From: Guy Harris <guy () alum mit edu>
Date: Sat, 14 Jul 2012 15:31:06 -0700


On Jul 14, 2012, at 8:26 AM, Jakub Zawadzki wrote:

It'd be great if we have some abstract and pure (no C/assembly inline) language to write dissectors.

Or "to describe protocols and the way packets for those protocols are displayed" - the languages in question wouldn't 
be as procedural as C/Lua/etc, they'd be more descriptive.

We could invent yet another protocol desciption language,

...but, as you suggest, we probably shouldn't.

but I was thinking to base grammar on netmon NPL [1] or wsgd [2].

Those are probably the two best choices.

I'm not sure it has to be a choice, though - we could implement both, resources permitting, of course.  (And, of 
course, given that there are many already-existing languages that describe protocols - ASN.1, {OSF IDL/MIDL/PIDL} for 
DCE RPC, rpcgen for ONC RPC, CORBA IDL, xcb for X11 - we will probably never have the One True Protocol Description 
Language.)

I'm bigger fan of NPL (sorry Olivier), nmparsers project has got large collection of dissectors[3] 
which we could use (LLTD - bug #6071, Windows USB Port packets - bug #6520, netsh - bug #6694)
but there might exists some legal (patents for grammar/implementation?!) issues.

That would be one concern - even having "our own" language, such as wsgd, runs the risk of infringing a patent, but, 
well, *writing software of just about any sort* runs the risk of infringing a patent; however, we're dealing with a 
large corporation in the case of NPL, so there's probably a greater risk that some or all of it is covered by patents.  
Were Microsoft to explicitly state that there are no patents on NPL-the-language or that they're granting a 
royalty-free license for all implementations (perhaps with a "mutual assured destruction" clause, so that were we to 
patent some feature of Wireshark and sue Microsoft for violating that patent, our license for their patents would 
terminate), and the same applied to any patents they hold on their implementation of NPL that would block independent 
useful implementations, that might help.

With wsgd we could reuse some existing code of plugin.

...and we also have more freedom to extend the language, e.g. to support preferences for a protocol - Paul Long's blog 
post says

A common problem: “No silly, we do HTTP traffic on port 8888, not 80 or 8080!”
 
While changing port mappings for protocols could be something revealed in the user interface, we haven’t gotten that 
far in Network Monitor 3.0 yet.  I expect we should address this specific problem on different fronts, i.e. a UI for 
each protocol, and some way to handle dynamic port allocations.  And there are also some heuristics we can use to 
identify protocols as well.  But today, there is a fairly simple way to modify the NPL script for protocols on 
non-standard ports.

I don't know whether, as of 3.4, they support "a UI for each protocol, and some way to handle dynamic port 
allocations", but we already have the infrastructure for that.

NPL also, for strings, offers 3 encodings - to quote the help manual:

This data type extracts a specified number of characters from a sequence of bytes. The characters can be UTF-16, 
UTF-8, or ASCII, depending on the encoding specified.

There's no mention of the Extended Binary-Coded Decimal Interchange Code there, but we have several dissectors using 
ENC_EBCDIC, so that would be another place where we might want to extend NPL were we to use it.

Were there an "Open NPL Consortium" of some sort where multiple implementers of NPL could propose extensions, and 
perhaps a way an implementation could offer private extensions without worrying about colliding with other 
implementations or future standards, that might help.

Note, by the way, that having a language of this sort could allow something such as this.

Consider a protocol with the following description (in a C-like protocol description language that I'm making up on the 
fly):

        enum message_type {
                Login = 0,
                Logout = 1,
                Request = 2,
                Response = 3
        };

        struct login {
                ascii string username[16];
                ascii string password[16];
        };

        struct request {
                uint32 bigendian requested_item;
        };

        struct response {
                uint32 bigendian value_size;
                uint8 value[value_size];
        };

        struct request {
        protocol foo {
                uint32 bigendian enum message_type type;
                switch (type) {

                case Login:
                        struct login login;

                case Logout:
                        /* logout message has only a type */

                case Request:
                        struct request request;

                case Response:
                        struct response response;
                }
                uint32 bigendian message_id;
        };

which might translate to (in a pseudo-machine language I'm also making up on the fly):

        uint32 bigendian foo.type saveas x
        switch x:
                0       Login
                1       Logout
                2       Request
                3       Response
        Login:
                ascii string 16 foo.login.username
                ascii string 16 foo.login.password
                goto end
        Logout:
                goto end
        Request:
                uint32 bigendian foo.request.requested_item
                goto end
        Response:
                uint32 bigendian foo.response.value_size saveas y
                uint8 array y foo.response.value
                goto end
        end:
                uint32 bigendian foo.message_id

Now consider a dissection pass being done for a display filter "foo.message_id == 0x4073".  That full "compiled" 
program is overkill; that dissection pass might optimize it into

        uint32 bigendian foo.type saveas x
        switch x:
                0       Login
                1       Logout
                2       Request
                3       Response
        Login:
                skipbytes 32
                goto end
        Logout:
                goto end
        Request:
                skipbytes 4
                goto end
        Response:
                uint32 bigendian foo.response.value_size saveas y
                skipbytes y
                goto end
        end:
                uint32 bigendian foo.message_id

and, for that dissection pass, run that optimized version of the dissection "machine code" for the foo protocol, and 
similarly optimized versions of the dissection code.  The optimized versions of the dissection "machine code" might be 
generated as needed (rather than generating optimized versions for every protocol, just generate them from the base 
code the first time we try to run the code) and cached with the cache key being the set of fields in which the 
dissection in question was interested (whether because they're being used in a filter or for a column or in "-e 
{field}" in TShark or...).

This would allow us to get some of the effect of

        if (tree) {
                ...
        }

without leaving it up to humans to get it right (which humans often don't), and allow us to do more such optimization 
as well (as it's not just "do I need a protocol tree?", it's "do I need anything other than these few fields and 
whatever fields are necessary to get at those fields").

(It also raises the question of whether interpreted execution of that "machine code" or translation to C or machine 
language will be faster - interpreted execution *could* result in a smaller cache footprint if the interpreter is small 
enough and the code "high-level" enough to be fairly dense, although it does involve difficult-at-best-to-predict 
branches in the interpretive loop.)

Of course, this would allow people to extend Wireshark without needing any C developer tools, and would reduce the need 
for stability in the dissector core code.  Translating to a "machine code" of the sort shown above might also 
significantly reduce compile time (maybe with support for the CORBA IDL, building Parlay support won't dim the lights 
:-)), and if those are all loaded at startup time, it might make it easier to build configurations of Wireshark that 
don't have Every Single Protocol Known To Man and that thus start up more quickly.

On the other hand, it might also allow protocol descriptions to be shipped either in source form or binary form with 
restrictions on redistribution, providing a way to "get around the GPL" for protocols.  Some might consider that a 
feature (I seem to remember many years ago Cisco raised this issue about some protocols) and others might consider it a 
bug.  If we end up with a consensus of "it's a bug", we might be able to extend the protections of the GPL to dissector 
descriptions fed to the interpreter, so that if you make a "compiled" protocol description available, you must also 
make the source available to recipients and must give recipients the right to redistribute the source or binaries.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: