tcpdump mailing list archives

Re: Proposed new pcap format


From: Christian Kreibich <christian () whoop org>
Date: Wed, 14 Apr 2004 12:21:40 -0700

On Wed, 2004-04-14 at 00:06, Jefferson Ogata wrote:

I'm suggesting the pcap storage format be XML. A raw capture, without using 
protocol dissectors, would just be a sequence of base64-encoded (perhaps) frames 
and metadata.

But once you're using raw base64-encoded (or whatever), you're losing
the benefit of any xml-enabled app to understand what's contained.

Tools like the tcpdump protocol dissectors and tethereal could then just be XML 
filters that take a raw XML input frame and annotate it with protocol elements, 
as in the rough example I posted. Existing XML tools, e.g. xsltproc, could 
generate reports from the annotated XML using XSLT. The reports could as easily 
be HTML output as plain text or more XML.

I really doubt that a feature like HTML output is what the majority of
pcap users need ...

Additional protocol dissectors for protocols unknown to tcpdump/tethereal could 
be written in any language with XML support (preferably event-based). In fact, 
many protocol analyzers could be written directly in XSLT/XPath and processed 
using xsltproc. Among other things, this provides many means to eliminate the 
continuing problem of buffer overflows. tcpdump could have a plugin architecture 
with an XML filter for each protocol/frame type.

Well I think to be consistent you'd have to make those pcap plugins (as
pcap will be the component writing out the trace files). If at any point
you want a plugin to convert base64-encoded raw data into structured xml
then I don't see how that will prevent buffer overflows. Sure, as long
as you're within the xml world, that problem will be reduced.

I'm suggesting that we use XML as the capture file format so that tcpdump 
becomes an extensible XML filter.

I also believe that the performance hit of parsing the packet data into
XML before writing the packets out will be too hight for applications
that want to get the packets to disk as quickly as possible. And if in
that case, you turn off any analyzers and output raw base64, then you
lose all the benefits anyway.

Or you can throw all that musing away. Just pay attention to the discussion for 
a little while --

Hey I am :)

 it revolves around timestamp and metadata formats, sizes of 
fields, and other esoterica that are sounding a bit archaic in today's computing 
environment. I think we should take a hard look at whether it's really 
appropriate to define yet another hard binary file format when XML can provide 
the same functionality with modest storage overhead, and has many added benefits.

Trust me, I am not one of the default hardcore XML haters, but I don't
see why a tagged binary format isn't enough in the case at hand. If
somebody finds that some hash value bitfield isn't large enough then
create another tag format -- I don't see that problem there, you can
always encode length values in those headers anyway to keep things
flexible in the first place.

I don't like the idea of XML as the lowest common denominator for a
capture format -- as a processing-stage output it sounds great to me.

Regards,
Christian.
-- 
________________________________________________________________________
                                          http://www.cl.cam.ac.uk/~cpk25
                                                    http://www.whoop.org


-
This is the tcpdump-workers list.
Visit https://lists.sandelman.ca/ to unsubscribe.


Current thread: