Firewall Wizards mailing list archives

Re: Recording slow scans


From: "Marcus J. Ranum" <mjr () nfr net>
Date: Wed, 07 Oct 1998 13:47:21 -0400

Stephen P. Berry wrote:
The major moving part in my roll-your-own IDS is tcpdump(8).  I suspect
this is probably true of most home-grown IDS solutions.  It is also
appears to be true of, for example, the SHADOW project.

Lots of folks use tcpdump. Depending on the platform you're
running it on, take its results with a grain or 2 of salt.
We've observed on busy networks that tcpdump reports zero
packets lost - but network analyzers and NFRs see more traffic
than tcpdump did. Hmmmm.... :)  Just an FYI. Solaris was
particularly not so hot in this regard.

This leads to the first big performance bottleneck:  filtering.
If you want to test the datastream against n filters, you've got
to run your data through tcpdump n times[2].  This is where my IDS
spends most of its cycles.

Well, obviously, that's a design flaw with using a piece of
filtering software that's designed to only evaluate a single
condition efficiently! My experience is that tcpdump is great
for limited stuff but then it hits a wall pretty fast, design-wise.
Of course, when I design stuff, I tend to overkill on conceptual
openness.:)

What I want to do is twiddle libpcap
to take n filters on the command line (foo, bar, baz...) and spit the
results of each filter into a separate file (foo.dump, bar.dump,
baz.dump...), but that's not going to happen until I get a few
spare cycles myself.  Anyone else know of anything already out there
that'll do this sort of thing, preferably available under GPL (or
something similar)?

By the time you've done that you'll have wound up writing your
own NFR, or Bro, or argus or NNstat.

Of course I could get around this if I wanted to boil down my data
down prior to the initial filtering.

Right. That's why just about every IDS/monitoring system (including
NFR, Argus, Bro, etc) has a tiered approach internally, wherein
the traffic is manipulated into an accessible form and then
multiple rules are applied against that form.

The second major bottleneck I encounter is at the other end of the
analysis cycle, in long-term analysis in the database itself.

*THIS* is the really interesting open problem!

Meaningful statistical analysis takes huge amounts of crunch time
for the size tables I tend to see.  There might be a simple
heuristic for determining the optimal period over which analysis
should be done, but I sure as hell don't know one.  As a result,
analysis generally covers everything from yesterday (or whenever
data was last imported into the database) and back as far as
storage will allow.  So it's slow.

Right. What you really need to be able to do is preload the
kind of analysis you want to do, so as much of it as possible
is done in realtime as the data comes in. I haven't worked with
databases for a long time, but what it amounts to is pushing the
first stage of query optimization into the data gathering
loop -- then if the results come true, you run the rest of the
query.

mjr.
--
Marcus J. Ranum, CEO, Network Flight Recorder, Inc.
work - http://www.nfr.net
home - http://www.clark.net/pub/mjr



Current thread: