Firewall Wizards mailing list archives

Re: Recording slow scans


From: "Stephen P. Berry" <spb () incyte com>
Date: Mon, 19 Oct 1998 19:39:36 -0700

-----BEGIN PGP SIGNED MESSAGE-----


Darren Reed <darrenr () reed wattle id au> wrote:

How many times do people need to reinvent the wheel ? 

Until there's a GPL'd wheel out there that I happen to like.  Because
sometimes all I want is the wheel, and most vendors are car salesmen.

Well, why don't you write one for us ?  I'm sure we'd all appreciate
your time and effort spent on such a project.

I did code up the changes I mentioned previously (twiddling libpcap(3)
and tcpdump(8) to accept multiple filters and output the results to multiple
files).  When I have a few more spare cycles, I'll document the changes
and submit them to the libpcap/tcpdump maintainers.  I have no idea
how receptive said maintainers are to unsolicited changes, so I have
no idea whether or not the changes are likely to make it into future
releases.

Anyway, this twiddle and a couple like it really clear up the
bottleneck in first-order filtering---the major bottleneck which seems
to be characteristic of most tcpdump-based IDSen.

One of the other tweaks I've tried is dyking out the native tcpdump
output routines and inserting some that do just what I want.  This
saves having to run the data through a perl (or awk, or whatever)
script to turn the somewhat balky tcpdump output into a more
standardised, simply-delimited format.  The proper way to handle
this is of course to use a tool which is configurable to output 
in reasonably arbitrary formats.  Whether or not tcpdump is
a good base from which to build such a tool is of course an open
question.

But the reason why I'm mentioning all this here is to point out
that with comparatively minor coding one can produce fairly
usable tools with materials which are already freely available.

I'm won't argue that the sort of IDS you get with a roll-you-own
solution is as good as NFR, say.  I do contend, however, that for
many applications such solutions are sufficient.  And that having
such tools and the knowledge generally available is overall a Good
Thing.



In responding I observed that I generally
see two bottlenecks:  one in my first-order filtering;  and one during
the analysis in the database itself.

I'm not sure that there is a bottleneck problem at the database so much as
just working with the data at an appropriate speed to get it there.

Unless we're talking at cross purposes, I'm pretty sure there's
a bottleneck in the database.  If one is interested in analysing
data over fairly long periods of time (in terms of months rather
than hours, say), even reasonably narrow pipes are going to be
producing enough data to bog down most analysis engines.  It's
possible that there are arguments against attempting this sort of
long-term analysis, or methods for greatly streamlining it;  I'm
open to suggestions.



One alternative might be to (say) have a box with 5 ethernet ports on it,
1 being the data "tap" and the other 4 for syhponing data off to boxes
for processing.  For example, you might have one box dedicated to doing
TCP processing, another for UDP, another for ICMP/IGMP and another for
the remainder.

I've considered something similar, but discarded the idea.  When I'm
doing long-term analysis, I'm very interested in correlations.
Distributing analysis by protocol seems to be a way to maximise the
difficulty of establishing the sorts of correlations of interest.  It's
possible that some scheme for parallelising each of a number of
steps in a serial analysis might be the optimal solution (it is for
many problems), but I can't think of any such approach that would
work here.



The obvious answer (or at least the one which seems obvious to me) is
to determine a baseline and then look for anomalies.  What remains
an open question, however, is how to best set about doing this.

Start collecting your data!
As for how to determine the baseline and do real-time IDS with pattern
based matching, etc, a good dose of AI might be appropriate :)

Or scrap the AI and use an expert system if you want it working this
millennia.

Anyway, I think your general sentiment is right.  One of the weaknesses
I see in the analysis end of almost all current IDS systems is that they
are signature-based.  I suspect that the long-term best strategy will
be to use some sort of automated procedure to characterise traffic
on the hoof, and then generate anomaly warnings when the traffic
varies from the observed baseline.  By way of analogy, I'd compare
current IDS analysis engines to chess programmes with a modest
opening book and an alpha-beta search for playing outside it.  Even
as we beef up the book, we still lose when we can't make good decisions
without it.  Hopefully a useful engine for characterising network
traffic won't have to be a brawny as Deep Blue.

That being said, I'll hasten to add that there's nothing necessarily wrong
with playing out of the book---indeed, that appears to be the best
and only way to play the game at the moment.







- -Steve


-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBNiv3xCrw2ePTkM9BAQHLFQQAgOorjTrtVTtg3yK25RwLLZ7hbBagoJM8
WawYztETLgYiilrCgLm/nMRvCkA0L1WTr4v2VsQsI34kIQb5OUxsl9/KjT+8cQiS
6tPz/ANjztttL9Eyy4MZ/Y0lttOVUxe2AiPRkZlUKqPuO/uCIlrqzCQQm08ThqvE
pkeHeld98ZI=
=ypQQ
-----END PGP SIGNATURE-----



Current thread: