Firewall Wizards mailing list archives

Re: Recording slow scans


From: "Stephen P. Berry" <spb () incyte com>
Date: Tue, 06 Oct 1998 18:41:04 -0700

-----BEGIN PGP SIGNED MESSAGE-----


"Paul D. Robertson" <proberts () clark net> wrote:

On Sun, 4 Oct 1998, Darren Reed wrote:
Is anyone pumping IDS or NFR data into a real database
(Oracle, etc) for later analysis ?

I've just got Sybase for Linux (what a download!) on a home system, and I 
expect to upgrade my FreeBSD box next to it so that I can start playing 
with it.  If anyone's gotten further, I'm interested in hearing about it.

I haven't twiddled around with NFR, but I do use a home-grown IDS
to get packets from the wire and into a database.



There's one other important issue to this and that is to keep track of all
IP and port pairs which communicate, regardless of TCP flags, etc.  Whether
or not your paranoia requires that level of effort is another thing...

I think that if you can aggragate the data sufficiently to not run out of 
space, it's worth having around.  I'm hoping that an NFR/DB combination 
will give me that.

A record of just ports, IPs and protocols compresses down quite nicely[1],
even if it's being stored as a gzip'd flat ASCII file.  Surprisingly,
I generally haven't found storage to be the biggest bottleneck in
IDS performance.


The major moving part in my roll-your-own IDS is tcpdump(8).  I suspect
this is probably true of most home-grown IDS solutions.  It is also
appears to be true of, for example, the SHADOW project.

This leads to the first big performance bottleneck:  filtering.
If you want to test the datastream against n filters, you've got
to run your data through tcpdump n times[2].  This is where my IDS
spends most of its cycles.  What I want to do is twiddle libpcap
to take n filters on the command line (foo, bar, baz...) and spit the
results of each filter into a separate file (foo.dump, bar.dump,
baz.dump...), but that's not going to happen until I get a few
spare cycles myself.  Anyone else know of anything already out there
that'll do this sort of thing, preferably available under GPL (or
something similar)?

Of course I could get around this if I wanted to boil down my data
down prior to the initial filtering.  That is, reduce the data to
IPs, ports and protocols, and then run my filters on -that- data rather
than on the raw tcpdump(8) packet stream.  The reason why I don't
do this is because most of my decisions about what merits further
investigation are the result of this first-order filtering.  Since
this tends to be when I'm looking most at the fine detail (i.e.,
what is a given packet doing) rather than the big picture (i.e.,
how does a given packet fit into the larger pattern[3]), this is
when I'm most likely to want more information that just the IPs,
ports and protocols involved.  Running the filters on the raw packet
stream and generating dumps containing just the `interesting' packets
appears to be a net win in terms of amount of effort involved in
completing a meaningful first-level analysis.  


The second major bottleneck I encounter is at the other end of the
analysis cycle, in long-term analysis in the database itself.
Meaningful statistical analysis takes huge amounts of crunch time
for the size tables I tend to see.  There might be a simple
heuristic for determining the optimal period over which analysis
should be done, but I sure as hell don't know one.  As a result,
analysis generally covers everything from yesterday (or whenever
data was last imported into the database) and back as far as
storage will allow.  So it's slow.

In general, this is okay, as the sorts of things one tends to
learn from long- term analysis aren't the sorts of things that need
attention this hour more than the next.  Granted, if something
suspicious is happening you want to know about it as soon as possible
possible---but if you're learning about it from trend analysis, chances
are you're either safe or you're looking at doing damage control
instead of preventative maintenance anyway.







- -Steve

- -----
1     `Quite nicely' here means `to between two and three orders of
      magnitude the size of the raw tcpdump(8) output'.
2     Presuming you want the results of each filter separately.
3     Or rather the larger patterns---there's never only one.

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBNhrGoyrw2ePTkM9BAQGPiQP/fne4614KE1kzmz1kfZCQ6XHVXQW99pgp
I2uX9jk/+1taLtmMFSMJMkQi6dpeff9+fKQJPZPv+VACp8cHp6PhN+kNuUaHmiKl
wYhPXI+6Sw7lXc4A8CQLPUP26qtMxNGsK/bcT7Zca5Bp8V5xqmyqfrth41U0lodr
ipxbnFeEbgo=
=AyF/
-----END PGP SIGNATURE-----



Current thread: