Firewall Wizards mailing list archives

Re: RE: IDS (was: FW appliance comparison)

From: "Marcus J. Ranum" <mjr () tenablesecurity com>
Date: Fri, 27 Jan 2006 14:15:17 -0500

Brian Loe wrote:

To run the transactions they have a VERY large mainframe. To collect
logging I'm lucky to have gotten (since they got it for free
themselves) a pseries running linux. Slight difference.


Of course there is a "slight" difference. Forgive me for not using my
psychic powers to read your mind and understand your systems'
configuration better from here. ;)

Joking aside, I think you got my point. It's certainly possible to
handle the kind of logging load you're describing. I wouldn't go
quite for far as to say it's "easy" - but your original post made it
sound like you felt it was a near impossibility, or something like that.

But you don't have to convince me


It _sounded_ like I do. After all, you were the one who said:
"I'm not sure how we could do it"
That's a whole different situation from if you'd said "I have to
convince my management not to be stupid" in which case I
would have entirely ignored your posting. After all, "I have to
convince my stupid managers" is the de facto mating call
of the security practitioner -- and it's always a political/management
problem, not a technical problem, and I suck at politics due to
a certain lack of political correctness and tact that results from
a childhood injury I suffered when I was bitten by a
radioactive slug. Some people get super powers; all I got was
brain damage. Oh, well, those are the breaks.

They're running debug on every device they own
right now, they're just not logging it, tracking it, analyzing it..or
anything else with it - until there's a problem.


So, they obviously understand at least some of the value of
collecting such information - otherwise they would have it
completely turned off "for performance reasons" or something
like that.

You're stating that
they have to spend money  - at least for disk space. I'd be laughed
at...unless IBM or Cisco can do it with a "device".


So if you're working for a company so stupid that they'd rather
spend $10,000 for a "device" from Cisco (or whoever) or $100,000
for something from IBM (or whatever) then none of us can help
you and your best strategy is to crawl back under a rock and go
back to hoping the sky doesn't fall on your head. There's nothing
any of us on this list can do to help you; you should be talking
to a Cisco sales-rep, instead.

But - back to your assertion that I'm saying that "they have to
spend money" - wait a minute: didn't you already say you had
some hardware? Presumably a computer with a hard disk, right?
Maybe you don't have a terabyte in the darned thing but if you
managed to do something useful and demonstrate some value
to it, you'd probably find that they could afford a hard disk
upgrade. But what do I know?

It's not a hardware problem... But - wait - you said "database"?
Please tell me you weren't trying to stick that much data into
a SQL database with indexes on your tables and an interpreted
query/optimizer engine on top of all that? If so, I'm not surprised
it didn't work -- but that's not a "logging is hard" problem that is
a "using a relational database for a write-heavy application is
the wrong tool" problem.


I didn't realize what I was getting into, firstly. Secondly, what good
does the data do if you can't "do" anything with it?


So did you make the intellectual leap of faith that having data
in a database somehow lets you "do" something with it? That's
a hell of a leap, when you consider that, to be useful, your
database needs to be structured to facilitiate whatever it
was you were planning to do in the first place. In other words,
databases don't inherently make data useful - they facilitate
your performing queries that you already know are useful.

In order to "do" anything with your log data you must first
look at it, think about it, and decide what you are interested
in. And, no, I am sorry, you can't buy thinking as a "device"
from Cisco. That's the "analysis" part of "log analysis" and
the primary (only) tool for that stage of the process is the
good old Mark IV human brain.

So, it sounds to me like you jumped into the problem
without actually thinking about it, first, and failed. That
should have come as no surprise. But it appears that
you generalized that failure into a theory that "it can't
be done."  Um, no.

Without a system
to at least *help* you analyze it you're simply swimming in quicksand,
flailing in fact.


The last log analysis project I got dug into (Hi Paul!) we used
really advanced tools like "grep" and "more" and "guinness stout"
and figured out what the data looked like, then figured out what
we wanted to do with it. Then I wrote a few carefully tailored
500 to 2000 line-of-code utilities in C that did the job, and
let them run for a day and *poof* we found stuff!  By the way,
since I had about 40 gigs of bzipped log data that I was trying to find
a single unknown event in, I had to take into account the speed
of the various tools and do some back of the envelope math,
first. If I'd used a database, we'd still be sitting waiting for
results - and the project was 2 years ago.

If you know of a better way of doing this that doesn't
cost money, I'm all ears


There's no silver bullet, because every organization's
log data is different (in quantity and type), and what they
want/need to do with it is different. The "generic" approaches
all come with high hardware and software costs because
the vendors that offer those solutions are trying to over-spec
their systems to be able to handle a wide range of problems.
That's always a more expensive strategy than sitting down
and thinking about stuff and then deriving a solution that
works for you.

As for IDS, I
personally think its a mostly useless tool - especially the way they
have it implemented here.


But you're the guy who said:
"But, on the bright side, our 2k IDS system did
eventually begin blocking it from all but one customer site."
comparing your "$250k" log analysis system to your $2k IDS -
which certainly doesn't make it sound like you think your
IDS is useless. Make up your mind, would you?

By the way, if you think you need to spend $250K on a log
analysis system, you're off by a very wide margin. Although
if your management is stupid enough to spend that much
I'd be happy to solve your log analysis problems for a
mere $200k. ;) I'll even epoxy an IBM sticker on it.

What did you use to pour through it?


I wrote a little doodad that ran through and picked out the
log structures I was looking for, parsed the date/time fields
and sorted them into files, while keeping count of certain
values from the transaction fields. It did a lot of error
checking on things like field lengths, sizes, and "normal"
characters in the fields. One of the important things that
the tool did was eject (into a separate file) copies of any
message that didn't parse 100% correctly. That was based
on the hypothesis that something which caused the
application to screw up might cause the messages it logged
to also be screwy. Turned out that was a pretty interesting
hypothesis - I found about 4,000 malformed lines that pointed
to a code flaw in the web server (it appeared to have a wild
pointer someplace)

Another thing to remember is to count stuff as you're
making your pass through your logs. The first law of
log analysis (and IDS) reads:
The number of times an uninteresting thing happens is an interesting thing
So, as a simple example, if you do nothing more than
count log entries and keep a graph of that, you might learn
something interesting.

The big mistake everyone makes going into this stuff is
assuming that they know what they are looking for, already.
You don't. So you have to approach it with the zen mind of
a child and treat it as a process of discovery. Look and see
what is there then start asking yourself, "should I count the
values of this field?" "should I count how many times this
happens?" "should I keep track of every time I see a new
value appear in this field?"   Once you ask a bunch of
questions like that then you've got a specification for a simple
single-pass log analysis routine. It need not be complicated.
The last one I wrote (Hi Ron!) was 12 lines of C, consisting
mostly of calls to sscanf( )... (*)   

But, anyhow... the point is that I sat down and spent some time
thinking about what the thing I was looking for MIGHT look like, then
hypothesized a few ways that it might be detected, given
my assumptions.

You have to be able to load that
40 gigs of data


Yeah, and log data compresses nicely (about 90% or more) so it
was actually quite a bit more than that. I recall I had to buy an
80 gig hard drive for the project. $125. Wow. I never actually
decompressed the stuff so I could look at it the whole thing
in one big wad on a hard disk (why bother? that's what gzcat
is for!) but it was "a lot" of data. So you want to design your
processing to make a single pass that does everything.
Avoiding using an interpreted language like perl is a good
suggestion for that kind of problem.

Remember, though, we're talking about my crunching through
more than 100 times (decompressed) as much data as you were
complaining about, in less time than it was taking you to
collect the amount you were complaining about. That tells
me your problem is probably highly solvable.

or break it up into something semi-coherent


I do not know what this means

and then
you have to be able to scan it quickly enough to get it done within
the year but not so quick you miss something...


Yeah, and this is hard why?

Tell me d(&#$#!!! The how is what I'm obviously missing...


I'm trying to!!! First, you have to overcome your assumption
that it can't be done and your desire to use the wrong tools
for the job. Once you've done that, start asking yourself
what you're interested in within the data. Then ask yourself
what you are absolutely NOT interested in within the data.
Then put something in place that buckets the stuff you
are NOT interested in, but counts it (knowing you tossed
20,000 firewall permit log messages today is interesting
if you only tossed 2,000 firewall permit log messages
yesterday!) Then skin out the fields you care about. Define
the data that SHOULD be in those fields and decide on
an algorithm for kicking out anything that is in those
fields that doesn't match your idea of what should be
there. Stuff the counts into something that keeps longterm
statistics. Do structural analysis on the record formats.
Set up an artificial ignorance filter or use a bayesian filter.
Those are all techniques that may or may not work for
you, depending on your data and your needs. 

As far as tools for this stuff, some of the things I've used
at various times can be downloaded from my code page
http://www.ranum.com/security/computer_security/code
but it's probably easier in most cases to do your own
thing rather than trying to understand mine. Take a close
look at NBS and ask yourself what ideas you can take
from its structural analysis mode. Take a look at the
idea of artificial ignorance
http://www.ranum.com/security/computer_security/papers/ai/
and steal ideas from that. I am quite sure that if someone
wrapped some utilities around an interface that wrote
lex scripts and shoved them through a compiler you
could write an artificial ignorance processor capable of
handling truly ginormous amounts of log data very
quickly. Etc.

I don't want to be stupid about it, but outside of this list, you
don't hear anything but the marketing buzz on the latest "device" to
make the world a safer, happier place (and NSA compliant).


Logging, in particular, is one of those problems that
does not admit to a cookie-cutter solution. Not for
large volumes or interesting data, anyhow. On the other
hand, it's not rocket science or even anything close
to it. It's just data and extracting meaning from data
is a straightforward, though personal, process.

I leave you with my favorite log analysis haiku:
        my log compressed and
        compressed in a while loop
        hmm... disk usage zero

mjr.
---
* Before someone says it - if all the buffers you are scanning into are
individually longer than the entire line you've read in, there is no chance
of a buffer overrun. 

_______________________________________________
firewall-wizards mailing list
firewall-wizards () honor icsalabs com
http://honor.icsalabs.com/mailman/listinfo/firewall-wizards

Current thread:

Re: RE: IDS (was: FW appliance comparison) Marcus J. Ranum (Feb 01)
- Re: RE: IDS (was: FW appliance comparison) Brian Loe (Feb 01)
  - Message not available
    - Re: RE: IDS (was: FW appliance comparison) Marcus J. Ranum (Feb 01)
    - Re: RE: IDS (was: FW appliance comparison) Brian Loe (Feb 02)
    - RE: RE: IDS (was: FW appliance comparison) Bill Royds (Feb 02)
    - RE: RE: IDS (was: FW appliance comparison) Marcus J. Ranum (Feb 02)
    - RE: RE: IDS (was: FW appliance comparison) Paul Melson (Feb 02)
    - RE: RE: IDS (was: FW appliance comparison) Paul Melson (Feb 02)
    - Re: RE: IDS (was: FW appliance comparison) david_harris (Feb 02)
    - Re: RE: IDS (was: FW appliance comparison) ArkanoiD (Feb 02)
- RE: RE: IDS (was: FW appliance comparison) Jon Czerwinski (Feb 02)

(Thread continues...)