Firewall Wizards mailing list archives
Re: RE: IDS (was: FW appliance comparison)
From: "Marcus J. Ranum" <mjr () tenablesecurity com>
Date: Fri, 27 Jan 2006 14:15:17 -0500
Brian Loe wrote:
To run the transactions they have a VERY large mainframe. To collect logging I'm lucky to have gotten (since they got it for free themselves) a pseries running linux. Slight difference.
Of course there is a "slight" difference. Forgive me for not using my psychic powers to read your mind and understand your systems' configuration better from here. ;) Joking aside, I think you got my point. It's certainly possible to handle the kind of logging load you're describing. I wouldn't go quite for far as to say it's "easy" - but your original post made it sound like you felt it was a near impossibility, or something like that.
But you don't have to convince me
It _sounded_ like I do. After all, you were the one who said: "I'm not sure how we could do it" That's a whole different situation from if you'd said "I have to convince my management not to be stupid" in which case I would have entirely ignored your posting. After all, "I have to convince my stupid managers" is the de facto mating call of the security practitioner -- and it's always a political/management problem, not a technical problem, and I suck at politics due to a certain lack of political correctness and tact that results from a childhood injury I suffered when I was bitten by a radioactive slug. Some people get super powers; all I got was brain damage. Oh, well, those are the breaks.
They're running debug on every device they own right now, they're just not logging it, tracking it, analyzing it..or anything else with it - until there's a problem.
So, they obviously understand at least some of the value of collecting such information - otherwise they would have it completely turned off "for performance reasons" or something like that.
You're stating that they have to spend money - at least for disk space. I'd be laughed at...unless IBM or Cisco can do it with a "device".
So if you're working for a company so stupid that they'd rather spend $10,000 for a "device" from Cisco (or whoever) or $100,000 for something from IBM (or whatever) then none of us can help you and your best strategy is to crawl back under a rock and go back to hoping the sky doesn't fall on your head. There's nothing any of us on this list can do to help you; you should be talking to a Cisco sales-rep, instead. But - back to your assertion that I'm saying that "they have to spend money" - wait a minute: didn't you already say you had some hardware? Presumably a computer with a hard disk, right? Maybe you don't have a terabyte in the darned thing but if you managed to do something useful and demonstrate some value to it, you'd probably find that they could afford a hard disk upgrade. But what do I know?
It's not a hardware problem... But - wait - you said "database"? Please tell me you weren't trying to stick that much data into a SQL database with indexes on your tables and an interpreted query/optimizer engine on top of all that? If so, I'm not surprised it didn't work -- but that's not a "logging is hard" problem that is a "using a relational database for a write-heavy application is the wrong tool" problem.I didn't realize what I was getting into, firstly. Secondly, what good does the data do if you can't "do" anything with it?
So did you make the intellectual leap of faith that having data in a database somehow lets you "do" something with it? That's a hell of a leap, when you consider that, to be useful, your database needs to be structured to facilitiate whatever it was you were planning to do in the first place. In other words, databases don't inherently make data useful - they facilitate your performing queries that you already know are useful. In order to "do" anything with your log data you must first look at it, think about it, and decide what you are interested in. And, no, I am sorry, you can't buy thinking as a "device" from Cisco. That's the "analysis" part of "log analysis" and the primary (only) tool for that stage of the process is the good old Mark IV human brain. So, it sounds to me like you jumped into the problem without actually thinking about it, first, and failed. That should have come as no surprise. But it appears that you generalized that failure into a theory that "it can't be done." Um, no.
Without a system to at least *help* you analyze it you're simply swimming in quicksand, flailing in fact.
The last log analysis project I got dug into (Hi Paul!) we used really advanced tools like "grep" and "more" and "guinness stout" and figured out what the data looked like, then figured out what we wanted to do with it. Then I wrote a few carefully tailored 500 to 2000 line-of-code utilities in C that did the job, and let them run for a day and *poof* we found stuff! By the way, since I had about 40 gigs of bzipped log data that I was trying to find a single unknown event in, I had to take into account the speed of the various tools and do some back of the envelope math, first. If I'd used a database, we'd still be sitting waiting for results - and the project was 2 years ago.
If you know of a better way of doing this that doesn't cost money, I'm all ears
There's no silver bullet, because every organization's log data is different (in quantity and type), and what they want/need to do with it is different. The "generic" approaches all come with high hardware and software costs because the vendors that offer those solutions are trying to over-spec their systems to be able to handle a wide range of problems. That's always a more expensive strategy than sitting down and thinking about stuff and then deriving a solution that works for you.
As for IDS, I personally think its a mostly useless tool - especially the way they have it implemented here.
But you're the guy who said: "But, on the bright side, our 2k IDS system did eventually begin blocking it from all but one customer site." comparing your "$250k" log analysis system to your $2k IDS - which certainly doesn't make it sound like you think your IDS is useless. Make up your mind, would you? By the way, if you think you need to spend $250K on a log analysis system, you're off by a very wide margin. Although if your management is stupid enough to spend that much I'd be happy to solve your log analysis problems for a mere $200k. ;) I'll even epoxy an IBM sticker on it.
What did you use to pour through it?
I wrote a little doodad that ran through and picked out the log structures I was looking for, parsed the date/time fields and sorted them into files, while keeping count of certain values from the transaction fields. It did a lot of error checking on things like field lengths, sizes, and "normal" characters in the fields. One of the important things that the tool did was eject (into a separate file) copies of any message that didn't parse 100% correctly. That was based on the hypothesis that something which caused the application to screw up might cause the messages it logged to also be screwy. Turned out that was a pretty interesting hypothesis - I found about 4,000 malformed lines that pointed to a code flaw in the web server (it appeared to have a wild pointer someplace) Another thing to remember is to count stuff as you're making your pass through your logs. The first law of log analysis (and IDS) reads: The number of times an uninteresting thing happens is an interesting thing So, as a simple example, if you do nothing more than count log entries and keep a graph of that, you might learn something interesting. The big mistake everyone makes going into this stuff is assuming that they know what they are looking for, already. You don't. So you have to approach it with the zen mind of a child and treat it as a process of discovery. Look and see what is there then start asking yourself, "should I count the values of this field?" "should I count how many times this happens?" "should I keep track of every time I see a new value appear in this field?" Once you ask a bunch of questions like that then you've got a specification for a simple single-pass log analysis routine. It need not be complicated. The last one I wrote (Hi Ron!) was 12 lines of C, consisting mostly of calls to sscanf( )... (*) But, anyhow... the point is that I sat down and spent some time thinking about what the thing I was looking for MIGHT look like, then hypothesized a few ways that it might be detected, given my assumptions.
You have to be able to load that 40 gigs of data
Yeah, and log data compresses nicely (about 90% or more) so it was actually quite a bit more than that. I recall I had to buy an 80 gig hard drive for the project. $125. Wow. I never actually decompressed the stuff so I could look at it the whole thing in one big wad on a hard disk (why bother? that's what gzcat is for!) but it was "a lot" of data. So you want to design your processing to make a single pass that does everything. Avoiding using an interpreted language like perl is a good suggestion for that kind of problem. Remember, though, we're talking about my crunching through more than 100 times (decompressed) as much data as you were complaining about, in less time than it was taking you to collect the amount you were complaining about. That tells me your problem is probably highly solvable.
or break it up into something semi-coherent
I do not know what this means
and then you have to be able to scan it quickly enough to get it done within the year but not so quick you miss something...
Yeah, and this is hard why?
Tell me d(&#$#!!! The how is what I'm obviously missing...
I'm trying to!!! First, you have to overcome your assumption that it can't be done and your desire to use the wrong tools for the job. Once you've done that, start asking yourself what you're interested in within the data. Then ask yourself what you are absolutely NOT interested in within the data. Then put something in place that buckets the stuff you are NOT interested in, but counts it (knowing you tossed 20,000 firewall permit log messages today is interesting if you only tossed 2,000 firewall permit log messages yesterday!) Then skin out the fields you care about. Define the data that SHOULD be in those fields and decide on an algorithm for kicking out anything that is in those fields that doesn't match your idea of what should be there. Stuff the counts into something that keeps longterm statistics. Do structural analysis on the record formats. Set up an artificial ignorance filter or use a bayesian filter. Those are all techniques that may or may not work for you, depending on your data and your needs. As far as tools for this stuff, some of the things I've used at various times can be downloaded from my code page http://www.ranum.com/security/computer_security/code but it's probably easier in most cases to do your own thing rather than trying to understand mine. Take a close look at NBS and ask yourself what ideas you can take from its structural analysis mode. Take a look at the idea of artificial ignorance http://www.ranum.com/security/computer_security/papers/ai/ and steal ideas from that. I am quite sure that if someone wrapped some utilities around an interface that wrote lex scripts and shoved them through a compiler you could write an artificial ignorance processor capable of handling truly ginormous amounts of log data very quickly. Etc.
I don't want to be stupid about it, but outside of this list, you don't hear anything but the marketing buzz on the latest "device" to make the world a safer, happier place (and NSA compliant).
Logging, in particular, is one of those problems that does not admit to a cookie-cutter solution. Not for large volumes or interesting data, anyhow. On the other hand, it's not rocket science or even anything close to it. It's just data and extracting meaning from data is a straightforward, though personal, process. I leave you with my favorite log analysis haiku: my log compressed and compressed in a while loop hmm... disk usage zero mjr. --- * Before someone says it - if all the buffers you are scanning into are individually longer than the entire line you've read in, there is no chance of a buffer overrun. _______________________________________________ firewall-wizards mailing list firewall-wizards () honor icsalabs com http://honor.icsalabs.com/mailman/listinfo/firewall-wizards
Current thread:
- Re: RE: IDS (was: FW appliance comparison) Marcus J. Ranum (Feb 01)
- Re: RE: IDS (was: FW appliance comparison) Brian Loe (Feb 01)
- Message not available
- Re: RE: IDS (was: FW appliance comparison) Marcus J. Ranum (Feb 01)
- Re: RE: IDS (was: FW appliance comparison) Brian Loe (Feb 02)
- RE: RE: IDS (was: FW appliance comparison) Bill Royds (Feb 02)
- RE: RE: IDS (was: FW appliance comparison) Marcus J. Ranum (Feb 02)
- RE: RE: IDS (was: FW appliance comparison) Paul Melson (Feb 02)
- RE: RE: IDS (was: FW appliance comparison) Paul Melson (Feb 02)
- Re: RE: IDS (was: FW appliance comparison) david_harris (Feb 02)
- Re: RE: IDS (was: FW appliance comparison) ArkanoiD (Feb 02)
- Message not available
- Re: RE: IDS (was: FW appliance comparison) Brian Loe (Feb 01)