Firewall Wizards mailing list archives
Re: Handling large log files
From: david () lang hm
Date: Wed, 6 May 2009 10:39:57 -0700 (PDT)
On Wed, 6 May 2009, Nate Hausrath wrote:
First, thanks for the great responses! Aside from the fact that we need a beefier system (2x P3 1.4 GHz, 3 GB RAM, RAID-5... ouch), it looks like I have a lot of work to do.
raid 5 is not nessasarily a problem.one surprise I ran into when configuring my splunk systems is that for read-only situations, raid 5/6 can be as fast as raid 0, the big overhead of raid 5/6 is when you are writing data.
so what I do is have the incoming logs written to one disk (pair of mirrord drives), indexed there, and once all the work is done it gets copied to the raid6 array, and that array is otherwise read-only
Also, thanks for providing some idea of the specs I will need to use for a central log server. I believe our goal is to have around 300 servers sending logs (most of them should be less chatty than the current ones). If you don't mind me asking, roughly how many servers should I expect to have generate 1 GB of logs? I realize there really isn't an accurate answer here, but I'm trying to get a rough ballpark figure.
this depends so much on your systems that any answer is pretty meaningless.
in the absense of other information, I would just extrapolate from your current systems
Marcin wrote: - see if the architecture can be improved. Can you use multiple log servers? Is there a logical way of segmenting the log traffic - OS to box 1, db transactions to box 2, etc.? Post to the project's mailing list, there should be people who use it for larger installations, and willing/able to provide specific suggestions.I'll see if this is an option. Along these lines, I'd eventually like to be able to turn log messages into events and be able to correlate them with other messages, IDS alerts, etc. I think that once I compress the duplicates, and get rid of a lot of noise, I could forward the results to an OSSIM box and use it for correlation, alerts, etc.
this gets a lot harder than you think, but you don't nessasarily need to pre-filter the logs, the correlattion engines are going to be doing regex matching on the logs themselves.
David wrote: the key is to categorize and summarize the data. I have not found a good commercial tool to do this job (there are good tools for drilling down and querying the logs), the task of summarizing the data is just too site specific. I currently get 40-80G of logs per day and have a nightly process that summarizes them.This is good to know as well. I'd like to avoid commercial tools if possible to save money (although Splunk seems pretty darn useful).
you can do everything with free tools, it's just a matter of manpower ;-) for nightly reports, you can use the plan I listedfor alert generation and event correlation, look at SEC (simple event correlator)
the part that is hard to do on the cheap is to efficiantly be able to search the logs.
if you have an idea of what you are looking for ahead of time, you can split the logs into different files for different types of events, then just search the subset of items, but if you don't anticipate things, you end up needing to do a full-text search through your logs. Postgres does have good full-text indexing capabilities, but as you grow you will get to the point where it takes more than one machine to get an answer back in a reasonable amount of time (just due to the fact that you have so much index data to search through to find where to go for the real data), and at that point you need some sort of clustered datastore. those aren't cheap, (even for the commercial version of postgres), and if you haven't already figured out how to do this, there is a lot of value in buying one of the commercial solutions that have that stuff more-or-less figured out for you.
David Lang _______________________________________________ firewall-wizards mailing list firewall-wizards () listserv icsalabs com https://listserv.icsalabs.com/mailman/listinfo/firewall-wizards
Current thread:
- Handling large log files Nate Hausrath (May 05)
- Re: Handling large log files Marcin Antkiewicz (May 05)
- Re: Handling large log files Nate Hausrath (May 06)
- Re: Handling large log files david (May 06)
- Re: Handling large log files Marcus J. Ranum (May 06)
- Re: Handling large log files Nate Hausrath (May 06)
- Re: Handling large log files Paul Melson (May 05)
- Re: Handling large log files david (May 06)
- Re: Handling large log files Swaminathan, Gayathri (May 06)
- Re: Handling large log files hugh.fraser (May 07)
- Re: Handling large log files sai (May 08)
- Re: Handling large log files Nate Hausrath (May 08)
- Re: Handling large log files Gyöngyösi Péter (May 11)
- Re: Handling large log files Marcin Antkiewicz (May 05)