Firewall Wizards mailing list archives

Re: Handling large log files


From: david () lang hm
Date: Wed, 6 May 2009 05:30:01 -0700 (PDT)

On Tue, 5 May 2009, Nate Hausrath wrote:

Hello everyone,

I have a central log server set up in our environment that would
receive around 200-300 MB of messages per day from various devices
(switches, routers, firewalls, etc).  With this volume, logcheck was
able to effectively parse the files and send out a nice email.  Now,
however, the volume has increased to around 3-5 GB per day and will
continue growing as we add more systems.  Unfortunately, the old
logcheck solution now spends hours trying to parse the logs, and even
if it finishes, it will generate an email that is too big to send.

I'm somewhat new to log management, and I've done quite a bit of
googling for solutions.  However, my problem is that I just don't have
enough experience to know what I need.  Should I try to work with
logcheck/logsentry in hopes that I can improve its efficiency more?
Should I use filters on syslog-ng to cut out some of the messages I
don't want to see as they reach the box?

I have also thought that it would be useful to cut out all the
duplicate messages and just simply report on the number of times per
day I see each message.  After this, it seems likely that logcheck
would be able to effectively parse through the remaining logs and
report the items that I need to see (as well as new messages that
could be interesting).

Are there other solutions that would be better suited to log volumes
like this?  Should I look at commercial products?

I don't like the idea of filtering out messages completely, the number of times that an otherwise 'unintersting' message shows up can be significant (if the number of requests for a web image per day suddenly jumps to 100 times what it was before, that's a significant thing to know)

the key is to categorize and summarize the data. I have not found a good commercial tool to do this job (there are good tools for drilling down and querying the logs), the task of summarizing the data is just too site specific. I currently get 40-80G of logs per day and have a nightly process that summarizes them.

I first have a process (perl script) that goes through the logs and splits them into seperate files based on the program name in the logs. Internally it does a lookup of the program name to a bucket name and then outputs the message to that bucket (this lets be combine all the mail logs to one file, no matter which OS they are from and all the different ways that the mail software identifies itself). for things that I haven't defined a specific bucket for, I have a bucket called 'other'

I then run seperate processes against each of these buckets to create summary reports of the information in that bucket. some of these processes are home-grown scripts, some are log summary scripts that came with specific programs.

one of the reports is how mnay log messages there are in each bucket (this report is generated by my splitlogs program)

for the 'other' bucket, I have a sed line from hell that filters out 'unintersting' details in the log messages (timestamps, port numbers, etc) and then run them through a sort|uniq -c |sort -rn to produce a report that shows how many times a log message that looks like this shows up (the sed line works hard to collaps similar messages togeather)

I then have a handful of scripts that assemble e-mails from these reports (different e-mails reporting on different things going to different groups). For a lot of the summaries I don't put the entire report in the e-mail, but instead just do a head -X (X=20-50 in many cases) to show the most common items.

for example, I have a report that shows all the websites that were hit by people on the desktop network. I have another report that shows the hits by desktop -> website. I generate an e-mail showing the top 50 entries in each of these reports and send it to the folks looking for unusual activity on the desktop network (it's amazing how accuratly a simple report like this can pinpoint a problem desktop machine)

getting this setup takes a bit of time and tuning, but with a bit of effort you can quickly knock out a LOT of your messages, and then you start finding interesting things (machines that are misconfigured and generating errors on a regular basis, etc). as you fix some of these problems, the other report goes from an overwelming tens of thousands of lines, to a much smaller report. just concentrate on killing the big items and don't try to deal with the entire report at once (the nightly e-mail to me shows the top several hundred lines of this report so that I can work on tuning it. when I can keep up on the tuning it's not unusual for this to be the entire report)

with this approach (and a reasonably beefy log reporting machine), it takes about 3-6 hours to generate the report (6 hours being the 80G days)

I have other tools watch the logs in real-time for known bad things (to generate alerts), and am installing splunk to let me go searching in the logs when I find something in the reports that I want to investigate further (with this sort of log volume, just doing a grep through the logs can take days)

hope this helps.

David Lang
_______________________________________________
firewall-wizards mailing list
firewall-wizards () listserv icsalabs com
https://listserv.icsalabs.com/mailman/listinfo/firewall-wizards


Current thread: