Firewall Wizards mailing list archives

Re: Handling large log files

From: david () lang hm
Date: Wed, 6 May 2009 05:30:01 -0700 (PDT)

On Tue, 5 May 2009, Nate Hausrath wrote:

Hello everyone,

I have a central log server set up in our environment that would
receive around 200-300 MB of messages per day from various devices
(switches, routers, firewalls, etc).  With this volume, logcheck was
able to effectively parse the files and send out a nice email.  Now,
however, the volume has increased to around 3-5 GB per day and will
continue growing as we add more systems.  Unfortunately, the old
logcheck solution now spends hours trying to parse the logs, and even
if it finishes, it will generate an email that is too big to send.

I'm somewhat new to log management, and I've done quite a bit of
googling for solutions.  However, my problem is that I just don't have
enough experience to know what I need.  Should I try to work with
logcheck/logsentry in hopes that I can improve its efficiency more?
Should I use filters on syslog-ng to cut out some of the messages I
don't want to see as they reach the box?

I have also thought that it would be useful to cut out all the
duplicate messages and just simply report on the number of times per
day I see each message.  After this, it seems likely that logcheck
would be able to effectively parse through the remaining logs and
report the items that I need to see (as well as new messages that
could be interesting).

Are there other solutions that would be better suited to log volumes
like this?  Should I look at commercial products?

I don't like the idea of filtering out messages completely, the number oftimes that an otherwise 'unintersting' message shows up can be significant(if the number of requests for a web image per day suddenly jumps to 100times what it was before, that's a significant thing to know)

the key is to categorize and summarize the data. I have not found a goodcommercial tool to do this job (there are good tools for drilling down andquerying the logs), the task of summarizing the data is just too sitespecific. I currently get 40-80G of logs per day and have a nightlyprocess that summarizes them.

I first have a process (perl script) that goes through the logs and splitsthem into seperate files based on the program name in the logs. Internallyit does a lookup of the program name to a bucket name and then outputs themessage to that bucket (this lets be combine all the mail logs to onefile, no matter which OS they are from and all the different ways that themail software identifies itself). for things that I haven't defined aspecific bucket for, I have a bucket called 'other'

I then run seperate processes against each of these buckets to createsummary reports of the information in that bucket. some of these processesare home-grown scripts, some are log summary scripts that came withspecific programs.

one of the reports is how mnay log messages there are in each bucket (thisreport is generated by my splitlogs program)

for the 'other' bucket, I have a sed line from hell that filters out'unintersting' details in the log messages (timestamps, port numbers, etc)and then run them through a sort|uniq -c |sort -rn to produce a reportthat shows how many times a log message that looks like this shows up (thesed line works hard to collaps similar messages togeather)

I then have a handful of scripts that assemble e-mails from these reports(different e-mails reporting on different things going to differentgroups). For a lot of the summaries I don't put the entire report in thee-mail, but instead just do a head -X (X=20-50 in many cases) to show themost common items.

for example, I have a report that shows all the websites that were hit bypeople on the desktop network. I have another report that shows the hitsby desktop -> website. I generate an e-mail showing the top 50 entries ineach of these reports and send it to the folks looking for unusualactivity on the desktop network (it's amazing how accuratly a simplereport like this can pinpoint a problem desktop machine)

getting this setup takes a bit of time and tuning, but with a bit ofeffort you can quickly knock out a LOT of your messages, and then youstart finding interesting things (machines that are misconfigured andgenerating errors on a regular basis, etc). as you fix some of theseproblems, the other report goes from an overwelming tens of thousands oflines, to a much smaller report. just concentrate on killing the big itemsand don't try to deal with the entire report at once (the nightly e-mailto me shows the top several hundred lines of this report so that I canwork on tuning it. when I can keep up on the tuning it's not unusual forthis to be the entire report)

with this approach (and a reasonably beefy log reporting machine), ittakes about 3-6 hours to generate the report (6 hours being the 80G days)

I have other tools watch the logs in real-time for known bad things (togenerate alerts), and am installing splunk to let me go searching in thelogs when I find something in the reports that I want to investigatefurther (with this sort of log volume, just doing a grep through the logscan take days)


hope this helps.

David Lang
_______________________________________________
firewall-wizards mailing list
firewall-wizards () listserv icsalabs com
https://listserv.icsalabs.com/mailman/listinfo/firewall-wizards

Current thread:

Handling large log files Nate Hausrath (May 05)
- Re: Handling large log files Marcin Antkiewicz (May 05)
  - Re: Handling large log files Nate Hausrath (May 06)
    - Re: Handling large log files david (May 06)
  - Re: Handling large log files Marcus J. Ranum (May 06)
- Re: Handling large log files Paul Melson (May 05)
- Re: Handling large log files david (May 06)
- Re: Handling large log files Swaminathan, Gayathri (May 06)
- Re: Handling large log files hugh.fraser (May 07)
  - Re: Handling large log files sai (May 08)
  - Re: Handling large log files Nate Hausrath (May 08)
- Re: Handling large log files Gyöngyösi Péter (May 11)