Firewall Wizards mailing list archives
Final sct: parsing logs ultra-fast inline
From: "Marcus J. Ranum" <mjr () ranum com>
Date: Wed, 01 Feb 2006 19:22:24 -0500
Lastly, to comment on the issue of detecting new forms of log messages I use a simple hack that I call "structural analysis" for lack of a better term. The idea is simple: try to knock the variant fields of the message out and create a template that represents the format of the message. Abe asked me 3 years ago (that long? holy crap!) if this process could be automated and we tried and the answer appears to be "not really" - but - there are still useful outcomes from the technique. You take a log message and run it through a routine that offers up a candidate template. I.e.: mjr@lyra-> head -1 /var/log/messages Feb 1 04:00:01 lyra newsyslog[12809]: logfile turned over So that's our input. Then we run it into X4, which creates the template with knocked-out fields and frequencies for the things in the fields. I.e.: BEGINREC FIELDS=11 UPDATES=1 EXAMPLE=Feb 1 04:00:01 lyra newsyslog[12809]: logfile turned over TEMPLATE=%s %d %d:%d:%d %s %s[%d]: %s %s %s FIELD 1 COUNT 1 1=Feb FIELD 2 COUNT 1 1=1 FIELD 3 COUNT 1 1=04 FIELD 4 COUNT 1 1=00 FIELD 5 COUNT 1 1=01 FIELD 6 COUNT 1 1=lyra FIELD 7 COUNT 1 1=newsyslog FIELD 8 COUNT 1 1=12809 FIELD 9 COUNT 1 1=logfile FIELD 10 COUNT 1 1=turned FIELD 11 COUNT 1 1=over ENDREC OK, that's not particularly interesting except for a few things. We might write a post-processor that runs through the resulting templates and takes any record where there are more than 100 instances and turns all the fields that never changed in 100 instances into static strings. So in this case our template would transform from: TEMPLATE=%s %d %d:%d:%d %s %s[%d]: %s %s %s to TEMPLATE=%s %d %d:%d:%d %s %s[%d]: logfile turned over As it happened, writing the post-processor was difficult and there were a lot of collisions in the dataset we were using - but the results got us to within a point where we probably could have developed a complete log-analysis template codex in a predictable amount of person-power (about 4 person-years, but it'd parallelize and outsource well) - within the grasp of a well-capitalized SIM startup... Anyhow, using a bastardized version of X4's templatizing routine I built it into NBS so that NBS would kick out the first instance of a new template structure going into a log file. That turned out to be quite useful. Especially at detecting things where the log message is really weird, like: named[34]: parse error in IPH!)*OAIUFYJA JEI(*!&*AAAJAFJA AAA blargh X4 is at present undocumented but functional and somewhat interesting. If anyone wants the sources to play with Email me. NBS is somewhat documented and is on my website under the computer security->code link. An interesting variant of X4 might be to have it kick out records where a new value in a variant field of a particular template was discovered. That would result in a "detector" for interesting things like new email senders, new email recipients, etc. Need I add that X4 is fast? It ate 100gigs of log data in 23 hours.. mjr. _______________________________________________ firewall-wizards mailing list firewall-wizards () honor icsalabs com http://honor.icsalabs.com/mailman/listinfo/firewall-wizards
Current thread:
- Final sct: parsing logs ultra-fast inline Marcus J. Ranum (Feb 02)