Firewall Wizards mailing list archives

Final sct: parsing logs ultra-fast inline


From: "Marcus J. Ranum" <mjr () ranum com>
Date: Wed, 01 Feb 2006 19:22:24 -0500

Lastly, to comment on the issue of detecting new forms of
log messages I use a simple hack that I call "structural analysis"
for lack of a better term. The idea is simple: try to knock the
variant fields of the message out and create a template that
represents the format of the message.

Abe asked me 3 years ago (that long? holy crap!) if this process
could be automated and we tried and the answer appears to be
"not really" - but - there are still useful outcomes from the
technique.

You take a log message and run it through a routine that
offers up a candidate template. I.e.:
mjr@lyra-> head -1  /var/log/messages
Feb  1 04:00:01 lyra newsyslog[12809]: logfile turned over
So that's our input. Then we run it into X4, which creates the
template with knocked-out fields and frequencies for the
things in the fields. I.e.:
BEGINREC
FIELDS=11
UPDATES=1
EXAMPLE=Feb  1 04:00:01 lyra newsyslog[12809]: logfile turned over
TEMPLATE=%s %d %d:%d:%d %s %s[%d]: %s %s %s
FIELD 1 COUNT 1
1=Feb
FIELD 2 COUNT 1
1=1
FIELD 3 COUNT 1
1=04
FIELD 4 COUNT 1
1=00
FIELD 5 COUNT 1
1=01
FIELD 6 COUNT 1
1=lyra
FIELD 7 COUNT 1
1=newsyslog
FIELD 8 COUNT 1
1=12809
FIELD 9 COUNT 1
1=logfile
FIELD 10 COUNT 1
1=turned
FIELD 11 COUNT 1
1=over
ENDREC

OK, that's not particularly interesting except for a few things. We might
write a post-processor that runs through the resulting templates and
takes any record where there are more than 100 instances and turns
all the fields that never changed in 100 instances into static strings.
So in this case our template would transform from:
TEMPLATE=%s %d %d:%d:%d %s %s[%d]: %s %s %s
to
TEMPLATE=%s %d %d:%d:%d %s %s[%d]: logfile turned over

As it happened, writing the post-processor was difficult and there
were a lot of collisions in the dataset we were using - but the
results got us to within a point where we probably could have
developed a complete log-analysis template codex in a predictable
amount of person-power (about 4 person-years, but it'd parallelize
and outsource well) - within the grasp of a well-capitalized SIM
startup...

Anyhow, using a bastardized version of X4's templatizing
routine I built it into NBS so that NBS would kick out the first
instance of a new template structure going into a log file. That
turned out to be quite useful. Especially at detecting things where
the log message is really weird, like:
named[34]: parse error in IPH!)*OAIUFYJA JEI(*!&*AAAJAFJA AAA blargh

X4 is at present undocumented but functional and somewhat
interesting. If anyone wants the sources to play with Email me. NBS
is somewhat documented and is on my website under the
computer security->code link. An interesting variant of X4
might be to have it kick out records where a new value in a
variant field of a particular template was discovered. That would
result in a "detector" for interesting things like new email senders,
new email recipients, etc. Need I add that X4 is fast? It ate 100gigs
of log data in 23 hours..

mjr.

_______________________________________________
firewall-wizards mailing list
firewall-wizards () honor icsalabs com
http://honor.icsalabs.com/mailman/listinfo/firewall-wizards


Current thread: