IDS mailing list archives

RE: parsing very large tcpdump files


From: Michael Miller <michael.miller () state co us>
Date: Mon, 22 Nov 2004 10:13:35 -0700

I've found you have to pick your battles wisely. If you assume mail and HTTP
are going to be logged elsewhere, and you decide you can ignore ARP and ping
traffic, I think you'll find you can really reduce the amount of data
processed.

Now, you can still _gather_ the information, just run it through tcpdump
again and strip off mail/web/arp/ping related traffic.

If you find someone doing something 'inappropriate', you can go back to the
full logs and just extract the data with them as a destination or source.



-----Original Message-----
From: Carlos Henrique P C Chaves [mailto:cae () lac inpe br] 
Sent: Friday, November 19, 2004 10:59 AM
To: Tom
Cc: focus-ids () securityfocus com
Subject: Re: parsing very large tcpdump files

Hy Tom,

i'm a brazilian msc candidate and may research is about detecting backdoors
and covert channels (tunnels) analysing the network traffic. We have some
dump files
of over than 700MB and the TCP/IP reconstructing tool takes a lot of time to
do the job. When i say a lot of time, i'm talking about more than one day,
and this
dump file is related to one hour of traffic of our institution, and we just
use the header information for this. Imagine a full-package reconstruction. 
We are focusing on analysing C Class traffics separately, and just some
protocols.

The point is: i don't think that is feasable to reconstruct such amount of
traffic. I don't know
a open-source/free tool tha would do this. Maybe someone in this list can
clear my mind. 

One nice step is trying to remove the noise from the dump to reduce its
size.  

You can use a flow analysis tool, such as Argus, but it won't give you all
the information you want.

Best regards,

Carlos Henrique. 

On Thu, Nov 18, 2004 at 06:29:32PM -0500, Tom wrote:

moderator: sorry if this is vague.  My requirements are not fixed yet and
will probably change from case to case, therefore I am just looking for
generic info now.

I was wondering if anyone on this list can recommend some tools
(Opensource or commercial) to automate the parsing of very large (many GB)
tcpdump files.  I am trying to put together a generic toolset but in general
some things I'd like to do are:

1. Filter out traffic to/from a specific IP address or range
2. Reconstruct all reconstructable sessions in an easy to parse way:
emails, web sites visited (and content uploaded/downloaded), voip, anything
else imaginable.
3. Be able to search all of this data for keywords. 

This may seem like a tall order.  I know of a few tools to do individual
tasks on a small scale, such as mailsnarf, vomit, ethereal,  etc. but it's
not practical to use ethereal to parse these by hand. I've tried
chaosreader.pl but it bogs down on files as small as 200 MB.

I'd appreciate any input.

Thanks.



_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!

--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from 
CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 
to learn more.
--------------------------------------------------------------------------

--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from 
CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 
to learn more.
--------------------------------------------------------------------------

--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from 
CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 
to learn more.
--------------------------------------------------------------------------


Current thread: