IDS mailing list archives

Re: Processing time and IDS traffic

From: "SecurIT Informatique Inc." <securit () iquebec com>
Date: Fri, 15 Aug 2003 12:41:14 -0400

At 05:10 PM 11/08/2003, Eric Knight wrote:

Greetings,


Hello Eric.

I've been working on a 'universal framework' application for collecting,
analyzing, charting, log management, control, etc. for "anything goes"
(forensics, anti-virus, IDS, firewalls, etc.) in a client/multi-tiered
server environment.  At the moment, its all for Microsoft Windows. The
project has gone wonderfully, and I've been working on expanding the
horizons of my programs to include the majority of popular tools as it was
intended.

I've been working on a similar project for the past few months (LogIDS1.0/LogAgent 4.0/ComLog 1.05), with the main difference being that insteadof generating XML trees, I present the data on a graphical representationof a network map. Other than that, the goal is the same, a universal wayto treat and parse log files from various security tools, from (ideally)all the hosts on a network. My solution too is Windows based only. Youcan download it from my page at http://securit.iquebec.com.

One of the external applications I've been integrating is Snort, mostly
because its reviews were outstanding and readily available to work with.  I
created a test environment using Snort that generates about 1 error every
second and I've let it collect 75,000 reported elements (roughly 20
megabytes of logs.)

Regarding your question, I'm missing something. You say Snort generatesone alert per second, but that your tool tries to take on 75000 records ata time. My point here is that if you fetch your records from the logfilesas it is being written to, then your program should not have any problem atreading, analysing and treating 1 records/second. If it does, then theperformance leaves to be desired. If your program works such that it takeson a batch of records and then generates reports for this batch, then thefact that Snort reports 1 record per second becomes irrelevant. In thiscase, I'm not sure if 10 minutes of processing is good or not for 75000records, it can probably be improved in some ways, but you'll never be ableto obtain "instant" result with such volume.

What I did was parse the logs into XML records and arranged them into a nice
pleasant tree sorted by error type, origin, destination, protocol, port,
etc. and collected totals by severity, time, total attacks, traffic, etc.
Then displayed them in a tree structure that's easy to search through and
make digested reports with.  Not sure if its the best arrangement for all
uses, but it seems to be certainly friendlier than the flat lists I normally
see.

The problem is, 75,000 records takes about 10 minutes for my test computer
to parse, sort and process.  It isn't a fast box (Duron 750/256meg ram) and
its mostly overburdened anyway running Snort + development environment in
debug, but it raised my eyebrow because the code is fairly optimized (for
Java.)  However, I'm disappointed that it isn't next-to-instant (because,
well, I'm -always- disappointed when something isn't next to instant.
*grins*)  I'm already considering re-doing the whole process in C++, but I'm
wondering what the process time other people have for similar calculations,
how many records people usually get on average/day from a typical,
strategically placed IDS system and what people get from a IDS system
located on an exposed workstation (personal firewall?)  I really have no
idea what performance I'm targeting for.

Here again, part of my answer would depend on how you fetch your records(as it is being written, or batch processing). For batch processing, youcan indeed improve your results with a better/faster CPU and loads ofmemory, in which case you don't have much control on this part. If youfetch it as it is being written to disk, then here you have a differenthardware bottleneck: hard disk. Bear in mind that the hard disk isprobably the slowest component on a PC, and that for each record beingwritten disk, another request is sent to the hard disk controller to readit back, which doubles the I/O requests to the hard drive. Count a thirdrequest if you are writing back the record to a directory in yourapplication folder tree. This causes the limitation not to be in the sheernumber of records being processed, but how many per seconds. Here again,better hardware will turn out better performance, but algorithmoptimisation can do a lot to help here, and compensate for thebottleneck. But all this record writting/reading from disk takes CPUcycles, and this will impact un the number of records being processed persecond. So when you reach your performance treshold, what happens is thatyou still take care of 100% of the records, but the analysis lags behind interms of processing, which means that you analyzer may still be processingseconds or even minutes after the reporting application goessilent. Unfortunately, I can't give you number figures.

Other than that, I don't know much about XML, so I couldn't tell you ifthis is a CPU-hungry task or not. I don't do Java neither, but here Ithink I can drop 2¢ more. I am not sure that Java is the language ofchoice for such a task if performance is your goal. On my laptop, when theJava machine loads because I just browsed into a webpage containing Java,just loading the Java interpreter machine takes forever (over 30 seconds, Idon't have a single app that takes nearly so long to load). I could agreethat once loaded, the Java machine could work in reasonable speed for mostweb-based tasks, bear in mind that Java is an interpreted language, whichis always slower that compiled code. I did my tools in Perl, which is alsoan interpreted language, but the interpreter is much lighter to the CPUthan Java. I get fairly decent performance from it, but I also useperl2exe to compile my programs into binaries for easier distribution. I'mnot sure on how perl2exe really works, but chances are that it's simply awrapper for perl embedded with the source code of the program, which wouldmean that it is still interpreted in some way rather than true compiledcode. Well, even if it is the case, I made some tests (loop several 10000of times, and check how much time in seconds it took), and gained roughly10% performance in time it took to "process" (6 seconds gain on a 58seconds process). So, all that being said, if you can re-implement youridea in C/C++, which in turn gives you true compiled code, you willinstantly get noticeable performance gain. Here again, I can't give youfigures, although you can expect to do better than my 10% with Perl.

As for my own program, I too have to deal with some performance problems,which I am currently working on. Still, it is a good proof-of-concept of atool for real-time universal log analysis. I am still waiting to receivesome feedback from the community about it, although downloads are going up,which must be a good thing :-)


Hope this helps, and glad to see some new desings circulating around
Floydman

Thanks for your time,

Eric Knight

---------------------------------------------------------------------------
Captus Networks - Integrated Intrusion Prevention and Traffic Shaping  
 - Instantly Stop DoS/DDoS Attacks, Worms & Port Scans
 - Automatically Control P2P, IM and Spam Traffic
 - Ensure Reliable Performance of Mission Critical Applications
Precisely Define and Implement Network Security and Performance Policies
**FREE Vulnerability Assessment Toolkit - WhitePapers - Live Demo
Visit us at: http://www.captusnetworks.com/ads/31.htm
---------------------------------------------------------------------------

Current thread:

Processing time and IDS traffic Eric Knight (Aug 11)
- Re: Processing time and IDS traffic SecurIT Informatique Inc. (Aug 15)