IDS mailing list archives

Re: Processing time and IDS traffic


From: "SecurIT Informatique Inc." <securit () iquebec com>
Date: Fri, 15 Aug 2003 12:41:14 -0400

At 05:10 PM 11/08/2003, Eric Knight wrote:

Greetings,

Hello Eric.

I've been working on a 'universal framework' application for collecting,
analyzing, charting, log management, control, etc. for "anything goes"
(forensics, anti-virus, IDS, firewalls, etc.) in a client/multi-tiered
server environment.  At the moment, its all for Microsoft Windows. The
project has gone wonderfully, and I've been working on expanding the
horizons of my programs to include the majority of popular tools as it was
intended.

I've been working on a similar project for the past few months (LogIDS 1.0/LogAgent 4.0/ComLog 1.05), with the main difference being that instead of generating XML trees, I present the data on a graphical representation of a network map. Other than that, the goal is the same, a universal way to treat and parse log files from various security tools, from (ideally) all the hosts on a network. My solution too is Windows based only. You can download it from my page at http://securit.iquebec.com.

One of the external applications I've been integrating is Snort, mostly
because its reviews were outstanding and readily available to work with.  I
created a test environment using Snort that generates about 1 error every
second and I've let it collect 75,000 reported elements (roughly 20
megabytes of logs.)

Regarding your question, I'm missing something. You say Snort generates one alert per second, but that your tool tries to take on 75000 records at a time. My point here is that if you fetch your records from the logfiles as it is being written to, then your program should not have any problem at reading, analysing and treating 1 records/second. If it does, then the performance leaves to be desired. If your program works such that it takes on a batch of records and then generates reports for this batch, then the fact that Snort reports 1 record per second becomes irrelevant. In this case, I'm not sure if 10 minutes of processing is good or not for 75000 records, it can probably be improved in some ways, but you'll never be able to obtain "instant" result with such volume.

What I did was parse the logs into XML records and arranged them into a nice
pleasant tree sorted by error type, origin, destination, protocol, port,
etc. and collected totals by severity, time, total attacks, traffic, etc.
Then displayed them in a tree structure that's easy to search through and
make digested reports with.  Not sure if its the best arrangement for all
uses, but it seems to be certainly friendlier than the flat lists I normally
see.

The problem is, 75,000 records takes about 10 minutes for my test computer
to parse, sort and process.  It isn't a fast box (Duron 750/256meg ram) and
its mostly overburdened anyway running Snort + development environment in
debug, but it raised my eyebrow because the code is fairly optimized (for
Java.)  However, I'm disappointed that it isn't next-to-instant (because,
well, I'm -always- disappointed when something isn't next to instant.
*grins*)  I'm already considering re-doing the whole process in C++, but I'm
wondering what the process time other people have for similar calculations,
how many records people usually get on average/day from a typical,
strategically placed IDS system and what people get from a IDS system
located on an exposed workstation (personal firewall?)  I really have no
idea what performance I'm targeting for.

Here again, part of my answer would depend on how you fetch your records (as it is being written, or batch processing). For batch processing, you can indeed improve your results with a better/faster CPU and loads of memory, in which case you don't have much control on this part. If you fetch it as it is being written to disk, then here you have a different hardware bottleneck: hard disk. Bear in mind that the hard disk is probably the slowest component on a PC, and that for each record being written disk, another request is sent to the hard disk controller to read it back, which doubles the I/O requests to the hard drive. Count a third request if you are writing back the record to a directory in your application folder tree. This causes the limitation not to be in the sheer number of records being processed, but how many per seconds. Here again, better hardware will turn out better performance, but algorithm optimisation can do a lot to help here, and compensate for the bottleneck. But all this record writting/reading from disk takes CPU cycles, and this will impact un the number of records being processed per second. So when you reach your performance treshold, what happens is that you still take care of 100% of the records, but the analysis lags behind in terms of processing, which means that you analyzer may still be processing seconds or even minutes after the reporting application goes silent. Unfortunately, I can't give you number figures.

Other than that, I don't know much about XML, so I couldn't tell you if this is a CPU-hungry task or not. I don't do Java neither, but here I think I can drop 2ยข more. I am not sure that Java is the language of choice for such a task if performance is your goal. On my laptop, when the Java machine loads because I just browsed into a webpage containing Java, just loading the Java interpreter machine takes forever (over 30 seconds, I don't have a single app that takes nearly so long to load). I could agree that once loaded, the Java machine could work in reasonable speed for most web-based tasks, bear in mind that Java is an interpreted language, which is always slower that compiled code. I did my tools in Perl, which is also an interpreted language, but the interpreter is much lighter to the CPU than Java. I get fairly decent performance from it, but I also use perl2exe to compile my programs into binaries for easier distribution. I'm not sure on how perl2exe really works, but chances are that it's simply a wrapper for perl embedded with the source code of the program, which would mean that it is still interpreted in some way rather than true compiled code. Well, even if it is the case, I made some tests (loop several 10000 of times, and check how much time in seconds it took), and gained roughly 10% performance in time it took to "process" (6 seconds gain on a 58 seconds process). So, all that being said, if you can re-implement your idea in C/C++, which in turn gives you true compiled code, you will instantly get noticeable performance gain. Here again, I can't give you figures, although you can expect to do better than my 10% with Perl.

As for my own program, I too have to deal with some performance problems, which I am currently working on. Still, it is a good proof-of-concept of a tool for real-time universal log analysis. I am still waiting to receive some feedback from the community about it, although downloads are going up, which must be a good thing :-)

Hope this helps, and glad to see some new desings circulating around
Floydman

Thanks for your time,

Eric Knight


---------------------------------------------------------------------------
Captus Networks - Integrated Intrusion Prevention and Traffic Shaping  
 - Instantly Stop DoS/DDoS Attacks, Worms & Port Scans
 - Automatically Control P2P, IM and Spam Traffic
 - Ensure Reliable Performance of Mission Critical Applications
Precisely Define and Implement Network Security and Performance Policies
**FREE Vulnerability Assessment Toolkit - WhitePapers - Live Demo
Visit us at: http://www.captusnetworks.com/ads/31.htm
---------------------------------------------------------------------------

Current thread: