IDS mailing list archives
Re: Content Inspection - Statistical methods
From: Jamie Riden <jamie.riden () gmail com>
Date: Wed, 12 Aug 2009 21:26:59 +0100
2009/8/11 Richard Bejtlich <taosecurity () gmail com>:
On Sat, Aug 8, 2009 at 1:45 PM, Glenn Wilkinson<glenn.wilkinson () gmail com> wrote:Hello IDS folks, I'm currently doing a mini-project involving applying machine learning techniques to the identification of hostile network traffic. My focus is on TCP traffic, and I'm looking at header and content based inspection. I'm wrapping up my feature extraction code now, whereby I've imported all TCP sessions from the DARPA training sets into a DB and have tagged the hostile sessions. My question is, does anyone have any bright ideas of some useful, simple content analysis attributes? As it's a statistical/ML approach I'm trying to come up with as generic as possible ideas. So far I'm calculating things like session data entropy, most frequent character, counts of certain characters. I'm brand new to this field, but am really excited about this project. Any feedback/advice would be greatly appreciated. Thanks! GHi Glenn, How about NOT using the DARPA data sets? Maybe something more modern? http://taosecurity.blogspot.com/2009/08/2009-cdx-data-sets-posted.html
Agreed - I think I remember using those for some coursework in 2001. They were a bit limited in the features extracted from the packet and the eventual winning solution - a combination of bagged/boosted decision trees - I don't think would work very well in the real world. This is all going from memory, so could be absolute rubbish. The real problem in ML seems to be finding good, accurately labelled training data :( cheers, Jamie ----------------------------------------------------------------- Securing Your Online Data Transfer with SSL. A guide to understanding SSL certificates, how they operate and their application. By making use of an SSL certificate on your web server, you can securely collect sensitive information online, and increase business by giving your customers confidence that their transactions are safe. http://www.dinclinx.com/Redirect.aspx?36;5001;25;1371;0;1;946;9a80e04e1a17f194
Current thread:
- Content Inspection - Statistical methods Glenn Wilkinson (Aug 11)
- Re: Content Inspection - Statistical methods Federico Maggi (Aug 11)
- Re: Content Inspection - Statistical methods Richard Bejtlich (Aug 12)
- Re: Content Inspection - Statistical methods Jamie Riden (Aug 13)
- Re: Content Inspection - Statistical methods Stefano Zanero (Aug 14)
- Re: Content Inspection - Statistical methods Jamie Riden (Aug 13)