IDS mailing list archives

Re: Machine Learning for IDS: which dataset?


From: "J.A." <centurion () phreaker net>
Date: Thu, 08 Jun 2006 11:13:29 +0200

trantichphuoc () yahoo com wrote:
Hi there,
I am interested in applying machine learning algorithms in detecting network intrusions. I read many papers and 
realized that the KDD-99 is the most well-known dataset used in the field. However, this dataset is provided by MIT in 
1999, and obviously, its pretty old. As we all know, the defensive technologies are fast, and also the hacking 
techniques. Clearly, the KDD-99 dataset would not provide the true representation of a network at the current time. So, 
could anyone plz tell me which dataset is more updated, specialized for machine learning research in IDS?
Thanks
Patrick


Hi, Patrick.

I am using the KDD-99 dataset in my research work. Though it is the most well-known datasets it has several drawbacks that limits what you can do with it. As an example, and as you note, the distribution of normal data and attack data does not represents a true real network.

I think that a better dataset is the original used to generate the KDD-99 dataset. It can be obtained from www.ll.mit.edu.

Cheers


Juan A. Suárez-Romero

------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from CORE IMPACT. Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 to learn more.
------------------------------------------------------------------------


Current thread: