IDS mailing list archives

Re: Machine Learning for IDS: which dataset?


From: Stefano Zanero <zanero () elet polimi it>
Date: Mon, 19 Jun 2006 20:15:39 +0200

J.A. wrote:

I am using the KDD-99 dataset in my research work. Though it is the most
well-known datasets it has several drawbacks that limits what you can do
with it. As an example, and as you note,  the distribution of normal
data and attack data does not represents a true real network.

You may also add that some of the header fields have regularities and
markers introduced by the generation mechanism (the dataset is
artificial), and that the attack types were limited even in '99.

All in all, that dataset is of very limited use nowadays.

I think that a better dataset is the original used to generate the
KDD-99 dataset. It can be obtained from www.ll.mit.edu.

What good would this be ? Anomalies, artifacts and aged attacks are
present in the original dataset as well.

The only way to go is generate new datasets with repeatable, scientific
approaches and start from there.

Stefano


------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it 
with real-world attacks from CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 
to learn more.
------------------------------------------------------------------------


Current thread: