IDS mailing list archives
Re: Machine Learning for IDS: which dataset?
From: Stefano Zanero <zanero () elet polimi it>
Date: Mon, 19 Jun 2006 20:15:39 +0200
J.A. wrote:
I am using the KDD-99 dataset in my research work. Though it is the most well-known datasets it has several drawbacks that limits what you can do with it. As an example, and as you note, the distribution of normal data and attack data does not represents a true real network.
You may also add that some of the header fields have regularities and markers introduced by the generation mechanism (the dataset is artificial), and that the attack types were limited even in '99. All in all, that dataset is of very limited use nowadays.
I think that a better dataset is the original used to generate the KDD-99 dataset. It can be obtained from www.ll.mit.edu.
What good would this be ? Anomalies, artifacts and aged attacks are present in the original dataset as well. The only way to go is generate new datasets with repeatable, scientific approaches and start from there. Stefano ------------------------------------------------------------------------ Test Your IDS Is your IDS deployed correctly? Find out quickly and easily by testing it with real-world attacks from CORE IMPACT. Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 to learn more. ------------------------------------------------------------------------
Current thread:
- Machine Learning for IDS: which dataset? trantichphuoc (Jun 06)
- Re: Machine Learning for IDS: which dataset? Brad Carmichael (Jun 09)
- Re: Machine Learning for IDS: which dataset? J.A. (Jun 09)
- Re: Machine Learning for IDS: which dataset? Stefano Zanero (Jun 19)