IDS mailing list archives

Re: Re: Intrusion Detection Evaluation Datasets


From: zubair.shafiq () yahoo com
Date: 10 Mar 2009 08:55:29 -0000

An ideal IDS dataset will be fully diverse (in terms of type of attacks) and completely free of artifacts (incurred 
during creation and pre-processing). However, ideal scenarios do not hold in real-life! -- if they do then they will 
not be real...

I agree that it is very hard to obtain datasets with payloads due to privacy constraints. Good anonymization procedures 
mostly retain the relative statistics of the data. For example, you may consult the following work by people at ICSI. 

http://www.icir.org/enterprise-tracing/devil-ccr-jan06.pdf

An overwhelming majority of network based IDSs use only spatial information present in packet headers. The datasets 
that I have mentioned in my earlier post can be used to evaluate such IDSs. Moreover, you can find details of the 
endpoint worm propagation dataset in the following papers:

http://www.nexginrc.org/papers/tr15-zubair.pdf
http://www.nexginrc.org/papers/gecco08-zubair.pdf

In my view, there are two directions to take dataset labeling further:

1. Improving injection procedures to ensure minimization of artifacts. This is more feasible if you know all parameters 
and environmental conditions during trace collection -- Know Thy Data. 

2. Use "semi-automated" ~ "semi-manual" procedures. 

@Stefano: You have probably missed this point. Semi-automated procedures still require manual intervention, however, it 
will help to reduce its magnitude significantly. So, we are not exactly developing a typical anomaly detection system. 

let me know what you think.



Current thread: