IDS mailing list archives

Re: Intrusion Detection Evaluation Datasets


From: Paul Palmer <paul_palmer () us ibm com>
Date: Thu, 12 Mar 2009 16:43:07 -0400

Terry,

Stefano,

An overwhelming majority of network based IDSs use only spatial
information present in packet headers.

"spatial" information ? if you mean "IP addresses", then

I took "spatial" information to mean connection or packet header data
-- more than just IP addresses, but lacking the unstructured data
portions.

1) your statement is definitely not true and

Actually, I think it is: the majority of unique NIDSs that I am
familiar with were built to use the KDD Cup '99 dataset. I pray none
of those systems are actually used in production anywhere. Let's face
it, only a handful of signature based network intrusion detectors were
ever built. After Marty released Snort to the community, there really
hasn't been a need to build another. Sure, a couple have been so that
they wouldn't be "encumbered" by the open source license, but there
really haven't been any major changes to signature based detection in
the past decade (just thousands of tweaks). Most anomaly or machine

If you extend your familiarity with the NIDS/NIPS industry further, I 
think you will ultimately find that Stefano is correct. I think that you 
will find that the majority of the top NIDS/NIPS products use "signature" 
engines that are not based upon SNORT technology. I think you will also 
ultimately find that while SNORT is very good at what it was designed to 
do, it is not a universal solution. Trade-offs were made. The product has 
strengths and weaknesses. SNORT is not the sine qua non of NIDS. 
Mercifully, Marty has left some of the market for the rest of us :)

learning based detectors will only work with structured data, so they
limit themselves to the header portions of the packets or connection
records.

2) such IDSs "work" only because of the artifacts in the evaluation 
datasets

We can't really say that conclusively. At this point we can only say
that any successes demonstrated by those systems has been due to flaws
in the evaluation datasets. For lack of good evaluation datasets, we
have no idea how those systems might perform in real world
environments. More importantly, for any system which requires training
data we must question how portable it is across different networks;
should it require unique training data for a given network, is it
feasible that such training data will ever be available?

I see a lot of people saying (correctly) that advanced (non-signature
based) NIDS can't be researched until we have good evaluation
datasets, and I see a lot of people ignoring them and doing it anyway.
Is anyone (else) actually working on fixing the data problem?

Cheers,
Terry



Paul



Current thread: