IDS mailing list archives

Re: TippingPoint Releases Open Source Code for FirstIntrusionPrev ention Test Tool, Tomahawk


From: Greg Shipley <gshipley () neohapsis com>
Date: Thu, 4 Nov 2004 00:41:10 -0600 (CST)


On Tue, 2 Nov 2004, Martin Roesch wrote:

Measuring latency, throughput, etc is also best done in an environment
where you can setup repeatable test environments or at least where you
can setup repeatable baseline environments to transmit your pcaps over
the top of.  Tcpreplay doesn't meet this requirement particularly well
all by itself, nor will the TippingPoint software.

Greg Shipley and the Neohapsis guys can comment on this stuff better
than I, but one thing that I've learned from building Sourcefire for
the past ~4 years is that testing gigabit IDS/IPS systems requires
considerable expertise and infrastructure if you want to do anything
more than just test basic detection capability.

To expand a bit on what Marty states above (and for the record, I agree
with everything stated) pcap replay is going to get you a limited amount
of functionality, testing-wise.  In fact, the problems associated with
replay-based testing lead Neohapsis to move away from the model sometime
around 1999; we realized there were far too many issues to address.  Now,
that's not to say that it isn't useful in some scenarios, but I would
caution anyone planning to use replay as a sole method of
testing/verifying a product's performance.

My main concern (as a public tester) with replay-based testing is that it
is often signature/vendor dependent.  One vendor's replayed benign traffic
may be another vendor's false positive, and vice-versa....which can be
used to manipulate the tester's experience if you haven't verified the
replayed traffic.  Because signature writing is still such an artform (and
probably will be for some time)  there are cases where a slight mutation
in the attack method will result in the IDS missing the attack.  Clearly
this is more than just a testing problem, but this isn't just theory,
either;  we've witnessed it first-hand.

For example, I've watched our lab team find errors in IDS signatures while
products went through the IDS rounds of OSEC testing.  Now, OSEC was never
meant to be a signature quality verification effort (that's certainly NOT
part of the test criteria), but it wound up serving a minor QA role for
parts of the market as even with the 20-some sigs we looked at/for we
found (and helped correct) misfires and "false negatives" (no detection)
with almost all of the products.

An interesting side effect of this exercise was the minimal amount of
convincing we had to do of the vendor dev teams as the result of the
"live" (read: no replay) testing method.  It's one thing to argue that
one's pcap file is bad (and in turn, it's not the IDS sig's fault), it's
quite another to debate that root shell prompt that's staring you in the
face!  Or put another way, if I launch a remote attack against a victim
box and successfully exploit/compromise it, guess what?  That was a real
attack - there's little to debate if the IDS didn't flag it.  pcap replay,
however, opens an entire angle for debate that real exploitation
eliminates.

Now, this is clearly an issue us public testers have to worry about, but
it affects closed-door tests, too.  For example, suppose VendorX supplies
you with a pcap capture that doesn't trigger ANY false positives (which is
kind of a funny concept in itself if you think about it - they are all
kinda "false," but we can debate the metaphysical aspects some other
time... :) but somehow it triggers false positives with VendorY's product?
Or suppose VendorX's pcap-based testing tool generates something VendorY's
IDS doesn't detect because of a particular mutation of Z attack?  But
suppose VendorX can't detect mutation Q of that same attack?  Or worse,
suppose the pcap capture is crap, and a VendorY's box doesn't flag the
attack CAUSE ITS NOT A REAL/VALID ATTACK?

Would you know?  Would you verify the packet capture files?  Do you make a
product selection based on these observations?

----------

Replay aside, we've also seen IDS products fail under varying traffic
conditions using varying protocols.  Good engine flexing requires you to
keep x variables consistent/steady while tweaking only one or two other
variables.  This is what ultimately drove us to using Spirent's
WebAvalanche suite; granular control over variables (traffic levels,
packet sizes, session counts, address ranges, etc.)  Replay CAN play a
role if you simply want to determine engine thresholds.  (Use a replay
injection with independent traffic volume X, then X+1, then X+2, etc.,
until the device stops flagging the injected traffic.  But make sure your
surrounding infrastructure can handle that x+_ load.  Heh - more
complications...)

Finally, in regards to vendors sharing pcap files: taking this point one
step farther, if VendorA builds their sig around traffic/pcap set A, and
VendorB builds their sig around pcap set B (which may be slightly
different), studying both can give one a leg up on R&D or alternatively,
give an attacker a leg up on circumvention methods.

Of course a lot of this depends HEAVILY on the protocol the attack occurs
over, the vulnerability specifics, etc., and I'm sure there are signature
writers on this list that can go into way more detail than I can...

[And before we get hit with the "x codes all of its signatures against the
vulnerabilities and not the exploits" marketspeak please spare us -
vendors have been claiming this for years, and every vendor on this planet
has cut a signature corner at one point or another.  And for better or
worse, we've run into it head-on during our testing... ]

-----------------------------

If I don't shutup though you guys will wind up with a book of an email in
your inbox...so I'll end it here.  In short, I'm encouraged to see
TippingPoint release code into open-source and I certainly don't want to
discourage that behavior - thank you TippingPoint!  But I've also seen the
vendor finger-pointing contests - and the fallout - when people don't
understand the variables at play or the limitations of their test tools.
Replay can be good for making sure some base functionality is present but
know it's limitations - there are many.

Greetings from Chicago,

-Greg



--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from 
CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 
to learn more.
--------------------------------------------------------------------------


Current thread: