IDS mailing list archives

Re: NIPS solutions


From: nick black <dank () suburbanjihad net>
Date: Sat, 1 May 2004 23:54:07 +0000 (UTC)

On 2004-04-20, Andreas Hess <hess () tkn tu-berlin de> wrote:

Disclaimer:  I am the lead developer of the Reflex Interceptor as
discussed below.  I speak for myself here, not Reflex Security.

Especially I wonder if either single processor or multiple processor 
machines are used?

We employ both UP and SMP Xeon-based boxen for the Interceptor.  Our
2-way Xeon product is our best-performing offering, according to our
most recent testing across high loads.  On low loads it is not
necessarily a gain, but these loads are a) handled without problems in
any case and b) not the ones this box would be sold for.  Certain
pedantic cases can also be slower under our particular parallelized
approach; upon entering a lossy state, utilization is measured against
parallelization cost and SMP methodology can be disabled (this is a
standard feedback-based scheduling technique, necessary when tasked
with handling maliciously-crafted input).


I just explain my point of view. I realized a simple NIPS that is 
running on a linux machine. The intrusion prevention system is running 
as a thread in kernel space. So, each packet that is arriving at the 

Mmm, while you likely understand what you mean, the terminology here is
a bit loose - Linux has "kthreads" in addition to the standard
process/thread model.  More importantly, I would surely hope no one's
developing their analysis logic directly within the kernel.  It'd be a
nightmare to debug, and I've yet to see a design which can provably beat
proper use of a mmap()ed packet socket.

network interface triggers an hardware interrupt that is instantly 
processed by the Linux OS. Consequently the intrusion prevention thread 

Not necessarily.  Any IPS running on Linux failing to make use of
NAPI-supported cards (and interrupt binding) isn't worth the box it's
sold in.

An IPS solution that is running on a dual or multiple processor machine 
would not suffer under this limitation. But  it is a real hassle to get 
useful  information from manufacturers.

This is far too wide of a generalization to effectively address.  Issues
of cache bouncing, lock contention etc demand very precise analysis to
measure.  Despite having designed the system from the ground up to be
SMP-aware, we found only speed losses until I'd ripped out much of the
complex synchronization necessitated by a highly parallelizable model.
Currently, we only export truly processor-bound, heavyweight computation
to helper threads (of which there is a current maximum of one).  In many
cases, there's little of this to be done per packet, and we our binding
of interrupts must take this and particular hardware performance
(as carefully measured via oprofile) into account.

In short, SMP has indeed increased our maximum throughput, but only
after extensive rigorizing of our approach.

-- 
nick black <dank () reflexsecurity com>
"np:  nondeterministic polynomial-time
the class of dashed hopes and idle dreams." - the complexity zoo


---------------------------------------------------------------------------

---------------------------------------------------------------------------


Current thread: