IDS mailing list archives

Re: Current state of Anomaly-based Intrusion Detection


From: Chris Keladis <chris () cmc optus net au>
Date: Sat, 05 Mar 2005 15:14:50 +1100

Hi Göran, Adam, Jose, all.

Extending the concept in a slightly different direction..

I fully agree, NetFlow has it's place, even if it only logs the metadata.

Although, a tool such as Argus can peek into an arbitrary amount of the payload if configured to do so, but at the disadvantage of having additional equipment inline, which NetFlow does not demand.

I was thinking, what would make an excellent tool for "anomaly detection" in network flows is to extend something like MSs LogParser tool which accepts SQL-like statements, to parse NetFlow (or Argus) network traces.

LogParser already has a rich set of output formats ranging from graphs down to CSV. I also believe it is possible to use the new "COM input format" of LogParser 2.2 to extend the tool into doing this.

I'd write it myself but unfortunately i'm not familiar with writing COM objects.

All sorts of data could be trended, sliced and diced, any which way the analyst chooses.

Ideas such as outputting (summarized) trend data from the flows into a SQL DB (LogParser has SQL output) for longer-term trend analysis, long after the NetFlow logs have cycled out of existance.

Maybe use the SQL DB trend data back as input to LogParser and produce graphs of activity over a longer term quickly and easially.

Or, draw crude graphs of traffic after a DDoS, keyed on (one or more) identifying features in the packet data (ip, port, tos, ip_id, etc).

Sample output formats can be found here:

http://www.microsoft.com/technet/scriptcenter/tools/logparser/lpexamples.mspx

Extending support to pcap could aid in Honeypot/Malware research, the sky's the limit.

This can all be done with scripts today, but it's useful to quickly be able to pull summary information out of your network traces, from various input formats into various output formats.





Regards,

Chris.


Adam Powers wrote:

Well said. A few additions...

The "anomaly detection" technology that you find in successful products such
as those offered by Arbor and Lancope rely less on packet payload and more
on behavior-based "flow analysis". This denotes a heavy reliance on
statistics, learned traffic thresholds, and pattern recognition.

But why is flow analysis important to anomaly detection research and
development?

Well, NetFlow is a good source of flow data. NetFlow doesn't provide packet
payload, only metadata about the conversation (sorta like a phone bill). If
you're gonna derive security value from NetFlow data, you have to use
statistics, thresholds, and patterns; not payload.

So why is NetFlow important?

Simply put: flow-based anomaly detection is practical, affordable, and
"lightweight". Instead of deploying a physical piece of hardware to each and
every remote location, switch closet, and datacenter cabinet, turn on
NetFlow exports from your Cisco router/switch. Presto. Instant flow-based
anomaly detection technology anywhere you have NetFlow capable
infrastructure.




On 2/28/05 11:35 AM, "Jose Nazario" <jose () monkey org> wrote:


there are several methods that can all be called anomaly detection
techniques, you named only a statistical method.

statistical based methods: you mentioned hardcoded threshold values (ie
200 MBps) and also learned average values for traffic. its a bit more
complicated than that, and more fine grained with respect to services and
endpoints, but you get the general idea. basically what you're doing is
monitoring traffic rates, either in bulk or per service and/or endpoint,
and alerting when some value is overshot, either once or for a sustained
period of time. statistical methods rely on a strong baseline of traffic
to accurately alert. characterization, such as "it's all TCP SYNs that are
responsible for this upsurge in bandwidth usage", can also be performed.

the second kind that i'll list here is specification based, and again you
mentioned it briefly. this can include protocol specifications (ie "a
valid SMTP greeting is no longer than NN bytes long"), such as what is
done with some products. traffic is monitored, examined by the application
or protocol that is in use, and the data is compared against a
specification. if, in this example, an SMTP greeting longer than NN bytes
is passed and alert is thrown. this requires detailed understanding of the
network protocols and application protocols in use, their standards, and
their implementations. not a trivial thing to do.

the third kind would be relational, which is what a few companies are
doing. in this scenario what you do is you examine inter-host
relationships (ie "host A is an SMTP server to hosts B, C and D") and when
that relationship is violated (ie "host A is suddenly a web server for
host E") an alert is thrown.

the fourth kind would be behavioral, where some metric of host or network
behaviors is modeled and constantly examined. these behaviors can include
an application's file usage, a hosts network usage, or the like.

the examine i gave above for a specification based anomaly detection
system can be hardcoded, as i discussed, or even learned, using a tool
like PI to group the observations. this learning can be unsupervised (ie
wholly trusted from the training period's observations) or supervised
(where some editing of the observed data is done to ensure
trustworthiness). and finally, this learning period can be a one time deal
or continuous, allowing for dynamic network behaviors and the normal
change over time. in this case, alerting can be done because the model was
violated or some statistical confidence measure can come into play, as
well (ie 3 observations, std deviation of 10%, but you overshot traffic
rates by 12%, would you alert?).

is this ready for prime time? sure, it's been in real-world use for years
now. arbor networks' peakflow DoS and SP systems have been doing this for
several years, using traffic rates over time to detect and characterize
attacks. this system is a mix of learned or profiled traffic rates per
service and network block endpoint as well as some informed decisions (ie
SYN packet rates). and peakflow X is a relational based anomaly detection
system that's been seeing real world deployments (see some recent news
reports for example customers). i'd say it's been seeing real world
deployments and success. arbor networks is one of several companies
finding success in this field.

AD systems are a significantly more complex and widely available system
than you seem to have acknowledged. go digging around and you'll see there
are some real systems out there seeing real use, protecting real networks.

notes and links:

PI: http://www.baselineresearch.net/PI/

________
jose nazario, ph.d.   jose () monkey org
http://monkey.org/~jose/   http://infosecdaily.net/

--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from
CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708
to learn more.
--------------------------------------------------------------------------






--------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from CORE IMPACT. Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 to learn more.
--------------------------------------------------------------------------


Current thread: