Snort mailing list archives

Re: Pattern Matching

From: Todd Wease <twease () sourcefire com>
Date: Fri, 17 Oct 2008 08:09:18 -0400

Hi Rayne,

Answers inline.


Rayne wrote:

Hi,

1) So AC is run for the fast pattern matcher to narrow down potential
rules, then Boyer-Moore is run for the actual content matching. If
there are no content modifiers, each pattern in the potential rule
should be matched from the beginning of the payload, right? Also, if I
only have one content option per rule, Boyer-Moore will not be
employed? And what do you mean by "if the content (or pcre) is relative"?


Please take a look at the Snort Manual:
http://www.snort.org/docs/snort_htmanuals/htmanual_283/, in particular
http://www.snort.org/docs/snort_htmanuals/htmanual_283/node227.html. 
This should give you a better understanding of the content rule option
and how to use it and it's modifiers.  Boyer-Moore is still used because
the modifiers offset, depth, within, distance and case need to be applied.


2) I'm thinking of the situation where the longer pattern "CDEFG" is
only significant if the shorter pattern "AB" is found, that is, CDEFG
means nothing when it appears in the payload alone. Also, there is the
speed consideration. For example, the payload may contain CDEFG but
not AB. Because CDEFG is the longer pattern, it will be put in the
fast pattern matcher and trigger a match. If AB is put together with
CDEFG under the same rule, I would have wasted time trying to find AB
in the payload. If I could have the rule for matching CDEFG trigger
only when AB is found, I could have saved that time.


The "fast_pattern" modifier to a content rule option lets you specify
which content in the rule you want to put in the pattern matcher.  The
content "AB" is probably more likely to occur in any random packet than
"CDEFG".  More time is likely to be gained by ensuring the content
"CDEFG" is in the packet before checking "AB".

Here are two rules, one using "AB" as the fast pattern content and one
using "CDEFG" for the fast pattern content ("fast_pattern" modifier not
really necessary since "CDEFG" will go into the pattern matcher by default).

alert tcp any any -> any any (msg:"AB fast pattern"; content:"AB";
fast_pattern; content:"CDEFG"; sid:1;)
alert tcp any any -> any any (msg:"CDEFG fast pattern"; content:"AB";
content:"CDEFG"; fast_pattern; sid:2;)

Here are some rule profiling statistics for each rule when run against a
large and diverse pcap (note there is a bug in the calculations of
avg/match and avg/nonmatch that just came up yesterday on snort users so
I'm leaving them out so the stats are less likely to line wrap):

Rule Profile Statistics (all rules)
==================================================
Num SID GID Checks Matches Alerts  uSecs Avg/Check
=== === === ====== ======= ======  ===== =========
  1   2   1    127       7      7    599       4.7
  2   1   1 196838       7      7 696622       3.5

The first rule which ensures that "AB" is in the packet before
evaluating the rule is checked 196838 times. The second rule which
ensures that "CDEFG" is in the packet before evaluating is only checked
127 times.  The difference between the amount of time spent on each rule
is enormous (599 microseconds with "CDEFC" used as the pattern for the
fast pattern matcher compared to 696622 with "AB" used).

Also note that the longest pattern isn't always likely to be the most
unique pattern, which is why there is the fast_pattern option.

alert tcp any any -> any any (msg:"HTTP fast pattern"; content:"ZZZ";
content:"HTTP"; nocase; fast_pattern; sid:3;)
alert tcp any any -> any any (msg:"ZZZ fast pattern"; content:"ZZZ";
fast_pattern; content:"HTTP"; nocase; sid:4;)

Rule Profile Statistics (all rules)
=================================================
Num SID GID Checks Matches Alerts uSecs Avg/Check
=== === === ====== ======= ====== ===== =========
  1   4   1   4883       5      5 14356       2.9
  2   3   1  35744       5      5 64732       1.8


5) You said that stream5 "will flush the segments it has gathered,
reassembling into a pseudo-packet and sending to the preprocessors and
detection engine (only if timeout is reached)". This happens even if
the segments it gathered are incomplete when timeout occured?


TCP is a streaming protocol.  There is no real beginning or end to the
data as TCP sees it.  Boundaries are determined by the application layer
protocol.


Thank you.

Regards,
Rayne

--- On *Thu, 10/16/08, Todd Wease /<twease () sourcefire com>/* wrote:

    From: Todd Wease <twease () sourcefire com>
    Subject: Re: [Snort-users] Pattern Matching
    To: hjazz6 () ymail com
    Cc: snort-users () lists sourceforge net
    Date: Thursday, October 16, 2008, 12:49 PM


    Hi Rayne,

    Answers inline...

    Rayne wrote:
    > Hi
    all,
    >
    > I have a few questions regarding the pattern matching aspect of
    Snort.
    >
    > 1) If I have the following rule option (content:"ABC",
    > content:"DEFGH"), am I right to say that the string
    "DEFGH" will be
    > compared first to see if there is a match, and if there is, then
    "ABC"
    > is compared, because "DEFGH" is the longer string?
    >

    The fast pattern matcher (ac-bnfa, lowmem, etc.) is used to find the
    rules that have a chance at matching. Only the longest content of each
    rule is put in the pattern matcher. After the pattern matcher is
    compiled, each match state points to a tree of rule options with each
    path in the tree from root to leaf containing the rule options for a
    unique rule (besides the descriptive options such as msg, sid, etc.).
    Note that a match state can contain more than one pattern, e.g.
    "ABCD",
    "BCD", "CD". A tree is used because many rule options
    are
    present in
    the same place in different rules and using a tree eliminates the need
    to evaluate these options more than once. Each path in the tree will
    contain a content rule option containing one of the patterns in the
    match state. The contents are evaluated again using boyer-moore
    because
    of the usual modifiers to the content indicating the relativity, depth
    and case. So in your rule above, the content "DEFGH" will be put in
    the
    fast pattern matcher. If that content is found in the payload, a tree
    will be traversed (essentially a linked list here), starting with
    content:"ABC". That content will be evaluated using boyer-moore. If
    that succeeds, then content:"DEFGH" will be evaluated using
    boyer-moore. Note that the results of each rule option evaluated (each
    node in the tree) get cached for each packet so if "DEFGH" occurs
    multiple times in the payload, the previous results for the rule
    options
    will be used with the caveat that if the content (or pcre) is
    relative,
    it will need to be evaluated again.

    >
    > 2) Is it possible to have one rule activate another rule within the
    > same packet, i.e. when a content match with "AB" is found, it
    will
    > trigger another rule that consists of a content match with a longer
    > string, e.g. "CDEFG". This would be something similar to
    > activate/dynamic, except from what I understand, dynamic only logs a
    > certain number of subsequent packets that match the first rule after
    > being activated, which is not exactly what I want to do. If this is
    > possible, does the second content match start from the beginning of
    > the payload, or from where "AB" was matched?
    >

    Not sure why you wouldn't put these contents in the same rule. Can you
    give an example of a couple of rules?

    >
    > 3) Say I have 5 rules each with
    one content match. All the rule
    > headers are the same, i.e. the 5 OTNs are under the same RTN,
    and they
    > contain only the content match. Using the AC search method, does
    Snort
    > build just one DFA that contains all 5 strings so each packet can be
    > searched through only once for all 5 strings at a time, or is a DFA
    > built for every OTN/string, resulting in searching through each
    packet
    > 5 times? What if one of the rules has 3 content matches while the
    > other 4 has only one content match each. How is the DFA built then?
    >

    All contents are searched for simultaneously in one state machine.
    Only
    the longest content in a rule is used in the state machine.

    >
    > 4) Does the pattern matching algorithm return the position
    within the
    > payload where the pattern is found? For example, if I'm matching for
    > the string "GET" and the payload is "kas sdfGETjkdn",
    will I
    get
    > something like "Pattern "GET" matched at position 8"?
    Also, in
    > acsmx.c, it is mentioned that the AC algorithm "finds all
    occurrences
    > of all patterns within a body of text". If there are, say, 5
    > occurences of a pattern string, do I get one alert/log per
    occurence,
    > one alert/log per pattern matched (if there are multiple content
    > strings in the rule option) or one alert/log per rule (regardless of
    > the number of content strings in the rule option)?
    >

    The index or position in the payload where the pattern matched is not
    used for evaluating rules. Generally, it's not important since content
    modifiers specify where in the payload a content should be. As I said
    above, the fast pattern matcher is just a way to pick out a much
    smaller
    subset of rules to evaluate.

    Snort alerts per rule matched, not per pattern matched.

    >
    > 5) How long does Snort hold
    fragments for reassembly in Frag3 and
    > Stream5 before discarding the packets if they are incomplete?
    >

    For frag3, fragments are discarded if a timeout is reached (default of
    60 seconds). frag3 does not constantly go through the current fragment
    trackers looking for timed out ones (a performance hit) but makes this
    decision if it gets a fragment for one of the trackers. If the memory
    cap is reached in trying to create a new tracker, it will purge the
    least recently used trackers to get enough memory to create the new
    one. stream5 does essentially the same thing, but will flush the
    segments it has gathered, reassembling into a pseudo-packet and
    sending
    to the preprocessors and detection engine (only if timeout is reached,
    not memcap).

    >
    > Thank you.
    >
    > Regards,
    > Rayne



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Snort-users mailing list
Snort-users () lists sourceforge net
Go to this URL to change user options or unsubscribe:
https://lists.sourceforge.net/lists/listinfo/snort-users
Snort-users list archive:
http://www.geocrawler.com/redir-sf.php3?list=snort-users

Current thread:

Pattern Matching Rayne (Oct 16)
- Re: Pattern Matching Todd Wease (Oct 16)
  - Re: Pattern Matching Rayne (Oct 16)
    - Re: Pattern Matching Todd Wease (Oct 17)