Nmap Development mailing list archives

Automatic sorting of nmap-service-probes


From: David Fifield <david () bamsoftware com>
Date: Thu, 2 Feb 2012 21:33:05 -0800

This is a description of a project, the work to achieve which exceeds
its benefits, but which could be fun for a hacker with some CS chops.
The main idea is to identify when two different regular expressions can
be matched by the same string, and use this to sort nmap-service-probes
in a way that preserves the semantics of its ordering.

I often want nmap-service-probes to be sorted broadly by server. For
example, under the GetRequest probe, there is a big block of Apache
patterns, a block of Xerox printer patterns, and so on. But many similar
patterns are spread out in the file, because they just get added in the
order they are submitted. Sometimes when I'm doing submissions I'll
separate out a block once it's big enough, but I would really like that
to happen mostly automatically (just by sorting on the p// field, for
example).

What prevents this from being easy is things like this:

# Needs to go before the Apache match lines -Doug
match http-proxy m|^HTTP/1\.[01] \d\d\d .*\r\nServer: Apache\r\n.*X-orenosp-filt:|s p/Orenosp reverse http proxy/
...
match http m|^HTTP/1\.[01] \d\d\d.*\r\nDate: .*\r\nServer: Apache\r\n| p/Apache httpd/ cpe:/a:apache:http_server/

When the same string can match two different patterns, the order
matters. (Nmap takes the first match.) Here, this HTTP proxy claims to
be Apache, but has a distinctive header field. If the order were
reversed, the later, very generic Apache match would prevent the HTTP
proxy from ever matching.

When there is no string that can match two different patterns, they can
me moved around without restrictions. So the main goal is to write an
algorithm that decides whether there is a string that can match both of
two given regular expressions. Then we'll run it against every pair of
matchlines within a service probe; the output of the algorithm will be a
list of pairs of lines that interfere with each other.

Any regular expression is equivalent to an NFA. A pair of regular
expressions is also equivalent to an NFA, one with two disjoint sets of
states: the accept states of the joint NFA are those that represent a
set of states that contain an accept state from both of the sub-NFAs.
Any NFA is in turn equivalent to a DFA, which you can look at as a
directed graph; then finding a string that satisfies both regular
expressions is the same as finding a path through the graph from the
start state to any accept state. If no such path exists, then the
regular expresions cannot be matched by the same string.

Finding a path could take exponential time in the worst case because an
equivalent DFA may be exponentially large in the size of an NFA. But
I'll bet that doesn't happen with the typical regular expressions in our
database. You don't have to explicitly build the full DFA, only do it
incrementally as you search for a path through the graph.

You should read this good article:
http://swtch.com/~rsc/regexp/regexp1.html

Maybe there will be too many conflicts to be useful. It might not be
obvious that the following two patterns can be matched by a single
string:

match http m|^HTTP/1\.0 200 OK\r\n.*Server: Apache\r\n|s
match http m|^HTTP/1\.0 200 OK\r\n.*Server: IIS\r\n|s

I would not have qualms about reordering those, because it's unlikely
that a server responds with "Server: Apache\r\nServer: IIS\r\n". But if
such cases are few enough, perhaps we can deal with them manually.

David Fifield
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: