Nmap Development mailing list archives

Analysis of using CPE for version detection

From: David Fifield <david () bamsoftware com>
Date: Mon, 9 Aug 2010 15:53:40 -0600
This is a continuation of my research into using Common Platform
Enumeration (CPE) in Nmap. The first part, on using CPE in OS
signatures, is at http://seclists.org/nmap-dev/2010/q3/278.

CPE is a naming system for hardware, operating systems, and
applications. A CPE name looks like

cpe:/{part}:{vendor}:{product}:{version}:{update}:{edition}:{language}
(http://cpe.mitre.org/specification/diagram.html)

The CPE specification and dictionary (list of registered names) are at

http://cpe.mitre.org/specification/index.html
http://cpe.mitre.org/dictionary/index.html

As I suspected, adding CPE to version detection will be somewhat more
difficult than adding it to OS detection.

The {part} of a CPE name can be "a" for application, "h" for hardware,
or "o" for operating system. For version detection we are mostly
interested in "a", but we can use "h" and "o" as well. Samples of CPE
application names are

cpe:/a:acme:thttpd:2.00
cpe:/a:adobe:acrobat:9.1.3
cpe:/a:hp:openview_network_node_manager:7.51:-:linux

Version detection provides six names fields to describe a service,
listed at http://nmap.org/book/vscan-fileformat.html#vscan-versioninfo.
These are

p/vendorproductname/
v/version/
i/info/
h/hostname/
o/operatingsystem/
d/devicetype/

p//, v//, and i// map more or less to CPE components. p// is a
combination of {vendor} and {product}, v// goes directly to {version}
(with caveats explained below), and i// is where we put any extra
information, only a small part of which (such as language) fits into
CPE. o// can also be expressed with CPE. Usually, not all of these
fields will be present in a service fingerprint. An example of how they
appear in nmap-service-probes is shown. If you need a refresher on the
nmap-service-probes file format, see
http://nmap.org/book/vscan-fileformat.html. All the signatures in this
message are wrapped to 80 columns but in the original file they are each
one long line.

match http m|^HTTP/1\.1 \d\d\d .*\r\nDate: .*\r\nServer: Apache (\d+\.\d+\.[-.\w
]+)\r\nX-Powered-By: ([^\r\n]+)\r\n| p/Apache httpd/ v/$1/ i/$2/

For convenience I'm going to introduce a new notation for including CPE
in a match line. The line above, augmented with CPE, would look like

match http m|^HTTP/1\.1 \d\d\d .*\r\nDate: .*\r\nServer: Apache (\d+\.\d+\.[-.\w
]+)\r\nX-Powered-By: ([^\r\n]+)\r\n| p/Apache httpd/ v/$1/ i/$2/ cpe:/a:apache:h
ttp_server:$1/

I'm using "cpe:" as if it were another field name like "p" or "v". The
trailing slash isn't part of a CPE name but it makes the syntax more
uniform. I prefer putting the entire CPE URI in the match line, as
opposed to adding lots of new fields like cpevendor// and cpeproduct//
because it looks better and it's easier to copy and paste. The only
thing to watch out for here is that match substitutions like $1 need to
be percent-encoded before being substituted into cpe://, so if a product
name contains a colon or something it doesn't break the syntax. See
section 5.4 of the specification for percent encoding.

The CPE dictionary contains more applications than nmap-service-probes
(16,057 "a" names versus 6,594 match lines), but that is mostly because
it has multiple entries for each application, one per known version. If
we look at only unique {vendor}:{product} pairs and p// strings, the CPE
dictionary has 2,615 and nmap-service-probes has something like 4,700.
(That's not exact because the p// delimiters may be different; I used
"egrep -o ' p/[^/]*/' nmap-service-probes | sort | uniq | wc -l".) The
disparity is greater when you consider that some fraction of CPE
dictionary entries aren't candidates for our database, because they name
client software like Adobe Acrobat. We will be in the same situation as
OS detection, with many new names to be submitted to the dictionary. I
worked through 10 service submissions this morning, which would have
resulted in 8 new CPE names, and of these only 1 was already in the
dictionary.

For instance, I was surprised that Postfix isn't present in version 2.2
of the official dictionary. But an advisory at
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2936 uses the
form cpe:/a:postfix:postfix:2.3.0, which is how I would have expressed
it.

Many of our match lines don't have a specific program named because it
isn't known. Instead they use something like p/Netgear FVS318 router
http config/. (Whenever possible, we put the actual server, such as
micro_httpd, in p//, and put hardware model numbers in i//, but the
actual server isn't always known.) This case is like "embedded" for OS
signatures, so I propose we handle them the same way: use an "h" CPE
name like cpe:/h:netgear:fvs318.

What about when the application and the hardware are known? And the OS?
We might be able to get away with the distinction between "a", "h", and
"o" names: "a" is always the service version, "h" is always the hardware
type, and "o" is always the operating system. So for example I have
considered the following:

# Alice Box PM203 (v1) (Pirelli Broadband Solutions)
match upnp m|^HTTP/1\.0 414 Request-URI Too Long\r\nServer: Linux/([\w._-]+) UPn
P/([\w._-]+) fbxigdd/([\w._-]+)\r\nConnection: close\r\n\r\n$| p/AliceBox PM203 
UPnP/ o/Linux $1/ i/UPnP $2; fbxigdd $3/ d/WAP/ cpe:/h:pirelli:alicebox_pm203/ c
pe:/o:linux:kernel:2.6/
# Roam About Switch (RAS), Version: 7.0.7.3 REL. Model: RBT-8200
match http m|^HTTP/1\.1 302 OK\r\nDate: \w\w\w \d\d, \d\d:\d\d:\d\d\.\d\d\d\r\nS
erver: TreeNeWS/([\w._-]+)\r\nMime-Version: 1\.0\r\nLocation: https://index\.htm
l\r\nContent-Length: 67\r\nContent-Type: text/html\r\n\r\n<HTML><HEAD><TITLE>Red
irect</TITLE></HEAD>\n<BODY></BODY></HTML>\r\r\n\n$| p/TreeNeWS httpd/ v/$1/ i/E
nterasys RBT-8200 switch http config/ d/switch/ cpe:/a::treenews:$1/ cpe:/h:ente
rasys:rbt-8200/

There's one aspect of CPE names that I don't know how to handle. Some of
the dictionary entries split the version number into {version} and
{update}. For example, we are supposed to represent thttpd 2.25b as
cpe:/a:acme:thttpd:2.25:b. This is going to be difficult with our
pattern matching system. Currently we match like this:

match http m|Server: thttpd/([\w._-]+)| p/thttpd/ v/$1/

The [\w._-]+ pattern is designed to match many different version
strings. We could extract the final "b" using a pair of matches:

match http m|Server: thttpd/([\d.]+)(\w+)| p/thttpd/ v/$1/ cpe:/a:acme:thttpd:$1
:$2/
match http m|Server: thttpd/([\d.]+)| p/thttpd/ v/$1/ cpe:/a:acme:thttpd:$1/

This is ugly and hard to maintain. Making matters worse, there are other
suffixes that belong in {update}, for example cpe:/a:isc:bind:9.4.0:rc1.
We could handle this with a long regular expression designed to catch
all such variations, and copy it everywhere we match a version number,
or at least everywhere a version number is likely to have such a suffix.
Making matters worse, the dictionary is not consistent on this point. I
found these examples while browsing it.

cpe:/a:mysql:mysql:3.20.32a
cpe:/a:mysql:mysql:3.23.0:alpha
cpe:/a:mysql:mysql:3.23.20:beta
cpe:/a:mysql:mysql:3.23.53a
cpe:/a:mysqldumper:mysqldumper:1.21_b6
cpe:/a:mysqldumper:mysqldumper:1.22
cpe:/a:mysqldumper:mysqldumper:1.23_pre-release
cpe:/a:samba:samba:3.0.25:pre1

I think we will have to keep the human-readable text descriptions. In an
example above, for "Netgear FVS318 router http config", the best CPE can
do is cpe:/h:netgear:fvs318. The important information that this is HTTP
and a router would be lost. Although the CPE dictionary has a
human-readable string associated with each name, it's not descriptive
enough. For the name above it says only "NetGear FVS318". Of our
existing fields, only v// and o// can potentially be completely replaced
by CPE names. A new CPE field would coexist with the others.

We need to consider how to present CPE names in output. Adding them to
XML output will be super easy. Should they be shown by default in normal
output? On the same line as the rest of the version information? A
portfule NSE script could print the CPE name, letting users choose
whether to see it (but also polluting the XML output with duplicate
information).

I think the main cost of adding CPE to the version database would be an
increase in the ongoing maintenance of the database. During periodic
integration the integrator would have to ensure that the entries are not
only internally consistent but also consistent with the CPE dictionary.
This could be mitigated with an automated tool that checks for unknown
names and suggests possibilities through fuzzy string matching (like a
CPE spell checker). As with OS detection, we would want a tool to
extract all unmatched names so they can be submitted to the dictionary
maintainers.

Something that mitigates the difficulty of adding CPE names to the
version database is that it doesn't have to be done all at once. The
cpe:// field would be optional. We could add it only for new signatures
or for prominent older ones, or do the whole database a piece at a time.
Users could be instructed to submit new names to the online form when a
match lacks a CPE name, though I'm not sure that would be more efficient
than someone just working through the database because of the large
amount of error correction and canonicalization that would have to be
done. On the other hand, it would ensure that the most common matches
get a CPE name.

David Fifield
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/
Current thread:

Analysis of using CPE for version detection David Fifield (Aug 09)