Nmap Development mailing list archives
Analysis of using CPE for version detection
From: David Fifield <david () bamsoftware com>
Date: Mon, 9 Aug 2010 15:53:40 -0600
This is a continuation of my research into using Common Platform Enumeration (CPE) in Nmap. The first part, on using CPE in OS signatures, is at http://seclists.org/nmap-dev/2010/q3/278. CPE is a naming system for hardware, operating systems, and applications. A CPE name looks like cpe:/{part}:{vendor}:{product}:{version}:{update}:{edition}:{language} (http://cpe.mitre.org/specification/diagram.html) The CPE specification and dictionary (list of registered names) are at http://cpe.mitre.org/specification/index.html http://cpe.mitre.org/dictionary/index.html As I suspected, adding CPE to version detection will be somewhat more difficult than adding it to OS detection. The {part} of a CPE name can be "a" for application, "h" for hardware, or "o" for operating system. For version detection we are mostly interested in "a", but we can use "h" and "o" as well. Samples of CPE application names are cpe:/a:acme:thttpd:2.00 cpe:/a:adobe:acrobat:9.1.3 cpe:/a:hp:openview_network_node_manager:7.51:-:linux Version detection provides six names fields to describe a service, listed at http://nmap.org/book/vscan-fileformat.html#vscan-versioninfo. These are p/vendorproductname/ v/version/ i/info/ h/hostname/ o/operatingsystem/ d/devicetype/ p//, v//, and i// map more or less to CPE components. p// is a combination of {vendor} and {product}, v// goes directly to {version} (with caveats explained below), and i// is where we put any extra information, only a small part of which (such as language) fits into CPE. o// can also be expressed with CPE. Usually, not all of these fields will be present in a service fingerprint. An example of how they appear in nmap-service-probes is shown. If you need a refresher on the nmap-service-probes file format, see http://nmap.org/book/vscan-fileformat.html. All the signatures in this message are wrapped to 80 columns but in the original file they are each one long line. match http m|^HTTP/1\.1 \d\d\d .*\r\nDate: .*\r\nServer: Apache (\d+\.\d+\.[-.\w ]+)\r\nX-Powered-By: ([^\r\n]+)\r\n| p/Apache httpd/ v/$1/ i/$2/ For convenience I'm going to introduce a new notation for including CPE in a match line. The line above, augmented with CPE, would look like match http m|^HTTP/1\.1 \d\d\d .*\r\nDate: .*\r\nServer: Apache (\d+\.\d+\.[-.\w ]+)\r\nX-Powered-By: ([^\r\n]+)\r\n| p/Apache httpd/ v/$1/ i/$2/ cpe:/a:apache:h ttp_server:$1/ I'm using "cpe:" as if it were another field name like "p" or "v". The trailing slash isn't part of a CPE name but it makes the syntax more uniform. I prefer putting the entire CPE URI in the match line, as opposed to adding lots of new fields like cpevendor// and cpeproduct// because it looks better and it's easier to copy and paste. The only thing to watch out for here is that match substitutions like $1 need to be percent-encoded before being substituted into cpe://, so if a product name contains a colon or something it doesn't break the syntax. See section 5.4 of the specification for percent encoding. The CPE dictionary contains more applications than nmap-service-probes (16,057 "a" names versus 6,594 match lines), but that is mostly because it has multiple entries for each application, one per known version. If we look at only unique {vendor}:{product} pairs and p// strings, the CPE dictionary has 2,615 and nmap-service-probes has something like 4,700. (That's not exact because the p// delimiters may be different; I used "egrep -o ' p/[^/]*/' nmap-service-probes | sort | uniq | wc -l".) The disparity is greater when you consider that some fraction of CPE dictionary entries aren't candidates for our database, because they name client software like Adobe Acrobat. We will be in the same situation as OS detection, with many new names to be submitted to the dictionary. I worked through 10 service submissions this morning, which would have resulted in 8 new CPE names, and of these only 1 was already in the dictionary. For instance, I was surprised that Postfix isn't present in version 2.2 of the official dictionary. But an advisory at http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2936 uses the form cpe:/a:postfix:postfix:2.3.0, which is how I would have expressed it. Many of our match lines don't have a specific program named because it isn't known. Instead they use something like p/Netgear FVS318 router http config/. (Whenever possible, we put the actual server, such as micro_httpd, in p//, and put hardware model numbers in i//, but the actual server isn't always known.) This case is like "embedded" for OS signatures, so I propose we handle them the same way: use an "h" CPE name like cpe:/h:netgear:fvs318. What about when the application and the hardware are known? And the OS? We might be able to get away with the distinction between "a", "h", and "o" names: "a" is always the service version, "h" is always the hardware type, and "o" is always the operating system. So for example I have considered the following: # Alice Box PM203 (v1) (Pirelli Broadband Solutions) match upnp m|^HTTP/1\.0 414 Request-URI Too Long\r\nServer: Linux/([\w._-]+) UPn P/([\w._-]+) fbxigdd/([\w._-]+)\r\nConnection: close\r\n\r\n$| p/AliceBox PM203 UPnP/ o/Linux $1/ i/UPnP $2; fbxigdd $3/ d/WAP/ cpe:/h:pirelli:alicebox_pm203/ c pe:/o:linux:kernel:2.6/ # Roam About Switch (RAS), Version: 7.0.7.3 REL. Model: RBT-8200 match http m|^HTTP/1\.1 302 OK\r\nDate: \w\w\w \d\d, \d\d:\d\d:\d\d\.\d\d\d\r\nS erver: TreeNeWS/([\w._-]+)\r\nMime-Version: 1\.0\r\nLocation: https://index\.htm l\r\nContent-Length: 67\r\nContent-Type: text/html\r\n\r\n<HTML><HEAD><TITLE>Red irect</TITLE></HEAD>\n<BODY></BODY></HTML>\r\r\n\n$| p/TreeNeWS httpd/ v/$1/ i/E nterasys RBT-8200 switch http config/ d/switch/ cpe:/a::treenews:$1/ cpe:/h:ente rasys:rbt-8200/ There's one aspect of CPE names that I don't know how to handle. Some of the dictionary entries split the version number into {version} and {update}. For example, we are supposed to represent thttpd 2.25b as cpe:/a:acme:thttpd:2.25:b. This is going to be difficult with our pattern matching system. Currently we match like this: match http m|Server: thttpd/([\w._-]+)| p/thttpd/ v/$1/ The [\w._-]+ pattern is designed to match many different version strings. We could extract the final "b" using a pair of matches: match http m|Server: thttpd/([\d.]+)(\w+)| p/thttpd/ v/$1/ cpe:/a:acme:thttpd:$1 :$2/ match http m|Server: thttpd/([\d.]+)| p/thttpd/ v/$1/ cpe:/a:acme:thttpd:$1/ This is ugly and hard to maintain. Making matters worse, there are other suffixes that belong in {update}, for example cpe:/a:isc:bind:9.4.0:rc1. We could handle this with a long regular expression designed to catch all such variations, and copy it everywhere we match a version number, or at least everywhere a version number is likely to have such a suffix. Making matters worse, the dictionary is not consistent on this point. I found these examples while browsing it. cpe:/a:mysql:mysql:3.20.32a cpe:/a:mysql:mysql:3.23.0:alpha cpe:/a:mysql:mysql:3.23.20:beta cpe:/a:mysql:mysql:3.23.53a cpe:/a:mysqldumper:mysqldumper:1.21_b6 cpe:/a:mysqldumper:mysqldumper:1.22 cpe:/a:mysqldumper:mysqldumper:1.23_pre-release cpe:/a:samba:samba:3.0.25:pre1 I think we will have to keep the human-readable text descriptions. In an example above, for "Netgear FVS318 router http config", the best CPE can do is cpe:/h:netgear:fvs318. The important information that this is HTTP and a router would be lost. Although the CPE dictionary has a human-readable string associated with each name, it's not descriptive enough. For the name above it says only "NetGear FVS318". Of our existing fields, only v// and o// can potentially be completely replaced by CPE names. A new CPE field would coexist with the others. We need to consider how to present CPE names in output. Adding them to XML output will be super easy. Should they be shown by default in normal output? On the same line as the rest of the version information? A portfule NSE script could print the CPE name, letting users choose whether to see it (but also polluting the XML output with duplicate information). I think the main cost of adding CPE to the version database would be an increase in the ongoing maintenance of the database. During periodic integration the integrator would have to ensure that the entries are not only internally consistent but also consistent with the CPE dictionary. This could be mitigated with an automated tool that checks for unknown names and suggests possibilities through fuzzy string matching (like a CPE spell checker). As with OS detection, we would want a tool to extract all unmatched names so they can be submitted to the dictionary maintainers. Something that mitigates the difficulty of adding CPE names to the version database is that it doesn't have to be done all at once. The cpe:// field would be optional. We could add it only for new signatures or for prominent older ones, or do the whole database a piece at a time. Users could be instructed to submit new names to the online form when a match lacks a CPE name, though I'm not sure that would be more efficient than someone just working through the database because of the large amount of error correction and canonicalization that would have to be done. On the other hand, it would ensure that the most common matches get a CPE name. David Fifield _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Analysis of using CPE for version detection David Fifield (Aug 09)