Nmap Development mailing list archives

Re: Inconsistency in nmap XML output


From: Dual Mobius <dualmobius () comcast net>
Date: Wed, 10 Nov 2004 20:09:38 -0700

Matt wrote:

> How many people interested in this thread and getting the host down
> added to the XML output are using windows to try and figure this stuff
> out (keep reading i'm not just windows bashing, windows can do it all
> too)?


I am currently using XML output on Linux and not on Windows.

> Seriously, if you're using linux why would you spend all the time
> building XML parsers when you can just run 'awk'.  I do nmap scans
> regularly and have yet to use the XML output.  Just -oN and -oG for
> me, thx.


Basically because when you need to keep very close records of exactly what was done, when, and how; the XML output is a lot easier to extract these details out of (especially when using version detection)

$ nmap -vv -sV 127.0.0.1 -oA nmap_comparison

and compare for yourself. My comments below will focus on comparing the .xml output with the .gnmap output.

For example, in a non-version detecting scan:

XML format for open port (newlines inserted to avoid badly placed line wraps)
------------------------
<port protocol="tcp" portid="22">
  <state state="open" />
  <service name="ssh" method="table" conf="3" />
</port>

Grepable format for open port
------------------------------
22/open/tcp//ssh///

In this case, I will gladly admit that the grepable output is easier to handle with sed/awk or your choice of scripting language -- and I frequently do just that.


Now on to a version detection scan:

XML format for open port (newlines inserted to avoid badly placed line wraps)
------------------------
<port protocol="tcp" portid="22">
  <state state="open" />
  <service name="ssh" product="OpenSSH" version="3.8.1p1"
   extrainfo="protocol 2.0" method="probed" conf="10" />
</port>

Grepable format for open port
------------------------------
22/open/tcp//ssh//OpenSSH 3.8.1p1 (protocol 2.0)/


In this case, the XML output is a god-send for parsing detected product, version, etc into a spreadsheet or database. Nmap has the protocol knowledge embedded in it to give the correct values for the correct parts -- so why not make use of that instead of figuring out how to reliably split various formats of product names and strings that are encountered into product/version/extra tuples (as illustrated below).

  "OpenSSH 3.8.1p1 (protocol 2.0)"
  "Samba smbd 3.X (workgroup: XXXX)"
  "CUPS 1.1"
  "OpenLDAP 2.1.X"
  "Squid webproxy 2.5.STABLE7"
  "Apache httpd"

Some have version numbers, some don't. Some products are multiple words, some are single words. etc. Quite a while back, I spend an afternoon trying reliably split these strings, and just when I thought I had it, I found a new service enabled somewhere that messed it up again. I switched to parsing the XML format that have a lot fewer problems since then.


Another bonus of the XML format is when you need to log the command line used as well as the start and finish times for the scan run.

In the grepable output, you have to parse this out of the first and last comment lines in the output.

# nmap 3.75 scan initiated Wed Nov 10 19:11:16 2004 as: nmap -vv -sV -oA nmap_comparison 127.0.0.1
...
# Nmap run completed at Wed Nov 10 19:11:37 2004 -- 1 IP address (1 host up) scanned in 21.263 seconds

While this is again doable with sed/awk, I find it easier with an XML parser. You just ask for the element tags to get exactly the data you want (most current scripting languages come with very simple XML parsers).

<nmaprun scanner="nmap" args="nmap -vv -sV -oA nmap_comparison 127.0.0.1"
 start="1100139076" version="3.75" xmloutputversion="1.01">
...
<runstats>
  <finished time="1100139097" />
  <hosts up="1" down="0" total="1" />
</runstats></nmaprun>


I'm NOT saying that just about all of this data can't be extracted with combinations sed, awk, cut, grep, and friends. It's just that when all put together, it is often easier to just parse the XML.

Not to mention speed issues. (Focusing on just line oriented data sets for the moment) I've run across multiple instances where shell scripts using sed, awk, cut, and grep will take over 10 minutes to process a block of data while an equivalent perl/python/ruby script will do the exact same job in about 20 seconds -- basically just from the overhead of spawning new processes and piping data between sed, awk, cut, etc.

> So who needs XML?

I do.

> I don't consider nmap to be an end all be all to
> build a report from; it's just a middle step.


I completely agree. However, there is nothing wrong with making things easier for the downstream work as long as it doesn't mess up the tool. Why make 100 people implement almost the same thing downstream, if it is comparatively simple to add to the upstream data source?

> So I'm interested in
> the output not making a report.  And i can search through the -oN much
> quicker with awk than going through the XML any other way.  Maybe i've
> got a very limited view of nmap, but it has served me well for what
> i've been using it for.

---------------------------------------------------------------------
For help using this (nmap-dev) mailing list, send a blank email to nmap-dev-help () insecure org . List archive: http://seclists.org



Current thread: