Nmap Development mailing list archives

Re: A week of NSE structured output updates - and how to help

From: Paulino Calderon <paulino () calderonpale com>
Date: Sat, 6 Sep 2014 13:14:15 -0500

Hey list,

I started looking into making the NSE library vulns support structured XML output. Have any of you started working on 
it already?

Cheers.


On Sep 5, 2014, at 11:25 PM, Daniel Miller <bonsaiviking () gmail com> wrote:

Hi, List!

Back in Nmap 6.20BETA1, Nmap introduced a feature that I had worked long
and hard on: structured XML output for NSE scripts [1]. Scripts that use
this feature produce not only human-readable text output, but also
machine-parseable XML within Nmap's XML output. You can see what this looks
like in the @xmloutput section of the NSEdoc in scripts that support it.

I've tried to enforce structured output for new scripts that I help get
committed, but there were already over 400 scripts in Nmap at the time the
feature was added. That's a lot of changes! I've gone back and updated a
few in the past, but as of last week there were only 47 scripts with
@xmloutput sections, out of 484 total.

This week, I took a deep breath and started converting scripts. I've done
18, which seems piddly now, but represents a good deal of work. I wanted to
write this message to encourage folks who want to start contributing to
Nmap with a useful project that isn't too scary.

Here's the blow-by-blow:

smb-enum-shares (r33654) - When converting this script, I noticed that
information was being duplicated: Each share had the name of the current
user being reported when listing permissions. I moved this to its own key
at the top of the output. Similarly, when handling the condition for
NT_STATUS_OBJECT_NAME_NOT_FOUND, the string "<not a file share>" was
appended to the permissions. In this case, I chose to report this
information in the "Type" key. Besides this, the only other modification
was including some code form smb-security-mode to format the
domain\username when reporting which account was used to check permissions
(previously, domain was not reported).

smb-enum-groups (r33653) - This script was an example of one whose original
output is good, and would take significant massaging to create by
formatting a table. In this case, I preserved the original output and
returned it as the second return value. This let me use a more natural
tree-like table structure for the first return value without worrying about
formatting. One advantage of this is the ability to report more information
than would fit or feel natural in the text output, namely the list of SIDs
which are members of each group. I also took the opportunity to update the
@output section, since it was incorrectly missing the "(RID: 123)" portion
of the output.

dhcp-discover and broadcast-dhcp-discover (r33650) - These scripts were
examples of a very common theme: output consists of "key: value" pairs as
formatted strings, so we instead do output_table[key] = value, which
results in the same output. I repeatedly found myself using variations on
this command in vim:

:s/table.insert( *\(\w*\),[^"]*"\([^:]*\):[^"]*", *\([^)]*\)))/\1["\2"] = \3

The other notable part here is the use of a __tostring metamethod to format
some results differently. Usually, a list-style table (with numeric
indices) will be formatted with one element per line. By setting the
__tostring metamethod, we override this behavior and format as a
single-line comma separated list.

nat-pmp-info, sip-methods (r33646) - These two were one-line changes,
swapping a string output for its equivalent dict- or list-style table.
Instead of resorting to the metamethod approach to get a comma separated
list for sip-methods, I left the string output as-is and split it on commas
to get the structured output.

smb-security-mode (r33646) - This was pretty straightforward until I
realized that some of the "(dangerous)" warnings didn't really fit with
machine-parseable output. I hacked together a new formatting method to
annotate certain keys with extra information, then called it from the
__tostring metamethod, which was a closure over the list of annotations.
This also tripped me up because I forgot that stdnse.output_table() uses
metamethods to do its magic, so simply using setmetatable on an
output_table will destroy that magic. Instead, you must grab the table with
getmetatable, add the __tostring key, and then setmetatable again.

hadoop-namenode-info and hadoop-tasktracker-info (r33645) - Sometimes,
scripts want to embed a table of data within a tree structure. This is
tricky because (for now) tab.lua "tables" do not produce useful structured
output, and their string format can't be indented to match the rest of the
tree. For hadoop-namenode-info, I did a similar trick to the 2-value return
that NSE allows, but for just one portion of the output. I created a
tab.lua table with two empty columns (producing one 2-space indent each)
and dumped its string output into a variable. Then, I put the data into a
dict-style table and set its __tostring metamethod to return the tabular
output. Then this table was inserted into the rest of the output table. I'm
pretty sure I have a branch somewhere that contains initial work on making
tab.lua produce nice structured output, but I'll have to dig it up.
hadoop-tasktracker-info was unremarkable.

ms-sql-info (r33644) - This script was pretty straightforward, but
illustrates well how to modify a script that currently uses
stdnse.format_output() to use structured output instead. In addition to the
key-value changes mentioned with dhcp-discover above, the "name" index of
each table is intended to be a label, so it usually needs to be set as the
key in the upper-level table instead. So this:

{
 {
   name = "Thing 1",
   "data"
 }
 {
   name = "Thing 2",
   "stuff",
   "nonsense"
 }
}

becomes this:

{
 ["Thing 1"] = {
   "data"
 }
 ["Thing 2"] = {
   "stuff",
   "nonsense"
 }
}

Also, converting booleans to "Yes" and "No" is unnecessary, since they will
be stringified as "true" and "false".

hadoop-jobtracker-info and hbase-master-info (r33643 and r33642) - These
look like big changes, but they are mostly whitespace. I chose to alter the
control flow a bit to avoid excessive indentation. Most of the time, I try
to avoid non-output-related changes, but this one was rather minor. The
rest of the conversion was straightforward.

epmd-info (r33641) - Adding structured output here meant parsing it out of
strings that were previously dumped to output straight from the packet.

snmp-win32-* (r33640) - Mostly uninteresting, but snmp-win32-software has a
neat trick using the __index metamethod to translate the index names that
stdnse.format_timestamp expects to find into the numerical indices that
were provided. Part of converting to structured output is normalizing data
formats; stdnse.format_timestamp and stdnse.format_time are two useful
functions for this.

netbus-info (r33639) - This one had a lot to change, but none of the
changes really involved any difficult thinking. When I began this project,
I started by only working on "default" category scripts that already used
stdnse.format_output. These represent a good intersection between
commonly-used scripts and those that are easy to convert.

Well, I hope this hasn't been too much of a tl;dr. Please, consider taking
the time to convert a script today!

Dan

[1] http://nmap.org/book/nse-api.html#nse-structured-output
_______________________________________________
Sent through the dev mailing list
http://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/


_______________________________________________
Sent through the dev mailing list
http://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread:

A week of NSE structured output updates - and how to help Daniel Miller (Sep 05)
- Re: A week of NSE structured output updates - and how to help Paulino Calderon (Sep 06)
  - Re: A week of NSE structured output updates - and how to help Daniel Miller (Sep 07)