Nmap Development mailing list archives
Re: A week of NSE structured output updates - and how to help
From: Paulino Calderon <paulino () calderonpale com>
Date: Sat, 6 Sep 2014 13:14:15 -0500
Hey list, I started looking into making the NSE library vulns support structured XML output. Have any of you started working on it already? Cheers. On Sep 5, 2014, at 11:25 PM, Daniel Miller <bonsaiviking () gmail com> wrote:
Hi, List! Back in Nmap 6.20BETA1, Nmap introduced a feature that I had worked long and hard on: structured XML output for NSE scripts [1]. Scripts that use this feature produce not only human-readable text output, but also machine-parseable XML within Nmap's XML output. You can see what this looks like in the @xmloutput section of the NSEdoc in scripts that support it. I've tried to enforce structured output for new scripts that I help get committed, but there were already over 400 scripts in Nmap at the time the feature was added. That's a lot of changes! I've gone back and updated a few in the past, but as of last week there were only 47 scripts with @xmloutput sections, out of 484 total. This week, I took a deep breath and started converting scripts. I've done 18, which seems piddly now, but represents a good deal of work. I wanted to write this message to encourage folks who want to start contributing to Nmap with a useful project that isn't too scary. Here's the blow-by-blow: smb-enum-shares (r33654) - When converting this script, I noticed that information was being duplicated: Each share had the name of the current user being reported when listing permissions. I moved this to its own key at the top of the output. Similarly, when handling the condition for NT_STATUS_OBJECT_NAME_NOT_FOUND, the string "<not a file share>" was appended to the permissions. In this case, I chose to report this information in the "Type" key. Besides this, the only other modification was including some code form smb-security-mode to format the domain\username when reporting which account was used to check permissions (previously, domain was not reported). smb-enum-groups (r33653) - This script was an example of one whose original output is good, and would take significant massaging to create by formatting a table. In this case, I preserved the original output and returned it as the second return value. This let me use a more natural tree-like table structure for the first return value without worrying about formatting. One advantage of this is the ability to report more information than would fit or feel natural in the text output, namely the list of SIDs which are members of each group. I also took the opportunity to update the @output section, since it was incorrectly missing the "(RID: 123)" portion of the output. dhcp-discover and broadcast-dhcp-discover (r33650) - These scripts were examples of a very common theme: output consists of "key: value" pairs as formatted strings, so we instead do output_table[key] = value, which results in the same output. I repeatedly found myself using variations on this command in vim: :s/table.insert( *\(\w*\),[^"]*"\([^:]*\):[^"]*", *\([^)]*\)))/\1["\2"] = \3 The other notable part here is the use of a __tostring metamethod to format some results differently. Usually, a list-style table (with numeric indices) will be formatted with one element per line. By setting the __tostring metamethod, we override this behavior and format as a single-line comma separated list. nat-pmp-info, sip-methods (r33646) - These two were one-line changes, swapping a string output for its equivalent dict- or list-style table. Instead of resorting to the metamethod approach to get a comma separated list for sip-methods, I left the string output as-is and split it on commas to get the structured output. smb-security-mode (r33646) - This was pretty straightforward until I realized that some of the "(dangerous)" warnings didn't really fit with machine-parseable output. I hacked together a new formatting method to annotate certain keys with extra information, then called it from the __tostring metamethod, which was a closure over the list of annotations. This also tripped me up because I forgot that stdnse.output_table() uses metamethods to do its magic, so simply using setmetatable on an output_table will destroy that magic. Instead, you must grab the table with getmetatable, add the __tostring key, and then setmetatable again. hadoop-namenode-info and hadoop-tasktracker-info (r33645) - Sometimes, scripts want to embed a table of data within a tree structure. This is tricky because (for now) tab.lua "tables" do not produce useful structured output, and their string format can't be indented to match the rest of the tree. For hadoop-namenode-info, I did a similar trick to the 2-value return that NSE allows, but for just one portion of the output. I created a tab.lua table with two empty columns (producing one 2-space indent each) and dumped its string output into a variable. Then, I put the data into a dict-style table and set its __tostring metamethod to return the tabular output. Then this table was inserted into the rest of the output table. I'm pretty sure I have a branch somewhere that contains initial work on making tab.lua produce nice structured output, but I'll have to dig it up. hadoop-tasktracker-info was unremarkable. ms-sql-info (r33644) - This script was pretty straightforward, but illustrates well how to modify a script that currently uses stdnse.format_output() to use structured output instead. In addition to the key-value changes mentioned with dhcp-discover above, the "name" index of each table is intended to be a label, so it usually needs to be set as the key in the upper-level table instead. So this: { { name = "Thing 1", "data" } { name = "Thing 2", "stuff", "nonsense" } } becomes this: { ["Thing 1"] = { "data" } ["Thing 2"] = { "stuff", "nonsense" } } Also, converting booleans to "Yes" and "No" is unnecessary, since they will be stringified as "true" and "false". hadoop-jobtracker-info and hbase-master-info (r33643 and r33642) - These look like big changes, but they are mostly whitespace. I chose to alter the control flow a bit to avoid excessive indentation. Most of the time, I try to avoid non-output-related changes, but this one was rather minor. The rest of the conversion was straightforward. epmd-info (r33641) - Adding structured output here meant parsing it out of strings that were previously dumped to output straight from the packet. snmp-win32-* (r33640) - Mostly uninteresting, but snmp-win32-software has a neat trick using the __index metamethod to translate the index names that stdnse.format_timestamp expects to find into the numerical indices that were provided. Part of converting to structured output is normalizing data formats; stdnse.format_timestamp and stdnse.format_time are two useful functions for this. netbus-info (r33639) - This one had a lot to change, but none of the changes really involved any difficult thinking. When I began this project, I started by only working on "default" category scripts that already used stdnse.format_output. These represent a good intersection between commonly-used scripts and those that are easy to convert. Well, I hope this hasn't been too much of a tl;dr. Please, consider taking the time to convert a script today! Dan [1] http://nmap.org/book/nse-api.html#nse-structured-output _______________________________________________ Sent through the dev mailing list http://nmap.org/mailman/listinfo/dev Archived at http://seclists.org/nmap-dev/
_______________________________________________ Sent through the dev mailing list http://nmap.org/mailman/listinfo/dev Archived at http://seclists.org/nmap-dev/
Current thread:
- A week of NSE structured output updates - and how to help Daniel Miller (Sep 05)
- Re: A week of NSE structured output updates - and how to help Paulino Calderon (Sep 06)
- Re: A week of NSE structured output updates - and how to help Daniel Miller (Sep 07)
- Re: A week of NSE structured output updates - and how to help Paulino Calderon (Sep 06)