Nmap Development mailing list archives

Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch)


From: Daniel Miller <bonsaiviking () gmail com>
Date: Fri, 29 Jun 2012 17:00:40 -0500


Rob,

Thanks so much for looking at the proposals and giving feedback. I'll try to address some of your concerns inline below, and I encourage you to take a look at my current implementation on Github (https://github.com/bonsaiviking/Nmap-script-XML), which may clear up some of the more confusing points.

On 06/29/2012 04:04 PM, Rob Nicholls wrote:
I spotted proposal gamma on the wiki yesterday, there seems to be some
formatting issues on the wiki that I haven't looked into too deeply (table
tags aren't encoded and it's outside of the pre tags?). It's certainly
cleaner, but it looks like some structure/information might have been lost
(e.g. the "subject" and "issuer" bit is lost in the example output for
ssl-cert , so I prefer proposal beta.
I fixed these formatting issues just now. The block had been using the leading-spaces method of avoiding wiki formatting, but the <table> tags were throwing it off. <pre> tags to the rescue!

I'm slowly warming to proposal beta. I dislike seeing so many dict and elem
tags (as it makes it harder to read, and would make XPath queries slightly
longer), but having thought about it I suspect they're required if (and that
could be a very big "if") we're trying to keep things generic. I presume one
good reason for using dict, elem etc. is to keep the output simple to avoid
having to update the DTD file (https://svn.nmap.org/nmap/docs/nmap.dtd)
every time there's a new script that adds something new (e.g. if ssl-cert
created something like <subject><commonName>
secwiki.org</commonName></subject> instead of those dict and elem keys). The
alternative would be to allow anything within the script tags without
strictly defining it in the DTD, but I suspect that'd be a very bad idea and
would be very difficult to impose stricter definitions in the future.
This point was debated at one point (http://seclists.org/nmap-dev/2011/q2/149), and the result at the time was the "XML elements representing YAML structure" proposal on the Wiki page. In short, we are not willing to sacrifice DTD-validity for this kind of output.

Is the intention that all scripts get their output automatically converted
(consistently) to XML, or will scripts need special treatment (you mention
the XML structure could be "opt-in", and I think that's what's been coded so
far)? I've spotted that this is one of the outstanding questions listed on
the wiki. My vote would probably be for a single representation, which
automatically generates XML for all scripts (in a consistent manner, so we
don't have to worry too much about making it "backwards-compatible", and
preventing any opt-in or opt-out problems),
I think this is what I'm going for right now. The "single representation" is an arbitrary Lua table, and the conversion to XML happens recursively in C++ code in nse_main.cc. The "backwards-compatibility"/"opt-in" part comes from the current API of returning a string instead of a table. Scripts that do this are considered "old-style" and do not have any structured/XML output.
  and that XML and normal output
contains exactly the same information (normal output is stored as a value in
the script's existing "output" attribute, and is identically/additionally
stored in a structured format). I know Nmap contains some additional
information in the XML file that's not displayed on screen, but that's
generally an exception rather than the norm.
The discussion on IRC was leaning towards author-definable tostring methods (or at least a choice of several: indented, tabular, inline, etc), and this is what I have implemented. This means it is up to the script author to hide any extraneous information, but there is no way (other than deleting it from the return table) to prevent extra information from getting into the XML output. For example, the tab.lua library already adds a __tostring metamethod to its tables, so these could be returned from a script and result in the exact same output. However, tab-library tables have an extra data field, current_row, which is not used for text output but nevertheless ends up in the XML output.

I think I'd prefer if everything was converted into the structured XML
format without having to opt-in (I might regret that once I see the output
from some scripts). The ssl-cert output shown at the wiki seems to magically
convert the string "Not valid before" to "notBefore" (I haven't looked at
the code yet to see how this is done, but I assume something's hardcoded in
an updated ssl-cert script). I presume that means other scripts (e.g.
smb-os-discovery) wouldn't automatically produce nice structured XML output
using the current code until someone adds the same sort of opt-in
information (e.g. "Computer name" to "computerName")? Is it possible to
automatically create the keys (e.g. camelCase).
That particular example was a mock-up, and not actually output by any existing patch. David favors the camelCase key format, or at least one that results in a valid identifier in Lua. I don't think it's much of an issue either way, since *anything* can be used as a table key in Lua: {["Like this"]="for example"}

The only time I'd consider having different output in the XML file is if it
holds additional information that is known but isn't displayed on screen
(e.g. the TTL is in the XML output for a port, but not normal output); but
scripts currently tailor what's returned based on verbosity settings, and
anything known locally by the script that's not returned is presumably lost
forever, so we'd probably need to reconsider how NSE produces output (and
rewrite a lot of scripts). It might be possible to do that if we do that
while the number of scripts is low enough that it's not impractical.
We'll be rewriting a lot of scripts as it is. The most common structure of output is the stdnse.format_output style table, and it doesn't have as much structure (key-value, for instance) as we would like.
If we do that, I'd also like to see a nice/consistent way of reporting
errors. Most scripts return something like "ERROR: Something bad happened."
as part of the normal output, but if we had a specific way of returning an
error message as an error then we could also store that information within
an error tag in the structured XML output, making it easy to count or
identify when an error occurred. It might also be possible to modify Nmap to
only display errors in the normal output when the verbosity is raised? I
believe smtp-brute (and maybe some other scripts) will return an error
message if it can't connect no matter what the verbosity is, so users will
always see the error message. This might save developers from having to
check the verbosity before deciding if an error message should be returned
as part of the normal output.
In a previous patch attempt (proposal alpha), I had handled this by considering table index 0 to be an error message (since Lua tables are 1-indexed). Something like this could be implemented easily enough. Making it a standard in the Lua table representation would mean that the "standard" tostring formatters could handle verbosity checking without making the script author worry about it. This is definitely one of the things we'll have to figure out sooner than later.

Although this could delay the whole structured XML output, is it worth
creating a better API for returning script output that helps with creating
structured XML. I'd like to see a script return something like:

  - Output (normal output, what's displayed is based on verbosity)
  - Details (all of the information that the script can determine, no matter
what verbosity the user selected)
  - Errors (this will probably be blank most of the time)
With my current attempt at implementation, everything is returned in one value, with the __tostring metamethod covering the normal output case. As you pointed out earlier, we could use a better form of displaying errors. So to put it in a similar way, scripts can return:

- Data (corresponding to Details in your description, which gets translated to XML in a standardized fashion) - Formatting instructions (as the __tostring metamethod of the value being returned)

or

- Output (A string, which is used as normal output and the @output attribute of the <script> tag)
The Output section could then include whatever's currently generated by the
scripts (they may require a small tweak), and any scripts that return error
messages could be modified to return the error in the Errors section. The
Output section would be the normal/usual block of text that we see (for
scripts using the vulnerability library, possibly without vuln.extra_info,
unless verbosity is raised?). Details would be all of the information known
(e.g. vuln.extra_info) and returned by the script that is converted to a
nice structure, e.g. proposal beta). For a script using the vulnerability
library, we presumably might see something like (apologies if I've made any
mistakes, I've done this by hand):

<script id="something-vuln-cve2012-nnnn" output=" VULNERABLE:
   Authentication bypass in something... (etc.)">
   <dict key="details">
     <elem key="title">Title</elem>
     <elem key="state">VULNERABLE</elem>
     <elem key="description">Some big long description</elem>
     <dict key="IDS">
       <elem key="CVE">CVE-2012-nnn1</elem>
       <elem key="CVE">CVE-2012-nnn2</elem>
     </dict>
     <dict key="dates">
       <dict key="disclosure">
         <elem key="year">2012</elem>
         <elem key="month">nn</elem>
         <elem key="day">n</elem>
       </dict>
     </dict>
     <dict key="references">
       <elem key="reference">http://example.com</elem>
       <elem key="reference">http://anotherexample.com</elem>
     </dict>
     <dict key="extra_info">
       <proposal beta/gamma magic goes here as appropriate>
     </dict>
   </dict>
</script>

Other scripts might do something like (then we can more easily determine
which scripts have errors and which scripts return useful results):

<script id="an-error" output="">
   <dict key="errors">
     <elem key="error">ERROR: Failed to connect to SMTP server.</elem>
   </dict>
</script>

Obviously, if we go with proposal gamma, we'd have table instead of dict
tags in my examples above. I'd started thinking about this before I posted
proposal gamma or caught up with all of the emails on the list. Apologies if
I've covered something that has already been discussed in the other emails.
I've literally just spotted in a much earlier email that you suggested a
"WARNINGS" (similar to my Errors) section that's only displayed if debugging
is enabled, so it sounds like we're both coming to similar conclusions.
Warnings could be handled in a similar way as errors, once we figure that out. A quick grep through the scripts shows that not many scripts use the WARNING feature of stdnse.format_output, so we could have discussion of whether that feature is necessary.

Sorry for the lengthy email, I hope that was useful. You did (foolishly) ask
for feedback! :)

Rob



I'm very glad to hear anyone and everyone's inputs. This is a rather large change, so it's bound to affect many people. I would like to hear more from potential consumers of the structured data, too.

Dan

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: