Nmap Development mailing list archives

RE: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch)

From: "Rob Nicholls" <robert () robnicholls co uk>
Date: Fri, 29 Jun 2012 22:04:23 +0100

-----Original Message-----
From: nmap-dev-bounces () insecure org [mailto:nmap-dev-
bounces () insecure org] On Behalf Of Daniel Miller
Sent: 14 June 2012 12:43
To: Daniel Miller; Nmap Dev
Subject: Re: [RFC][patch] XML structured script output (evaluation of nse-
structured3 patch)

<snip>

I'll try to use the wiki page to expand on these ideas, but I may not get

to

much of it today. This has been a long time coming, so I feel it's more
important to get it right than to rush into it. To the nmap-dev list at

large:

Please join the discussion if you have opinions or suggestions. The

outcome

will affect everyone, and I know there are smarter people than me reading
this.


Hi Dan, your recent Tweet prompted me to finish my draft email with my
thoughts!

If I'm completely honest, I'm still not entirely happy with the proposals
listed so far on the wiki:

https://secwiki.org/w/Nmap/Structured_Script_Output

I spotted proposal gamma on the wiki yesterday, there seems to be some
formatting issues on the wiki that I haven't looked into too deeply (table
tags aren't encoded and it's outside of the pre tags?). It's certainly
cleaner, but it looks like some structure/information might have been lost
(e.g. the "subject" and "issuer" bit is lost in the example output for
ssl-cert , so I prefer proposal beta.

I'm slowly warming to proposal beta. I dislike seeing so many dict and elem
tags (as it makes it harder to read, and would make XPath queries slightly
longer), but having thought about it I suspect they're required if (and that
could be a very big "if") we're trying to keep things generic. I presume one
good reason for using dict, elem etc. is to keep the output simple to avoid
having to update the DTD file (https://svn.nmap.org/nmap/docs/nmap.dtd)
every time there's a new script that adds something new (e.g. if ssl-cert
created something like <subject><commonName>
secwiki.org</commonName></subject> instead of those dict and elem keys). The
alternative would be to allow anything within the script tags without
strictly defining it in the DTD, but I suspect that'd be a very bad idea and
would be very difficult to impose stricter definitions in the future.

Is the intention that all scripts get their output automatically converted
(consistently) to XML, or will scripts need special treatment (you mention
the XML structure could be "opt-in", and I think that's what's been coded so
far)? I've spotted that this is one of the outstanding questions listed on
the wiki. My vote would probably be for a single representation, which
automatically generates XML for all scripts (in a consistent manner, so we
don't have to worry too much about making it "backwards-compatible", and
preventing any opt-in or opt-out problems), and that XML and normal output
contains exactly the same information (normal output is stored as a value in
the script's existing "output" attribute, and is identically/additionally
stored in a structured format). I know Nmap contains some additional
information in the XML file that's not displayed on screen, but that's
generally an exception rather than the norm.

I think I'd prefer if everything was converted into the structured XML
format without having to opt-in (I might regret that once I see the output
from some scripts). The ssl-cert output shown at the wiki seems to magically
convert the string "Not valid before" to "notBefore" (I haven't looked at
the code yet to see how this is done, but I assume something's hardcoded in
an updated ssl-cert script). I presume that means other scripts (e.g.
smb-os-discovery) wouldn't automatically produce nice structured XML output
using the current code until someone adds the same sort of opt-in
information (e.g. "Computer name" to "computerName")? Is it possible to
automatically create the keys (e.g. camelCase). 

The only time I'd consider having different output in the XML file is if it
holds additional information that is known but isn't displayed on screen
(e.g. the TTL is in the XML output for a port, but not normal output); but
scripts currently tailor what's returned based on verbosity settings, and
anything known locally by the script that's not returned is presumably lost
forever, so we'd probably need to reconsider how NSE produces output (and
rewrite a lot of scripts). It might be possible to do that if we do that
while the number of scripts is low enough that it's not impractical.

If we do that, I'd also like to see a nice/consistent way of reporting
errors. Most scripts return something like "ERROR: Something bad happened."
as part of the normal output, but if we had a specific way of returning an
error message as an error then we could also store that information within
an error tag in the structured XML output, making it easy to count or
identify when an error occurred. It might also be possible to modify Nmap to
only display errors in the normal output when the verbosity is raised? I
believe smtp-brute (and maybe some other scripts) will return an error
message if it can't connect no matter what the verbosity is, so users will
always see the error message. This might save developers from having to
check the verbosity before deciding if an error message should be returned
as part of the normal output.

Although this could delay the whole structured XML output, is it worth
creating a better API for returning script output that helps with creating
structured XML. I'd like to see a script return something like:

 - Output (normal output, what's displayed is based on verbosity)
 - Details (all of the information that the script can determine, no matter
what verbosity the user selected)
 - Errors (this will probably be blank most of the time)

The Output section could then include whatever's currently generated by the
scripts (they may require a small tweak), and any scripts that return error
messages could be modified to return the error in the Errors section. The
Output section would be the normal/usual block of text that we see (for
scripts using the vulnerability library, possibly without vuln.extra_info,
unless verbosity is raised?). Details would be all of the information known
(e.g. vuln.extra_info) and returned by the script that is converted to a
nice structure, e.g. proposal beta). For a script using the vulnerability
library, we presumably might see something like (apologies if I've made any
mistakes, I've done this by hand):

<script id="something-vuln-cve2012-nnnn" output=" VULNERABLE:
  Authentication bypass in something... (etc.)">
  <dict key="details">
    <elem key="title">Title</elem>
    <elem key="state">VULNERABLE</elem>
    <elem key="description">Some big long description</elem>
    <dict key="IDS">
      <elem key="CVE">CVE-2012-nnn1</elem>
      <elem key="CVE">CVE-2012-nnn2</elem>
    </dict>
    <dict key="dates">
      <dict key="disclosure">
        <elem key="year">2012</elem>
        <elem key="month">nn</elem>
        <elem key="day">n</elem>
      </dict>
    </dict>
    <dict key="references">
      <elem key="reference">http://example.com</elem>
      <elem key="reference">http://anotherexample.com</elem>
    </dict>
    <dict key="extra_info">
      <proposal beta/gamma magic goes here as appropriate>
    </dict>
  </dict>
</script>

Other scripts might do something like (then we can more easily determine
which scripts have errors and which scripts return useful results):

<script id="an-error" output="">
  <dict key="errors">
    <elem key="error">ERROR: Failed to connect to SMTP server.</elem>
  </dict>
</script>

Obviously, if we go with proposal gamma, we'd have table instead of dict
tags in my examples above. I'd started thinking about this before I posted
proposal gamma or caught up with all of the emails on the list. Apologies if
I've covered something that has already been discussed in the other emails.
I've literally just spotted in a much earlier email that you suggested a
"WARNINGS" (similar to my Errors) section that's only displayed if debugging
is enabled, so it sounds like we're both coming to similar conclusions.

Sorry for the lengthy email, I hope that was useful. You did (foolishly) ask
for feedback! :)

Rob


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

Current thread:

[RFC][patch] XML structured script output Daniel Miller (May 21)
- Re: [RFC][patch] XML structured script output Daniel Miller (May 24)
- Re: [RFC][patch] XML structured script output Djalal Harouni (May 27)
  - Re: [RFC][patch] XML structured script output Daniel Miller (May 27)
    - Re: [RFC][patch] XML structured script output Daniel Miller (May 29)
    - Re: [RFC][patch] XML structured script output Fyodor (Jun 03)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) David Fifield (Jun 13)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 14)
    - RE: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Rob Nicholls (Jun 29)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 29)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Patrick Donnelly (Jun 30)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 30)
    - Re: [RFC][patch] XML structured script output (output diff) David Fifield (Jun 13)
- Re: [RFC][patch] XML structured script output (summary of output changes) David Fifield (Jun 13)
- Re: [RFC][patch] XML structured script output (wiki page) David Fifield (Jun 13)