Nmap Development mailing list archives

Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch)

From: Daniel Miller <bonsaiviking () gmail com>
Date: Thu, 14 Jun 2012 06:43:26 -0500

David,

A few comments inlined below...

On Thu, Jun 14, 2012 at 12:43 AM, David Fifield <david () bamsoftware com> wrote:


I think that using stdnse.format_output as the conduit for XML output is
the wrong idea. On the one hand, it's nice because you're already
getting a semi-structured table. On the other hand, as you've seen,
scripts use it as a text formatting function (which it is), and many
scripts are going to be uselessly outputting <elem>unstructured
string</elem> until they are rewritten to add even more structure to
their structured output.


To clarify, the purpose of stdnse.format_output in the current state
of the patch is to validate the table passed to it. Very little
processing is done, so a script could just return the table itself. My
motivation for doing it this way was to avoid breaking any scripts;
the XML structure would be sort of "opt-in" at first. I realize that
this has the major downside of encouraging laziness that would spread
the changes to more structured output over a long time, frustrating
programmers who want to parse it.

Also, it bothers me somewhat that the machine-readable keys are
prose-looking strings like "Not valid before", which are subject to
typos, capitalization changes, and localization tweaks. However I think
we could live with these problems.

format_output's input isn't particularly rich. It is good at formatting
text output, but I don't think we want it to limit what we can do with
XML output or have to extend it in weird ways. For example, I want there
to be <error> and <vuln> elements outside of the usual output table, and
I don't want an API that makes that difficult.


If we are open to the idea of a new API (breaking
backwards-compatibility), then quite a few constraints I was working
under are removed.

Really, what we want from XML script output is some reversible
representation of a Lua table. For example, take quake3-info. I think
that a nice Lua representation of the output would be

{
       players = {
               { name = "cyberix", frags = "20", ping = "4" },
       },
       options = {
               capturelimit = "8",
               dmflags = "0",
               elimflags = "0",
               fraglimit = "20",
               gamename = "baseoa",
       }
}

And this in turn might look like this in XML:

<script id="quake3-info">
 <dict>
   <list key="players">
     <dict>
       <elem key="name">cyberix</elem>
       <elem key="frags">20</elem>
       <elem key="ping">4</elem>
     </dict>
   </list>
   <dict key="options">
     <elem key="capturelimit">0</elem>
     <elem key="dmflags">0</elem>
     <elem key="elimflags">0</elem>
     <elem key="fraglimit">20</elem>
     <elem key="gamename">baseoa</elem>
   </dict>
 </dict>
</script>

Then, if I wanted to find all the servers on which cyberix is playing, I
could use a crazy xmlstarlet command like this:

xmlstarlet sel \
 -t -m '//port/script[@id="quake3-info"]//list[@key="players"]/dict[elem[@key="name"]="cyberix"]' \
 -v '../../../../../../address[@addrtype="ipv4"]/@addr' -n quake.xml

Notice that the XML output doesn't have to correspond exactly to the
text output. What I'm thinking is that we start allowing script to
return a table, not just a string. Tables will be pretty-printed and
indented to be copied to normal output, and turned into XML as shown
above. Scripts that return a string will not have any structured XML
output written at all. But: I think there should be a way to specify a
human-readable string and a machine-readable table/XML blob at once.


Fortunately, the patch in its current state handles return values of
all types (but requires a modified version of the
stdnse.format_output-style table).

Suppose, for the moment, that we allow a script to return a {string,
table} pair. Then we show the string in normal output, and write the
table to XML. Scripts that don't care very much can return just a string
or just a table--we'll synthesize text output by pretty printing if we
get just a table. Maybe that will catch on and people will prefer their
normal output to look like that. But cases where we want normal output
and XML output to look different include nfs-ls, whose normal output
looks like this:


I was trying to get this flexibility in text/normal output with the
ScriptDisplay_t enumeration, which I declared to include TABLE display
type, but did not implement. The other missing piece would be an API
for script authors to indicate which display format to use.


|   NFS Export: /mnt/nfs/files
|   NFS Access: Read Lookup NoModify NoExtend NoDelete NoExecute
|     PERMISSION  UID   GID   SIZE     MODIFICATION TIME  FILENAME
|     drwxr-xr-x  1000  100   4096     2010-06-17 12:28   /mnt/nfs/files
|     drwxr--r--  1000  1002  4096     2010-05-14 12:58   sources
|     -rw-------  1000  1002  23606    2010-06-17 12:28   notes

As a Lua table it might look like this:

{
       {
               export = "/mnt/nfs/files",
               access = {"Read", "Lookup", "NoModify", "NoExtend", "NoDelete", "NoExecute"},
               files = {
                       {perm = "1755", uid = "1000", gid = "100", size = "4096", mtime = "2010-06-17 12:28", name = 
"/mnt/nfs/files"},
                       {perm = "1744", uid = "1000", gid = "1002", size = "4096", mtime = "2010-05-14 12:58", name = 
"sources"},
                       {perm = "0600", uid = "1000", gid = "1002", size = "23606", mtime = "2010-06-17 12:28", name = 
"notes"},
               }
       }
}

Which would lead to XML like this:

<script id="nfs-ls">
 <list>
   <dict>
     <elem key="export">/mnt/nfs/files</elem>
     <list key="access">
       <elem>Read</elem><elem>Lookup</elem><elem>NoModify</elem>...
     </list>
     <list key="files">
       <dict>
         <elem key="perm">1755</elem><elem key="uid">1000</elem>...
       </dict>
       <dict>
         <elem key="perm">1744</elem><elem key="uid">1000</elem>...
       </dict>
       <dict>
         <elem key="perm">0600</elem><elem key="uid">1000</elem>...
       </dict>
     </list>
   </dict>
 </list>
</script>

Here I think it is very important both to (1) isolate individual datums
like the uid in the XML output, and (2) preserve a compact normal output
that looks like the output of ls.

So my idea is basically this: Scripts that don't have complex output can
continue to return a string, or else return a table that will be
formatted in a reasonable fashion. Scripts with specialized output needs
can build up a string and a table output simultaneously, and return them
both. In many cases, like in nfs-ls, the string can be derived from the
table by the script in one postprocessing step. (Sort of like how
ssl-cert.nse builds up a text output from the cert table. In processing
XML, I want something closer to the cert table than to the text output.)

One downside is that dictionary tables don't preserve ordering of
elements. Scripts that just return a table won't be able to control the
ordering of their output. I propose that we ignore this for simplicity.
The alternative of making an array containing tiny name-value tables,
while reasonable, is so cumbersome that I can't see people actually
doing it. I'm going to call this "proposal beta" on the wiki page.


The array-of-dicts is how the current patch expects this to be done,
for this very reason. I don't think ordering matters for XML, but I
think a requirement for normal output should be identical ordering for
each run (so that output can be diffed).

David Fifield


I'll try to use the wiki page to expand on these ideas, but I may not
get to much of it today. This has been a long time coming, so I feel
it's more important to get it right than to rush into it. To the
nmap-dev list at large: Please join the discussion if you have
opinions or suggestions. The outcome will affect everyone, and I know
there are smarter people than me reading this.

Dan
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

Current thread:

[RFC][patch] XML structured script output Daniel Miller (May 21)
- Re: [RFC][patch] XML structured script output Daniel Miller (May 24)
- Re: [RFC][patch] XML structured script output Djalal Harouni (May 27)
  - Re: [RFC][patch] XML structured script output Daniel Miller (May 27)
    - Re: [RFC][patch] XML structured script output Daniel Miller (May 29)
    - Re: [RFC][patch] XML structured script output Fyodor (Jun 03)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) David Fifield (Jun 13)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 14)
    - RE: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Rob Nicholls (Jun 29)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 29)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Patrick Donnelly (Jun 30)
    - Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 30)
    - Re: [RFC][patch] XML structured script output (output diff) David Fifield (Jun 13)
- Re: [RFC][patch] XML structured script output (summary of output changes) David Fifield (Jun 13)
- Re: [RFC][patch] XML structured script output (wiki page) David Fifield (Jun 13)