Nmap Development mailing list archives

Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch)


From: David Fifield <david () bamsoftware com>
Date: Wed, 13 Jun 2012 22:43:59 -0700

On Tue, May 29, 2012 at 03:30:25PM -0500, Daniel Miller wrote:
I'm attaching an update to this patch, since the Lua 5.2 update
changed a few things. Also, this update includes modifications to
the XSL stylesheet so that the output should look the same as it did
before.

I think that using stdnse.format_output as the conduit for XML output is
the wrong idea. On the one hand, it's nice because you're already
getting a semi-structured table. On the other hand, as you've seen,
scripts use it as a text formatting function (which it is), and many
scripts are going to be uselessly outputting <elem>unstructured
string</elem> until they are rewritten to add even more structure to
their structured output.

Also, it bothers me somewhat that the machine-readable keys are
prose-looking strings like "Not valid before", which are subject to
typos, capitalization changes, and localization tweaks. However I think
we could live with these problems.

format_output's input isn't particularly rich. It is good at formatting
text output, but I don't think we want it to limit what we can do with
XML output or have to extend it in weird ways. For example, I want there
to be <error> and <vuln> elements outside of the usual output table, and
I don't want an API that makes that difficult.

Really, what we want from XML script output is some reversible
representation of a Lua table. For example, take quake3-info. I think
that a nice Lua representation of the output would be

{
        players = {
                { name = "cyberix", frags = "20", ping = "4" },
        },
        options = {
                capturelimit = "8",
                dmflags = "0",
                elimflags = "0",
                fraglimit = "20",
                gamename = "baseoa",
        }
}

And this in turn might look like this in XML:

<script id="quake3-info">
  <dict>
    <list key="players">
      <dict>
        <elem key="name">cyberix</elem>
        <elem key="frags">20</elem>
        <elem key="ping">4</elem>
      </dict>
    </list>
    <dict key="options">
      <elem key="capturelimit">0</elem>
      <elem key="dmflags">0</elem>
      <elem key="elimflags">0</elem>
      <elem key="fraglimit">20</elem>
      <elem key="gamename">baseoa</elem>
    </dict>
  </dict>
</script>

Then, if I wanted to find all the servers on which cyberix is playing, I
could use a crazy xmlstarlet command like this:

xmlstarlet sel \
  -t -m '//port/script[@id="quake3-info"]//list[@key="players"]/dict[elem[@key="name"]="cyberix"]' \
  -v '../../../../../../address[@addrtype="ipv4"]/@addr' -n quake.xml

Notice that the XML output doesn't have to correspond exactly to the
text output. What I'm thinking is that we start allowing script to
return a table, not just a string. Tables will be pretty-printed and
indented to be copied to normal output, and turned into XML as shown
above. Scripts that return a string will not have any structured XML
output written at all. But: I think there should be a way to specify a
human-readable string and a machine-readable table/XML blob at once.

Suppose, for the moment, that we allow a script to return a {string,
table} pair. Then we show the string in normal output, and write the
table to XML. Scripts that don't care very much can return just a string
or just a table--we'll synthesize text output by pretty printing if we
get just a table. Maybe that will catch on and people will prefer their
normal output to look like that. But cases where we want normal output
and XML output to look different include nfs-ls, whose normal output
looks like this:

|   NFS Export: /mnt/nfs/files
|   NFS Access: Read Lookup NoModify NoExtend NoDelete NoExecute
|     PERMISSION  UID   GID   SIZE     MODIFICATION TIME  FILENAME
|     drwxr-xr-x  1000  100   4096     2010-06-17 12:28   /mnt/nfs/files
|     drwxr--r--  1000  1002  4096     2010-05-14 12:58   sources
|     -rw-------  1000  1002  23606    2010-06-17 12:28   notes

As a Lua table it might look like this:

{
        {
                export = "/mnt/nfs/files",
                access = {"Read", "Lookup", "NoModify", "NoExtend", "NoDelete", "NoExecute"},
                files = {
                        {perm = "1755", uid = "1000", gid = "100", size = "4096", mtime = "2010-06-17 12:28", name = 
"/mnt/nfs/files"},
                        {perm = "1744", uid = "1000", gid = "1002", size = "4096", mtime = "2010-05-14 12:58", name = 
"sources"},
                        {perm = "0600", uid = "1000", gid = "1002", size = "23606", mtime = "2010-06-17 12:28", name = 
"notes"},
                }
        }
}

Which would lead to XML like this:

<script id="nfs-ls">
  <list>
    <dict>
      <elem key="export">/mnt/nfs/files</elem>
      <list key="access">
        <elem>Read</elem><elem>Lookup</elem><elem>NoModify</elem>...
      </list>
      <list key="files">
        <dict>
          <elem key="perm">1755</elem><elem key="uid">1000</elem>...
        </dict>
        <dict>
          <elem key="perm">1744</elem><elem key="uid">1000</elem>...
        </dict>
        <dict>
          <elem key="perm">0600</elem><elem key="uid">1000</elem>...
        </dict>
      </list>
    </dict>
  </list>
</script>

Here I think it is very important both to (1) isolate individual datums
like the uid in the XML output, and (2) preserve a compact normal output
that looks like the output of ls.

So my idea is basically this: Scripts that don't have complex output can
continue to return a string, or else return a table that will be
formatted in a reasonable fashion. Scripts with specialized output needs
can build up a string and a table output simultaneously, and return them
both. In many cases, like in nfs-ls, the string can be derived from the
table by the script in one postprocessing step. (Sort of like how
ssl-cert.nse builds up a text output from the cert table. In processing
XML, I want something closer to the cert table than to the text output.)

One downside is that dictionary tables don't preserve ordering of
elements. Scripts that just return a table won't be able to control the
ordering of their output. I propose that we ignore this for simplicity.
The alternative of making an array containing tiny name-value tables,
while reasonable, is so cumbersome that I can't see people actually
doing it. I'm going to call this "proposal beta" on the wiki page.

David Fifield
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: