Nmap Development mailing list archives
Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch)
From: Daniel Miller <bonsaiviking () gmail com>
Date: Thu, 14 Jun 2012 06:43:26 -0500
David, A few comments inlined below... On Thu, Jun 14, 2012 at 12:43 AM, David Fifield <david () bamsoftware com> wrote:
I think that using stdnse.format_output as the conduit for XML output is the wrong idea. On the one hand, it's nice because you're already getting a semi-structured table. On the other hand, as you've seen, scripts use it as a text formatting function (which it is), and many scripts are going to be uselessly outputting <elem>unstructured string</elem> until they are rewritten to add even more structure to their structured output.
To clarify, the purpose of stdnse.format_output in the current state of the patch is to validate the table passed to it. Very little processing is done, so a script could just return the table itself. My motivation for doing it this way was to avoid breaking any scripts; the XML structure would be sort of "opt-in" at first. I realize that this has the major downside of encouraging laziness that would spread the changes to more structured output over a long time, frustrating programmers who want to parse it.
Also, it bothers me somewhat that the machine-readable keys are prose-looking strings like "Not valid before", which are subject to typos, capitalization changes, and localization tweaks. However I think we could live with these problems. format_output's input isn't particularly rich. It is good at formatting text output, but I don't think we want it to limit what we can do with XML output or have to extend it in weird ways. For example, I want there to be <error> and <vuln> elements outside of the usual output table, and I don't want an API that makes that difficult.
If we are open to the idea of a new API (breaking backwards-compatibility), then quite a few constraints I was working under are removed.
Really, what we want from XML script output is some reversible representation of a Lua table. For example, take quake3-info. I think that a nice Lua representation of the output would be { players = { { name = "cyberix", frags = "20", ping = "4" }, }, options = { capturelimit = "8", dmflags = "0", elimflags = "0", fraglimit = "20", gamename = "baseoa", } } And this in turn might look like this in XML: <script id="quake3-info"> <dict> <list key="players"> <dict> <elem key="name">cyberix</elem> <elem key="frags">20</elem> <elem key="ping">4</elem> </dict> </list> <dict key="options"> <elem key="capturelimit">0</elem> <elem key="dmflags">0</elem> <elem key="elimflags">0</elem> <elem key="fraglimit">20</elem> <elem key="gamename">baseoa</elem> </dict> </dict> </script> Then, if I wanted to find all the servers on which cyberix is playing, I could use a crazy xmlstarlet command like this: xmlstarlet sel \ -t -m '//port/script[@id="quake3-info"]//list[@key="players"]/dict[elem[@key="name"]="cyberix"]' \ -v '../../../../../../address[@addrtype="ipv4"]/@addr' -n quake.xml Notice that the XML output doesn't have to correspond exactly to the text output. What I'm thinking is that we start allowing script to return a table, not just a string. Tables will be pretty-printed and indented to be copied to normal output, and turned into XML as shown above. Scripts that return a string will not have any structured XML output written at all. But: I think there should be a way to specify a human-readable string and a machine-readable table/XML blob at once.
Fortunately, the patch in its current state handles return values of all types (but requires a modified version of the stdnse.format_output-style table).
Suppose, for the moment, that we allow a script to return a {string, table} pair. Then we show the string in normal output, and write the table to XML. Scripts that don't care very much can return just a string or just a table--we'll synthesize text output by pretty printing if we get just a table. Maybe that will catch on and people will prefer their normal output to look like that. But cases where we want normal output and XML output to look different include nfs-ls, whose normal output looks like this:
I was trying to get this flexibility in text/normal output with the ScriptDisplay_t enumeration, which I declared to include TABLE display type, but did not implement. The other missing piece would be an API for script authors to indicate which display format to use.
| NFS Export: /mnt/nfs/files | NFS Access: Read Lookup NoModify NoExtend NoDelete NoExecute | PERMISSION UID GID SIZE MODIFICATION TIME FILENAME | drwxr-xr-x 1000 100 4096 2010-06-17 12:28 /mnt/nfs/files | drwxr--r-- 1000 1002 4096 2010-05-14 12:58 sources | -rw------- 1000 1002 23606 2010-06-17 12:28 notes As a Lua table it might look like this: { { export = "/mnt/nfs/files", access = {"Read", "Lookup", "NoModify", "NoExtend", "NoDelete", "NoExecute"}, files = { {perm = "1755", uid = "1000", gid = "100", size = "4096", mtime = "2010-06-17 12:28", name = "/mnt/nfs/files"}, {perm = "1744", uid = "1000", gid = "1002", size = "4096", mtime = "2010-05-14 12:58", name = "sources"}, {perm = "0600", uid = "1000", gid = "1002", size = "23606", mtime = "2010-06-17 12:28", name = "notes"}, } } } Which would lead to XML like this: <script id="nfs-ls"> <list> <dict> <elem key="export">/mnt/nfs/files</elem> <list key="access"> <elem>Read</elem><elem>Lookup</elem><elem>NoModify</elem>... </list> <list key="files"> <dict> <elem key="perm">1755</elem><elem key="uid">1000</elem>... </dict> <dict> <elem key="perm">1744</elem><elem key="uid">1000</elem>... </dict> <dict> <elem key="perm">0600</elem><elem key="uid">1000</elem>... </dict> </list> </dict> </list> </script> Here I think it is very important both to (1) isolate individual datums like the uid in the XML output, and (2) preserve a compact normal output that looks like the output of ls. So my idea is basically this: Scripts that don't have complex output can continue to return a string, or else return a table that will be formatted in a reasonable fashion. Scripts with specialized output needs can build up a string and a table output simultaneously, and return them both. In many cases, like in nfs-ls, the string can be derived from the table by the script in one postprocessing step. (Sort of like how ssl-cert.nse builds up a text output from the cert table. In processing XML, I want something closer to the cert table than to the text output.) One downside is that dictionary tables don't preserve ordering of elements. Scripts that just return a table won't be able to control the ordering of their output. I propose that we ignore this for simplicity. The alternative of making an array containing tiny name-value tables, while reasonable, is so cumbersome that I can't see people actually doing it. I'm going to call this "proposal beta" on the wiki page.
The array-of-dicts is how the current patch expects this to be done, for this very reason. I don't think ordering matters for XML, but I think a requirement for normal output should be identical ordering for each run (so that output can be diffed).
David Fifield
I'll try to use the wiki page to expand on these ideas, but I may not get to much of it today. This has been a long time coming, so I feel it's more important to get it right than to rush into it. To the nmap-dev list at large: Please join the discussion if you have opinions or suggestions. The outcome will affect everyone, and I know there are smarter people than me reading this. Dan _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- [RFC][patch] XML structured script output Daniel Miller (May 21)
- Re: [RFC][patch] XML structured script output Daniel Miller (May 24)
- Re: [RFC][patch] XML structured script output Djalal Harouni (May 27)
- Re: [RFC][patch] XML structured script output Daniel Miller (May 27)
- Re: [RFC][patch] XML structured script output Daniel Miller (May 29)
- Re: [RFC][patch] XML structured script output Fyodor (Jun 03)
- Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) David Fifield (Jun 13)
- Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 14)
- RE: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Rob Nicholls (Jun 29)
- Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 29)
- Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Patrick Donnelly (Jun 30)
- Re: [RFC][patch] XML structured script output (evaluation of nse-structured3 patch) Daniel Miller (Jun 30)
- Re: [RFC][patch] XML structured script output Daniel Miller (May 27)
- Re: [RFC][patch] XML structured script output (output diff) David Fifield (Jun 13)