Nmap Development mailing list archives

[NSE] generic file parsing for datafiles.lua


From: jah <jah () zadkiel plus com>
Date: Sun, 24 Aug 2008 03:06:40 +0100

Hello,

Kris once suggested [1] that the file parsing code from whois.nse might
be placed in datafiles.lua and this sparked the idea of making the
datafiles library a generic file parser.  The attached represents my
efforts to this end so far.
I've come up with a basic scheme for parsing any file line-by-line and
presenting a table containing the captured information, much like is
done already.  I'd like to bounce it around before I go much further
with it so I shall make an attempt to desrcibe its usage.

First thing to say is that I haven't removed the existing cleverly-named
functions: parse_protocols(), parse_rpc(), and parse_services([proto]). 
These now wrap around the generic parse_file( filename [, ...] ).  They
could be removed in the future.

The idea is that a filename or path (relative to the directory
containing nmap's data files) is passed to the function along with a
table which describes the table desired in return.  The table passed
should contain patterns (with captures) which will be applied to each
line of a file using string.match().
This is best illustrated:

.
status, t = datafiles.parse_file( "nmap-services", {"^%s*([^%s#]+)%s+%d+"} )

The one table passed will result in the return of one table.  The one
pattern captures service names and will return an array of captured info
- each member from a single line.
t[1] = service name from the first matching line.

If the table passed contains an array of patterns then the returned
table will contain an array for each.  The table passed to the function
in the above example is the same as { [0] = "^%s*([^%s#]+)%s+%d+" } i.e.
the key is numerical.  A numerical key will always return an array. 
Similarly, if the key for a pattern is a literal string it's ignored
(because it's not useful) and an array is returned. A pattern without
captures is treated as a literal string.

.
status, t = datafiles.parse_file( "nmap-services",
{["^%s*[^%s#]+%s+(%d+)"] = "^%s*([^%s#]+)%s+%d+"} )

Here the table has a key-value pair of patterns (with captures) and from
this would be returned an associative array - in this case service names
indexed by port numbers.
t["80"] = http

.
parse_file( "nmap-services", {[function(ln) return tonumber( ln:match(
"^%s*[^%s#]+%s+(%d+)/tcp" ) ) end] = "^%s*([^%s#]+)%s+%d+/tcp"} )

The key or value may be a function which takes a line as its argument
and returns a captured value (only one value is accepted).
t[80] = http

.
We can pass a table containing a literal key when the corresponding
value is a table, which itself would return an array - associative or
otherwise.  So we can pass a table that results in:
t.tcp[80] = http
t.udp[53] = domain

.
It's possible to pass more than one table and a table is returned for
each.  Only a single filename may be specified, so the usefulness of
that is perhaps limited.
If you want, you can pass an empty table and it's assumed that you want
an array of complete lines.

.
The script defines common_files which contains tables for nmap data
files, each indexed by their filename.  This allows us to pass just the
filename which is looked-up:

parse_file( "nmap-services" )
t.tcp[80] = http

.
The second parameter in this case may be a key of the
common_files[filename] table:

status, t = parse_file( "nmap-services", "udp" )
t[53] = domain

.
Alles klar?

This is not polished code and I'm posting it so that folks can have a
fiddle - do expect there to be a bug or three.  I'm interested to know
if anybody has any thoughts on this approach (like, is it a bit
complicated) or ideas for a better one.

Cheers,

jah

[1] - http://seclists.org/nmap-dev/2008/q2/0898.html
--- The datafiles module provides functions for reading and parsing Nmap's
-- data files. For example nmap-protocol, nmap-rpc, etc. These functions'
-- return values are setup for use with exception handling via nmap.new_try().
-- @author Kris Katterjohn 03/2008
-- @author jah 08/2008

module(... or "datafiles", package.seeall)

local stdnse = require "stdnse"


---
-- Holds tables containing captures for common data files, indexed by filename.
-- @type table
-- @name common_files
local common_files = {
    ["nmap-rpc"]       = { [function(ln) return tonumber( ln:match( "^%s*[^%s#]+%s+(%d+)" ) ) end] = 
"^%s*([^%s#]+)%s+%d+" },
    ["nmap-protocols"] = { [function(ln) return tonumber( ln:match( "^%s*[^%s#]+%s+(%d+)" ) ) end] = 
"^%s*([^%s#]+)%s+%d+" },
    ["nmap-services"]  = { ["tcp"] = { [function(ln) return tonumber( ln:match( "^%s*[^%s#]+%s+(%d+)/tcp" ) ) end] = 
"^%s*([^%s#]+)%s+%d+/tcp" },
                           ["udp"] = { [function(ln) return tonumber( ln:match( "^%s*[^%s#]+%s+(%d+)/udp" ) ) end] = 
"^%s*([^%s#]+)%s+%d+/udp" }
    }

}


---
-- This function reads and parses Nmap's nmap-protocols file.
-- bool is a Boolean value indicating success. If bool is true, then the
-- second returned value is a table with protocol numbers indexing the
-- protocol names. If bool is false, an error message is returned as the
-- second value instead of the table.
-- @return bool, table|err
-- @see parse_file
parse_protocols = function()
  local status, protocols_table = parse_file("nmap-protocols")
  if not status then
    return false, "Error parsing nmap-protocols"
  end

  return true, protocols_table
end


---
-- This function reads and parses Nmap's nmap-rpc  file. bool is a
-- Boolean value indicating success. If bool is true, then the second
-- returned value is a table with RPC numbers indexing the RPC names.
-- If bool is false, an error message is returned as the second value
-- instead of the table.
-- @return bool, table|err
-- @see parse_file
parse_rpc = function()
  local status, rpc_table = parse_file("nmap-rpc")
  if not status then
    return false, "Error parsing nmap-rpc"
  end

  return true, rpc_table
end


---
-- This function reads and parses Nmap's nmap-services file.
-- bool is a Boolean value indicating success. If bool is true,
-- then the second returned value is a table containing two other
-- tables: tcp{} and udp{}. tcp{} contains services indexed by TCP port
-- numbers. udp{} is the same, but for UDP. You can pass "tcp" or "udp"
-- as an argument to parse_services() to only get the corresponding table.
-- If bool is false, an error message is returned as the second value instead
-- of the table.
-- @param protocol The protocol table to return.
-- @return bool, table|err
-- @see parse_file
parse_services = function(protocol)
  if protocol and protocol ~= "tcp" and protocol ~= "udp" then
    return false, "Bad protocol for nmap-services: use tcp or udp"
  end

  local status, services_table = parse_file("nmap-services", protocol)
  if not status then
    return false, "Error parsing nmap-services"
  end

  return true, services_table
end


---
-- Generic parsing of datafiles.  By supplying this function with a table containing captures to be applied to each line
-- of a datafile a table will be returned which mirrors the structure of the supplied table and which contains any 
captured
-- values.  A capture will be applied to each line using string.match() and may also be enclosed within a table or a 
function.
-- A function must accept a line as its paramater and should return one value derived from that line.

function parse_file( filename, ... )

  local data_struct

  -- must have a filename
  if type( filename ) ~= "string" or filename == "" then
    return false, "Error in datafiles.parse_file: No file to parse."
  end

  -- is filename a member of common_files? is second parameter a key in common_files or is it a table?
  if common_files[filename] then
    if type( arg[1] ) == "string" and common_files[filename][arg[1]] then
      data_struct = {{ [arg[1]] = common_files[filename][arg[1]] }}
    elseif type( arg[1] ) == "table" then
      data_struct = { arg[1] }
    else
      data_struct = { common_files[filename] }
    end
  end

  if type( data_struct ) ~= "table" then
    local t = {}
    for _, a in ipairs( arg ) do
      if type( a ) == "table" then
        if not next( a ) then a = { "^(.+)$" } end -- empty table? no problem, you'll get the whole line
        t[#t+1] = a
      end
    end
    if #t == 0 then
      return false, "Error in datafiles.parse_file: I've no idea how you want your data."
    end
    data_struct = t
  end

  -- get path to file (if it's with nmap datafiles) - no checking done here
  local status, filepath = get_filepath( filename )
  if not status then
    return false, ( "Error in datafiles.parse_file: %s." ):format( filepath ) -- error from get_filepath
  end

  -- get a table of lines
  local status, lines = read_from_file( filepath )
  if not status then
    return false, ( "Error in datafiles.parse_file: %s could not be read: %s." ):format( file, raw )
  end

  -- do the actual parsing
  local ret = {}
  for _, ds in ipairs( data_struct ) do
    status, ret[#ret+1] = parse_lines( lines, ds )
  end

  return true, unpack( ret )

end


---
-- Generic parsing of an array of strings.  By supplying this function with a table containing captures to be applied 
to each value
-- of a array-like table of strings a table will be returned which mirrors the structure of the supplied table and 
which contains any captured
-- values.  A capture will be applied to each array member using string.match() and may also be enclosed within a table 
or a function.
-- A function must accept an array member as its paramater and should return one value derived from that member.

function parse_lines( lines, data_struct  )

  if type( lines ) ~= "table" or #lines < 1 then
    return false, "Error in datafiles.parse_lines: No lines to parse."
  end

  if type( data_struct ) ~= "table" or not next( data_struct ) then
    return false, "Error in datafiles.parse_lines: No patterns for data capture."
  end

  local ret = {}

  -- return an array-like table of values captured from each line
  function get_array( v_pattern )
    local ret = {}
    for _, line in ipairs( lines ) do
      -- only process strings
      if type( line ) == "string" then
        local captured
        if type( v_pattern ) == "function" then
          captured = v_pattern( line )
        else
          captured = line:match( v_pattern )
        end
        ret[#ret+1] = captured
      end
    end
    return ret
  end

  -- return an associative array table of index-value pairs captured from each line
  function get_assoc_array( i_pattern, v_pattern )
    local ret = {}
    for _, line in ipairs(lines) do
      -- only process strings
      if type( line ) == "string" then
        if type(i_pattern) == "function" then
          index = i_pattern(line)
        else
          index = line:match(i_pattern)
        end
        if index and type(v_pattern) == "function" then
          ret[index] = v_pattern(line)
        elseif index then
          ret[index] = line:match(v_pattern)
        end
      end
    end
    return ret
  end


  -- traverse data_struct and enforce sensible index-value pairs.  Call functions to process the members of lines.
  for index, value in pairs( data_struct ) do
    if type(index) == nil then return false, "Error in datafiles.parse_lines: Invalid index." end
    if type(index) == "number" or ( type(index) == "string" and not index:match("%(") ) then
      if type(value) == "number" or ( type(value) == "string" and not value:match("%(") ) then
        return false, "Error in datafiles.parse_lines: No patterns for data capture."
      elseif type(value) == "string" or type(value) == "function" then
        ret = get_array( value )
      elseif type(value) == "table" then
        _, ret[index] = parse_lines( lines, value )
      else
        -- TEMP
        print(type(index), "unexpected value", type(value))
      end
    elseif type(index) == "string" or type(index) == "function"  then
      ret = get_assoc_array( index, value )
    else
      -- TEMP
      print("unexpexted index", type(index), type(value))
    end
  end


  return true, ret

end


---
-- Reads a file, line by line, into a table.
-- @param file  String representing a filepath.
-- @return      Boolean True on success, False on error
-- @return      Table (array-style) of lines read from the file or error message in case of an error.

function read_from_file( file )

  if type( file ) ~= "string" or file == "" then
    return false, "Error in datafiles.read_from_file: Expected file as a string."
  end

  local f, err, _ = io.open( file, "r" )
  if not f then
    return false, ( "Error in datafiles.read_from_file: Cannot open %s for reading: %s" ):format( file, err )
  end

  local line, ret = nil, {}
  while true do
    line = f:read()
    if not line then break end
    ret[#ret+1] = line
  end

  f:close()

  return true, ret

end


---
-- Gets the path to filename.
function get_filepath( filename )
  local ff = { "nmap-rpc", "nmap-services", "nmap-protocols" }
  for _, f in pairs( ff ) do
    local path = nmap.fetchfile( f )
    if path then
      return true, ( path:sub( 1, #path - #f ) .. filename )
    end
  end
  return false, "Error in datafiles.get_filepath: Can't find nmap datafiles" -- ?
end

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: