Nmap Development mailing list archives

Lua bugfixes and a new buffering feature


From: doug () hcsw org
Date: Sat, 23 Jun 2007 05:19:12 -0700

Hi nmap-dev!

I just found 2 showstopper bugs in the PCRE-Lua interface, fixed them
and committed the fixes to SVN. It seems to work fine now although the
documentation is still hopelessly insufficient for anybody that
doesn't know how to read the C source code. :)

The REAL required interface is:

my_regex = pcre.new("my PCRE pattern", 0, "C")

my_regex:exec(string_to_match_against, 0, 0)

I am caching the compiled PCRE regexps into the NSE registry using
a fairly straightfoward scheme:


init = function()
  -- Start of MOTD, we'll take the server name from here
  nmap.registry.ircserverinfo_375 = nmap.registry.ircserverinfo_375
    or pcre.new("^:([\\w-_.]+) 375", 0, "C")

  -- NICK already in use
  nmap.registry.ircserverinfo_433 = nmap.registry.ircserverinfo_433
    or pcre.new("^:[\\w-_.]+ 433", 0, "C")

...


Then I'm having the action() function (NOT the portrule function) call
init() so that these regexps are compiled at most once per Nmap
invocation and only then if the action() function for the script is
actually called.

Perhaps it would be useful to look for an init function which is called
only once per script per nmap invocation and only right before action()
is called? Another solution we should consider is passing a table to the
action function that scripts can use for cross-invocation persistent data
structures. This would avoid any possible registry conflict problems
(every script would have its own table if it wanted it). I don't know if
better registry naming is required or not.


IMPORTANT NOTE FOR NSE SCRIPT WRITERS: Don't use the function
receive_lines() unless you plan on doing your own line parsing.
This function WILL RETURN MORE THAN JUST THE FIRST LINE OF DATA
IF MORE IS AVAILABLE.

This can be a problem in many scenarios. Most often with NSE you will
just miss pieces of data that you don't care about anyways. But sometimes
you will miss important lines or you will actually PROCESS AN INCOMPLETE
LINE that just happened to be delivered with another line and/or crossed
a read() boundary.

Consider an application that executes this code to send data to
your NSE script:

write(sd, "hello\nworld\n", 12);

Since write knows nothing about newlines, this will be bundled up in one
packet and both lines will probably be delivered in the same read() call
(which also knows nothing about newlines) by your OS. This means if you
are looping for the output in, say, a while loop...

  while true do
    my_line = sd:receive_lines(1)
    ... my_line will probably be "hello\nworld\n" NOT "hello\n".
    ... If we process just hello we would miss world!

Or even more insidiously, if the packet got split in the middle and you
had "hello\nwo" delivered. Unless you store that "wo" for the next call
you will be working with incomplete or wrong data.


The way some NSE scripts deal with this (see showHTTPVersion.nse) is
by keeping a string "response" and appending all data to the end of
that and then running regexps on the response at every step to see if
any match. This method will work fine for some tasks.

But if you want to reliably process data line-by-line as it arrives you
need to use something called a "buffer". The most straightforward way to
implement this in modern languages is by using a closure. Although I
personally find Lua syntax very cumbersome and verbose, Lua does offer a
powerful set of primitives that are, in my opinion, vital to and sufficient
for productive programming: lexical closures, tail-call optimisation, and
dynamic typing.

If the concept of closures frightens you, you can probably get away with
thinking about them like objects: a closure is sort of an object with exactly
one method: "apply". ;)

I'm including a fairly general closure-based buffer implementation that I
am using in my IRC script to process data on a line-by-line basis. Assuming
you have a socket sd you use it like so:

my_buffer = make_buffer(sd, "[\r\n]+")

and then

status, value = my_buffer()

status and value are the same as for read_lines(1) (except see the comments).

As you can see it is useful for much more than just lines (anything separated
by something you can write a lua pattern for). Barring any so-far unnoticed bugs
this should be a very safe, reliable way to parse line-based protocols and
I suggest we put it (or something like it) into the NSE standard library.
Empty lines currently aren't returned which could be a problem for some protocols
(like HTTP) but this is a tiny tweak.

Best,

Doug

PS. It has just come to me that maybe the best pattern to use for regular
newlines might be "\r?\n" instead of "[\r\n]+"! Oh well. :)



-- Generic buffer implementation using lexical closures
--
-- Pass make_buffer a socket and a separator lua pattern [1]
--
-- Returns a function bound to your provided socket with behaviour identical
-- to receive_lines() except it will return AT LEAST ONE [2] and AT MOST ONE "line".
-- The data is returned WITHOUT the pattern/newline on the end.
-- Empty "lines" ARE NOT RETURNED.
--
-- [1] Use the pattern "[\r\n]+" for regular newlines
-- [2] Except where there is trailing "left over" data not terminated by a pattern
--     (in which case you get the data anyways)
--
-- -Doug, June, 2007

make_buffer = function(sd, sep)
  local self, result
  local buf = ""

  self = function()
    local i, j, status, value

    i, j = string.find(buf, sep)

    if i then
      if i == 1 then  -- empty line
        buf = string.sub(buf, j+1, -1)
        return self() -- tail
      else
        value = string.sub(buf, 1, i-1)
        buf = string.sub(buf, j+1, -1)
        return true, value
      end
    end

    if result then
      if string.len(buf) > 0 then  -- left over data with no terminating pattern
        value = buf
        buf = ""
        return true, value
      end
      return nil, result
    end

    status, value = sd:receive()

    if status then
      buf = buf .. value
    else
      result = value
    end

    return self() -- tail
  end

  return self
end

Attachment: signature.asc
Description: Digital signature


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: