Nmap Development mailing list archives
[NSE] Child Coroutine Patch with Explanation
From: Patrick Donnelly <batrick () batbytes com>
Date: Thu, 4 Jun 2009 17:46:53 -0600
This thread attempts to answer a problem that came up on the NSE IRC meeting. We are attempting to solve the issue of a script using coroutines that make blocking calls on network objects. This e-mail attempts to give a detailed overview of the problem, some potential solutions, and example use cases for a proposed solution. ==== First some terminology: o A thread (lowercase) is a Lua coroutine that runs script code. o A Thread (uppercase) is a "class" representing a Script thread and all its data in the script engine (nse_main.lua). o I refer to the "base thread" as the thread which NSE assigns to the Script when run. This is the coroutine we make when we call Script:new_thread(...). When we run the base thread, remember that it may create its own coroutines that it may then run. If this coroutine were to make an nsock function call or a blocking call on a mutex, the yield, initiated by NSE, would incorrectly go to the base coroutine and not NSE itself. Observe here (nse_main.lua): for co, thread in pairs(running) do current, running[co] = thread, nil; cnse.startTimeOutClock(thread.host); local s, result = resume(co, unpack(thread.args, 1, thread.args.n)); We resume (start) the coroutine 'co'. We are blocked until this coroutine, base thread, yields or returns normally. Now, one wonders why we must resume control when the base thread's child coroutine is yielded by NSE. Perhaps we should allow the Script to continue running and resume the child coroutine later. We do want to encourage parallelism, right? Well, we run into the problem of having a script not knowing when to resume the child again, that is, when will NSE finish whatever operation caused the child coroutine to yield? The script could unknowingly just resume the script only to erroneously cause the completion of whatever NSE function the child coroutine called. Consider a slightly modified version of the script I had made in Section III of my original nse-lua post: <file = "cotest.nse"> author = "patrick" description = "coroutine test!" categories = {"default"} require "shortport" portrule = shortport.port_or_service(22, "ssh") function a (host, port) local try = nmap.new_try(); local socket = nmap.new_socket(); try(socket:connect(host.ip, port.number)); return "connected!"; end function action (host, port) local co = coroutine.create(a); print(coroutine.resume(co, host, port)); print(coroutine.resume(co, false, "some random error")); return "done"; end </file> Notice that the action function resumes the script twice. We would result (in the current implementation) with this output: batrick@batbytes:~/nmap/svn/nmap$ ./nmap --script cotest localhost Starting Nmap 4.85BETA9 ( http://nmap.org ) at 2009-06-03 05:31 MDT true false some random error Interesting ports on localhost (127.0.0.1): Not shown: 998 closed ports PORT STATE SERVICE 22/tcp open ssh |_ cotest: done 80/tcp open http Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds As I noted in my original post about this, the problem is very subtle. Note that a C function, such as nsock:connect() yields AND returns: return lua_yield(L, 0); What does this mean to us? (Please be familiar with the coroutine resume and yield functions in the Lua manual). Well, the function will yield its results which are retrieved by the coroutine which initiated the resume (usually, NSE). Note that in this case, we yield 0 values. The base thread, in the action function, thus prints simply "true" (coroutine.resume first return value is the status). Normally, NSE will resume the thread with the return values of nsock:connect(); resuming the thread causes the values passed to resume _to be the return values of nsock:connect()_. However, the script resumes the child coroutine itself. You'll notice that the script passes two values to the child coroutine, nil and "some random error". This means that these two values will be the return values for nsock:connect() in function 'a'. This will cause the try function to raise an error; the nil return from nsock:connect will be interpreted as an error. "some random error", shall be the error message. Thus you see "false some random error" as the printed return values from coroutine.resume. So I hope I've illustrated the problem. We have a two (AFAICT) options to consider: o NSE adds any coroutines that it yields to the waiting queue and shall run them later. We can return some special value indicating to the script that NSE has yielded its worker coroutine. o Propagate the yield up to the base thread. NSE may then resume operation and add the base thread to the waiting queue. NSE will later resume the base thread. This base thread will resume the entire chain of threads until the thread calling the NSE function is resumed. See lines of 161 to 297 of my patch. The question is: should NSE just add anything that yields to the waiting queue? If so, any of these coroutines that are yielded will be eventually resumed by NSE, just like any base thread. However, how will a script _know_ when to resume the child coroutine? There are a couple approaches to doing this, here's two that come to mind: (1) The base thread could use nmap.sleep and check periodically or (2) the base thread can use a condition variable (currently unimplemented) and wait until the child coroutine signals completion. Both of these come with a "deadlock" problem that the child coroutine ends (in error) for some reason and the base thread waits indefinitely for some predicate to become true, which never does. Ultimately, allowing child coroutines to be resumed in the background appears as magic from the point of view of the script writer and carries immense weight in managing these child coroutines. I feel this is definitely NOT the way to go. Instead, I prefer the second option as it allows NSE to properly assume control once again as it does generally. To script writers, there is no change necessary for their scripts and no "checking" (overhead) involved; scripts never notice the yielding of control when making socket operations and this would still be the case. Now for an example of why I think propagating the NSE initiated yield is a good thing to add to NSE. Consider the html spider NSE script. We want to parse a tree of documents and files hosted on a web server. Usually, we will recursively descend as we find more html documents with possibly links to more files. Here is a model of such a function: function parse_html (file_string) for link in links(file_string) do if link_is_html_file(link) then parse_html (http.get(host, port, link)); else -- add the link to the class of objects we are gathering information on, e.g. jpegs end end end Here's the problem for a flexible NSE script: how do we suddenly stop parsing once we have determined we are done? We have a complicated chain of functions calls to go through. We want to jump back to the start and return our results to NSE (like a longjmp in C). We can accomplish this by using a coroutine to parse the chain of html files. When parse_html, at the end of a long chain of such calls, has determined we are done working on the web server, we yield back to the caller with some arbitrary return status. The action function, or base thread, would then return some engineered results for the user. With the current NSE system, we cannot do this because this script's coroutine cannot make calls on sockets. I believe this is a sufficiently good example of a use case (one I intend to use later when working on the spider script with Joao). I also thought of a second use case. We have the function socket:receive_buf [1]. This function currently has a complex C implementation that manages the state of the buffer and iteration in data structures kept in the Lua registry. The overhead of tracking the buffer makes the code overly complex. Similar to the producer and consumer problem which is a classic use case for coroutines, this could be solved in a remarkably simple way in Lua. (This is for illustrative purposes and is untested. It does not handle the case where the delimiter is a function or allows the buffer to be used in the other socket functions, although this is not too hard to fix.): function receive_buf (socket, delimiter, keeppattern) local socket_env = debug.getfenv(socket); -- get the socket's environment table if not socket_env.receive_buf_reader then local function iterate (buf) buf = buf .. assert(socket:receive_bytes(64)); while true do local i, j = string.find(buf, delimiter); if not i then return iterate(buf) end -- need more data if keeppattern then yield(string.sub(buf, 1, j)); else yield(string.sub(buf, 1, i-1)); end buf = string.sub(buf, j+1); end end socket_env.receive_buf_reader = coroutine.create(iterate); end return coroutine.resume(socket_env.receive_buf_reader, ""); end While this might not necessarily be preferable to our current implementation, you should note that it illustrates how these coroutines can be used as producers/consumers by our scripts. I hope it is now clear why this coroutine fix should be applied. ==== Now for the separate problem of allowing scripts to do multiple socket operations at once. This would be useful to brute-force scripts that need to utilize multiple sockets simultaneously to finish in any reasonable amount of time. I did create a system for this in nse-lua that I believed to be sufficient: == nse.new_thread(func, ...) == A coroutine run manually by a script thread will propagate the yield through the parent back to NSE. To avoid this, you may use nse.new_thread to create a thread which is autonomous from the parent. The function launches a new thread (coroutine) that is managed by NSE. The extra parameters to new_thread are passed to the function 'func'. All errors are ignored and return values discarded by NSE. (<-- Very important so the new thread is not interpreted as an actual script.) Here was the implementation which is very short and clear: function _G.nse.new_thread (func, ...) assert(type(func) == "function", "function expected"); local co = create(function(...) func(...) end); -- do not return results print_debug(2, "%s spawning new thread (%s).", current.parent.info, tostring(co)); total, pending[co] = total + 1, setmetatable({ co = co, args = {n = select("#", ...), ...}, host = current.host, port = current.port, parent = current.parent, d = function() end, -- output no debug information }, {__index = Thread}); return co; end This function adds to pending (which is a table of threads ready to be moved to running) a new base thread that NSE will run. The main difference between this new thread and any normal script thread is all return values (such as the string indicating a result to be placed in the host/port script results) are discarded and errors ignored (the thread:d() function is a NOP function). I believe this is the optimal way to allow parallel worker threads to do work with multiple sockets. ==== [1] http://nmap.org/nsedoc/modules/nmap.html#receive_buf -- -Patrick Donnelly "Let all men know thee, but no man know thee thoroughly: Men freely ford that see the shallows." - Benjamin Franklin _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- [NSE] Child Coroutine Patch with Explanation Patrick Donnelly (Jun 04)
- Re: [NSE] Child Coroutine Patch with Explanation Fyodor (Jun 04)
- Re: [NSE] Child Coroutine Patch with Explanation Patrick Donnelly (Jun 04)
- Re: [NSE] Child Coroutine Patch with Explanation David Fifield (Jun 08)
- Re: [NSE] Child Coroutine Patch with Explanation Patrick Donnelly (Jun 29)
- Re: [NSE] Child Coroutine Patch with Explanation Fyodor (Jun 04)