Nmap Development mailing list archives

[NSE] Patch to NSE Nsock library binding to eliminate dependency on GC for freeing of socket locks


From: Patrick Donnelly <batrick () batbytes com>
Date: Mon, 8 Jun 2009 05:00:48 -0600

NSE Nsock library binding allocates --max-parallelism or 10 (whichever
is higher) socket "locks" to threads. A thread that obtains a socket
lock is able to connect as many sockets as it needs. This patch
attempts to solve a CPU hungry implementation that facilitates this.

A little background: currently NSE uses the Lua Garbage Collector to
track the amount of connected sockets that have been allocated. We use
the garbage collector with weak tables to monitor the number of
sockets a thread currently has open. Specifically, two tables
associate sockets and threads with a proxy userdata. The comment
documenting these tables should prove enlightening:

/* Some constants used for enforcing a limit on the number of open sockets
 * in use by threads. The maximum value between MAX_PARALLELISM and
 * o.maxparallelism is the max # of threads that can have connected sockets
 * (open). THREAD_PROXY, SOCKET_PROXY, and CONNECT_WAITING are tables in the
 * nsock C functions' environment, at LUA_ENVIRONINDEX, that hold sockets and
 * threads used to enforce this. THREAD_PROXY has <Thread, Userdata> pairs
 * that associate a thread to a proxy userdata. This table has weak keys and
 * values so threads and the proxy itself can be collected. SOCKET_PROXY
 * has <Socket, Userdata> pairs that associate a socket to a proxy userdata.
 * SOCKET_PROXY has weak keys (to allow the collection of sockets) and strong
 * values, so the proxies are not collected when an associated socket is open.
 *
 * All the sockets used by a thread have the same Proxy Userdata. When all
 * sockets in use by a thread are closed or collected, the entry in the
 * THREAD_PROXY table is cleared, freeing up a slot for another thread
 * to make connections. When a slot is freed, proxy_gc is called, via the
 * userdata's __gc metamethod, which will add a thread in WAITING to running.
 */
#define MAX_PARALLELISM   10
#define THREAD_PROXY       1           /* <Thread, Userdata> */
#define SOCKET_PROXY       2           /* <Socket, Userdata> */
#define CONNECT_WAITING    3           /* Threads waiting to lock */
#define PROXY_META         4           /* Proxy userdata's metatable */

Perhaps the only point of confusion is that the THREAD_PROXY table
size (the number of elements in the table, not the size of the array)
indicates the number of socket locks allocated.

This approach has a significant problem in that it requires NSE to
constantly make full garbage collections to release these socket locks
in a timely manner. The association between the thread and its proxy
userdata in THREAD_PROXY will persist for at least one garbage
collection cycle even after all socket proxy associations in
SOCKET_PROXY have been deleted. This results in exceptionally large
amounts of CPU time being consumed on behalf of garbage collection. I
have constructed a small bash script which creates one thousand copies
of the banner.nse script in a specified directory. You may run this
code to observe the problem:

svn co --username guest --password ""
svn://svn.insecure.org/nmap@13614 nse-nsock
cd nse-nsock
./configure && make
mkdir lots_scripts
# copy lots_scripts.sh to $PWD
chmod u+x lots_scripts.sh
./lots_scripts.sh lots_scripts/
time ./nmap -v -d --datadir . --max-parallelism 50 -PN -n --script
lots_scripts scanme.insecure.org | tee OUTPUT
# wait about 4 minutes for completion...
# [...] lots of output

real    3m57.841s
user    1m28.352s
sys     0m0.606s

Note that about one minute and 30 seconds of CPU time was used in the scan.

Now apply the patch, compile, and test again:

# copy patch to $PWD
patch < nsock_minus_gc.patch
make
time ./nmap -v -d --datadir . --max-parallelism 50 -PN -n --script
lots_scripts scanme.insecure.org | tee OUTPUT2
# wait about 4 minutes for completion...
# [...]  lots of output

real    3m41.764s
user    0m2.708s
sys     0m0.400s

Now note that the user time has been reduced to about 3 seconds of
user time. This is a significant decrease!

So now I'll talk a little bit about the patch. We no longer use a
proxy userdata to monitor the sockets a thread has open. Instead we
associate a table of sockets with the thread. This new comment should
describe it adequately:

/* Some constants used for enforcing a limit on the number of open sockets
 * in use by threads. The maximum value between MAX_PARALLELISM and
 * o.maxparallelism is the max # of threads that can have connected sockets
 * (open).
 *
 * THREAD_SOCKETS is a weak keyed table of <Thread, Socket Table> pairs.
 * A socket table is a weak keyed table (socket keys with garbage values) of
 * sockets the Thread has allocated but not necessarily open. You may
 * test for an open socket by checking whether its nsiod field in the
 * socket userdata structure is not NULL.
 *
 * CONNECT_WAITING is a weak keyed table of <Thread, Garbage Value> pairs.
 * The table contains threads waiting to make a socket connection.
 */
#define MAX_PARALLELISM   10
#define THREAD_SOCKETS     1           /* <Thread, Table of Sockets (keys)> */
#define CONNECT_WAITING    2           /* Threads waiting to lock */

The THREAD_SOCKETS table is very similar to the THREAD_PROXY table
before as it has a maximum size (total elements, not just array part)
of the number of socket locks we allow. Here is an example structure:

THREAD_SOCKETS = setmetatable({
  [thread_1] = setmetatable({
    socket_1,
    socket_2,
  }, {__mode = "k"}),
  [thread_2] = setmetatable({
    socket_3,
    socket_4,
    socket_5,
  }, {__mode = "k"}),
  -- ...
  [thread_n] = setmetatable({
    socket_n,
  }, {__mode = "k"}),
}, {__mode = "k"})

Where n in thread_n is the number of threads we allow to have sockets
open. As stated before, this value is the maximum of --max-parallelism
or 10. Observe that we allow sockets and/or threads to be collected.
This is necessary so we do not prevent garbage collection of threads
or sockets no longer used.

During nsock_loop, which is called before all threads in the running
queue are run, we now check the THREAD_SOCKETS table for threads which
no longer have open sockets. If a thread is found to not have any
sockets open, we remove the thread from the THREAD_SOCKETS table and
restore a thread in the waiting queue, allowing it to obtain the free
spot. Here is (short!) example Lua code that implements this:

function nsock_loop (n)
  collectgarbage "stop"; -- do not collect threads during traversal
  for thread, socket_table in pairs(THREAD_SOCKETS) do
    local open = 0; -- number of sockets open
    for socket in pairs(socket_table) do
      if socket_is_open(socket) then open = open + 1 end
    end
    if open == 0 then
      THREAD_SOCKETS[thread] = nil;
      local new_thread = next(CONNECT_WAITING, nil); -- get next waiting thread
      nse_restore(new_thread); -- add it to the running queue
      CONNECT_WAITING[new_thread] = nil; -- remove from connect waiting queue
    end
  end
  collectgarbage "restart";
end

So you'll note we now manually check if a thread has open sockets.
(Also know that nsock_loop does other unrelated tasks as well.) As you
saw in the previous benchmarks, the method of using GC to track socket
allocation almost entirely accounted for the difference in user time.
With this new approach, we should see a large reduction in CPU usage
for Nmap during NSE scans.


Some Technical Notes:

o With this patch comes a change to l_nsock_connect that immediately
tests whether the connection type is valid. The current implementation
will set the socket id (socket->nsiod) before a connection type error
can be raised. This will make it difficult to know that the socket is
not actually open when check for open sockets in nsock_loop.

o We no longer need to "unlock" the socket lock when a connection
fails. Because the socket id will be NULL, we can allow the next call
to nsock_loop to unlock the socket lock for us. There would be no
difference in timeliness.

o If Nmap is compiled without OpenSSL and an ssl connection is being
made, then nsock will immediately return false. Before, we allowed the
nsiod to be set for a connection before checking ssl is allowed. This
naturally prevents nsock_loop from distinguishing the socket as
actually open.

o Threads in the pending queue are moved to the running queue before
threads in the running queue are run. This allows threads just moved
to pending by nsock_loop to be immediately run.


Cheers,

-- 
-Patrick Donnelly

"Let all men know thee, but no man know thee thoroughly: Men freely
ford that see the shallows."

- Benjamin Franklin

Attachment: lots_scripts.sh
Description:

Attachment: nsock_minus_gc.patch
Description:


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: