Nmap Development mailing list archives
[NSE] Patch to NSE Nsock library binding to eliminate dependency on GC for freeing of socket locks
From: Patrick Donnelly <batrick () batbytes com>
Date: Mon, 8 Jun 2009 05:00:48 -0600
NSE Nsock library binding allocates --max-parallelism or 10 (whichever is higher) socket "locks" to threads. A thread that obtains a socket lock is able to connect as many sockets as it needs. This patch attempts to solve a CPU hungry implementation that facilitates this. A little background: currently NSE uses the Lua Garbage Collector to track the amount of connected sockets that have been allocated. We use the garbage collector with weak tables to monitor the number of sockets a thread currently has open. Specifically, two tables associate sockets and threads with a proxy userdata. The comment documenting these tables should prove enlightening: /* Some constants used for enforcing a limit on the number of open sockets * in use by threads. The maximum value between MAX_PARALLELISM and * o.maxparallelism is the max # of threads that can have connected sockets * (open). THREAD_PROXY, SOCKET_PROXY, and CONNECT_WAITING are tables in the * nsock C functions' environment, at LUA_ENVIRONINDEX, that hold sockets and * threads used to enforce this. THREAD_PROXY has <Thread, Userdata> pairs * that associate a thread to a proxy userdata. This table has weak keys and * values so threads and the proxy itself can be collected. SOCKET_PROXY * has <Socket, Userdata> pairs that associate a socket to a proxy userdata. * SOCKET_PROXY has weak keys (to allow the collection of sockets) and strong * values, so the proxies are not collected when an associated socket is open. * * All the sockets used by a thread have the same Proxy Userdata. When all * sockets in use by a thread are closed or collected, the entry in the * THREAD_PROXY table is cleared, freeing up a slot for another thread * to make connections. When a slot is freed, proxy_gc is called, via the * userdata's __gc metamethod, which will add a thread in WAITING to running. */ #define MAX_PARALLELISM 10 #define THREAD_PROXY 1 /* <Thread, Userdata> */ #define SOCKET_PROXY 2 /* <Socket, Userdata> */ #define CONNECT_WAITING 3 /* Threads waiting to lock */ #define PROXY_META 4 /* Proxy userdata's metatable */ Perhaps the only point of confusion is that the THREAD_PROXY table size (the number of elements in the table, not the size of the array) indicates the number of socket locks allocated. This approach has a significant problem in that it requires NSE to constantly make full garbage collections to release these socket locks in a timely manner. The association between the thread and its proxy userdata in THREAD_PROXY will persist for at least one garbage collection cycle even after all socket proxy associations in SOCKET_PROXY have been deleted. This results in exceptionally large amounts of CPU time being consumed on behalf of garbage collection. I have constructed a small bash script which creates one thousand copies of the banner.nse script in a specified directory. You may run this code to observe the problem: svn co --username guest --password "" svn://svn.insecure.org/nmap@13614 nse-nsock cd nse-nsock ./configure && make mkdir lots_scripts # copy lots_scripts.sh to $PWD chmod u+x lots_scripts.sh ./lots_scripts.sh lots_scripts/ time ./nmap -v -d --datadir . --max-parallelism 50 -PN -n --script lots_scripts scanme.insecure.org | tee OUTPUT # wait about 4 minutes for completion... # [...] lots of output real 3m57.841s user 1m28.352s sys 0m0.606s Note that about one minute and 30 seconds of CPU time was used in the scan. Now apply the patch, compile, and test again: # copy patch to $PWD patch < nsock_minus_gc.patch make time ./nmap -v -d --datadir . --max-parallelism 50 -PN -n --script lots_scripts scanme.insecure.org | tee OUTPUT2 # wait about 4 minutes for completion... # [...] lots of output real 3m41.764s user 0m2.708s sys 0m0.400s Now note that the user time has been reduced to about 3 seconds of user time. This is a significant decrease! So now I'll talk a little bit about the patch. We no longer use a proxy userdata to monitor the sockets a thread has open. Instead we associate a table of sockets with the thread. This new comment should describe it adequately: /* Some constants used for enforcing a limit on the number of open sockets * in use by threads. The maximum value between MAX_PARALLELISM and * o.maxparallelism is the max # of threads that can have connected sockets * (open). * * THREAD_SOCKETS is a weak keyed table of <Thread, Socket Table> pairs. * A socket table is a weak keyed table (socket keys with garbage values) of * sockets the Thread has allocated but not necessarily open. You may * test for an open socket by checking whether its nsiod field in the * socket userdata structure is not NULL. * * CONNECT_WAITING is a weak keyed table of <Thread, Garbage Value> pairs. * The table contains threads waiting to make a socket connection. */ #define MAX_PARALLELISM 10 #define THREAD_SOCKETS 1 /* <Thread, Table of Sockets (keys)> */ #define CONNECT_WAITING 2 /* Threads waiting to lock */ The THREAD_SOCKETS table is very similar to the THREAD_PROXY table before as it has a maximum size (total elements, not just array part) of the number of socket locks we allow. Here is an example structure: THREAD_SOCKETS = setmetatable({ [thread_1] = setmetatable({ socket_1, socket_2, }, {__mode = "k"}), [thread_2] = setmetatable({ socket_3, socket_4, socket_5, }, {__mode = "k"}), -- ... [thread_n] = setmetatable({ socket_n, }, {__mode = "k"}), }, {__mode = "k"}) Where n in thread_n is the number of threads we allow to have sockets open. As stated before, this value is the maximum of --max-parallelism or 10. Observe that we allow sockets and/or threads to be collected. This is necessary so we do not prevent garbage collection of threads or sockets no longer used. During nsock_loop, which is called before all threads in the running queue are run, we now check the THREAD_SOCKETS table for threads which no longer have open sockets. If a thread is found to not have any sockets open, we remove the thread from the THREAD_SOCKETS table and restore a thread in the waiting queue, allowing it to obtain the free spot. Here is (short!) example Lua code that implements this: function nsock_loop (n) collectgarbage "stop"; -- do not collect threads during traversal for thread, socket_table in pairs(THREAD_SOCKETS) do local open = 0; -- number of sockets open for socket in pairs(socket_table) do if socket_is_open(socket) then open = open + 1 end end if open == 0 then THREAD_SOCKETS[thread] = nil; local new_thread = next(CONNECT_WAITING, nil); -- get next waiting thread nse_restore(new_thread); -- add it to the running queue CONNECT_WAITING[new_thread] = nil; -- remove from connect waiting queue end end collectgarbage "restart"; end So you'll note we now manually check if a thread has open sockets. (Also know that nsock_loop does other unrelated tasks as well.) As you saw in the previous benchmarks, the method of using GC to track socket allocation almost entirely accounted for the difference in user time. With this new approach, we should see a large reduction in CPU usage for Nmap during NSE scans. Some Technical Notes: o With this patch comes a change to l_nsock_connect that immediately tests whether the connection type is valid. The current implementation will set the socket id (socket->nsiod) before a connection type error can be raised. This will make it difficult to know that the socket is not actually open when check for open sockets in nsock_loop. o We no longer need to "unlock" the socket lock when a connection fails. Because the socket id will be NULL, we can allow the next call to nsock_loop to unlock the socket lock for us. There would be no difference in timeliness. o If Nmap is compiled without OpenSSL and an ssl connection is being made, then nsock will immediately return false. Before, we allowed the nsiod to be set for a connection before checking ssl is allowed. This naturally prevents nsock_loop from distinguishing the socket as actually open. o Threads in the pending queue are moved to the running queue before threads in the running queue are run. This allows threads just moved to pending by nsock_loop to be immediately run. Cheers, -- -Patrick Donnelly "Let all men know thee, but no man know thee thoroughly: Men freely ford that see the shallows." - Benjamin Franklin
Attachment:
lots_scripts.sh
Description:
Attachment:
nsock_minus_gc.patch
Description:
_______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- [NSE] Patch to NSE Nsock library binding to eliminate dependency on GC for freeing of socket locks Patrick Donnelly (Jun 08)
- Re: [NSE] Patch to NSE Nsock library binding to eliminate dependency on GC for freeing of socket locks Patrick Donnelly (Jun 29)