Nmap Development mailing list archives

Re: [BUG] NSE/Nsock filehandle exhaustion


From: Brandon Enright <bmenrigh () ucsd edu>
Date: Thu, 30 Aug 2007 20:42:49 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stoiko,

Thanks for your reply.   Your patch worked.  So did Majek's.  I'll follow
up with Majek in a few minutes.  I have included information and comments
below in case you are still interested.

On Thu, 30 Aug 2007 20:35:10 +0200
Stoiko Ivanov <stoiko () xover htu tuwien ac at> wrote:

For both cases a backtrack would be a great help for chasing down the bug
I personally use gdb for debugging, so I can only provide you with
instructions for gdb:


A full stack trace dumps too much information for nmap_main.  Here is a
normal stack trace and an abbreviated full:

(gdb) bt
#0  0x00002ac196fc75e6 in raise () from /lib/libc.so.6
#1  0x00002ac196fc86e0 in abort () from /lib/libc.so.6
#2  0x00002ac196fc129a in __assert_fail () from /lib/libc.so.6
#3  0x000000000047618c in gh_list_remove_elem (list=<value optimized out>,
    elem=<value optimized out>) at gh_list.c:346
#4  0x00000000004734fd in nsock_loop (nsp=0xbeb780, msec_timeout=50)
    at nsock_core.c:913
#5  0x0000000000463d9a in process_mainloop (l=0xbfb620) at nse_main.cc:250
#6  0x0000000000464665 in script_scan (targets=@0x7fff1466fcb0)
    at nse_main.cc:213
#7  0x000000000041a63f in nmap_main (argc=21, argv=0x7fff14672f68)
    at nmap.cc:1755
#8  0x00000000004162bd in main (argc=21, argv=0x7fff14672f68) at main.cc:227


And the cut down full:

(gdb) bt full
#0  0x00002ac196fc75e6 in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x00002ac196fc86e0 in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x00002ac196fc129a in __assert_fail () from /lib/libc.so.6
No symbol table info available.
#3  0x000000000047618c in gh_list_remove_elem (list=<value optimized out>,
    elem=<value optimized out>) at gh_list.c:346
        __PRETTY_FUNCTION__ = "gh_list_remove_elem"
#4  0x00000000004734fd in nsock_loop (nsp=0xbeb780, msec_timeout=50)
    at nsock_core.c:913
        ms = <value optimized out>
        msecs_left = 50
        loopnum = 0
        quitstatus = <value optimized out>
#5  0x0000000000463d9a in process_mainloop (l=0xbfb620) at nse_main.cc:250
        state = <value optimized out>
        unfinished = 2048
        ssr = {id = 0x7fff1466d570 "@µð", output = 0xf4bcf0 "\220µð"}
        progress = {begin = {tv_sec = 1188501298, tv_usec = 625161},
  last_print_test = {tv_sec = 1188501298, tv_usec = 625161}, last_print = {
    tv_sec = 1188501328, tv_usec = 674693},
  scantypestr = 0xef9290 "SCRIPT ENGINE", last_est = {tv_sec = 1186353815,
    tv_usec = -22307}}
        total = 2048
        done = <value optimized out>
#6  0x0000000000464665 in script_scan (targets=@0x7fff1466fcb0)
    at nse_main.cc:213
        key = {static npos = 18446744073709551615,
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>>
= {<No data fields>}, <No data fields>}, _M_p = 0xf0aa78 "a.b.55.109"}}
status = 0 torun_threads =
{<std::_List_base<thread_record,std::allocator<thread_record> >> =
{ _M_impl = {<std::allocator<std::_List_node<thread_record> >> =
{<__gnu_cxx::new_allocator<std::_List_node<thread_record> >> = {<No data
fields>}, <No data fields>}, _M_node = {_M_next = 0xb52c40, _M_prev =
fields>0xf0b500}}}, <No data fields>} l = (lua_State *) 0xbfb620


Both of these were generated with:

$ sudo ./nmap --datadir . -sC --script ./scripts/ripeQuery.nse -v -d -n -P
A135,139,445,3389 -p 3389 -T5 --min-hostgroup 2048 -oA nse_sock_crash
a.b.0.0/16


When I run with more debugging, this is the output before the crash.
Run with -d2 -vv --script-trace:

SCRIPT ENGINE: TCP a.b.1.162:57889 > 193.0.0.135:43 | CONNECT
NSOCK (155.9340s) msevent_delete (IOD #1079) (EID #8632)
NSOCK (155.9380s) msevent_new (IOD #1080) (EID #8640)
NSOCK (155.9380s) TCP connection requested to 193.0.0.135:43 (IOD #1080)
EID 8640 NSOCK (155.9390s) nsock_loop() started (timeout=50ms). 1 events
pending NSOCK (155.9390s) wait_for_events
NSOCK (155.9390s) PCAP read_on_nonselect
NSOCK (155.9390s) PCAP END read_on_nonselect
NSOCK (155.9390s) Callback: CONNECT SUCCESS for EID 8640 [193.0.0.135:43]
SCRIPT ENGINE: TCP a.b.1.162:57890 > 193.0.0.135:43 | CONNECT
NSOCK (155.9390s) msevent_delete (IOD #1080) (EID #8640)
nmap: gh_list.c:346: gh_list_remove_elem: Assertion `list->count != 0 ||
(list->first == ((void *)0) && list->last == ((void *)0))' failed. Aborted
(core dumped)


Maybe a solution to this problem would be to put the scripts which
already got their network-event done in the beginning of the list.
Through this scripts, which already started running would be prefered over
those which had no chance to run at all, and would thus finish execution
(and closing their sockets) sooner.
I've tried this and it seems to work better (although I couldn't reproduce
the assertion failure from your second try) - I'll commit the patch in a
second.

This patch fixes the problem.  It is hard to get an exact feel for how many
concurrent NSOCK sockets are being used but watching the total on the
system with:

$ watch -n 1 'sudo sysctl -q fs.file-nr'

suggests that with your patch, up to about 48 NSOCK handles are open at
once.  It used to be that as soon as the NSE loop started, about 150 new
sockets would get opened a second until the assert() failed and crashed.
Now when the NSE loop kicks in, it fluctuates between 20 and 50 total.

Thanks for looking into this and providing a fix.  I'll follow up with
Majek in a minute.

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG1yvJqaGPzAsl94IRAsvOAKCFOFH4RvsHVBhZMwPWdunPLiX45ACePfC1
jpAaOb+6+A2dVqZ5zinRCr4=
=X3h6
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: