Nmap Development mailing list archives

SAX versus DOM: The umit nmap xml parsing benchmark


From: "Adriano Monteiro" <py.adriano () gmail com>
Date: Wed, 7 Jun 2006 09:13:45 -0300

Hi folks,

Yesterday I finished with umit's new parser. Umit has been using DOM
to parse the nmap xml output since the begining. But, as most of you
might know already, DOM may not work well with large xml files, as it
loads the entire file in memory to manipulate it. As Umit is intended
to serve network administrators that scan loads of hosts frequently,
this parsing must be fast. The answer to this problem is (I hope ;-)
SAX parsing. SAX doesn't loads the entire XML file in memory to
manipulate it. Instead, it goes reading the tags and calling events to
manipulate them. The difference is shown in the following benchmark.

I tested a nmap xml output file with 4000 hosts.
The nmap options used for this scan: "-A -sV -v -v -v -d -d -p80,22"
The xml file size: 5.0M

The machine (from /proc/cpuinfo):

vendor_id       : GenuineIntel
model name      : Mobile Intel(R) Celeron(R) CPU 1.80GHz
cpu MHz         : 1794.364
cache size      : 256 KB
bogomips        : 3595.62

Memory (from "free -m"):

             total       used       free     shared    buffers     cached
Mem:           503        227        275          0          3         77


The benchmark:
I used the python's "timeit" module to measure the execution time.
Each parsing was tested only once. So the time shown below is what it
took to execute once the parsing of the given file with each parsing
method.

Result:
SAX: 10.8011291027 segundos
DOM: 61.6646518707 segundos

A good difference, isn't it?
Feel free to make any commentary, suggestion, question, etc. I'm
comminting this version to the repository right now, and by monday,
there will be available (hopefully) a testing version of UMIT with
this new parser and some changes on nmap output display.


Cheeeeeers!

-- 

Adriano Monteiro Marques
http://www.globalred.com.br
http://umit.sourceforge.net
py.adriano () gmail com

"Free software is a matter of liberty not price."
(PYTHON powered)


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev


Current thread: