Nmap Development mailing list archives

Re: [GSOC] Nmap exclude list implementation


From: sefa saygın <sefa_ts61_ () hotmail com>
Date: Thu, 5 Mar 2015 17:21:31 +0200

Please don't sent e-mail !! Please

iPhone'umdan gönderildi

5 Mar 2015 tarihinde 07:22 saatinde, Daniel Miller <bonsaiviking () gmail com> şunları yazdı:

Harshil,

While it is likely that a trie would be a suitable data structure, we need to be able to match IP addresses and 
ranges. The current approach treats an address as a bit vector, which allows this granularity (think about matching 
192.0.2.128/25 for instance, or 172.17.2-7.123). We also need to manage space complexity; in your test example, 
ideally the 2^16 contiguous addresses you entered in the exclude list would collapse into a single node representing 
all addresses with the shared prefix of 172.16.

One final note: please credit the original authors of code you use 
(http://programminggeeks.com/c-code-for-radix-tree-or-trie/); I understand that this was a trial and that you would 
write your own data structure for an actual implementation, but proper code attribution and licensing is a serious 
issue.

Dan

On Wed, Mar 4, 2015 at 6:49 PM, Harshil Lodhi <lodhi.harshil () gmail com> wrote:
Just to convince myself that a simple TRIE would be good enough for the implementation of addrset. I hacked the 
addrset code. I have attached the diff and the modified files. The current pre-prototype implementation works only 
for exclude list in which IPs are mentioned in x.x.x.x format. I have created a TRIE of strings( IP address in form 
of strings) . The code can be compiled sucessfully without any modifications to any makefile. 

I got the time down from 23 seconds to 0.2 seconds = 100x FAST . (This involves no scanning)

Waiting for your comments. 

On Thu, Mar 5, 2015 at 4:45 AM Harshil Lodhi <lodhi.harshil () gmail com> wrote:
Some of my findings while profiling nmap. 

I created a exclude.list which contained all the 172.16.0.0/16 IPs i.e 256*256 IPs . 

I started nmap -T4 --excludefile exclude.list 172.16.-.-
It took around 23 seconds to complete and no targets were found.
I profiled the code using callgrind for 172.16.27.- . You can see the attached output using text editor or 
Kcachegrind (GUI tool) . 
Most of the time( 90% ) is spent in addrset_contains function in nbase_addrset.c file. It is a linear time lookup.

On Thu, Mar 5, 2015 at 3:26 AM Harshil Lodhi <lodhi.harshil () gmail com> wrote:
I am starting to profile nmap for exclude list implementation. I wanted to know does the order in which the 
addresses are stored in the exlude list file ,matter in the lookup. I mean if they are sorted, does it speed up 
the process. Does anyone know after what size of the exclude list, does it start to slow down.

On Wed, Mar 4, 2015 at 3:27 AM Harshil Lodhi <lodhi.harshil () gmail com> wrote:
Hi Jacek, 
Definitely. Performance measurement is a must before optimizing random parts. I always remember Donald Knuth's 
quote  "Premature optimization is the root of all evil". From the experience that I have in programming in C/C++, 
these advanced data structures are useful only when the size of data becomes big. A normal array/ linkedlist 
would be fast for a small sized list but when the size becomes big, the asymptotic nature starts to dominate. I 
first plan to profile the nmap using gprof to see the exact part of the code that is taking time and then can 
decide on the optimizations.
The problem is mentioned in the gsoc page of Nmap (http://nmap.org/soc/) . I discussed its implementation with 
"bonsaiviking" on irc and came to this conclusion. I first plan to profile nmap to see.
Also I would love to work on more optimizations for Nmap apart from this if need exists. 



On Wed, Mar 4, 2015 at 3:08 AM Jacek Wielemborek <d33tah () gmail com> wrote:
W dniu 03.03.2015 o 20:00, Harshil Lodhi pisze:
Hi everyone,

I was going through the GSOC ideas "Performance/Optimization Specialist"
one. In their its mentioned that current implementation of exclude list is
not a good one. It takes order of the size of the list. David particularly
mentioned about implementing the lookup using BDD.


The link for the paper doesn't seem to work for me. I searched for the
topic and read about it. One of the alternatives that we can consider is
Patricia/Radix trees. They seemed to be more widely used than BDDs.
Following is a highly cited research paper for the same.
http://ece.ut.ac.ir/classpages/F83/Advanced%20Computer%20Networks/PAPERS/LOOKUP/routing.pdf

I am very much interested in this position and in the script developer
position. I have been using the proxy module of Nmap for the past 3 years
to find working proxies inside my university campus during nighttime when
the main internet is shut down.

Waiting for your feedback on this.



_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/


Hello,

I like this idea, but just to make sure - how did you pick this
particular problem? I'm asking to make sure you plan to make some
performance measurements before you start optimising random parts of
Nmap functionality.

Cheers,
Jacek

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/
_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/
_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread: