Nmap Development mailing list archives

Re: Review: Angry IP Scanner


From: David Fifield <david () bamsoftware com>
Date: Wed, 20 Aug 2008 12:50:09 -0600

On Wed, Aug 20, 2008 at 05:20:24PM +0000, Brandon Enright wrote:
Hi David.  I think you forgot to CC nmap-dev on this so I'll respond to
the list and keep the whole message intact.  Reply below.

Thanks, you're right.

On Wed, 20 Aug 2008 10:41:14 -0600
David Fifield <david () bamsoftware com> wrote:

On Fri, Jun 06, 2008 at 02:05:52AM +0000, Brandon Enright wrote:
One thing I do like about it is the ability to narrow down the
random IP generation to a given base IP and netmask.  This may
not be easy to implement in Nmap (since a user may or may not
want a reserved IP, and the random IP generation code would have
to be changed to allow for this), and probably not worth it.  For
example, if you wanted random IPs in the range of 192.*.*.*,
should 192.168 be chosen or not?  It's still pretty cool, though.

This *is* nice and I've thought about implementing it for Nmap but
every time I give it serious thought I decide that outputting all
the IPs to a list with -sL and then a random section from that list
is a better way to do it.

I'm not aware of any generic algorithm, method, or technique that
could generate numbers in some arbitrary set of ranges without
duplicates that is both fast and memory efficient.  Creative use of
a Bloom Filter (http://en.wikipedia.org/wiki/Bloom_filter) would
work but that starts to get pretty damn time-consuming to do
right...

While researching something else I stumbled upon an algorithm for
this. It's Algorithm S in section 3.4.2 of the Art of Computer
Programming (page 142 in the third edition). To select n addresses
out of a pool of N, you simply iterate through the addresses and
select each one with probability

    (n - m) / (N - t),

where m is the number of addresses selected so far and t is the number
of records iterated over (both initialized to 0).

This requires knowing the size of the address pool in advance, which
is no problem for us. Clearly no collisions are possible (unless the
user lists an address more than once). It requires only a constant
amount of memory. A limitation is that while it gives you a random
selection, the addresses will still be in order relative to each
other. This can be mitigated with --randomize-hosts.

If we want to restrict random addresses to be in a list of addresses
this is the way to go. However just having a good algorithm doesn't
mean the feature should be implemented. Does someone have a
compelling use case? Otherwise I'll just let this post stand in case
the topic comes up again.

David Fifield

Thanks for the followup on this.  I love Knuth but the above algorithm
is somewhat simple-minded.  It requires linear memory which is what we
had with some of our other ideas.  I think that we decided that we'd
need to do better than linear memory if we were to actually implement
this feature.

No, it requires constant memory. All you do is call get_next_host the
way Nmap does now and accept or reject each host. The algorithm is O(N)
in time but O(1) in memory. You don't need enumerate all the N possible
addresses, or even all n selected addresses. You do have to know the
number N in advance, but you don't need to allocate N memory locations.

Knuth says: "Usually N is very large, so that it is impossible to
contain all the data in memory at once; and the individual records
themselves are often very large, so that we can't even hold n records in
memory. Therefore we seek an efficient procedure for selecting n records
by deciding either to accept or to reject each record as it comes along,
writing the accepted records onto an output file."

So the algorithm is designed for cases like this. For us, "write the
accepted records onto an output file" would be "queue the accepted
addresses into the next host group".

I did come up with a way to do this but the "algorithm" was somewhat
complicated.  I didn't send it to the list because I'm notoriously
poor and explaining the random crap that pops into my head.

I'll give it a shot anyways:

Our goal here is to be able to use -iR but select IPs out of our own
range.  For example, an organization might want to pick 1000 IPs out of
"10.0.0.0/16".  Someone else might want to try to sample 10,000 router
addresses on the Internet in the range "*.*.*.1,65,129,193".

Knuth suggests that we list out all the possible IPs and then just
select them with the probability described above.  There is a much more
memory-efficient way to do it though by exploiting a linear recurrence
(LCG).

Before I get to that though, one must be able to map an index number
into a the desired range.  That is, pretend that you made an array of
this range: "*.*.*.1,65,129,193".

Then index 0 in that array would be "0.0.0.1".  Index 7 would be
"0.0.1.193".  In general, we can easily compute the size of the
imaginary array and we can easily compute the value of a single element
in that array given an index.  We can do both of these things without
pre-computing the entire array.

So what this did was map elements in our range into integer starting at
0 and going up to the size of the range.

What we need to do now is select integers out of that range without
duplicates.  That turns out to be easy using an LCG
(http://en.wikipedia.org/wiki/Linear_congruential_generator).  The
trouble is that it could be hard to pick constants that will generate a
maximal-period LCG for an arbitrarily sized range.

That's a pretty groovy solution. It's good in that the hosts come out in
pseudorandom order. If there are many or complicated expressions then
each iteration of the algorithm will be slower as it finds the address
corresponding to an index. But you only have to iterate at most 2n
times, not N times as with Algorithm S. My gut is that S is faster if n
is big (close to N) and your algorithm is faster if n is small (far from
N). I think there are optimizations that can be applied to Algorithm S
such that it skips over chunks of records at a time when the probability
is low.

David Fifield

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: