Nmap Development mailing list archives
Re: Review: Angry IP Scanner
From: Brandon Enright <bmenrigh () ucsd edu>
Date: Wed, 20 Aug 2008 17:20:24 +0000
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi David. I think you forgot to CC nmap-dev on this so I'll respond to the list and keep the whole message intact. Reply below. On Wed, 20 Aug 2008 10:41:14 -0600 David Fifield <david () bamsoftware com> wrote:
On Fri, Jun 06, 2008 at 02:05:52AM +0000, Brandon Enright wrote:One thing I do like about it is the ability to narrow down the random IP generation to a given base IP and netmask. This may not be easy to implement in Nmap (since a user may or may not want a reserved IP, and the random IP generation code would have to be changed to allow for this), and probably not worth it. For example, if you wanted random IPs in the range of 192.*.*.*, should 192.168 be chosen or not? It's still pretty cool, though.This *is* nice and I've thought about implementing it for Nmap but every time I give it serious thought I decide that outputting all the IPs to a list with -sL and then a random section from that list is a better way to do it. I'm not aware of any generic algorithm, method, or technique that could generate numbers in some arbitrary set of ranges without duplicates that is both fast and memory efficient. Creative use of a Bloom Filter (http://en.wikipedia.org/wiki/Bloom_filter) would work but that starts to get pretty damn time-consuming to do right...While researching something else I stumbled upon an algorithm for this. It's Algorithm S in section 3.4.2 of the Art of Computer Programming (page 142 in the third edition). To select n addresses out of a pool of N, you simply iterate through the addresses and select each one with probability (n - m) / (N - t), where m is the number of addresses selected so far and t is the number of records iterated over (both initialized to 0). This requires knowing the size of the address pool in advance, which is no problem for us. Clearly no collisions are possible (unless the user lists an address more than once). It requires only a constant amount of memory. A limitation is that while it gives you a random selection, the addresses will still be in order relative to each other. This can be mitigated with --randomize-hosts. If we want to restrict random addresses to be in a list of addresses this is the way to go. However just having a good algorithm doesn't mean the feature should be implemented. Does someone have a compelling use case? Otherwise I'll just let this post stand in case the topic comes up again. David Fifield
Thanks for the followup on this. I love Knuth but the above algorithm is somewhat simple-minded. It requires linear memory which is what we had with some of our other ideas. I think that we decided that we'd need to do better than linear memory if we were to actually implement this feature. I did come up with a way to do this but the "algorithm" was somewhat complicated. I didn't send it to the list because I'm notoriously poor and explaining the random crap that pops into my head. I'll give it a shot anyways: Our goal here is to be able to use -iR but select IPs out of our own range. For example, an organization might want to pick 1000 IPs out of "10.0.0.0/16". Someone else might want to try to sample 10,000 router addresses on the Internet in the range "*.*.*.1,65,129,193". Knuth suggests that we list out all the possible IPs and then just select them with the probability described above. There is a much more memory-efficient way to do it though by exploiting a linear recurrence (LCG). Before I get to that though, one must be able to map an index number into a the desired range. That is, pretend that you made an array of this range: "*.*.*.1,65,129,193". Then index 0 in that array would be "0.0.0.1". Index 7 would be "0.0.1.193". In general, we can easily compute the size of the imaginary array and we can easily compute the value of a single element in that array given an index. We can do both of these things without pre-computing the entire array. So what this did was map elements in our range into integer starting at 0 and going up to the size of the range. What we need to do now is select integers out of that range without duplicates. That turns out to be easy using an LCG (http://en.wikipedia.org/wiki/Linear_congruential_generator). The trouble is that it could be hard to pick constants that will generate a maximal-period LCG for an arbitrarily sized range. We can get around that too though. To do so, pre-compute 32 sets of constants, each with a power-of-two period. That is, the first pair would have period 2, the 3rd pair would have period 8, the 10th 1024 and so on. So if the range of indexes you want to select from is 3000, you'd pick the LCG constants with period 4096. Then, seed the LCG randomly and pull out one index at a time. If you get an index greater than your range (say, 3200), just discard that index and move on. Because we computer ^2 sized LCG periods this is *at worst* 50% efficient (and on average 75%) which is still linear time. It also uses constant memory which is a huge win over linear memory. Like you said though, just because we have an algorithm or two that work doesn't mean we should actually implement this. Brandon -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkisUl4ACgkQqaGPzAsl94JhDwCeJuiWUBLXyEm6+vCVc3EZXpR4 tnwAn34NqalK9eWb9suZnYEyK9vcQ2bu =wxCo -----END PGP SIGNATURE----- _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- Re: Review: Angry IP Scanner Brandon Enright (Aug 20)
- Re: Review: Angry IP Scanner David Fifield (Aug 20)