Nmap Development mailing list archives

Re: [PATCH] Add the ability to generate quality random IPs without any duplicates


From: Brandon Enright <bmenrigh () ucsd edu>
Date: Fri, 28 Aug 2009 07:03:21 +0000

I have finished my testing.  Dieharder is a much more difficult to use
(properly) than I expected.  Comments inline and below.

On Mon, 24 Aug 2009 06:18:59 +0000 or thereabouts Brandon Enright
<bmenrigh () ucsd edu> wrote:

Sorry for the top-post.  Still working off of my phone.

Regarding period: I designed it for 2^32 period with no holes but  
you're right, I should test it.  There is a slight chance the period  
is 2^32-1 or worse, 2^(32-1).

I have confirmed the period to be 2^32 with no holes.


I got to thinking about the tweak I designed and I think it provides  
about 6 dimensions of independance.

Testing suggests it is more than 6.

If I can find something that
lets me plot 3d points each with unique RGB color a pattern would
form.

Dieharder is good enough.

If so maybe I'll add s-boxes in between the two rounds.

This was a failure (see below).


I'll test the period and then run it through Diehard tests.

Brandon

Sent from my phone.  If you would like a digital signature for this  
email let me know and I will sign it later.


On Aug 24, 2009, at 5:48, Fyodor <fyodor () insecure org> wrote:

On Sat, Aug 22, 2009 at 01:31:23AM +0000, Brandon Enright wrote:

Hey all, it's Friday and this week has burnt me out on "work  
things" so
I decided to tackle something fun.

Neat!

Q: Why not just replace -iR with this new -iU option instead of  
having
both?

A: Sometimes I want *really* high quality IPs.  Sometimes I don't  
want
to be bothered with duplicates. This patch is small, I think
there is a
place for both.

I think supporting both for your initial patch is a great idea, as
it makes it easy to test the different strategies and compare.  But
Nmap itself only needs one random IP generation mechanism (whatever
one we think is best).

Okay, testing results below show that this patch should be the default.

If someone wants a different approach,
nothing stops them from generating random IPs however they like and
feeding them to Nmap with -iL.

Agreed.


So the next question is: which mechanism is best?  Avoidance of
duplicates is a useful property.  I often pass my IP lists to sort
and uniq for this very reason.  The number of duplicates aren't
material unless you are scanning a huge number of IPs (your example
of 10 million IPs shows about 0.15% duplicates).  But they are still
annoying.

Yeah I'm annoyed when I want to scan exactly a million hosts but get
slightly fewer.  I've also been getting complaints here and there from
some of the Internet scanning I do and I want to be able to tell the
complainer that for certain, they were only scanned once.


So I'm in favor of a no duplicate approach for Nmap *if* there
aren't any significant downsides.

I haven't found any.  Results below.

Does anyone see any problems in
using this approach?  The code is small and doesn't appear
computationally intensive.

It's actually slightly faster than the current RC4 method.

I haven't evaluated the quality of
randomness, though Brandon's graphs look promising.  Perfect
randomness isn't critical here anyway.

Really good randomness is a nice thing to have.  The world wouldn't end
if Nmap didn't have it though.

Brandon: have you tried
generating billions of these to verify that the period is as large
as you expect and that the full group doesn't have holes in it
(besides the expected ones from ip_is_reserved())?

I have now.

Every IP can be generated exactly once before any repeats.


Okay, now on to Dieharder, my testing, my tweaks, and why this took so
darn long to test.

First, unlike the Diehard battery of tests, Dieharder consumes a
ridiculous amount of random numbers.  Some of the tests use on the
order of 50 GB of random input.  I started by generating 100M IPs and
turning them into a random bit stream but Dieharder found so many
problems the test results were useless.  I didn't realize at the time
that the problem was with the small input and not the generator so I
tried strengthening it.  It wasn't until I tested RC4 with 100M IPs
that I realized I needed to provide more input.

Also, ip_is_reserved() can't be run on the IPs before being input to
Dieharder; It causes all the Dieharder tests to fail, even for RC4.

So these tests are on a version of Nmap that doesn't call
ip_is_reserved().

Because of some initial failures and misunderstandings on my part, I
tried strengthening the generator.  This actually made it weaker.  It
did give me more to test with though.

Finally, Dieharder is all about statistics.  Even good generators fail
a test once in a while.  If you flip a coin and it lands heads 10 times
in a row, either the coin is broken or you just hit the 1 in 1024
chance.  Dieharder assumes the coin is broken.

There are 4 possible outcomes for a test:

PASS: This means the generator did well on this test
WEAK: This means the generator didn't do so great but it could just be
bad luck.
POOR: The generator did really badly.  There is probably a pattern.
FAIL: The generator failed badly.  There is definitely a pattern.

=== RC4 ===
PASS: 102
WEAK: 5
POOR: 0
FAIL: 0

=== The LCG without any tweak ===
PASS: 48
WEAK: 4
POOR: 3
FAIL: 52

=== The LCG with my original tweak ===
PASS: 101
WEAK: 2
POOR: 1
FAIL: 3

=== The LCG with my first attempt at using S-boxes in my tweak ===
PASS: 83
WEAK: 5
POOR: 0
FAIL: 19

=== The LCG with my second attempt at using S-boxes ===
PASS: 92
POOR: 1
WEAK: 5
FAIL: 9

=== The LCG with my original tweak plus a third round (instead of 2) ===
PASS: 103
WEAK: 4
POOR: 0
FAIL: 0


I think this brings up a few questions.

1) Why did RC4 score WEAK on 5 tests?  I think this is bad luck.  I
think it would pass those tests and score weak on others if tested
again.

2) Why did the original patch fail a few tests?  The tests that did
poorly were the bit pair count tests (the output was too uniform), and
the lagged sums tests (there was some linear correlation in the
dimensions tested).  It also failed a couple different minimum distance
tests in 2 and 3 dimensions.  I think the output was too uniform.

3) Why did your first attempt at adding S-boxes to the tweak make it
so much worse? This took me a day to figure out but I didn't undo the
S-box transform properly.  I tried to fix this in a second s-box
attempt.

4) Why did the second attempt at using s-boxes also do poorly?  My best
explanation is that Dieharder was finding some pattern in my s-boxes
that would have gone away with more rounds.

5) Why when you got rid of s-boxes and just added an additional round
to the original tweak did it do better than RC4?  Adding the additional
round clearly helped improve the tweak.  It only beat RC4 by chance
though.  If the tests were to be run again RC4 would beat it once in a
while too.

6) Why did 3-round tweak version do so well when no-duplicates violates
obvious randomness properties?  First, Dieharder rarely treats the
output as 32 bit numbers and if it treats them as less there will be
duplicates.  Second, Dieharder implements tests that don't use a lot of
memory.  Figuring out that there aren't any duplicates can consume a
lot of memory.  Third, I think this is a weakness of Dieharder and I
have some ideas for how to fix it.  I plan on working on formalizing my
ideas and submitting them so that Dieharder can be improved.


So I think the answer is that we should switch -iR to use my tweak with
3 rounds.  I'm not attaching a patch here because my testing has made a
real mess of a bunch of code and it will take me a while to clean it
up.  If there are no objections I'll check something in tomorrow.  If
people want to do independent testing, let me know and I'll write up a
quick Nmap+Dieharder how-to.  Running Dieharder is anything but quick
though.

I have posted the Dieharder results for my suggested patch here:

http://noh.ucsd.edu/~bmenrigh/dielcgfix3r.txt

Brandon


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: