Nmap Development mailing list archives

Re: Replacing passwords.lst


From: Brandon Enright <bmenrigh () ucsd edu>
Date: Wed, 17 Mar 2010 01:16:29 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 16 Mar 2010 18:58:02 -0600
David Fifield <david () bamsoftware com> wrote:
[...]
I wrote a simple program to sum the counts from several password
files and output the top n passwords. Using the five lists above,
I regenerated our nselib/data/passwords.lst. The program
automatically does bz2 decompression based on filename so keeping
compressed lists isn't inconvenient.

Cool, it's good to handle the bz2 compression transparently.  I
think we can't just sum the lists though without normalizing them
to a degree.  Otherwise rockyou is weighted too strongly.

Ron and I chatted off-list about this a bit.  A simple linear weight
probably isn't the right choice because things that are only
duplicated a few times in phpbb or mypspace would get scaled up too
much.

I don't understand. All of Ron's lists have counts, not just ranks. So
if a myspace password has a count of 1 or 2, it will still have a
count of 1 or 2 in the master list and end up way at the bottom.

Yeah I was referring to normalizing their counts.  More on that below.


To me, each password list is like a sample from a giant population.
That's not totally accurate because different sites have different
password policies, but the size of each sample shouldn't matter,
right?


Well each is a pretty biased sample of a really huge password
population.  If our lists were truly random samples from that
population then no amount of weighting for sample size would be better
than just summing up counts and ordering them.

Since we don't know how biased each list is we should just treat them
equally.  If our goal is to sum the counts up while keeping them equal
we have to normalize those counts.

Put another way, if we had a list with 10 million passwords and 9M of
them were "password" that list would clearly be a very biased sample
from all passwords available out there.  If we wanted to combine that
list with our myspace list, we couldn't let 9M "password" be added to
the count of "password" for the myspace list.  The bias for our 10M
word list would just be too significant in the resulting list.

Since rockyou is so huge it dominates the other lists and I think we
need to weight them on some factor of their sample size so that our
resulting list doesn't just reflect the rockyou list biases.

Any weighting we come up with should be a NOP if the lists are unbiased
random samples.  I think this is pretty easy and natural to do.

I'm pretty sure the code and sample results are going to speak louder
than words here.  I can probably start working on this and testing this
weekend.

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAkugLXMACgkQqaGPzAsl94IlLQCgvHqggnTX8XLLKnqEFCv+wwLI
rxYAnAvjsGj0qYfZx+GBeumCs+2eK9dV
=0yOW
-----END PGP SIGNATURE-----
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: