Nmap Development mailing list archives

RE: [BULK] Re: Replacing passwords.lst


From: "Norris Carden" <ncarden () ascendfcu org>
Date: Wed, 17 Mar 2010 09:22:23 -0500

Why not weight each password as a percentage of each list? If "password"
is (just pulling numbers out of a hat) 7% of the RockYou list and 5% of
another list, then an average of 6% across the two lists should be
handling things pretty evenly. Of course dump the obviously biased
"rockyou" as a password, but not necessarily from the count total for
figuring the percentage.

Then again I never took sadistics as either an undergrad or grad
student.

Norris Carden, CISSP, CISA

On Wed, Mar 17, 2010 at 01:16:29AM +0000, Brandon Enright wrote:

Well each is a pretty biased sample of a really huge password
population.  If our lists were truly random samples from that
population then no amount of weighting for sample size would be better
than just summing up counts and ordering them.

Since we don't know how biased each list is we should just treat them
equally.  If our goal is to sum the counts up while keeping them equal
we have to normalize those counts.

I agree with you that that each list is a bit biased and that RockYou
is so huge that it dominates the other lists.  But as you note, "we
don't know how biased each list is", so I think treating them exactly
equally is completely arbitrary.  And it introduces its own biases.
It would mean that the seven people from the religious site
faithwriters who chose "godisgood" as their password could count as
much as passwords that hundreds or thousands of people chose on
Rockyou.  After all, Rockyou has almost 2,000 times as many passwords
as Faithwriters, so I think we'd be terribly discounting that huge and
valuable sample size if we treated it the same way as the cheesy
little lists.

I agree that we could probably make the lists a bit better now with
some weighting of the files.  But I'm definitely skeptical of the
approach, as it seems quite subjective.  Comparing with Brandon's
password list is a neat idea (and I like it), but it also has the risk
of just finding a solution which is closest to the biases in that
file.  After all, if that file was perfect we'd use it directly.
Another thing which might help, but I'm also a bit skeptical of, is
assigning counts to the files which don't have them.  For example, we
could look at the distribution of counts in the first 3,000 entries of
Rockyou or one of the others, and then assign counts to files like
john.txt in those proportions.  Of course that would also require us
to basically subjectively decide how much to weigh the john.txt file,
so it is even more problematic than the issue of weighting the
individual counted files.

I don't deny that a little bit of weighting to reduce the Rockyou
dominance would probably help, but it would be very subjective since
we don't really have a good way to decide how much weight to give each
list.  I do think any manipulations we do should try to be as simple
as possible.  And we may decide to (as we do now) just sum up the
counts and order them.  That also makes it very easy to add new lists,
which I hope we will be doing.  If RockYou's 14 million passwords is
overly dominant, let's fix that by finding some more password files.
Come on guys!  Get to hacking!  I'll send a free signed copy of Nmap
Network Scanning to whoever gets me the Facebook or Twitter password
list first :).  OK, that's a bad joke, but I do think we'll be able to
collect more password lists over time.  I even have a lead on a couple
now.  And I think that would be the best way to remove the biases.

BTW, we currently do a little bit of subjective massaging.  David's
script automatically takes out a handful of terribly biased results
such as the "rockyou" password which is found more than 20,000 times
in the rockyou DB.

Cheers,
-F
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/



_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: