
Show HN: I packed 161,000 most common passwords into a 1MB file - thaumaturgy
https://github.com/robsheldon/bad-passwords-index
======
thaumaturgy
A while back a dump of 10 million usernames and passwords was released and got
some comments on HN
([https://news.ycombinator.com/item?id=9024751](https://news.ycombinator.com/item?id=9024751)).

I thought it would be neat to use that dump to create a file that services
could use to improve their users' password security. This turned into a fun
side project that I chipped away at ten minutes at a time here and there.

Just do a case-insensitive string search against the file to filter out common
passwords during user signup.

~~~
minimaxir
Unfortunately, this is made mostly redundant by any sane modern password
rules.

~~~
Nadya
Password rules? Do you mind explaining? Specifically which ones you consider
to be "sane".

~~~
minimaxir
8 character minimum; atleast 1 lowercase, atleast 1 uppercase, atleast 1
number, atleast 1 symbol.

The only systems that don't/can't follow that are legacy systems which likely
have other issues and where the list would not be that much help.

~~~
thaumaturgy
I think I make a fair point against those password rules in
[http://www.robsheldon.com/index-of-bad-
passwords/](http://www.robsheldon.com/index-of-bad-passwords/) (linked from
the Github repo's readme file) -- what users typically do is take their really
weak password and then add the minimum amount of effort necessary to meet the
password rule requirements. Meanwhile, there are lots of perfectly good
passwords that don't meet those requirements.

edit: To further make the case, I just ran your password requirements against
the input file. The most popular password that meets your requirements is some
variation of "p@ssw0rd" \-- usually "P@ssw0rd", but "P@ssw0rd123",
"P@ssw0rd$", and so on are all popular too. Direct variations of "p@ssw0rd"
(where the user only changed the capitalization of one letter) occur 47 times
in the input file. For comparison, the index file contains every password that
occurs at least 5 times...

Other popular combinations that meet your requirements are l58jkdjp!(46
times), !qaz2wsx (39), 1qaz!qaz (37), 1qaz@wsx (18), !qazxsw2 (16), zaq!2wsx
(15), nick1234-rem936 (12), xxpa33bq.adna (11), !qaz1qaz (11), g00dpa$$w0rd
(11), jhon@ta2011 (10), nloq_010101 (9), 1qazzaq! (9), pa$$w0rd (8)...

Meanwhile a nice long passphrase, which doesn't occur at all in the input
file, is not allowed according to your requirements unless the user also
throws in a special character and a number and an uppercase letter and a
lowercase letter.

~~~
pdshrader
what makes "l58jkdjp!" such a popular password? Is that something in a
language other than english?

~~~
thaumaturgy
It appears to be a popular password for French users. Beyond that, I've no
idea.

I suspect that the input file I used from the security researcher's 10m dump
has a lot of duplicate entries. The researcher's site is down now, but IIRC,
he compiled it from a bunch of different publicly-available sources, so the
same users might have shown up several times in the file.

Once I start mixing in the other data files I have anomalies like that should
get sifted out.

