
300M Freely Downloadable Pwned Passwords - urahara
https://www.troyhunt.com/introducing-306-million-freely-downloadable-pwned-passwords/
======
oelmekki
I really would love we be done with passwords altogether. We're asking non
power users to make their password unique, and then make it complicated, and
then remember all of them in their head, not on a post-it. Nobody can do that,
not even us who are telling them to do that. And then, we explain to them
they're dumb if they didn't do that.

Currently, my way to generate a new password is this : `pwgen | md5sum`. And
then, I use "lost password" everywhere (but for my mailbox, obviously), that
is, the rare times my browser is not already prefilling the login form.

This makes me wonder why we don't just go with that : generate a random
password for the user in registration form, allow the browser to save it. On
the login form, check if fields are prefilled. If not, only display an email
field and send an auth link as mail. User clicking it (once, and fast enough)
is logged in.

You still have to remember your mailbox password, but that's the only one,
quite akin the root password of a server.

~~~
belzebub
Email address is stupid, we should have randomly generated proxy email
addresses.

~~~
thephyber
A former coworker makes liberal use of American Express disposable credit card
numbers -- proxy credit card numbers that you can request to give away to less
than trustworthy merchants.

~~~
jacobwg
Privacy ([https://privacy.com/](https://privacy.com/)) offers a similar
service for those that want something that works with more than American
Express cards or other such offerings from other card issuers.

~~~
phs318u
Did not know about this! Checked it out, got really excited... and then
discovered it's US only :-(

------
devy
Can Troy or someone contact Google (bq-public-data@google.com) and push this
to the GCP's BigQuery public dataset[1] for hosting and easier look up your
password via SQL in BigQuery rather than some 3rd party site?

[1] [https://cloud.google.com/bigquery/public-
data/](https://cloud.google.com/bigquery/public-data/)

~~~
cobookman
I work for Google cloud. Will ping internally about us hosting it.

~~~
dredmorbius
Thank you.

------
oliwarner
An interesting element to this is how resistant some people are too using
torrents for legitimate purposes, even as a backup mirror.

It's something we've come to embrace in the Linux world. _Much_ faster than a
single server and saves bandwidth at individual sites. Surprised this
pragmatism hasn't reached the rest of you yet.

~~~
jff
Having an http option makes it better for those on restrictive work networks--
I'm downloading the file because I want to experiment with it, but I certainly
wouldn't want a call from Networking asking why I'm torrenting stuff.

~~~
oliwarner
Yes, I understand the reason for keeping a slow legacy option available.

Even in your situation, if you're doing this _for work_ and IT calls you up,
you just tell them what you're doing. "I'm downloading a very large file over
Bittorrent because it's 50% faster than the HTTP download and I'd like to do
some work today. Is that all? Thanks. Bye."

In some places you'd get that call from IT just for downloading a large file.
Getting a call doesn't mean you're doing something wrong, they're just
checking it's you, not malware, and that it's for work. If they haven't
already blocked it, have at it.

------
peteretep
I wonder how we force change with individual companies? Today I had to sign up
for a UPS account. The password length was set to max 27 characters, and the
form had disabled paste in the password field.

Who do we lobby to get them to fail their next PCI-DSS compliance test?

~~~
thangngoc89
Someone made an Chrome extension to enable password pasting again.

Don't Fuck With Paste: [https://chrome.google.com/webstore/detail/dont-fuck-
with-pas...](https://chrome.google.com/webstore/detail/dont-fuck-with-
paste/nkgllhigpcljnhoakjkgaieabnkmgdkb?utm_source=chrome-app-launcher-info-
dialog)

~~~
bigiain
And now we've moved the problem from "I used a weak password for $site because
I couldn't paste it from my password manager" to "I've got an extension in my
browser from some internet random that manipulates every form field including
the password field on every webpage I visit"...

(And yeah, the "internet random" here has a github repo with the code, and the
file that does this is an easily auditable 16 lines of javascript, so props to
him for that. But it's still got the recently exploited attack vector that he
or an attacker who takes over his account could push malicious updates to the
extension, like the webdev extension from earlier this week...)

------
jaclaz
I don't get it.

>Do not send any password you actively us to a third-party service - even this
one.

So I can only test password that I am not using (and by extension that I am
not going to use in the future).

>oh no - pwned!

>This password has previously appeared in a data breach and should never be
used. If you've ever used it anywhere before, change it immediately!

If I cannot (shouldn't) submit any password I am actively using, what does it
matter if I used it before? Now I already changed it.

~~~
model_m_warrior
I believe the idea is to ensure no one can use the listing to brute force.

~~~
thephyber
I think gp is complaining that the second you type your password into the
form, you've "used it", hence you should change it.

The gp makes a good point, but that's also why you can submit the
`sha1($your_password)` instead. The only question is why did Troy allow un-
hashed passwords to be submitted.

~~~
jaclaz
Maybe - even better - if you could submit only - say - first 8 characters of
the SHA1 (and NOT the complete hash) and provide - still say - max 10 "whole"
hashes found with that 8 char beginning (if more than 10 ask for a ninth
char).

I mean, here is the SHA1 of my password (not really):

d012f68144ed0f121d3cc330a17eec528c2e7d59

This site:

[https://hashkiller.co.uk/sha1-decrypter.aspx](https://hashkiller.co.uk/sha1-decrypter.aspx)

>We have a total of just over 312.072 billion unique decrypted SHA1 hashes
since August 2007.

Took exactly 221 ms to reverse it to "pippo".

------
captn3m0
Going to generate a bloom-filter from this dataset tonight.

Troy mentions some arguments against torrents, but it is better to have a
authoritative torrent than none, imo.

~~~
developer2
Is a bloom filter worth it in this case? With the optimal "k" hash functions
of 10 and a "p" error rate of 0.001% (false positives of approximately 1 in
1000), a bloom filter for the 306,259,512 items will take 538 MB. Increasing
the error rate to 0.01% (1 in 100) is still 358 MB. That's a sizeable filter
to maintain in memory (then again... RAM is cheap).

I'd probably just shove the passwords into a database, limiting the index
prefix to the first X characters to reduce index size.

~~~
captn3m0
Distributing a 538 MB file (which can be compressed further) is much easier.

~~~
wongarsu
What are the actual use cases where this size difference matters?

I distributing to a general audience, 0.5GB and 10GB isn't that much of a
difference, and most people are more equipped for handling lists of strings
than for handling bloom filters.

------
colinbartlett
I do agree with Troy that this could be useful to send to relatives and
family:

> I'm envisaging more tech-savvy people using this service to demonstrate a
> point to friends, relatives and co-workers: "you see, this password has been
> breached before, don't use it!"

But I can't be the only one whose family would be baffled by the term "pwned".
I wish it said something like "Your password has been hacked!" which we all
know not to be technically correct but would resonate a lot more.

~~~
Piskvorrr
Too alarmist, and in wrong tense: people would start freaking out "by typing
my password into this box, my password has been hacked". Resonance is not
always a good thing, see Tacoma Narrows bridge ;)

------
excitom
>If a password is not found in the Pwned Passwords set, it'll result in a
response like this:

Wait, so I test my password to see if it's "good" and now you have a copy of a
password I will be using. Am I just being paranoid?

~~~
rrauenza
You can post the sha1sum instead.

    
    
      $ sha1sum
      SooperSekretPassw0rd^D
      SooperSekretPassw0rddc0d3504b259a92dce59b850969601d12c06a75f  -

~~~
AlexCoventry
sha1sum is giving different results.

    
    
        /tmp$ echo "p@55w0rd"  | sha1sum
        8633c4a8b38a8826132414d8861af7b6a8371976  -
    

This is a different value from the one given in the blog post:
"ce0b2b771f7d468c0141918daea704e0e5ad45db".

The python sha-1 hexdigest comes out right, though:

    
    
        In [13]: import sha
    
        In [14]: sha.new('p@55w0rd').hexdigest()
        Out[14]: 'ce0b2b771f7d468c0141918daea704e0e5ad45db'
    

In case anyone else has passwords they want to check, this will binary-search
them:
[https://gist.github.com/coventry/5df7885fb0d5caeabb39fcd0e2b...](https://gist.github.com/coventry/5df7885fb0d5caeabb39fcd0e2bf4864)

~~~
nsillik
You need to use `echo -n` in order to not have `echo` generate a newline.

~~~
AlexCoventry
Ah, thanks.

------
Deimorz
One really unfortunate aspect of the passwords being hashed is that there's no
info available about their lengths. Knowing the lengths could allow you to
reduce the size considerably when you enforce a minimum password length.

For example, if I have a site that requires passwords to be at least 10 chars
long, I don't need any of the data for breached passwords that are shorter
than 10 characters. People can't possibly use them anyway, so that's probably
a huge chunk of the data that's completely useless to be storing and checking.

~~~
royce
I've cracked* just under 99% of them so far (including the 14 million added in
Update 1). Statistics are here:

[https://gist.github.com/roycewilliams/b1de2afbfe5cb71bea16c9...](https://gist.github.com/roycewilliams/b1de2afbfe5cb71bea16c94042b9bbfc)

Regardless of composition, the top 12 lengths are:

    
    
        8: 32% (102260862)
       10: 14% (45084047)
        9: 13% (41525797)
        7: 10% (33632055)
        6: 06% (20211176)
       11: 05% (18275968)
       12: 04% (14052958)
       15: 02% (8291459)
       13: 02% (8042452)
       14: 01% (6321198)
       16: 01% (4201765)
        5: 00% (3054291)
    

In other words, requiring a minimum length of 12 would make 80% of the
passwords in the corpus inapplicable.

... and the top 12 masks are:

    
    
      ?l?l?l?l?l?l?l?l,47823614
      ?l?l?l?l?l?l?d?d,7005728
      ?d?d?d?d?d?d?d?d,6212778
      ?l?l?l?l?l?l?l?l?l?l,6023602
      ?l?l?l?l?l?l?l?l?l,5379482
      ?l?l?l?l?l?l?l?d?d,5169013
      ?l?l?l?l?l?l?l?l?d?d,5090400
      ?l?l?l?l?l?l?l,4998896
      ?d?d?d?d?d?d?d,4798329
      ?l?l?l?l?d?d?d?d,4798124
      ?d?d?d?d?d?d?d?d?d?d,4754401
      ?l?l?l?l?l?l?d?d?d?d,4377841
    

Almost 48 million of them are 8 lower-case characters.

* And to be clear, "cracked" is an overstatement. Many of his sources are public. Simply using those sources as wordlists makes "cracking" these like shooting fish in a barrel.

~~~
Deimorz
Oh, very cool, thanks for posting. I had actually written a really basic
cracker and started seeing if I could figure out how many of them belonged to
shorter passwords, but you're doing a much, much better job of it than I am (I
was just brute-force generating short passwords, no wordlists or anything).

Are you planning to make a blog post or anything "final" with the info you
find out, or will you just keep updating those gists?

~~~
royce
You're welcome! And "¿por que no los dos?" :) I'll keep the gists updated and
will also do a blog post, I think.

With a tool like hashcat, a modern GPU or two, and some publicly available
wordlists, you can get the vast majority of them without breaking a sweat.

In other words: there is almost no value in hashing them with SHA1.

------
aj7
If I test my passwords, aren't they also now pwned?

~~~
stephengillie
Not if you grep locally. How big is this data set? It can't be much bigger
than a AAA video game download.

~~~
ocdtrekkie
It's 11.9 GB of text (5.3 GB zipped). So smaller than quite a few video game
downloads.

~~~
sleepybrett
fairly low compression rate for zipped ascii, but i guess it is mostly a giant
pile of nearly random strings.

------
r_singh
Is it safe to test my password on this website? (because I just did)

~~~
pwdisswordfish
[http://inutile.club/estatis/password-security-
checker/](http://inutile.club/estatis/password-security-checker/)

~~~
coldsmoke
Hehe, I was just about to build something similar. I like the extra touch of
it not even being served over https.

~~~
distances
Also the option to test another password is nice, and the unsatisfied
curiosity you're left with if you go down that route.

------
nicpottier
I used one password for a decade or so in the 90s and early naughts and though
I've since moved on to use LastPass and two factor for everything this is the
first time that password appears in one of these databases.

Guessing it was in MySpace..

Ironically I used another password for sites I trusted less and that one isn't
in there.

------
mbid
As others have pointed out, the use case for pasting plain text passwords is
not quite clear. Maybe it would be a good idea to allow searching for hashes
only, or at least hash the password in js on the client.

Also, I'm genuinely curious as to why SHA-1 is used and not SHA-256. Surely
the one-time additional cost of using SHA-256 would've been negligible for
Troy? If at some point somebody manages to do preimage attacks on SHA-1, I
have to assume my password is broken if I've submitted its hash to his API.
Although I guess you'd have to actually be able to enumerate preimages,
preferably from small to big. Still, I don't understand why Troy doesn't
account for the possibility by using a hash function widely considered to be
stronger.

~~~
tracker1
I believe the sources for the data breaches were mentioned in the article, so
if someone wanted to get those sources anyway, it wouldn't be a big deal.

~~~
bigiain
He stops well short of saying "here's a link to the pastebin dumps!", but yeah
- it wouldn't take too much google-fu to build your own version of this -
perhaps not a 300M entry one, but I doubt it'd take more than a weekend to get
halfway there if you wanted too.

------
r0f1
What would be the best data structure for using this in, say, a Python script?
I imagine just putting it into a dictionary (hash map etc.) won't work because
of the size.

~~~
fanf2
Bloom filter :-)

------
pjf
Someone should apply deep learning to this and check how it compares with
brute-forcing passwords. E.g.
[https://github.com/thoppe/5baa61e4c9b93f3f0682250b6cf8331b7e...](https://github.com/thoppe/5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8)

~~~
RomanPushkin
That's the point here. We know 300M passwords leaked, but we can't investigate
these passwords and learn from this data. It's sad that researchers should
look into black market if they want to get plain passwords. I'm pretty sure
everyone who needs these passwords can find them.

------
w8rbt
Many of these passwords are one or two characters in length. I think the 300
million number is inflated for publicity. Who allows a password that only has
one character?

Go here and type the character 'a':

[https://haveibeenpwned.com/](https://haveibeenpwned.com/)

~~~
Freaky
There are 9120 printable ASCII strings of 1-2 characters in length. Cunning,
including that many to bulk out the list :)

~~~
w8rbt
;) OK. But there are a ton of 3 and 4 char passwords too. I mean, what sort of
site would allow that? It's just hard to believe these were actually
passwords.

BTW, I upvoted your comment. It made me laugh. Point granted.

~~~
bigiain
FWIW - I do that all the time.

If someone's website forces me to set up an account without me already being
convinced there's enough benefit to me in return for my personal information,
they're likely to get a signup for test@example.com with password "foo". (And
if they then respond with "please click the confirmation link in the email we
just sent", I'll sign up again with $sitename@$spare-domain-I-own.com and pick
up the link from the spam filtered catchall account).

I probably do this at least once a month when there's hints something useful
in a web forum I want to read, but I'm not (yet or ever) convinced I'll ever
become a member of that forum's community...

------
Freaky
Quick script to binary-search for passwords locally:
[https://gist.github.com/Freaky/4cb7ce8c107c3da2e4a8210356e8d...](https://gist.github.com/Freaky/4cb7ce8c107c3da2e4a8210356e8da25)

------
coldsmoke
Interesting - "correct horse battery staple"[0] is flagged as not being in the
data set. I was sure someone should have used that by now.

[0] [https://xkcd.com/936/](https://xkcd.com/936/)

~~~
captn3m0
It shows up if you remove the spaces.

~~~
danbruc
And with spaces and uppercase first letters.

------
danso
I'm enjoying typing in things that I've never used as passwords (my own name,
for example) to see if anyone else at some point thought these things would
make for good passwords.

------
nischalsamji
HIBP provides a REST API to check if a password has been found in a breach, Is
there a disadvantage of using it in applications and restricting users not to
use the breached password?

~~~
lancefisher
It's not ideal to send every new user's password to a 3rd party service.

~~~
edelans
you can still send the SHA1

~~~
wongarsu
Without salt, meaning the majority of passwords can be reversed with brute-
forcing or rainbow tables.

The second google result for _rainbow tables_ lets me download software and
tables to efficiently reverse any sha1 whos plaintext fits _[a-zA-Z0-9]{1,9}_
or _[a-z0-9]{1,10}_. That's likely the majority of passwords an attacker would
observe

------
oxplot
Woha, wait a second:

> Each of the 306 million passwords is being provided as a SHA1 hash.

That's it? Without any salting? This would make it trivial to recover the
plain text using rainbow tables.

~~~
bigiain
Troy explains where the majority of them came from - it'd be even more trivial
to Google for the dumps yourself.

It's not like everyone who is curious cant go findabout stuff like this
doesn't have the Rockyou dump already, and there's easily enough links to
start your own list here:
[https://www.google.com/search?q=password+lists](https://www.google.com/search?q=password+lists)

------
j_s
Would it make sense to host the file on a cheap OVH/Scaleway VPS with
unlimited bandwidth? I guess CloudFlare doing it for free beats that though!

~~~
tomschlick
"Unlimited" as in, "as soon as you start becoming a problem we drop you".

~~~
pfg
OVH lets you use the advertised bandwidth 24/7, if you wish to do so.
Naturally, this doesn't replace a CDN, and their peering isn't the best, so
some routes might be congested and won't be that fast.

I have no experience with Scaleway on this, but based on what I've heard about
them in the past, I imagine their policy is roughly the same.

------
misticdeveloper
I'm confused. If a website salted their hashes, wouldn't it not matter if he
password alone was "pwned"?

~~~
eridius
If a website salts their hashes, that only helps protect their users if that
website is hacked and their password database stolen.

But 306 million passwords have already been exposed by data breaches at other
sites. And users have a tendency to reuse passwords across multiple sites.
Just because your website wasn't hacked doesn't mean an attacker can't go look
up one of your users in someone else's data breach and try the same password
on your site.

