
"Pwned Passwords" V2 With Half a Billion Passwords - explodingcamera
https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/
======
ivanech
I think it would be interesting to do an art project with this data - some of
these passwords are funny and/or revealing. Some examples:

    
    
      pooplasagna - 3 times
      eggsarebad - 3 times
      eggsaregood - 25 times
      myhusbandcheats - 4 times
      icheatonmywife - 1 time
      ihatemyneighbors - 2 times
      iamanalcoholic - 6 times
      1yearsober - 31 times
      imissmykids - 51 times
      imissmyparents - 6 times

~~~
grinsekatze
6618 people love life, 1367 want to die.

893 people like turtles, 170 love turtles, 155 love everyone (363 people hate
everyone though and 428 hate us all).

4301 people love their dog, 3 fuck their dog, 3 killed their dog (only one
person killed their cat)

110 people are killers, 4 kill for money, 1 is a murderer. 68 want to kill, 24
kill for fun :-/

2781 love their wife, 552 love their husband. 68 people hate their wife, 38
hate their husband. 3823 people love their son, 17 hate their son. 212 love
their daughter, 3 hate their daughter. 11 siblings are having sex with one
another. 3 have sex with their dad, 5 have sex with their mom.

6 people save the cheerleader, 11 people save the world. 1 person is the
president. 4 people are obama, only one person is trump. The password
donaldtrump is almost 5 times more popular than barrackobama.

339 people love sun, 559 love rain.

114 people are healthy, 216 are sick. 108 people are old, 3 people are too
old. 15 people sleep well. 80 people run fast, 6 people run faster. 1 person
is the fastest (obviously).

All this is quite facinating and fun .. and partly disquieting.

jesus - 123279 times, lordjesus - 8187, satan - 11662 times, iamgod - 8834,
iamjesus - 345 times, jesussucks - 100 times

~~~
TallGuyShort
> 1 person is the president

Don't tell me he actually made that his password...

~~~
grinsekatze
i just checked, at least not his twitter password..

------
groovecoder
do _not_ skip the section on "Cloudflare, Privacy and k-Anonymity" ... it is a
great summary of an elegant privacy solution.

And check out Cloudflare's detail post too:

[https://blog.cloudflare.com/validating-leaked-passwords-
with...](https://blog.cloudflare.com/validating-leaked-passwords-with-k-
anonymity/)

~~~
skykooler
Why does 0000 have the largest number of hashes? Does SHA-1 not distribute
hash values evenly?

~~~
jobigoud
It's indeed weird that "00000" would be the hash prefix with the highest
number of entries. I think it must be a hidden variable. Like some sources put
an all-zeroed-out hash in the database for testing or in case of a
registration error or for deleted users, and these show up here.

~~~
cmurphycode
Great thought, but it doesn't seem to be the case - as the number of unique
suffixes is the large number here -- in fact, none of the values in the range
are simply all zeroes.

[https://api.pwnedpasswords.com/range/00000](https://api.pwnedpasswords.com/range/00000)

I wonder if the hidden variable is something to do with how the passwords are
leaked. First, let's suppose that a very commonly used broken password hash is
plain SHA-1 (I think that's a valid assumption-- unfortunately!). Then, let's
figure that amongst the many data dumps / extracts done by hackers, some of
them are only able to extract part of the database, or save part of the
database, or whatever....and they are fetched / saved / uploaded in lexical
order?

Can't think of anything else.

EDIT: Ooops. The other thing is, that these actually are sha-1 hashes of real
plaintext passwords. So it's definitely not a test-row in that sense.

------
kbenson
That moment when you test an old, but still highly valued and securely used,
password that you think isn't super obscure but not likely to be used much and
see a 4000+ count...

~~~
Arn_Thor
That moment when you test a very unique password you used to use and it's been
pwned once.. GULP! Glad I stopped using that one

~~~
filoeleven
Same experience here, except mine was current until shortly after I checked
it.

The weird part is that I only used it on internal systems at work. With an
overly paranoid security department. Either they’re paranoid in the wrong
ways, or I have an evil twin somewhere.

At least, I _hope_ they’re the evil twin...

~~~
Arn_Thor
time for some soul-searching

------
dandare
Am I the only one here who thinks typing your password to a stranger's website
is a risk? How do you know he does not log it? how do you know he was not
hacked and someone is not logging all passwords that are not on the list YET.

~~~
maze-le
If you don't trust troy hunt / haveibeenpwned.com you can always download the
data and analyze your password yourself. But if this is the case you should
not trust any website with your password anwhere ever, and should not create
accounts anywhere. Troy Hunt has shown himself a responsible security
professional, and I trust him more to create a secure password query than some
other security organizations.

~~~
ajro
"But if this is the case you should not trust any website with your password
anwhere ever".

That is why you should use unique password for each site.

~~~
steferson
This is absurd and impossible to remember, you should instead have at least 3
levels of password strenght, one high strenght for base services that are used
to retrieve other accounts like facebook and e-mail, other for important
services, and another for crap.

~~~
idle_zealot
You're not expected to remember them all, you're expected to either wrote them
down or use a password manager. That way you only really need to remember one
very strong password.

------
dopamean
An old password (12 char numbers and letters) I've since stopped using (but
used to use everywhere) appears as pwned in this list (3 times!). I'd love to
know who exposed it. Any chance I can find out?

~~~
anitil
As a policy Troy Hunt won't reveal which breach he found your data in. I
considered setting up a series of 'canary' emails so that I could track who's
selling what but ... well never got round to it.

~~~
giarc
You used to be able to adjust your email address to check. For example if you
email was bill@gmail.com, you could sign up for HN with
bill+hackernews@gmail.com. Gmail ignores the part after the + sign. Therefore
if you noticed emails coming to that address, you would know that HN sold
their list. However, I've found that most forms reject that as a non-valid
email address now.

~~~
outworlder
Yes. Those forms are also ignoring relevant RFCs.

~~~
emmelaich
Can you be more specific?

~~~
loeg
Many sites reject valid email addresses. One character it is common for forms
to reject is a "+" in the left hand side of an email address. The email RFCs
allow this character, so denying it is bogus. Nevertheless, they do.

[https://tools.ietf.org/html/rfc2822#section-3.4](https://tools.ietf.org/html/rfc2822#section-3.4)

atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and
specials. "$" / "%" / ; Used for atoms "&" / "'" / " * " / "+" / "-" / "/" /
"=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

atom = [CFWS] 1 * atext [CFWS]

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1 * atext * ("." 1 * atext)

...

addr-spec = local-part "@" domain

local-part = dot-atom / quoted-string / obs-local-part

------
royce
The UX of a blacklist with a half billion entries would be so crippling that
it would cause a user revolt.

Most people's password-selection strategies are similar enough to other
people's (like kbenson's 4000+ hit) that they could spend _hours_ trying to
come up with a password that has never been leaked before.

I tried to encourage Troy to suggest to implementors that blacklisting all
passwords was a Bad Idea. Instead, he doubled down:

[https://twitter.com/TychoTithonus/status/966400790221930496](https://twitter.com/TychoTithonus/status/966400790221930496)

Please don't use the entire list for blacklisting _unless_ you actively also
guide the user in how to generate a random passphrase (if a human must
remember it) or a random password (if it will be stored in a password
manager).

Instead:

1\. Use a high-level password-strength assessment widget like zxcvbn:

[https://github.com/dropbox/zxcvbn](https://github.com/dropbox/zxcvbn)

2\. Configure a blacklist with, say, 10K or 20K of the most common passwords.

3\. Hash passwords with bcrypt cost 12 (adjusted to your platform's hashrate
capabilities), scrypt, or the appropriate method from the Argon2 family.

But for all that is holy, please don't use Troy's entire corpus - or even the
first million - as a blacklist. To quote the old NANOG saw, I encourage all of
my competitors to use it. ;)

Edit: And since Troy's API at this writing does not support only querying the
top X passwords, there's no way to use the API while avoiding the UX
nightmare. So if you want to use this data in a professional manner, but don't
want to download the entire corpus, here are the first 20K from his list (of
which I've only personally cracked 19965 so far, interestingly; gist will be
updated once I get all 20K):

[https://gist.github.com/roycewilliams/281ce539915a947a23db17...](https://gist.github.com/roycewilliams/281ce539915a947a23db17137d91aeb7)

Edit 2: Preliminary results indicate that this data may be dirty. The 273rd
most common password, according to Troy, is '$HEX'. This is almost certainly
an import/conversion artifact, since the '$HEX' prefix is how most cracking
suites escape non-ASCII or passwords that contain colons. I expect that there
will be more artifacts. Use the data with caution.

~~~
kbenson
> I tried to encourage Troy to suggest to implementors that blacklisting all
> passwords was a Bad Idea. Instead, he doubled down

> Please don't use the entire list for blacklisting unless you actively also
> guide the user in how to generate a random passphrase (if a human must
> remember it) or a random password (if it will be stored in a password
> manager).

I think he did the right thing, _and_ think you are correct as well. I think
we have the best of both worlds with this, in that it includes the count, so
API users can determine what the correct cut-off is for them. Once you get
into the thousands (or maybe less) might be a good indicator that your
password is not only _relatively_ common, but also likely to be on (and maybe
even fairly high on) many dictionary lists. More secure services that cater to
more technically savvy users (or security conscious companies) may decide to
blacklist any password on the list period, and that may be okay because those
sites either trust their users to deal with it or can dictate conditions for a
captive audience.

~~~
royce
As a corpus to download for password research, this is indeed useful. But for
providing a blacklist -- his stated purpose - it is not.

The crucial tell: his API _does not allow the implementor to specify a
frequency threshold_ (by top X in the list, or by Y number of unique uses of
the password or higher).

By both API and explicit language in the announcement, he is promulgating the
idea that checking the _entire_ blacklist is useful, and "the larger the
blacklist, the better." This is _exactly_ what I'm arguing against.

~~~
kbenson
> his API does not allow the implementor to specify a frequency threshold

Yes it does. The output contains the number of matching passwords. It's just
client side instead of server side. The reason for not doing so on the server
is also obvious taking into account his explanation of cost and caching, which
informed much of the API design itself.

> By both API and explicit language in the announcement, he is promulgating
> the idea that checking the entire blacklist is useful

Because the entire blacklist _is_ useful. He's given all the relevant
information to the client to do with as they may. It's up to them to choose
how to utilize it. I'm not sure why you seem to think some narrower use case
is necessarily better, using end use cases as arguments, given it's an API and
needs a client implementation fore being usable anyway.

~~~
royce
It's a fair point that raw password count is available.

But that value is an _absolute_ number, without any in-API context of the
total size of the corpus. This makes expressing _relative_ rarity only
possible by hard-coding the total size of the corpus into a calculation.

Put another way: the 20,000th position has a frequency value of "7889". But
what does that _mean_? Where is that in the _distribution_ of password
frequency? It's impossible to tell, without manually constructed context _that
will change over time as the total number of passwords in his corpus expands_.

But more crucially, there is no way to tell _relative_ rank ("is this password
in the top 20k?") using the API that I can see. That would make using the top
X much easier. But with the K-anonymity "feature", there's no way to do that
that I can see.

~~~
jasonpeacock
I don't follow - how is the relative rarity better than absolute frequency?
What really matters is how common your password is - not how highly it's
ranked in a compromised password list, which has no relevance to how common it
may be.

You want to filter on users choosing a password that's been re-used across all
compromised more than N times.

Filtering users on choosing a password that ranks N of M on a list of
compromised passwords doesn't tell the user how _bad_ that password is.

In fact, once you get to the rail, the ranking is basically based on sort
order and become irrelevant?

~~~
royce
The ranking in Troy's list is based entirely on how common the words are. Here
are the top 10, with their relative frequency:

    
    
      c4a8d09ca3762af61e59520943dc26494f8941b:123456 (20760336)
      f7c3bc1d808e04732adf679965ccc34ca7ae3441:123456789 (7016669)
      b1b3773a05c0ed0176787a4f1574ff0075f7521e:qwerty (3599486)
      5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8:password (3303003)
      3d4f2bf07dc1be38b20cd6e46949a1071f9d0e3d:111111 (2900049)
      7c222fb2927d828af22f592134e8932480637c0d:12345678 (2680521)
      6367c48dd193d56ea7b0baad25b19455e529f5ee:abc123 (2670319)
      e38ad214943daad1d64c102faec29de4afe9da3d:password1 (2310111)
      20eabe5d64b0e216796e834f52d61fd0b70332fc:1234567 (2298084)
      8cb2237d0679ca88db6464eac60da96345513964:12345 (2088998)
    

So ... what is the "right" threshold for N?

    
    
      $ for topx in 1 100 1000 5000 10000 20000 50000 100000 200000 500000 1000000; do \
        echo -n "$topx: "; head -n ${topx} pwned-passwords-2.0.txt | tail -1; 
      done
    
      1: 7C4A8D09CA3762AF61E59520943DC26494F8941B:20760336
      100: 482FA19D5C487CB69ACDA19EEE861CC69D82CC94:272371
      1000: 5B9FE558F673D63309BEB13BFA5DA6C30A3CA1BF:64912
      5000: FE648FC459A6F6EF6CD347BEE3D494766239BBB5:19860
      10000: 2682A3DBA7A1452EE7EE9980F195C6A768055DA6:11055
      20000: 53490A3C8567342B57B6A4FF24908DF73182B357:6309
      50000: 7517CD23A308BBCD05E5AD24AA6AD054237ED470:3153
      100000: BA6D6A41B9548C523833627A8B0E5170558BE1EA:1752
      200000: E50E6893264519636E90E95B6B1A85D0A691E0B1:931
      500000: AF8DF653177BBB3FEE2DA68D314B94CB5281B4F3:381
      1000000: BDD57A4CAA691A3441C1190C6F087B58B2EE3EF6:186
      2000000: C824AF24AA8F2FD99AD6842DC0E4B49100D96161:93
      10000000: 352DB7177AB7848DF1C102234401097FE40EB87D:22
    

The third field indicates how common the password is in the corpus (for
example, the single most common password - "123456" \- appears in the corpus
20,760,366 times).

So ... based on this data ... what is a reasonable value for that count, such
that if the value is exceeded, the user should be disallowed from using the
password? How much _real-world_ online or offline resistance is provided by
disallowing, say, passwords used at least 186 times in the corpus (roughly a
million passwords, though 5201 passwords are at the 186 mark)? (The answer
should be self-evident; if it isn't, I can provide more background).

Put another way ... if the corpus was only 1M in size, those right-hand values
would be much smaller. How could you determine the threshold then? What I'm
trying to illustrate here is that it's not the absolute value of that
commonality number that matters; it's the _relative rank_. But that relative
rank can't be determined via the API; you must analyze the entire corpus
directly - and then discard the vast majority of it for blacklisting purposes.

I totally get that the threshold might vary per implementation. But it varies
much less once the hash is slow enough, and the authentication service is
suitable rate-limited. In other words, any system that would get real benefit
from a 1-million-word blacklist is one that _needs to be improved elsewhere
instead_.

But Troy didn't provide any guidance about that, or even how to judge for
yourself what the threshold might be. He just provided an API to blacklist a
corpus of passwords that is three orders of magnitude larger than a properly
designed system would ever need.

1\. [https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-
pass...](https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-password-
strength-estimation/)

------
valtism
[https://mostsecure.pw/](https://mostsecure.pw/)

No pwnage found. Still confirmed for most secure password.

~~~
chuckdries
...this is a joke, right?

------
jack6e
How do I submit a pull request on Github asking that "hunter2" be removed from
his list?

~~~
LeonM
For those who missed the hunter2 reference:

[http://bash.org/?244321](http://bash.org/?244321)

~~~
r3bl
It was used just 16,092 times?

I thought the number would be much bigger.

~~~
Nition
I got 4,882 times. You're searching * * * * * * * right?

------
orasis
Can someone please just provide the exact shell commands to generate a
compatible sha-1 of a password to grep against the database?

The article seems to ramble forever about how to perform online checks without
discussing the basic offline secure option.

~~~
ianlevesque
echo -n "password" | openssl sha1 | tr '[:lower:]' '[:upper:]'

~~~
mnutt
You may also want to consider running `unset HISTFILE` before that to ensure
that the line containing your password doesn't end up sitting around in your
bash history.

~~~
r1ch
Another way is to prefix the command with whitespace.

~~~
baliex
Only if..

    
    
        export HISTCONTROL=ignorespace
    

..is set (either by default or explicitly)

~~~
deathanatos

      python3 -c 'import getpass, hashlib; print(hashlib.sha1(getpass.getpass().encode("utf-8")).hexdigest())'
    

Avoids history, doesn't echo to the terminal.

In fact, you should be able to just make a rudimentary CLI into Troy's API
simply with:

    
    
      #!/bin/bash
      HASH="$(python3 -c 'import getpass, hashlib; print(hashlib.sha1(getpass.getpass().encode("utf-8")).hexdigest().upper())')"
      curl -sS "https://api.pwnedpasswords.com/range/${HASH:0:5}" | grep "${HASH:5}"
    

(It'll emit the line from the API response matching your pass; if it does,
then that password was compromised. Bash isn't real good at error handling
though, so my biggest concern would be what this might do if an HTTP/TCP error
happened. I've attempted to throw -S there to catch that, but use with your
head screwed on.)

------
odammit
Split brain your password storage.

Another table, another database or another storage system in general.

If an attacker SQL injections your database don’t go spilling every hashed or
unhashed password you’ve got.

I tend to store passwords in a separate keyvalue store from where my
authentication identifier is (email, “username”).

If someone gets into my network they need to get into my servers with the
email addresses and then get into a secondary system where the passwords are
stored.

I like my password systems to be a k/v store because there is no need to
“query” it. I usually store the password under a key that _isnt_ the
identifier. Instead using something like a database surrogate key.

Have a secondary system (microservice, private subnet) that simply returns a
boolean representing if the provided non-email (key) and password (value)
match.

Have that secondary system take the plain text password so it can do the
hashing without letting the dependent service know what algorithm, salt or
stretches you’re doing. This will also allow you to easily roll over to new
hashing algorithms over time without affecting the service that is doing th
authenticating.

Edit: I’m not trying to be a know it all or a crabby old tinfoil hat a-hole.
But it’s passwords, man. When you leak them you ruin people’s days/year/life.
Building that system above takes a middle of the road engineer a day or two.
Put the effort in. Every password leak makes _all of our jobs harder_. It’s
your companies responsibility to keep that safe. If you know that already, be
the annoying guy that brings it up in every stand up. Make that debt known.

~~~
brazzy
Sorry, but this is convoluted nonsense that can only achieve one thing: make
yourself more vulnerable.

You want your security system to be as simple as possible, and to involve as
little custom code as possible. Because you can and _will_ fuck it up if you
try to be clever.

Hash and salt your passwords using a library designed exactly for that purpose
(which means it will use a _slow_ hash). That's it, end of story.

~~~
odammit
^ above is pretty damn simple.

I also never said “write your own hashing algorithm” I said abstract it so
it’s not sitting around in your ecommerce app code.

That is a simple security system. It’s just not baked into your flagship
ecommerce, blog or whatever else your storing the credentials to protect.

------
scrollaway
[Pasting an old comment of mine on password managers, since I see people
talking about starting to use Keepass. I hope this helps someone]

\----

If you're just starting, here's some guidance on setting up a password
manager.

First of all: Don't be afraid of using one. It's not just more secure, it's
super convenient. Never again will you ask yourself: Did I make an account for
this website/service? What email did I use? Never again will you have to
remember a password. Using a password manager is a quality of life
improvement.

KeepassXC is what I recommend to people at this point. It's free and you own
your data (your passwords). They live wherever you want them to live. There
are plenty of online services that are supposedly more convenient but I have
to say I trust them less -- YMMV (1Password is the best I'm aware of).

[https://keepassxc.org](https://keepassxc.org)

If you do use keepassxc, you get the added benefit of being able to store 2FA
settings in it as well (if you store them in the same database as your
passwords, be aware that you lose the security benefit of a second factor,
however it is still more secure than not having 2FA enabled due to the One-
time password component).

Put every account you ever made and ever make into keepass. Enable 2fa
wherever you don't have it enabled. Add login URLs and notes. Generate your
passwords from keepass itself; the password generator is really powerful and
lets you very easily deal with site-specific shitty password limitations. I'm
telling you this because, seriously, it's incredibly convenient to have this
stuff as long as you're rigorous about maintaining it.

Oh, also, keepass has the full history of all your passwords. Need to look up
an old password? Go into details and look at "History". You can also attach
files to items (items don't have to be accounts at all, you can use keepassxc
as a simple encrypted storage db).

Mobile support: Keepass2Android. Best android client, with google drive
support. iOS I have no idea, suggestions welcome.

IMPORTANT: BE STUPIDLY PARANOID AND RIGOROUSLY CAREFUL ABOUT YOUR MASTER
PASSWORD. That thing, together with your keepass database, unlocks all your
accounts ever. Use a really long passphrase that you will never have to write
down (if you do decide to write it down because you don't trust yourself,
store it in a safety deposit box, don't put it in a bloody drawer). Make sure
the device you unlock the database on is malware-free.

PS: Wondering what's up with Keepass vs. KeepassX vs. KeepassXC? Keepass is
the original app, written in .NET but with poor multi-platform support.
KeepassX is a rewrite in Qt and is a fantastic password manager, but has gone
unmaintained recently. The open source community picked up the slack in the
KeepassXC fork (after continuing countless attempts to upstream the patches)
and has implemented lots of powerful features. I've switched to it and at this
point I strongly believe it's the better client.

~~~
mseebach
> you get the added benefit of being able to store 2FA settings

Don't do this. If you use a password manager with all the benefits this
entails (long, random passwords, each only used for a single site), the only
benefit 2FA really gives you is if your password manager is compromised
somehow. If your second factor is _in_ your password manager, you're screwed.

I use Authy with a long, secure password printed on a piece of paper. Yes, it
is cloud and third party and everything, _but_ it's on a completely orthogonal
chain from my Keepass DB, so dual compromises are significantly more
difficult.

~~~
scrollaway
> _Don 't do this._

On the other hand, _do_ do this, but be aware of the tradeoffs.

I hate telling people not to do something. Most people just end up not turning
2FA on at all. My approach has converted many people from "one password reused
everywhere, at best with variations" to KeepassXC unique passwords everywhere
+ 2FA and I classify that as a big win.

The biggest benefit of TOTP 2FA isn't the "second factor" part, it's the OTP
part. This removes many forms of phishing, keylogging and database leaks as a
threat to your account. You do not lose these benefits when you have it all in
one factor.

If you read my comment, you'll see I address this concern. If this is a real
threat for you, then you can always simply use a separate Keepass database for
your OTP settings.

~~~
mseebach
I grant that it protects against phishing, but I would cautiously suggest that
sites that are smart enough to enable 2FA are smart enough to
salt/hash/bcrypt/whatever best practice their passwords, so leaks are
neutered. It doesn't not protect, so to speak, but the protection is likely to
be redundant.

But it emphatically does not protect against keylogging, anyone who can
install a keylogger on your computer can grab your password DB and your master
password. This is exactly the scenario where you need actual 2FA.

Anyway, broader point: yes, it's a tradeoff, but the kind of people who needs
explaining why a password manager is a good idea, do not understand enough to
make an informed decision about these tradeoffs. And so, the responsible
advice is to not use it.

I do know enough to understand these tradeoffs, and my conclusion is to keep
password management and 2FA strictly seperate.

~~~
scrollaway
> _But it emphatically does not protect against keylogging_

1\. A keylogger on your password db is useless if it does not also upload the
db (at which point you're looking at a targeted attack, and you have far
bigger problems than that).

2\. Keyloggers are more and more often browser-based. KeepassXC is immune to
those.

3\. KeepassXC supports 2FA for the database encryption itself. If you're that
paranoid, use that. There's always more you can do.

> _And so, the responsible advice is to not use it._

No.

Just as you see in the article where Troy has to make the difficult decision
_not_ to include a "Do not put your password anywhere not even here"
disclaimer, the same holds in my message: I weigh the pros of someone turning
2FA on as far more important than the cons that come with the less-than-ideal
security 2FA adds.

Your advice keeps people from turning 2FA on. 2FA is a pain in the ass for
most people.

You are one of the lucky few who understands the tradeoffs involved, as you
yourself said. So use that knowledge of yours to actually get people to secure
their accounts.

My goal isn't to keep Edward Snowden's accounts secure. It's to keep the bored
HN user's account secure. The average HN user has medium-to-high technical
literacy and low-to-medium security literacy. A lot of people on here reuse
passwords, I'm sure. This is what I'm trying to fix, and I won't advise Ed to
keep his TOTP seeds in the same database.

------
hmexx
I just tried a 11 character password without special chars, that I’ve used on
over 50 sites, over the last decade. It’s my password for throaway websites.
Some pretty dodgy.

Not in the database.

Makes me feel pretty good about password security overall!

~~~
empath75
I have a password that’s a pair of words in two languages with some number
substitution that I use a lot on websites I don’t care about and it’s not in
the dB. And I’m _sure_ I’ve used it on sites that have been hacked, so I
dunno.

~~~
ianlevesque
Unless the site was also storing in plaintext they’d have to actually crack
the password hash too, which for some passwords is very hard to do.

------
CurtMonash
There's something odd about a website that urges you to test your security by
typing in your password.

~~~
smoyer
He specifically tells you that you shouldn't do that. But as the winning tool
shows, perhaps people who have passwords in HIBP will changed them after
finding this out?

------
zaroth
I'm going to have to disagree with the premise that sites should stop users
from choosing a password which happens to have been cracked offline at some
point in the past -- to the tune of blacklisting half a billion potential
secrets.

What exactly is the end goal, and at what cost? Well, there are 3 ways to
steal a password. You can steal it from the user -- either by phishing or with
malware -- in which case it matters not a bit how complex the password is. You
could attempt to crack it _online_ , that is, by attempting to login as the
user through the front door. In this case, a simple counter should limit the
number of attempts before a second factor is required, such as clicking a link
in an email, and a list of half a billion candidates isn't going to help here
either. Finally, you can steal the password verifier database and attempt to
crack the password offline.

So, the theory must be that passwords in a known attacker's dictionary are
more likely to be used as candidates in an offline attack. This is likely
true. But once the verifier database is stolen, if an attacker is able to run
the password hashing function, then every password which is not raw entropy
already must be assumed to be cracked. _Regardless of how hostile your
password policy is._ So what exactly is this policy saving you?

On the flip side, it's reasonable to ask, what would such a policy _cost_ you?
It's hard to say without actual data, but I'd love to see some data on what
percentage of candidate passwords offered by a user trying to signup on their
mobile device would be rejected under this policy? How many attempts on
average would it take a user to find a password which was not rejected? And
what's the increase in bounce rate, and therefore lost signups, that would
result? How many increased password resets would be required from users
choosing passwords that they inevitably don't remember? How many additional
lock-outs which require customer support capital to resolve?

Password policies on average are pretty horrendous. But password policies
which are arbitrary black boxes to the end user are about the worst you can
find. Sitting on my mobile device, not knowing if a chosen password will be
accepted, having to type it twice each time, is a wretched user experience
which would need extremely lofty benefits to outweigh the cost. I fail to see
any benefits to this approach which couldn't be solved with better hashing
which wouldn't impact the user experience whatsoever.

I'll say one more thing on the idea of blacklists. Users just trivially work
around them. Password quality (entropy, guessability) does not generally
increase. Bad password policies often decrease password quality, particularly
in the case of password expiry. But a frustrating opaque blacklist could be
just as bad. (I'm not aware of any studies on this).

An attacker who knows a particular blacklist was in place will use munging
rules to find derivatives from the master list which are not on the blacklist.
How much will their crack rate (percentage of clears recovered from an offline
attack of a given magnitude) be affected? But more importantly, potentially
driving down the crack rate through user hostile password policies is a game
which has huge dividends at first (getting a password which can't be attacked
online) and very little dividends after that point.

Disclaimer: Founder of BlindHash, which is the "better hashing" I refer to
above.

~~~
nextgens
> I'm going to have to disagree with the premise that sites should stop users
> from choosing a password which happens to have been cracked offline at some
> point in the past

Well, NIST, NCSC and Microsoft all seem to be on the same page:

[https://www.ncsc.gov.uk/guidance/password-guidance-
simplifyi...](https://www.ncsc.gov.uk/guidance/password-guidance-simplifying-
your-approach)

[https://pages.nist.gov/800-63-3/sp800-63b.html#5111-memorize...](https://pages.nist.gov/800-63-3/sp800-63b.html#5111-memorized-
secret-authenticators)

[https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/06/Microsoft_Password_Guidance-1.pdf)

> What exactly is the end goal

The end goal is to prevent password spraying attacks. If the attacker can get
in on the first try because the user has re-used credentials in between two
services, account lockout policies don't help and neither does BlindHash (if
the compromised service wasn't using it). For it to bring any significant
security benefit you would need __everyone __to use it (which will obviously
never happen).

~~~
zaroth
NIST has unfortunately been the source of a lot of bad advice which has
actively harmed password security the last decade. (e.g. [1])

Cargo culting is generally a good thing in crypto because, you know, don’t
roll your own. But in this case we’re talking about policy. And this policy is
as user hostile (if not worse) than the prior NIST advice on password expiry.

If you want to stop password spraying, protect your hashes. There’s no proof
that blacklisting half a billion specific secrets will make cracking any more
difficult. Making it neigh impossible for users to register with your service,
well I guess if you have no users you have no passwords to lose.

But the point is a blacklist this extensive is just as likely to make
passwords easier to crack, not harder, and will come with a direct cost to the
company implementing it. I understand well the goal, I’m entirely unconvinced
this helps achieve it.

I would be interested to hear Gosney’s (cracker extraordinaire) and Cormac’s
(Microsoft Research) take on this.

[1] - [https://www.engadget.com/amp/2017/08/08/nist-new-password-
gu...](https://www.engadget.com/amp/2017/08/08/nist-new-password-guidelines/)

~~~
nextgens
> If you want to stop password spraying, protect your hashes.

Again, it's not about your hashes, it's about the attacker having access to
your users' credentials.

Users re-use credentials accross services and you have no control on how
(in)securely they are stored there.

Blacklisting (I don't have an opinion on how big the blacklist should be) what
is known to be widely used accross services sounds sensible... and there is
definitely an argument to be made about blacklisting what is known to be
widely available/effective for attackers.

------
amatecha
Hmm this is a pretty great list to use for any service that has user signups
-- disallow signup when using a password that is known to have been "pwned"!
:)

------
lzybkr
Here is a quick PowerShell script - it supports the pipeline so you can
automate, e.g. if you use a command line password manager.

[https://gist.github.com/lzybkr/85b4dbd6536ea5351e8d8e492a432...](https://gist.github.com/lzybkr/85b4dbd6536ea5351e8d8e492a432030)

~~~
Satchelmouth
Thank you

------
jakobegger
I love that simple API! Here's a bash one-liner that checks if 'hello' is
compromised:

curl -s
[https://api.pwnedpasswords.com/range/$(echo](https://api.pwnedpasswords.com/range/$\(echo)
-n hello | shasum | cut -b 1-5) | grep $(echo -n hello | shasum | cut -b 6-40
| tr /a-f/ /A-F/)

Edit: Improved one-liner that only requires typing the password once and
avoids storing it in the bash history:

(echo -n "Password: "; read pw; curl -s
[https://api.pwnedpasswords.com/range/$(echo](https://api.pwnedpasswords.com/range/$\(echo)
-n $pw | shasum | cut -b 1-5) | grep $(echo -n $pw | shasum | cut -b 6-40 | tr
/a-f/ /A-F/))

~~~
espadrine
You can use `read -s` to avoid risking having someone behind you read your
password on your screen as you type it.

(echo -n "Password: "; read -s pw; curl -s
[https://api.pwnedpasswords.com/range/$(echo](https://api.pwnedpasswords.com/range/$\(echo)
-n $pw | shasum | cut -b 1-5) | grep $(echo -n $pw | shasum | cut -b 6-40 | tr
a-f A-F))

------
schmich
A quick Ruby script to check if a password has been compromised using the
Pwned Passwords V2 API:
[https://gist.github.com/schmich/aeaffac922271a11b70e9a79a5fe...](https://gist.github.com/schmich/aeaffac922271a11b70e9a79a5fee19c)

------
DINKDINK
Guess my password is safe, it seems it was skipped in the list:

hunter1 - 3 times

* * * * * * * - 28 times

hunter3 - 2 times

/s

------
JimWestergren
This is really great and I will use this API.

Wrote a simple method in PHP using 10 lines:
[https://gist.github.com/JimWestergren/a4baf4716bfad6da989417...](https://gist.github.com/JimWestergren/a4baf4716bfad6da989417a10e1ccc5f)

Feel free to use.

------
ShakataGaNai
A quick python script to hit the API, for those that don't want to use the
webform (rightly so):

[https://gist.github.com/ShakataGaNai/cb786a2c64abc83d4dbe0db...](https://gist.github.com/ShakataGaNai/cb786a2c64abc83d4dbe0dbf6b60a5e3)

~~~
kbenson
Or, for those that don't want to use Python (in case it isn't installed, or
requires a non-core module, I dunno) but have access to a Linux box:

    
    
        # echo -n "password" | sha1sum
        5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8  -
    

Take the first 5 characters, in this case "5baa6" and use at the end of the
API endpoint in your browser. E.g.

    
    
        https://api.pwnedpasswords.com/range/5baa6
    

Then take the all the rest of the hash after the first 5 characters, in this
case "1e4c9b93f3f0682250b6cf8331b7ee68fd8" and ctrl-f search the results page
for it.

~~~
wuunderbar
For ease:

    
    
      # echo -n "password" | sha1sum | cut -b 1-5
        5baa6

~~~
kbenson
Yeah, but you would need two commands then, because you need to search for
bytes 6-40 in the resulting output. I made a one-liner farther down the
comments.[1]

1:
[https://news.ycombinator.com/item?id=16434244](https://news.ycombinator.com/item?id=16434244)

------
StapleHorse

      f--cktrump - 1 
      f--ckbush  - 945 
      f--ckclinton - 0
      f--ckobama - 128
    

I guess you can draw some conclusions there... maybe \- pwned accounts were
created between 2000-2016 \- Democrats do worse at secure passwords? \- In
2000-2008 people chosed worse passwords?

------
odammit
If you spend the time building a system to search those half billion passwords
when you’re users are signing up, you should focus on building a login rate
limiting system so it’s not possible to _brute force_ someone’s password.

~~~
rphlx
Though I agree rate limiting should be done (and done carefully), it is not
very effective in all cases. As just 1 example, a determined attacker who
wants to pop _any_ account can make 3 attempts on hundreds of thousands of
accounts, using a unique IPv4 address per account, thanks to Windows & IoT
botnets.

~~~
odammit
WAF, IP blacklists, naive bot detection[0] and why would a decent thresholding
system allow for a single IP to fail multiple accounts in a short time period.

If you hit two valid accounts [1] with bad passwords in a few
ACCEPTABLE_UNIT_OF_TIME, it’s captcha time.

Thresholding isn’t just action per IP it’s being smart about how people are
going to attack your system. It requires thought and upkeep.

[0] Previous thoughts on bot detection:
[https://news.ycombinator.com/item?id=16182405](https://news.ycombinator.com/item?id=16182405)

[1] Also, if your login identifier and your public “display names” (usernames)
are the same thing, that is a disservice to your users’ security.

------
eof
This is a tangent; but I had a 'pwned' password that I've used for years on
steam that started getting hacked like 4-5x a week; I would just ignore the 2
factor attempts for several months.

I finally changed the password to a slight variation that is not in this list
(nor likely any others, 9 random alphanumerics); and within a week the two
factor notifications started back up!

I was really surprised; admittedly the modification was trivial, but that is
pretty thorough for a steam account I've spent like $100 in.

~~~
stordoff
Did you have any items in the account (TF2, DOTA2 etc.)? Some of them sell for
silly prices (and it isn't always immediately obvious which), which can result
in your account being targeted.

------
jrochkind1
> However, I got a lot of feedback from V1 along the lines of "simply blocking
> 320M passwords is a usability nightmare". Blocking half a billion, even more
> so.

What, why?

~~~
azinman2
Because normal humans will only be able to generate so many passwords on their
own before they give up.

He also later makes the example of understanding the frequency matters —
abc123 is mathematically equivalently bad to mno678, except one is far more
likely than the other.

------
danbruc
Now it is to late, but Base64 could save about a third of the bandwidth, maybe
for V3.

EDIT: Not if compression is enabled as mentioned in the article, so forget
about that.

------
hosh
This dataset would be a fun project to implement on IPFS -- content-addressed,
distributed database of leaked passwords using the same hash range protocol.

------
dansingerman
This is so useful I've knocked up a quick gem to wrap the range service (i.e.
it only transmits the first 5 chars of the SHA1 hash)

[https://github.com/dansingerman/pwned_passwords_v2](https://github.com/dansingerman/pwned_passwords_v2)

The code is left deliberately simple so eyeballing lets you know it's not
doing anything hinky with the passwords.

------
dom96
In case anyone here is interested, I just did a livestream where I wrote a
simple app in Nim to query this API.

On YouTube:
[https://www.youtube.com/watch?v=Di2O_lIPxb4](https://www.youtube.com/watch?v=Di2O_lIPxb4)

Source code: [https://github.com/dom96/pwned](https://github.com/dom96/pwned)

------
woolvalley
I know you can't search username+.*@gmail.com addresses on have I been pwned,
but it would be pretty useful if we could.

------
brwsr
It would be great if the many password managers out there like keepass for
example, use this data to filter out any password that exists in the list. I
know it would be very rare, but still, why not filter them out?

~~~
grinsekatze
What would be the point of filtering out passwords that are in the list, when
using keepassX or other password managers? Isn't the point of password manager
that you don't have to choose or come up with passwords yourself?

------
StapleHorse

      troyhunt - 9 times
    

Most secure name for a password possible. :D

------
default-kramer
I thought it was funny that toepoke thought "People won't know what 'pwned'
means." So instead they inform the user that they have a "Pawned password."

------
tazard
I feel like this could come in handy while looking for a date at
[https://wordsofheart.com/](https://wordsofheart.com/)

------
eni
How trustworthy is /haveibeenpwned.com? Is there a chance the password people
enter there for checking will end up in the databases?

~~~
maaark
There is almost zero chance Troy Hunt would torpedo his carreer doing
something as monumentally stupid as that.

It's possible, sure. But I'd trust him with my password sooner than I'd trust
[INSERT SV COMPANY HERE].

------
iask
Wouldn’t reversing each password (reading right to left) produce a new list of
“Half a Billion” passwords to use?

------
caf
Might be fun to create a PAM module that warns you when you login if your
password is in the list.

------
u801e
What would be nice is if we could improve the process of generating per device
client side certificates that can be associated with a user account. Then we
could just use certificate based authentication (and add on password based
authentication if we want a second authentication factor).

------
quickthrower2
Oh the shame! An important password of mine is pwned. Just had to change it.

------
6t6t6t6
"Write your password in this input field to see if it has been pwned"

Me: ¬_¬

------
pcunite
Is there a large file dump of plaintext passwords out there?

------
tanu057
we have more love than hate. Good to see that in secret passowrds.

    
    
       iloveyou - 1,462,146
       iloveu - 179,992
       ihateyou - 58,656

------
NoGravitas
swordfish: 74,878 times

I guess the password _is_ always "swordfish".

------
capex
username 8340

------
roymurdock
Bit off topic, but I was searching for a better way to manage passwords a few
weeks ago (rather than have 1 or 2 master passwords across all websites).

I found KeePass through an old ask HN thread. It's a great little free, open
source key/password storage app that works across all my devices (iOS, macOS,
windows). [https://keepass.info/](https://keepass.info/)

I'd be interested to hear any suggestions for similar apps I could recommend
to my parents, who expressed concerns about their online passwords. KeePass
would be the ideal solution, but I don't think it would hold their hand
through download, setup, and password generation enough to be 100% ideal. Any
suggestions?

~~~
sachleen
Long time lastpass user recently switched to BitWarden. I find it's UI to be
cleaner/easier to use

~~~
ibdf
My biggest frustration with lastpass is that it doesn't seem to know the
difference between subdomains, so it suggests several passwords for the same
domain even though I am on different subdomains.

~~~
y4mi
There was an option somewhere with which you could disable that one url at a
time. I never bothered with it though and recently switched to keepass, so
can't verify anymore.

------
yarwelp_
On the topic of passwords, have a look at my command-line passphrase
generation program.

GitHub: [https://github.com/ctsrc/pgen](https://github.com/ctsrc/pgen)

It's written in Rust. Install the Rust toolchain installer from
[https://rustup.rs/](https://rustup.rs/)

    
    
        curl https://sh.rustup.rs -sSf | sh
    

And remember to add ~/.cargo/bin to your PATH.

Then install my command-line utility

    
    
        cargo install pgen
    

Usage is described in detail in the README on GitHub. Additionally you can ask
the program itself for a brief help summary.

    
    
        pgen --help
    

Eventually pgen will be available in some package manager repos so that you
can use your preferred package manager to install pgen but until then it must
be built from source following the steps above.

~~~
dmitrygr
A personal question. Do people really install megabytes of dependencies to run
what would be a one line shell script, were it written in shell?

~~~
funkymike
A bigger issue is blindly executing "curl ... | sh -" for something you are
going to use to generate passwords (though it's bad in general).

~~~
yarwelp_
That is the official way that you install the Rust toolchain.
[https://www.rust-lang.org/en-US/install.html](https://www.rust-lang.org/en-
US/install.html)

Rust is still undergoing changes frequently enough that most package manager
repos have a very old version of the Rust toolchain in terms of what it is
capable of doing.

For example the version of rustc that you get from Ubuntu default repositories
was too old to compile my pgen when I checked some weeks ago.

And exactly because the tool is for generating passwords I don't want to
distribute pre-compiled binaries of my tool myself, and therefore until I get
pgen itself into package manager repos I tell people to download the Rust
toolchain and to build my tool from source themselves as I did above.

