
Pwned Passwords in Practice: Real World Examples of Blocking the Worst Passwords - robin_reala
https://www.troyhunt.com/pwned-passwords-in-practice-real-world-examples-of-blocking-the-worst-passwords/
======
programbreeding
I'm a huge fan of Troy Hunt and HIBP, but reading this I assumed it was
basically an advertisement to get people to sign up for the Pwned Passwords
API -- I admit I didn't make it down to the "And Finally..." section where he
explains that it's free because there were a bunch of pictures at the bottom
and I stopped reading.

But reading through the API docs [0] it shows that API has no rate limit.
Impressive.

[0]
[https://haveibeenpwned.com/API/v2#SearchingPwnedPasswordsByP...](https://haveibeenpwned.com/API/v2#SearchingPwnedPasswordsByPassword)

~~~
pit2
Even if he's not charging for the service, I am pretty sure he's getting more
consultancy work from these "PR" (notice the quotes) posts. Even the timing on
which he releases them makes sense in a way to prevent people from getting too
tired of this service.

Not saying is a bad thing but don't assume something is for pure altruism
because not many things are.

~~~
Arnt
Have you ever been paid to do work that you really want to happen? Been paid
to improve the world in a small, but significant way? It's a lovely feeling.

~~~
PietdeVries
But what is the value-add of this service?

Complex passwords are quite useful if the server gets hacked and someone walks
away with the (salted) password hashes. Against brute-forcing passwords at the
login screen of an application they don't add much value, other than making it
the user quite hard to remember what the password for this particular site
could be...

Theoretically, if you block a user ID after say 5 or so invalid logins, almost
any bad password from the Have I been P0wned list will prevent you from being
hacked. The chance that you pick exactly that password from the 1-million or
so list is quite minimal.

So with that in mind, wouldn't this service be something for website owners
that don't know how to properly secure the information they control?

~~~
snowwolf
Because chances are, if you are using a password that is in the list, it's
because either it's an exceedingly common password (and you really shouldn't
be using it) or you've used it before multiple times and are probably the
reason it is in the list (because it was breached on another site).

From experience, most attacks we see now are credential stuffing attacks
rather than pure brute force attacks using something like Sentry MBA, with a
huge number of IP addresses (the last attack we saw was using over 6 million
IP addresses). So throttling sign in attempts at the IP level is almost
useless as is throttling at the email level, as the attacker can attempt at
least 6 million known email/password combinations to see if those accounts
exist on your site.

The only real defence against that is all your users using 2 factor, or
creating a psuedo 2nd factor (email them if the attempt is from an
unrecognised IP).

Edit: Of course the other helpful defence is to ensure your users aren't
reusing passwords, which is where Pwned Passwords comes in.

~~~
jsmeaton
I can attest to this. Credential stuffing was the number 1 reason we decided
to add the pwnedpassword validation to our signup flows. We were seeing
thousands of IP addresses and hundreds of thousands of requests over a few
days. Rate limiting slows it down but doesn’t help all that much. Rate
limiting on a specific username will prevent brute forcing but exposes you to
DOS. Rate limiting by IP becomes less effective when thousands are involved
and most requests end up succeeding.

Disclaimer: work for Kogan who is mentioned in TFA.

~~~
snowwrestler
> Rate limiting by IP becomes less effective when thousands are involved and
> most requests end up succeeding.

What do you mean by "end up succeeding"? Most requests successfully
authenticated? On the first try? Second try? Tenth try? Hundredth try?

(I'm not trying to doubt the utility of pwnedpassword validation; just hoping
you can help me understand the threat you're facing and why IP rate limiting
didn't help much. Thanks.)

~~~
snowwolf
Perhaps an example will help.

Lets say you have IP throttling/rate limiting. And you have it set to an
extremely conservative limit - 1 sign in attempt every hour. This is great for
the brute force threat - 24 passwords a day can be attempted by 1 IP.
Infeasible for any brute forcing.

But now lets say the attacker has access to a botnet with 6 million unique IP
addresses (not theoretical - see my comment above).

Now for each of those 6 million IPs they can try 24 passwords a day - i.e. 144
million attempts a day without ever triggering the throttle.

Bear in mind also that they aren't just trying random passwords for an account
- they have a compiled/combined breach list of known account/password
combinations from other breaches. So they can attempt 144 million known
combinations a day. Without hitting any throttles (this is what the parent
above means by "end up succeding").

What percentage of your users reuse passwords and have been exposed to at
least one breach? I would suggest it's quite a high value. How long do you
think it will take a credential stuffing attack to identify those accounts on
your site when they can try 100's of millions of combinations a day?

This is the threat vector.

~~~
jessaustin
ISTM the next step would be to rate limit for a given account without regard
to IP. Sure that's a potential DOS, but we can wait until that's actually a
problem before worrying about it.

~~~
snowwolf
They aren’t trying the same account multiple times. Well they may be
conincidentally (if the user uses a unique password per site and has been
breached from multiple sites so appears in the breach list multiple times with
different passwords) but not that frequently. What they are looking for is the
intersection of users who reuse passwords and have been exposed by a breach
and those users who have created an account on your site reusing the same
password. Which is perhaps not surprisingly a relatively large percentage of
your users.

------
sethgecko
I wrote this small python function to check if a password is part of a breach
by transmitting only the first 5 digits of it's hash:
[https://gist.github.com/mcdallas/d94ecd8b34a6bf57a162a7af0ce...](https://gist.github.com/mcdallas/d94ecd8b34a6bf57a162a7af0ce2a664)

~~~
programbreeding
FYI: "Due to the massive popularity of the range search over searching by
complete password hash, the significantly improved performance and the
enhanced privacy controls, searching by hash will be discontinued on 1 June
2018."

Shown in API docs under "Pwned Passwords overview" and links to here:
[https://www.troyhunt.com/enhancing-pwned-passwords-
privacy-b...](https://www.troyhunt.com/enhancing-pwned-passwords-privacy-by-
exclusively-supporting-anonymity/)

~~~
sethgecko
I think what he means is that searching by full hash will be removed in favour
of using the /range endpoint (the one I am using)

------
minus7
Would be nice if they made a bloom filter for anyone to use. On second
thought, you can do that yourself based on the SHA-1 hashes of passwords they
offer for download.

Edit: On third thought, a bloom filter for 502M entries and a false positive
rate of 0.1% ends up as a 800MiB large filter. Binary-searching the whole dump
is surely faster.

~~~
Freaky
> a bloom filter for 502M entries and a false positive rate of 0.1% ends up as
> a 800MiB large filter

With that sort of FP rate it's not really much use beyond filtering API calls.
I'd suggest 2GB[1] as a more sensible minimum. A compressed filter can get
this down somewhat.

> Binary-searching the whole dump is surely faster.

Not really. log2(500M) is ~29, k for a suitably sized bloom filter's only 23.
Interpolation search can get you a result in more like 10 seeks, but a
bucketed bloom filter can get your lookup down to a single read.

Having spent a fair bit of time faffing about with this stuff I ended up
settling[2][3] on Golomb compressed sets[4], which can get the full list with
a 1-in-10 million FP rate into 1.5GB.

[1]:
[https://hur.st/bloomfilter/?n=500M&p=1.0E-7](https://hur.st/bloomfilter/?n=500M&p=1.0E-7)
[2]: [https://github.com/Freaky/gcstool](https://github.com/Freaky/gcstool)
[3]: [https://github.com/Freaky/ruby-gcs](https://github.com/Freaky/ruby-gcs)
[4]: [http://giovanni.bajo.it/post/47119962313/golomb-coded-
sets-s...](http://giovanni.bajo.it/post/47119962313/golomb-coded-sets-smaller-
than-bloom-filters)

~~~
minus7
Oh right, I was wrongly thinking you'd have to memcmp the whole size of the
filter. It's simply too warm today for thinking. Did you look into Cuckoo
filters as well?

~~~
Freaky
> Oh right, I was wrongly thinking you'd have to memcmp the whole size of the
> filter.

Yeah, it's just k single-bit lookups - ideally you do something to get them
into clusters, like dividing the database into sub-filters, so you're doing
random lookups into, say, a 32KB chunk instead of a whole 2GB filter.

> Did you look into Cuckoo filters as well?

Cuckoo filters look like an interesting alternative and looking at them more
closely is on the to-do. I don't think they'd have any significant space
savings, though - they're similarly about 75% the size of the equivalent bloom
filter. Maybe they'd be faster for lookups?

I'd also be interested in playing with matrix filters[1], which supposedly get
close to the theoretical limits for these sorts of structures. Implementing
them seems rather more involved, sadly - particularly given the only reference
I can find is a fairly inscrutable CS paper. Show us the code damnit.

[1]: [https://arxiv.org/abs/0804.1845](https://arxiv.org/abs/0804.1845)

------
mynameismonkey
I'm trying to figure out if I need to do this if I'm requiring minimum sixteen
characters... after searching the docs and several of the blog posts, I can't
ascertain if the corpus contains any passwords/phrases >= 16 characters. I
don't want to be running this check on every passphrase create/modify if the
corpus contains none or very very few passes length 16 or greater. Does anyone
have any insight into the contents? Or, a way to query only the portion of the
corpus that has 16 or greater?

~~~
michaelbanfield
I just tried 'passwordpassword' and it was there. You can download the whole
dataset at the bottom of this page

[https://haveibeenpwned.com/Passwords](https://haveibeenpwned.com/Passwords)

Its a losing battle trying to add byzantine rules to prevent users doing
things like using their normal password * 2, so its probably a reasonable
check to add.

~~~
mynameismonkey
Aha, I tried a few obvious 16s but not that one... thank you! I guess we'll
add it in. The last few dumps (other sources) I reviewed contained nothing
over 15 characters, but I'm imagining they will start creeping in as more
folks demand longer phrases.

Still, I'd bet the vast majority of the bad passes are <16, seems a heck of a
waste of energy and bandwidth to check my user's passphrases against
(guesstimating) 0.05% of the corpus.

------
perl4ever
Soliciting people to enter their passwords seems like a bad approach both
because it's risky and because it helps train people to do risky things -
better would be a service that attempts to crack passwords and if it succeeds,
disables the account until a new password is set. Seems like a service that
everybody should use, if it existed. Much better than arbitrary rules about
"good" password character sets.

~~~
goodpass
There’s no one specific way to crack a password. It all depends on the
implementation. The most basic case is just storing the password in plaintext,
and plenty of companies are more than happy to do that. Passwords don’t exist
as some separate entity, they’re attached to a system. So cracking an Adobe
password might be easy, but cracking a Dropbox password incredinly hard.

~~~
perl4ever
I suppose. I was thinking of just hashing all known passwords and plausible
passwords and locking any accounts that matched.

------
mkirklions
The only thing I had a hard time understanding-

OP really cares about his privacy/security/password, but then he uses a
secondary system to store it?

Is this the best way to do this? Break into the secondary system and
everything is available. Keyloggers, eyes, stolen computers, all have the
possibility of everything available.

I considered other solutions like written + put in a lock box in a bank, but
thats really inaccessable.

~~~
rjacksonm1
It is the best way to do it for the average user.

Modern password management services are incredibly secure, with client-side
encryption of your secrets, among other protective mechanisms (to guard
against keyloggers, stolen computers, ...)

There is still a risk, because you're trusting third-party software (which in
some cases is closed-source – including 1Password), but for most people that
is a much lower risk profile than if they were managing passwords themselves.

For specifics on 1password's security, check out
[https://1password.com/security/](https://1password.com/security/)

~~~
willvarfar
> Modern password management services are incredibly secure

There lots of password managers that have had gaping vulnerabilities. From
memory I'm sure I've seen LastPass vulnerabilities top HN just a while ago...
yeah probably this one: [https://www.bankinfosecurity.com/lastpass-patches-
password-m...](https://www.bankinfosecurity.com/lastpass-patches-password-
manager-vulnerability-a-9299)

------
qrbLPHiKpiux
Great in theory, I don’t think it will play out good because of the average
computer user. Ask any help desk employee.

~~~
teknopaul
Word. Password security should be proportional to chance of a dictionary
attack * risk of a successful attack. Mozilla want a 12 character passwords to
file a bug report when they really should publish an email address an
anonymous html form and say please.

~~~
kikoreis
I know what you are feeling, but there's a balance to making bug filing easy.
For a large visible project, make it easy and the statistical noise drowns
everything out.

------
Animats
_they gave people the ability to check any individual password against the
online Pwned Passwords service_

What could possibly go wrong?

This works by locally hashing your password, then sending only the first 5 hex
characters of the hash to the server. The server sends back all matching
hashes of bad passwords it has on file. Typically this returns a few hundred
hits. The local client (probably some piece of Javascript) checks the hits
against the password hashes returned. If there's a match, the password is in
the database of bad passwords. This supposedly protects the password if
communications with the checker are intercepted.

But does it? If an attacker can see those first 5 hex characters, they too can
get the list of hashes of matching passwords. There are only a few hundred of
them, and they're hashes, not the actual passwords. One of those is the hash
of the user's password. Now they know what hashes to try.

An attacker presumably has a big database of likely passwords to try. So they
can create a database of hashes locally. The hashing algorithm is known to the
client, after all. So now they have a few hundred passwords to try for a
break-in. Try those over the next few days, and they're in.

Is it really that bad, or am I misunderstanding something here?

~~~
jxcl
The point of a secure password is that it _won't_ be in the list of hashes by
the server. The service should prevent the user from using any of the
passwords that are matched by the password database.

------
willvarfar
The Okta chrome extension mentioned in the article seems very cool and useful.

But how do we trust the Okta chrome extension not to post all credentials to
the developer? Even if its doing nothing shady now, in some future update?

~~~
amdavidson
With that security model, how do you trust any extension at all?

At least with this one, you can audit the source code and build it for
yourself[0]

0: [https://github.com/OktaSecurityLabs/passprotect-
chrome](https://github.com/OktaSecurityLabs/passprotect-chrome)

~~~
willvarfar
I once made a webpage where HNers drag-dropped their entire iphone backups. It
got me thinking. [http://markolson.github.io/js-sqlite-map-
thing/](http://markolson.github.io/js-sqlite-map-thing/)

~~~
ReverseCold
Whenever there are tools like this I usually just visit the site then unplug
my ethernet/disconnect from WiFi. If it's actually client side JS it will
work.

I'm pretty sure this is safe, but if there's a way to defer sending an HTTP
request to after the page being closed...

~~~
phyzome
If you do it from a private browsing window, maybe that would be fine. Maybe.
(Otherwise the page could exfiltrate information via localstorage or
something.)

------
ransom1538
Stupid question here. But shouldn't websites/apps simply lock the account for
5 minutes after 3 bad password attempts? -- basically ending the usefulness of
bad password lists, etc etc. Do sites actually let you run limitless password
attempts -- thus really find these password lists important? If your password
was "foofoo", and there was a password lock after 3 attempts I don't see how
you would crack it with a password file (in this lifetime).

~~~
elorant
I've tried that once in a site and it was a maintenance nightmare. Users kept
bombarding me with emails asking to unlock their accounts. Turns out 99 times
out of 100 that a username/password combo has been mistyped it's from users
jerking around or not remembering the password, rather from hackers trying to
brute force their entrance.

Now imagine having a site with a few million accounts and 0,1% of them
mistyping the password every now and then.

~~~
scient
Sounds like a case of not handling the UX of the feature properly. What you
can do is either allow them to unlock the account via email, or have a time-
based unlock, or both. We do 72 hours or unlock it via email.

------
paulpauper
reminder: never enter your email on one of those 'have i been pwned databases'
. Find a way to download the database offline and then check it.

~~~
forgotmypw
Not sure why this was voted down. By entering your email, you are providing
lots of valuable data to a potential attacker: that you are active, your IP
address and environment, all of which can help unlock your account on whatever
service was compromised.

~~~
moviuro
Who could possibly get this info on
[https://haveibeenpwned.com](https://haveibeenpwned.com) ? Troy? I bet he
doesn't even keep logs. Anyone listening to your HTTPS connections? Then you
have some more serious issues.

No seriously, Troy's site is awesome, and spreading FUD is not doing them any
service.

~~~
jsmeaton
It’s good advice _in general_ for most folks. For those that know of Troy and
this particular site it’s fine, but the general recommendation still stands.

