
Password checkup: from 0 to 650k users in 20 days - ebursztein
https://elie.net/blog/security/password-checkup-from-0-to-650-000-users-in-20-days/
======
moviuro
Troy Hunt already nailed the perfect service and API. Google's solution here
is not documented, and clearly not as usable as Troy's [0] (anyone with
openssl(1)+curl(1) can check if they've been pwned right from the CLI [1])

[0]
[https://haveibeenpwned.com/API/v2#PwnedPasswords](https://haveibeenpwned.com/API/v2#PwnedPasswords)

[1] [https://gitlab.com/moviuro/pass-
hibp/blob/master/hibp.bash](https://gitlab.com/moviuro/pass-
hibp/blob/master/hibp.bash)

------
kerng
Why not team up with haveibeenpwned that has been around for years? Seems like
this is doing the same thing

------
bradknowles
How does this compare to
[https://haveibeenpwned.com/Passwords](https://haveibeenpwned.com/Passwords) ?

~~~
vichu
It does seem very similar to the implementation of HIBP's Pwned Passwords by
Junade Ali[0]. I haven't delved into the nitty gritty details/differences
between the two, but they do seem to use similar techniques to guarantee
k-anonymity.

A key difference, at a glance, is the inclusion of usernames to be paired with
the leaked passwords.

[0] Junade Ali's write-up [https://blog.cloudflare.com/validating-leaked-
passwords-with...](https://blog.cloudflare.com/validating-leaked-passwords-
with-k-anonymity/)

------
codeddesign
Wow..that was the longest article I have ever seen that has absolutely nothing
to do with the title.

------
zaroth
First they create a lookup table of encrypted (blinded) hashes of each
(username, password) that they've found on the darknet, and index this table
by the first two bytes of the unencrypted hash of the username and password.

    
    
       // H = Argon2(username + password)
       Lookup[H[0:1]] = H^b; 
    

The hash function is Argon2 run with time cost 3 and RAM cost 256MB. They
claim a 100M record database took 1200 compute days to process, and the actual
database is 4 billion records, so presumably they spent 48,000 cpu-days
initializing the database (but they don't state that explicitly). One run of
Argon2(3, 256) takes about 1 second. [1]

H[0:1] is a 16-bit hash prefix of the (username, password) which is used to
query the dataset. That means each query will select for approximately 1 in
2^16, or for a 4 billion large set should be expected to return about 60,000
results for each query. They state the query actually returns ~1MB of data for
each lookup.

[Side Note: I like that they have chosen to use a just a 16 bit prefix, versus
the HIBP prefix which is 20 bit, which to me is a little too selective if
someone is pre-screening an online attack].

Here's where it gets a little neat.

You send the first two bytes of the hash of your username and password, along
with an encryption of your full hash, call it 'H^a'. The server returns all
the H^b for your prefix, along with your (H^a)^b which is H^ab.

When you get back your H^ab along with all the cracked H^b, you can unblind
H^ab back to H^b using your 'a' (which is random, ephemeral) because the EC
encryption is communicative. So cool. Now you have your hash (which Google may
have never seen before!) in the form of H^b, and Google never saw the
plaintext.

Essentially Google remotely encrypted your plaintext with their key, and you
never saw their key, and they never saw your plaintext. But this allows you to
now check if your hash (in the form of H^b) is in the set of ~60,000 H^b that
Google returned. If it is, then they have your username and password in their
dataset.

This is different than HIBP because Google is taking the risk of holding the
actual username, password tuples in the form of... essentially... a keyed
hash. Troy was explicitly not willing to take that risk.

This also lets anyone in the world essentially perform an online attack
against Google's 4 billion record database however fast they can run the Argon
function on any candidate {username, password} values that they might want to
test, plus or minus any additional rate limiting. The blog post says they rely
on the Argon2 for the rate limiting.

But I was just able to use this to make 20 guesses against my _dad 's_
password (based on his email address) before finding a match, meaning that
password has been leaked at some point, and may still be in use. Of course it
had my brother's name in it.

If services like this become pervasive, it may be a valid argument for _not_
using per-service usernames, e.g. plus addressing, assuming the
canonicalization doesn't strip that out, because it lets attackers use the
service to target not just your (username, password) but effectively (site,
username, password) directly. However, their dataset is, after all,
leaked/cracked passwords, so your creds are already up for grabs at that
point, and I'm not sure why attackers would use Google's service versus just
building their own database from the available sources.

[1] -
[https://gist.github.com/Indigo744/e92356282eb808b94d08d9cc6e...](https://gist.github.com/Indigo744/e92356282eb808b94d08d9cc6e37884c)

~~~
andreareina
What's the benefit of doing it this way instead of how Troy does it?

~~~
zaroth
Troy tells you that someone, somewhere, used that same password and it was
cracked.

Google tells you that your username password combination _specifically_ was
cracked.

One is a yellow caution flag. The other is a 3 alarm fire. Both are useful!

------
iuguy
If anyone wants to do something like this offline in an AD environment at
work, Safepass[1] does some pretty cool things to get better password
coverage.

[1] - [https://safepass.me/](https://safepass.me/)

------
keyle
Question is, how do you turn this into a profitable outcome without alienating
users?

~~~
sokoloff
I think in general Google wins the better (more trustworthy and low friction)
internet experience of random users is. They can profit from increased
internet usage and trust driving additional traffic to paid-search and AdSense
sites.

Seems it could be an overall win for users and Google.

------
wheelerwj
this is a story 1) that has nothing to do with its title, and 2) is about a
team who WORKS AT GOOGLE got 600k chrome extension users in the first few week
and how they “secure” the app from their own company learning your password.

