
Better password protections in Chrome - hbcondo714
https://security.googleblog.com/2019/12/better-password-protections-in-chrome.html
======
zaroth
The cryptography behind this is done very well so that Google does not ever
see even your hashed password, or know if there was a breached password
detected. What Google does see is a hash-prefix of your username, to narrow
down the encrypted data set of compromised credentials being returned to your
Chrome instance.

Here’s what happens. Your instance of Chrome hashes your username and password
and encrypts that value with an ephemeral key which doesn’t leave Chrome, _Kc_

Google gets the encrypted cred-hash and applies a second round of encryption
with a key only known to Google, _Kg_. They return the doubly-encrypted cred-
hash, and also return 256 candidates encrypted with just _Kg_ for you to
compare against, those 256 candidates are selected based on a clear-text 3
byte prefix that Chrome also sends them.

When Chrome gets the results back, it decrypts the cred-hash using _Kc_ which
results in the cred-hash now being encrypted only with _Kg_ ; coincidentally
this is the same exact key used to encrypt the candidates, allowing Chrome to
do a simple byte-compare to see if there’s a match between your cred-hash and
the encrypted breach candidates.

The cool part is the encryption operation is basically communicative. You can
add your layer of encryption first, send the value to Google, they add their
encryption and you get the double-encrypted blob back, and then you remove
your encryption to end up with a plaintext encrypted just with Google’s key,
even though Google never saw the plaintext, and you never saw their key.

This trick allows your browser, and only your browser, to compare your cred-
hash against the breached cred-hashes while all the data you are comparing is
actually encrypted with a secret key held by Google that never leaves Google’s
server.

The only information leak is the 3-byte hash prefix to identify the slice of
the comparison set. This is definitely not nothing, but perhaps it is a
reasonable trade-off.

~~~
flashman
Is this an okay simplification?

• I want to send my password (44) to Google but don't want them to know it's
44. So I multiply it by 37 (Kc) and send 1628.

• I also tell Google that the password is between 40 and 50. (Equivalent to
sending three characters of cleartext.)

• Google multiplies 1628 by 78 (Kg) and sends me the result, 126984.

• Google also sends me every number between 40 and 50, multiplied by 78.

• I divide 126984 by Kc and get 3432. I see that 3432 is in the set of other
numbers Google sent me (44x78). So I can infer that my password is in Google's
set of breached passwords.

~~~
Reelin
> I also tell Google that the password is between 40 and 50. (Equivalent to
> sending three characters of cleartext.)

I believe the (first 3 bytes of the) hash of your username is used, as opposed
to your plaintext password. I realize you were simplifying, but the difference
between "plaintext password" and "hashed username" is rather essential in this
case.

IIUC, their breach data consists of pairs of values - a username hash and the
matching username-password hash (all encrypted ofc). This way they aren't
storing a bunch of breached plain text credentials on their servers. This
allows for a range search based on your username hash prefix, while the
encrypted packages you send and receive are username-password hashes.

Edit: I believe I was incorrect - they only retain the _prefix_ of the
username hash (as opposed to the entire thing). This has the benefit of
effectively rendering the username unrecoverable - you'd be forced to attack
the combined username-password hash. So it's not really a range based search,
but instead a hash table. They just return the entire (24-bit addressed)
bucket to you, which for 4 billion (!) sets of credentials amounts to ~240 on
average.

------
dannyw
I don't like the direction of "popular websites are exempt, small websites are
restricted".

Chrome has blocked autoplaying videos with sound for most sites except a
small, hardcoded list that includes YouTube.

This means that if you create a YouTube competitor today, you are playing at a
technical disadvantage. Or if you're just hosting videos on your own personal
website!

What if Google's algorithms classify your new startup as "potential phishing",
because users are re-using their own passwords on your site? How can you
appeal? What recourse do you have against Big G's algorithm?

~~~
aawc
Disclosure: I'm the TL of this project on Chrome and I work very closely with
the Safe Browsing engineers regularly.

> What if Google's algorithms classify your new startup as "potential
> phishing", because users are re-using their own passwords on your site?

That's not how our phishing detection works. In fact, our internal studies
show that a lot of users reuse their passwords often and while that's not the
best password hygiene, it's the user's choice to make and we have to respect
that and build protections with this in mind.

> How can you appeal?

Right from your search console.

> What recourse do you have against Big G's algorithm?

Ultimately, Google/Safe Browsing has a lot more to lose if their users stop
trusting their product(s). I can tell you that we take false positives very
seriously and try hard to provide a fair and speedy resolution.

~~~
riquito
> I can tell you that we take false positives very seriously and try hard to
> provide a fair and speedy resolution.

How/Where does one start this process?

~~~
joking
he told you, in the search console. Yo have to register your site and
verificate it before you could appeal something.

~~~
tombrossman
This assumes that someone from every site has a Google account and consents to
their Privacy Policy and Terms. This is not a safe assumption, nor is it fair
to make this a requirement.

------
OliverJones
Seems to me Troy Hunt (the independent developer of
[https://HaveIbeenPwned.com/](https://HaveIbeenPwned.com/)) and Cloudflare's
Junade Ali deserve at least a mention in the Google announcement. They're
pioneers of this database of pwned passwords and the ability to look up
candidate passwords securely.

[https://HaveIbeenPwned.com/](https://HaveIbeenPwned.com/)

[https://haveibeenpwned.com/Passwords](https://haveibeenpwned.com/Passwords)

[https://www.troyhunt.com/ive-just-launched-pwned-
passwords-v...](https://www.troyhunt.com/ive-just-launched-pwned-passwords-
version-2/#cloudflareprivacyandkanonymity)

Mr. Hunt personally paid to operate these services for quite a few years. Then
he got sponsorship from 1Password to defray the server costs.

~~~
dhdhebsb
I’m happy to see this comment. I guess I don’t really care if Google wants to
succumb to NIH syndrome and build this themselves, but they should at least
acknowledge the person whose idea they are repackaging.

~~~
greggman2
Why? Did Apple acknowledge all the PDAs that came before iPhone? Did Spotify
acknowledge subscription music services like Rhapsody from 2003? Did Flickr
mention Smugmug?

------
dannyw
Some of these changes, such as realtime sending of all non-popular URLs,
concern me a little.

I am happy for Google to use this information for aggregated security purposes
(e.g. bots analyse suspicious new pages).

I am not happy with this being associated to the Google account. However,
AFAIK, I have no way of knowing this, and in the absence of an explicit
statement, I have to assume the worst: that Google will use this to target me
ads.

If any Chrome team members are reading this comment, could we get some sort of
confirmation that these security features will not be used to collect more
data to target me ads?

~~~
aawc
Disclosure: I'm the TL of this project on Chrome.

Your concern is fair.

TL answer: I can tell you that this data is used only for improving user
security. Legal answer: Please read the Chrome Privacy Notice :)

~~~
tsukurimashou
So basically "you have our word, so trust us!"

~~~
delroth
Do you see any other alternative than trust for such a system?

\- Building a fully local system will not provide the same coverage and
doesn't allow sharing findings across users (which means that when a new
phishing page appears, it will appear as "new" for every single user, instead
of being new for N users and then known bad for the rest of users). It also
heavily reduces what kind of analysis can be done since you can't just store
large datasets on every single device and/or run expensive algorithms on every
single web page load on a mobile phone.

\- Making it open source would not help. You can't know whether the code you
can see is what is deployed remotely, so in the end you just end up trusting a
different assertion instead (if you can't trust a privacy policy, why could
you trust that the deployment is not backdoored?). It also has some
significant cons: malware / phishing / abuse detection is in essence a cat-
and-mouse game, and secrecy is unfortunately a key requirement in how everyone
is building anti-abuse systems across the industry (not necessarily because
they want to, but because nobody knows how it could work otherwise).

\- You could even go all the way and have e.g. remote attestations,
reproducible builds, etc. that allow proving that indeed the code running
remotely is the open source code you want and can audit. This is barely doable
with available technology these days, and even if someone was to do it there
would maybe be 1K people on this planet able to understand why this is
trustworthy. A prime example of this is looking at people in this very thread
not understanding the differential privacy scheme for detecting compromised
passwords.

Not trusting Google is a personal opinion, and I completely respect that. But
implying that there is an alternative to trust for this kind of system is IMO
misleading. Using DDG or Protonmail or any other service doesn't change the
fact that you have to trust someone, it's just a different someone. You might
personally believe their word more than Google's word, but if e.g. DDG started
logging your identity and log requests and sell that to ad companies you would
have very little way of learning about it either.

Disclaimer: I work for Google, not on Chrome, but I have worked on anti-abuse
systems in the past.

------
aquova
Reading through the article and the other sources linked, this seems to be a
different dataset and implementation than the "Have I been pwned" dataset that
Firefox and other password managers reference. However, I don't see any
information about where their data comes from, only information about how the
feature/extension works to not leak your password during verification.

~~~
tialaramex
To be fair, if you have money (which Google does) you can buy a LOT of this
data, every single day, forever.

There are at least two distinct outfits which pay grey hats to steal data from
black hats who in turn obtained it typically through cheesy script kiddie
attacks on web sites or phishing.

If you give them not very much money they'll "monitor" their stream of
supposedly fresh stolen data (a lot of it isn't fresh because crooks are also
often liars) for specific data items you tell them about. This is a bad deal,
but apparently some pretty big US companies pay for that service. It's fine
though because everybody reading this has unique strong passwords for every
account right? Right?

If you give them a LOT of money (or if you were say one of my former employers
and you owned the company that does this outright) you just get the raw data.
As like UTF-16 XML or CSV files with different types of quoting on each line,
or whatever other crazy and inadequately documented nonsense came to mind for
each such type of file delivered.

------
ggm
I like the secured hash check method. I think this is good.

Most of my current life is in unique-per-site and 2FA but the burden of
remembering which one(s) are using shared was more than my motivation to fix.
This mechanism may take me to fixing faster.

------
TMWNN

      chrome://flags/#username-first-flow
    

is needed to see the

    
    
      Warn you if passwords are exposed in a data breach
    

option in

    
    
      Sync and Google services

------
hu3
The Verge article about it:
[https://www.theverge.com/2019/10/2/20892854/google-
password-...](https://www.theverge.com/2019/10/2/20892854/google-password-
checkup-hack-detection-now-available)

------
jdlyga
This is similar to what Firefox has been doing lately, right?

------
hongzi
I guess someone somewhere maintains a database of all leaked username
passwords and this feature compares a credential with this database? Is that
how this works?

~~~
snug
Possibly, internally at google, we had something similar that would look for
passwords found in the wild that you're using for internal google logins

------
robbya
Does this use Troy's have I been pwned dataset?

~~~
dhdhebsb
No they went and rolled their own in true Google fashion

~~~
dannyw
When you’re google size and investing engineering and cryptography resources
into building security features, applied at scale to a billion* users, you
probably want assurance that this list will he maintained for as long as you
want it to be maintained, AND for no third parties like HIBP to enumerate over
all google emails.

* random number.

~~~
rimliu
Google does not have a very good record of maintaining their non-core
projects.

~~~
dannyw
Publicly, yeah. But you really think google would depreciate some core
internal service actively used for account security, as well as internal
employee security?

------
quotha
Say no to chrome. Here is why:

[https://notochrome.org/why/](https://notochrome.org/why/)

------
jijji
this is a solution looking for a problem, enabled by default. i think most
people dont care that homomorphic encryption is being used, they dont want
their user/passwords being sent across the wire in any way shape or form,
especially enabled by default.

------
LockAndLol
I thought this would be about finally introducing a master password, but nope!

------
Shank
So let me get this straight. We currently consider it poor practice if
passwords are stored for authentication purposes in anything but heavily
salted PBKDF2, bcrypt, or sometimes scrypt. Yet it's okay to just send SHA256
unsalted hashes to web services for the purposes of verifying that those same
SHA256 hashes aren't already stored?

It's good that Google is encrypting these hashes, but come on, really? Surely
there has to be a better way than just shipping off unsalted password hashes
to a centralized location.

Edit: Okay, unsalted scrypt for the password. Thanks for the clarification :)

~~~
ReverseCold
I'm assuming it's similar to the haveibeenpwned pwned passwords API?

[https://haveibeenpwned.com/API/v3#PwnedPasswords](https://haveibeenpwned.com/API/v3#PwnedPasswords)

~~~
Shank
It...appears to be so? The infographic states that they "send a strongly
hashed and encrypted copy of your username and password to Google." They
explicitly mention that the username, not the password, is sent with a 3-byte
hash prefix. They make no mention of prefixing the password in the
infographic.

In the linked paper where they discuss the tradeoffs and take blinding into
consideration. They note that the protocol is that a client calls
CreateRequest(u, p) which creates a Req that is then sent explicitly to
Google. It appears to me that they consider the merits of sending a hash-
prefixed password, but do not make it explicitly clear that the final solution
sends a hash-prefixed password. I think that they would want to explicitly say
that only a partial hash is sent to Google, if that were the case?

~~~
geogriffin
> ... do not make it explicitly clear that the final solution sends a hash-
> prefixed password

I'm not sure if you're actually talking about something else, but the paper
says: "Post-canonicalization, the server calculates a computationally
expensive hash of both the canonical username and credential password... This
2-byte prefix—while leaking some bits of password material—provides the client
with k-anonymity over the universe of all username and password pairs."

IOW, the 3-byte hash prefix sent is of the username and password concatenated.
(Note that Google seems to have added another byte to the prefix versus the
paper).

~~~
Reelin
To add to this, hashed username-password material is leaked only by the first
variant described in the paper. The second variant described only leaks hashed
username material. They reportedly used the first variant during testing but
have now switched to the second variant.

They indeed appear to have increased the prefix from 2 to 3 bytes. This makes
logistical sense though - with 4 billion items, a 2 byte address yields ~61k
items per bucket (and thus sent to the client per request) while a 3 byte
address yields only ~240 on average.

