> If you have a Google account, Google’s core sign-in system is designed not to know your password. How, then, can we verify your password when you sign in to your Google account again? The answer lies in a bit of cryptography: when you set your password, instead of remembering the exact characters of the password, we scramble it with a “hash function”, so it becomes something like “72i32hedgqw23328”, and that’s what we store with your username. Both are then also encrypted before being saved to disk. The next time you try to sign in, we again scramble your password the same way. If it matches the stored string then you must have typed the correct password, so your sign-in can proceed.
> The effectiveness of the hash function lies in its one-way nature: it is simple to scramble your password, but nearly impossible to unscramble it. So, if someone should obtain the scrambled password, they won’t be able to recover your real password. The downside of password hashing is that if you forget your password, we cannot show you what it was; there’s nothing we can do other than reset it to a temporary password (valid one time only) and then require you to pick a new one.
When you scramble your locker combo, is really is random. When you scramble your password with a hash function, it may look random, but it's really not.
Also just for fun, here's a creative stab at my own explanation.
As every dog knows, each dog has a unique smell. If you have smelled the dog before, then you can use its smell to recognize who it is, but without that past experience, nothing in the smell itself tells you who it might be.
The hash function creates the smell from your password, which is the dog. A website does't save a photo of your dog, just the smell. So if a hacker breaks in, they might smell your password but they can't see it. And when you login to the website, it makes sure your password smells right before it lets you in.
I know you said "just for fun", but I really think we need to do better than these types of explanations.
I remember many years ago, I was wondering what the heck hyper-threading was, and I came across a video that compared it to a bowl of cereal. Even though you only have one mouth, you'll be able to eat the cereal more quickly if you use both your hands, instead of just one.
I watched this video, and felt gratified, and closed the browser window... and then suddenly realized that I still had no idea what hyper-threading was, only that I should use both my hands if I'm ever in a cereal-eating competition.
A better explanation would have gone something like, "CPUs solve math problems more quickly than we can give them problems to solve. Hyper-threading queues up additional sets of problems that the CPU can switch to when it would otherwise be doing nothing."
Analogies are useful tools, but usually only after we have some understanding of the real version.
(not serious; hope it is not against the guidelines)
More seriously analogies are better for a bigger picture of a complex system, as they effectively describe mode of interactions. The dog/smell analogy was not bad but you have to also add a role for the other actor in the system (here I would say: the third party, and the database)
The scrambling is modulated by a random nonce which is stored together with the hash. So you are scrambling the original randomly; you then scramble the password-to-be-checked the same "way" (ie. with the same nonce) to compare hashes.
>"Hashing" a password is a method of scrambling it that will produce the same scramble every time you do so. However two different passwords will never produce the same scramble.
Could be better, but emphasizes the deterministic nature of hashing but also avoids tangential details and gotchas like collisions which aren't relevant here.
The explanation is clear that they scramble the password in the same way to get the same result.
A cryptographic hash function is an algorithm that:
Given an input, it returns a very large number.
Given a bit-for-bit identical input it returns exactly the same number.
If the input changes by one bit, it _should_ give an extremely different number, so different that it appears to be random.
The probability of two different inputs producing the same output is extremely low, and the better the hash function, the lower this probability is.
People aren't random even when they try to be.
It's trivial to unscramble hashed dictionary words with 123! or 2019! appended to the end.
Google should somehow note that distinction as it is important.
Being able to make millions (or hundreds of millions or even billions) of guesses each second makes successful guessing more feasible.
If you can prove P = NP, then you have proved that no secure hash functions exist. (This is the contrapositive.)
(For a fun angle not related to complexity, the algorithm itself could be so large that it wouldn't fit in the observable universe without breaking the Bekenstein bound.)
And it has been a long time since I did proof theory, but is there the possibility of a non-constructive proof but no actual examples no matter how long you keep enumerating algorithms? Or a non-constructive proof, but no constructive proof? In that case there could be a "perfect heuristic"...
Because without that knowledge what would the attacker run their guesses through? And wouldn't that information be hard to get unless you had the full source code along with all relevant configuration?
As an attacker you're not trying to guess the hashed password and even if you had it you still would not be able to use it to log in. Hashing functions that are fit for real-world use are pure functions and their input cannot be determined by looking at their output, which renders rainbow tables (a list of plain text input and their corresponding hashed output) useless to attackers.
So knowing the hash function is useless and even if you also manage to access the database and get the hashed password you still need to run through a brute force technique to find which input gets that output. If the original password is itself a long random set of characters stored with Keychain or Onepass you’re looking at potentially trillions of attempts before you get it right. Hashing functions are written to be slow and computationally expensive so you’re going to incur a lot of energy costs by the time you get it. Also, you’ll be dead by then. When brute force works it’s because the password has no failure restrictions and is something short and simple like a mother’s maiden name with a zip code at the end.
Those 2 assumptions right there get you what you want 90% of the time.
The issue here is meaningful, and it's useful to have a reminder that accidentally retaining plaintext passwords is a hazard of building customer identity features.
But I think it's at least equally useful to get the level set on what engineering at scale can reasonably promise today.
correction: in the world -> in the industry
We've got five affected accounts.
So far the G-Suite experience has been underwhelming to say the least. Crap interface and less fine grained control over document access than even Google drive offers for free, gmail UI just plain sucks over IMAP and so on.
And now this. As much as there is to like about this email in terms of transparency it is also very interesting for what it does not say: Apparently they can't determine with certainty whether or not the accounts were accessed.
"legacy functionality that enabled customer Domain Admins to view password"
That functionality should have never existed to begin with.
"primarily impacted system generated or admin generated passwords intended for one-time use"
Note the weasel word 'primarily', either it did or it did not potentially affect all passwords.
"We have reviewed the login information for the account(s) and have found no evidence that the unhashed passwords were misused."
No evidence does not mean it did not happen, so there apparently is a chance that even if they did not find evidence that it did happen and that their audit trails for reviewing that log data are not such that they can guarantee that nobody had access to it or viewed it.
"an internal system that logged account signup information for diagnostic purposes, also inadvertently logged the administrator’s account password in our encrypted systems in an unhashed format."
This suggests some pretty major process failures, this change was apparently found after it had already been pushed to production without review or with a review that did not catch this pretty basic mistake.
Having the best security team in the world is great but if you then have stuff like this happening you have to wonder about the processes around deployment, which are just as important as having a great security team to begin with.
Damned if you do and damned if you don't, a small company is better off relying on the likes of Google for their secure storage of mail and documents, but at the same time that's also not perfect and apparently includes some random strangers potentially having access to all of that which is something that never happened to us in the last decade or so to the best of my knowledge.
It is possible that this team did not work with the security team even if it is a highly unlikely scenario. The likely scenario is that this team did work with a security team and they were aware they were supposed to hash the passwords but they made a mistake during the implementation.
I think what is being underappreciated here is that very very smart application developers can have little to no idea about security best practices. I can say this confidently from my direct experience of working with Googlers.
There Is no reason why we can't have 2 rounds of hashing, one client side and one server side. This way even if Google is malicious, it cannot know the actual password.
In the algorithm I’m most familiar with (SRP6a) the plaintext password never leaves the client machine either on initial signup or on later password-based authentication, and what does transit the wire can be used neither to recover the password nor to replay the login attempt. You cannot break into the account just by passively watching the messages go back and forth.
One nice thing that falls out of a PAKE algorithm, as opposed to a zero-knowledge proof, is that you get a cryptographic key on both sides (thus the “key exchange” part of the acronym) that the peers can use with a symmetric encryption algorithm to communicate with each other. This then lets you construct things like TLS-SRP, which is a variant of TLS (ne SSL) that uses SRP to get the symmetric key instead of X.509 certificates and an entirely different key exchange algorithm:
See (shameful plug) http://thesybil.net
Yes, it is academic but it should be everywhere.
I improved it to perform client-side hashing and encryption but have had not the time to update the docs.
But sure, there are good solutions to this, like SCRAM. Unfortunately, there is not much point when the authentication code is controlled by the server (eg. JS served by a server)
They said users created through admin console are affected. The might explain my nearly zero count. I've used GAM to automate creation for a very long time. We are a high school school so we churn about 1/4 of our user count every year.
> due to legacy functionality that enabled customer Domain Admins to view passwords, some of your users’ passwords were stored in our encrypted systems in an unhashed format. This primarily impacted system generated or admin generated passwords intended for one-time use.
Sure enough, the accounts listed in the email as affected were never signed in and still needed a password reset.
The email: https://i.judge.sh/Twilight/iRcF0e7D.png
I was a little confused about the timeline of this as they mentioned implementing some functionality back in 2005. If I understand this right the specific issue of storing some (unsure as to what percentage or quantity) users passwords unhashed:
- started in January of this year
- is now fixed
- were stored for up to 14 days in this state
- would only effect new years signing up (based on it being part of the signup flow)
1. All passwords set by admins for other accounts since 2005 were impacted.
2. Also all passwords starting Jan 2019 were also stored un-hashed for 14 days.
(Once again: this is speculation based on general firsthand knowledge about how Google operates, but I know nothing specific to this incident.)
Their audience is people who know those words and know that they want them, but might not know what they mean. Using technically accurate details when addressing people who don't understand the vocabulary is generally a recipe for misery.
Here we have google storing in plaintext by accident.
... for 14 years without noticing...
Assuming the salt is unique to the website it also solves the problem of users using the same password on many different websites - this would no longer be an issue.
It would also mean as an end user we can have a look at our login POST request and confirm our password is not sent and therefore cannot be stored in plaintext.
EDIT: I see the other discussion and understand the hash now becomes the password so if a hacker obtains the hash they now can login. I'm advocating for it to be hashed server side also, protecting the user from cases just like this one.
I recently opened two of what I though were new 'consumer' Google accounts to be dedicated to managing Google Cloud Platform resources. On both accounts I'v also setup Identity & Access Management with an organization and additional sub-user-accounts.
Both of those parent account received this email warning about five hours ago...
Subject: [ACTION REQUIRED] Important Information about your G Suite Account
Google Customer Alert
Dear G Suite Administrator,
We are writing to inform you that between January 13, 2019 and May 9, 2019, an internal system that logged account signup information for diagnostic purposes, inadvertently stored one of your user account passwords in our encrypted systems in an unhashed format. This impacted the user account password provided during the initial account signup process. The log information was retained for 14 days following the signup process, and then was deleted according to our normal retention policies.
We have reviewed the login information for the account and have found no evidence that the unhashed password was misused.
The following is the user account impacted in your domain(s):
Google Planned Action: for your security, starting tomorrow Wednesday, May 22, 2019 PT we will force a password change unless it has already been changed prior to that time.
Our password update methodology is as follows:
• We will terminate the impacted user’s session and prompt the user to change their password at their next login.
• In addition, starting Wednesday, May 29, 2019 PT we will reset the password for the user if they have not yet selected a new password or have not had a password reset. This user will need to follow your organization’s password recovery process. However, Super Admins will not be impacted. For information on password recovery options please refer to the following [Help Center Article].
For further questions please contact [Google Support] and reference issue number ###.
The G Suite Team
Is this another way of saying they stored cleartext passwords? Interesting they choose to use "uncashed" instead of "cleartext"
That's actually great that industry standards are being codified with actual deterrents for failing to strongly store secure passwords.
> If passwords are being used, they should not be accessible and should be securely stored to avoid them being visible to unauthorized individuals. Some sort of standard such as encryption or an equivalent should be used to protect the passwords.
quote from some random blog 
What would be better is for us folks to do this to ourselves first. If self regulation works well, the government will never need to step in. The GDPR seems to be pretty good, but it's not unimaginable that some assholes in office who are looking to get reelected respond to a high-profile data breach by making a shitty set of regulations to govern all software developers in the country for all time.
W.R.T. enforcing compliance, having a licensing board that oversees this would provide incentive to developers to not cut corners, and would also provide a way to resist management that is intent on releasing a shitty product quickly in the hope of making a quick buck before it all crashes down / padding their resume.
Shifting the onus on security away from the companies and onto the developers also seems like a bad idea. With GDPR, there is a financial incentive for companies to use good practices. With a licensing board, companies will care far less if Joe Q Developer might lose his license. Why would they care unless there is financial or legal incentive? I just don't see how this would work without getting back to government regulation. I'll definitely reconsider my view point if you have a solution to this hurdle.
I've never built a website that doesn't securely store passwords. I've definitely _used_ websites that do though . Up until now my only choice is to stop using any of these sites and just hope they magically fix their crap.
It's obvious which would be a better motivator for any of these sites, either receiving an email stating "what you're doing is wrong, and unsecure" or one stating "what you're doing is wrong and unsecure and, by the way, you could be fined €50,000" 
Again, I'm not a lawyer or expert in this domain, so I could be wrong.
The encryption is most likely enough to be within GDPR compliance.
Why do you think that, allowing staff to read plaintext passwords is contrary to standard security practice; companies are expected to make reasonable effort to secure PII and allowing staff to read your password doesn't appear to be "reasonable effort" by even the casualist of readings.
I don't think the EU courts are that stupid.
FWIW I don't think there is a case here particularly, as it appears to be a genuine error and being fixed.
There should be a standard password box in html in which you can just call a function to pull out a salted hash of a password. The attributes of the box could include options for password restrictions such as a minimum of 8 chars, etc.
edit: nevermind, I just realized this wouldn't work
* actually the server would send the challenge ahead of time along with serving the web page
Copying the "layman's terms" from that page for your convenience...
In layman's terms, during SRP (or any other PAKE protocol) authentication, one party (the "client" or "user") demonstrates to another party (the "server") that they know the password, without sending the password itself nor any other information from which the password can be derived. The password never leaves the client and is unknown to the server.
The other major benefit of server-side hashing, to what you're correctly surmising, is that it prevents certain privilege escalations. A read-only database dump (from e.g. SQL injection in to a SELECT query) can otherwise be used to gain read-write access by simply copying dumped passwords from the database back in to the login form. The thing is, server-side hashing doesn't actually prevent this attack, it merely slows it down. It also doesn't help at all with weak passwords.
In any case, to address your concern you can also simply run the client-side hashed value (which could and should be produced by something like Argon2) through a single round of SHA2.
There are ways to actually add security above SSL but they require a per-session nonce. The most secure versions of these operate on some sort of out-of-band time-scrambling algorithm that's issued to users on an individual basis (usually a dongle) which is basically a nonce that's been pre-arranged a few years in advance, these can (if done correctly) provide all the security of on-demand generated nonces and physical keys combined.
Fixed hashing algorithms add such a negligible amount of security that I'd prefer to avoid them due to the tech debt their existence incurs outweighing their advantage.
They would only be able to screw users by allowing access to their own service if they were forced to provide a salt and only be able to get the hash.
<input type=password salt=687161>
But at html level, everything is possible and this would be hardly possible to enforce, aside from informing the user that they are typing a password into unsecure password field (hard to do, not to be fakeable, too).
It would still be hackable, by bad actors by using a salt of a service they want to get pw for, but that would at least be detectable client side.
If the hash is dynamic, the password has to be stored in plaintext. Kerberos does something like this.
Neither of these are better than the current state of the art. Read up on SRP for a public-key based system that does something like what you want.
Third party data access should go through OAuth, not by sharing passwords to services.
Anything that would prevent the website's server from getting the plaintext password would be good. Salted hash is obviously not a plaintext password and can't be stuffed around to other services.
Now you've just got to call these two:
and if you really want to tweak things you can, but the simple route is moderately secure
The problem is instead that now that the client-side hash is actually the password.
(edit: some people use a "pepper" at the application level and apply it to all passwords, which might be kind of what you're thinking of? Buy you don't need to do this with modern key derivation algorithms. You can if you want, it just doesn't really matter much.)
Edit: though perhaps it's worth noting that it won't force a server to store a password securely, but would facilitate it.