Hacker News new | comments | show | ask | jobs | submit login
Yahoo discloses hack of 1B accounts (yahoo.tumblr.com)
1046 points by QUFB on Dec 14, 2016 | hide | past | web | favorite | 569 comments

Fittingly, attempting to change my password to a 32-character random string generated by 1Password returns an error that the password "cannot contain my email or username", regardless of the contents of that random string (I tried several).

It does, however, _happily_ accept `passwordpassword` and cheerily move along to confirming that my recovery email account from 2003 is still valid.

Gonna guess that's a bad message for a password length violation or something else.

Not that it's much better. Is it so hard to allow 50 character passwords?

I'm guessing it detected an @ symbol?

has anyone tried password@password ?

if the password is stored properly, (i.e. bcrypt), the number of characters shouldn't matter at all, be it 50 or 5000.

It sort of does matter for bcrypt, surprisingly: http://security.stackexchange.com/questions/39849/does-bcryp...

In the interests of hewing closest to cryptographic reality, I design not to allow a password longer than the algorithm can usefully use.

I think it's best to allow longer passwords for those who use long phrases. It's easier to remember the full phrase than a truncated version. You could show a warning that the extra chars beyond 50-55 will be ignored.

Or you could SHA256 the original password and feed the hash to bcrypt. Remember to use the 64-byte hexadecimal hash, not the 32-byte binary because bcrypt chokes on null bytes.

Everyone's been saying "just use bcrypt", but bcrypt has too many gotchas to be the default choice. We really need to work on getting scrypt and argon2 into the most popular programming languages and frameworks a.s.a.p.

> Everyone's been saying "just use bcrypt", but bcrypt has too many gotchas to be the default choice

This has got to be the underlying problem of modern security. By the time a best practice is well known, it's no longer best practice.

I think that's a good observation. The implication seems to be that we're not iterating fast enough, or not sufficiently fast in implementing changes/improvements.

On the flipside, isn't there a risk of moving too quickly? There's a certain culture of caution because there's something to be said for "if it aint broke, don't fix it." and even if something is broke, how certain are we that cool new encryption algorithm is better or safer?

Like nutrition!

You would probably want to use PBKDF2 as a key-stretching function rather than just naive SHA256. Otherwise you're clipping your bcrypt input from "56 arbitrary bytes" down to "56 hexadecimal characters".

I haven't looked deeply at this, but using "key stretching" that clips your output characters to such a small space smells very suspect to me.

Remember: there is only 32 bytes of actual output there, regardless of whether you represent it as hex or binary. And since bcrypt can't take more than 56 bytes of input, you are clipping that down to the equivalent of 23 bytes.

Is "just use scrypt" an acceptable answer then? I'm not a security expert and I don't know the advantages of one over the other.

Yes, scrypt is a perfectly fine password hash.

If you are currently using something else (say salted md5 or even just plain md5), you can migrate your passwords to scrpyt(current_hash()) without having to change everyone's password and/or wait for everyone to log in.

See also this comment thread: https://news.ycombinator.com/item?id=12549110

Don't do that. You've essentially just turned the old hashes into plain-text passwords, and how sure are you that those hashes don't exist in backups anywhere?

No, not exactly. An adversary who has the old hash, but not the plaintext that it represents cannot login because scrypt(H(H(value))) != scrypt(H(value)). This is not considering the offline crackability of a compromised hash. But there are legitimate situations where upgrading the password backing to a modern slow hash is preferable to continuing to use the old hash or worse storing the old hash as a field for a long time so that when a breach happens both the new and old hashes are available.

There are user experience battles when talking about forcing a million users to change their passwords in a real system. Hashing the hash may be vastly preferable to management nixing the security upgrade. A password updating schema that changes the hash as users login and eventually locking the accounts of users who have not logged in for an extended period of time can accomplish rolling the hashes without having to tell users to change their passwords.

Not if you mark the converted versions and try scrypt(oldhash()) on users authenticating with them.

Woah! Very good point!

scrypt is okay if you use it correctly. It's too easy to use it incorrectly, though, because scrypt is a low-level algorithm that wasn't specifically designed for password storage. [1]


In order to be able to tell people to "just use scrypt", we would need to have a sort of standard wrapper that uses the correct parameters by default and produces identical results in every common programming language.

> You could show a warning that the extra chars beyond 50-55 will be ignored.

Or you could use a better KDF, e.g. scrypt (even PBKDF2 is better on this metric). Artificial password-length restrictions are symptomatic of poor design.

This is surprising, do you know how Argon2 behaves compared to this?

Argon2 does the right thing. No silly upper bounds.

That depends on how silly you consider 2^32 - 1 bytes, if I recall correctly.

That's just a bug. Truncation invalidates the 'stored properly' part of the statement.

Could you expand on that? I did not think bcrypt was responsible for storing the resultant hash. The limit appears to be in calculating the hash.

The original phrasing was "stored properly, (i.e. bcrypt)". That's including the hashing as part of the 'storing'. Bcrypt has a size limit, but a size limit is not the same thing as truncating. It's broken code on the front end that truncates instead of doing something like sha512.

Bcrypt spits out a string, that the caller must store, somewhere. I presume the parent post means that Bcrypt "stores" in its output string a value that, for all practical purposes, varies reliably with the same salt but different plaintext.

If the password is stored properly, (i.e. bcrypt) then there does need to be some length limit or it becomes too easy to DoS a service by sending it hundreds of megabytes of password to bcrypt. There's no reason for that length limit to be less than 100 characters though.

You are going to be limited by the max http request size way before that.

To upload 100s or even more than a few megs you need a multipart message, a password form won't accept MP http requests.

On the back-end there usually is a naive POST handler which happily accepts anything it can parse, unless a mature framework with sane defaults is used.

Yep, people who've run marginally popular sites have dealt with this before. Give someone a text box and watch them try to stuff 4GB of content in it. There has to be a cutoff somewhere, but as you note, it should be well outside of the realm of reasonable password lengths (hundreds of characters).

GitHub is the only website I can think of off the top of my head that doesn't limit to an arbitrarily small number (aka <100). Do you name any other "major" websites that allow 100 character passwords?

Anything built with the popular rails gem devise allows 128 by default [1]


Hash the password locally (you are serving JavaScript over SSL right?) and only send the SHA256.

That would lock out anyone who chooses not to execute your JavaScript.

Requiring me to trust your code in order for you to decide whether or not to trust me is asking too much.

How would you know that the hash is of a password of sufficient entropy?

Never trust the client.

This isn't about trusting the client: it's about your endpoint being able to only accept a SHA256 hash sum from the client (thus: length limited) while allowing the user to input arbitrarily long passwords.

They hash in the browser: the only way they can mess with it by producing silly outputs, but that only hurts them.

I can't think of any security implications of hashing on the client-side. What's your thinking?

Does salting work if you hash in the browser?

Well in this case the hash would be passed to Bcrypt or Scrypt, which have built in salt support, so client side salting wouldn't matter.

If the hashes are leaked, you could log in with them.

Well serverside you store them as plaintext equivalents - i.e. salt+hash the hash. So a leak doesn't leak the user-side.

It's probably a naive substring detection check.

I hit this on the last Yahoo hack go-round, and it seemed that having a name in the form 'F Lastname' (for example) disallowed use of the letter F in the password.

I say "seemed" as I did not go through the exercise of testing with multiple combinations of name, initial, and password.

* Defending Against Hackers Took a Back Seat at Yahoo, Insiders Say - The New York Times || http://www.nytimes.com/2016/09/29/technology/yahoo-data-brea...

Time to update that article from September. Hooray for Yahoo, they made it 76 days without a 500M+ user security breach.

(No, I don't know the actual dates... just making a joke.)

Last time I tried changing my Yahoo password it took me days before it accepted something (and I had password generator scripts and my brain). Now it's back to something along the lines of `letmein`.

Just leave it at passwordpassword, it will be leaked eventually anyway

Strong passwords that need to be memorized shouldn't be wasted on security bozos

He doesn't need to memorize it. He mentioned he used 1Password to generate it. I'd assume he's storing it there too.

I've always wondered when 1Password is going to get hacked..

Won't do much; AFAIK everything is encrypted client-side with your master password. So a hacker could, in theory, get my encrypted database, but by the time they crack my strong password, I will, at the very least, have changed all those passwords.

That's not what a hack against 1Password, LastPass, or similar product will look like. When it happens, it will be because someone manages to commit to the VCS repository of one or more of their client applications (iOS, Android, desktop, etc.). All it takes is a few lines of code to dump the unencrypted contents on the device itself, and post them to some API endpoint or email address.

One commit to a VCS by a disgruntled employee, or an attacker who social engineers credentials to the VCS, and the client applications themselves - which must be trusted to decrypt the contents locally - will be compromised.

This is the problem with proprietary password managers, where the client applications are provided by the company. You cannot vet that software which is running on your device today, let alone all the app updates coming down the pipeline.

Thank you for writing this. I use a password manager, and whenever I see someone say "it's unhackable because of the encryption" I want to tell them this, exactly. All someone needs to do is to surreptitiously send your password to their own server and all your passwords are owned. It's not difficult.

I've often wondered about this. Is there a preferable alternative?

At the very least you need to memorize the 1pw password, but I do memorize some others as well

I can kind of understand that reasoning, but one of the nicer things about strong passwords is that there are a lot of them. In some sense that's what makes them strong passwords.

Why do you assume the password would be leaked eventually? Usually hashes are leaked (as in this case), not passwords.

leaking unsalted md5 passwords == leaking passwords

Not if it's my password. I use ~100 bits of entropy.

Hence "Strong passwords that need to be memorized" in OP's comment. Or else your memory is way better than mine (or I care way less, or probably both).

Whether they need to be memorized or not does not make the statement "it will be leaked eventually anyway" more true.

I use a password database so I don't memorize most of my passwords.

Well, it does, because memory puts some limitations on length and complexity...

It is possible to memorize a 100 bit password. I once had a 1000 word poem memorized, and could write it down flawlessly from memory.

I agree that it's not worth memorizing, you should instead use a password database. But I still maintain my original point that there's no reason to assume that your password will be leaked eventually if you use a strong password.

Could the attacker find an easier to find string that matches the same md5 hash?

The current best attack wrt matching an existing hash brings MD5's 128 bits of security down to 123. So no, that's not going to happen.

MD5 is terrible for human passwords because it's fast. But md5 is not actually broken for password storage purposes. If you use a long random password, md5 is enough.

Yes, if you add a (long - at least 32 bit) salt and something like at least 10^9 rounds of MD5 then, yeah, it's probably ok

No. I mean single unsalted MD5. You will not crack a 20-random-char password. You cannot process 2^120 guesses, and MD5 is not broken for this use.

Just follow NIST guidelines and never change it. That way when the servers in Utah crack your password, they don't have to recrack it later.

I've run into off-by-one issues in password length requirements in the past, so if 32 characters is the stated maximum it might only be capable of 31 on the validation side.

That asymmetry in length support strongly suggests they're storing passwords in plain text.

I tried to change it to a 64 character random 1Password string with numbers, characters and symbols. It complained it was too easy to guess. I submitted the exact same password and it accepted it.

> August 2013

> hashed passwords (using MD5)

I don't even know what to say.

> investigating the creation of forged cookies that could allow an intruder to access users' accounts without a password. Based on the ongoing investigation, we believe an unauthorized third party accessed our proprietary code to learn how to forge cookies

How is this possible? Aren't most auth cookies just a session ID that can be used to look up a server-side session? Did they not use random, unpredictable, non-sequential session IDs?

1) As Yahoo "upgraded" all password storage in UDB (where all login / registration details are stored) to be bcrypt before 2013, I'm curious how this was possible.

2) Yahoo doesn't use a centralized session storage. If you know a few values (not disclosing the exact ones) from the UDB, it's theoretically (guess not so theoretical now) possible to create forged cookies if you steal the signing keys. To my knowledge, the keys were supposed to only be on edit/login boxes (but it's been a while so I may be forgetting something), so this is a pretty big breach.

On a number of engagements I've come across password databases that have been migrated to bcrypt. In one case I checked CVS to see who made the code change, and found the MD5 passwords on his dev box. In another I tracked down a MySQL slave that had broken replication for over a year.

In both cases I tried to track down backups, but discovered neither company was keeping them. That is another possible vector.

1) I'd be flabbergasted beyond belief if there was ever a Yahoo! engineer who had user passwords on their laptop / Dev box. The technical hurdle for that would be a stretch, let alone the fact of the other ramifications of doing this.

2) there's no SQL database involved with Yahoo!'s storage of passwords. It's a custom built db system with proprietary access and replication protocols.

I wasn't saying either possibility was the cause of the Yahoo breach. Simply pointing out that there is always another way.

The NSA's MUSCULAR program for example decoded proprietary secret squirrel cross datacenter replication protocols designed by both Google and Yahoo, so that isn't much of a safe guard against state level actors.

Yet, somehow they did get out.

Apologies, I've heard the details at this point and I can't disclose them. The limit of what I can do is poke holes in the theories that are wrong.

Aren't the details "three years after we were hacked, law enforcement told us that we had been hacked, and we believe them?"

The press release explicitly says "We have not been able to identify the intrusion associated with this theft." I especially noticed that the "What are we doing to protect our users?" section doesn't mention anything about Yahoo fixing any security issues.

Presumably, then, as a Yahoo engineer, you know what your security practices are but you don't know what you did wrong or whether you've fixed it.

Do you honestly believe a press release covers every detail, especially ones with strong legal implications, and might not have rather been worded very carefully?

The contrast between your statements and the press statement is great enough to imply Yahoo is being dishonest.

"Dishonest", not in the slightest. From what I'm told, they really don't know how they got in. But that's only the part of the story discussed in the press release, what's not discussed is how the data existed in that format.

From my experience if Paranoids did know they would have locked it down at the expense of engineers or others. I know since I have made breaking changes to infrastructure which did lock out some engineers and cause plenty of headaches.

Every Yahoo I have ever known has cursed the Paranoids for getting and the way. Every Yahoo that has actually been in a situation has also blessed the Paranoids for the same reasons.

Simple fact is that Yahoo has a mega butt ton of code from several decades. There are going to be holes and when they are found they are fixed pretty damn quick. Last one I dealt with was solved in hours with all hand on deck. Sometimes it just sucks to be as old a Yahoo is.

If they do not know how the adversaries got in, how do they know the adversaries are not still in to some degree?

Good point. I don't know if they do know that for sure.

> the "What are we doing to protect our users?" section doesn't mention anything about Yahoo fixing any security issues.

"We continuously enhance our safeguards and systems that detect and prevent unauthorized access to user accounts."

At the end of the same paragraph. They're already continuously updating security, before they even knew they were hacked. Three years have passed, so for all they know something in those continuous updates covered this hack.

I am taking a WAG here but if they got code then they might be able to take educated guesses at the UDB values without actual access to UDB. Those guesses are more likely to be true with bot registered accounts where there is duplication of information.

This goes back to my theory that a good portion where junk accounts.

Not saying this is acceptable, just saying garbage in garbage out.

You can't guess the XX (anonymized for obvious reasons) key without access to the UDB.

I'm guessing by your handle I know who you are :). Ex-Yahoo super chat moderating guy here, which should let you know me.

Wouldn't the upgrade require the accounts to actually login to migrate password? Last I was at Yahoo there was at least 3B junk accounts in UDB. With out knowing details I am guessing that many of the "compromised" accounts fall into that bucket.

I get that membership can't just trash junk accounts but marketing was very aware of them. Paranoids also can't just say a compromised junk account is not a compromise, they are too paranoid for that.

This unfortunately sounds bad PR wise, with little knowledge of actual impact. On the flip side I'm pretty sure I am not on the radar of the state actor since they would more then likely be looking at their own.

Just to confirm, purple Yahoo! car in YEF spot ;)

As to your question, no, they didn't need to login due to how the hash "upgrade" was done (unlike how Tumblr did it around the same time). I was one of the people in the billion accounts and I definitely have logged in and also changed my password multiple times (also have very high entropy passwords and use TFA).

It wasn't me despite your DR Ycan't photos. :)

Tumblr was indeed what I was thinking about.

What's funny is that there's someone currently working at Yahoo with a name scarily similar to yours and I was pretty sure for a moment that you were some random ycombinator person faking being him.

Although...he IS cool.

bcrypt(md5(password)) allow the existing password hash to be reused.

No. They've stolen the hash, so if they crack it, you've just let them waltz back in.

The correct response is force a password reset, and _delete_ weak hashes so that they cannot be stolen in a subsequent breach. At worst, store a bcrypted md5 password as you suggest, but only as a check for a password the user must not be allowed to use again; it _cannot_ be used to sign them in.

One of the attacks you're preventing is on _other_ sites, where the user has reused the passwords. Keeping around weak hashes even to let that user perform a reset is risking that hash being taken, cracked and used in a breach elsewhere.

When they did the bcrypt(md5(password)) there was no leaks of Yahoo!'s md5'd passwords. That's obviously changed now and thus why the billion passwords were invalidated (I'm one of those folks btw, but I also had TFA on my account and my password had sufficient entropy you won't brute force the md5).

Keeping around weak hashes even to let that user perform a reset is risking that hash being taken, cracked and used in a breach elsewhere.

We're currently working on PCI compliance. In pen testing, we got dinged for not preventing re-use of prior passwords, and that bothers me for exactly this reason (plus the new NIST standards say NOT to force periodic changing).

I believe that our hashes are strong (using scrypt, salt, etc.). But the belief that you're getting it right shouldn't let you be lax in other areas, hence security in depth.

So I really object to the requirement that we keep around those old hashes.

Good point. Thanks for pointing out my mistake.

Is the info about the Y and T cookies in this pdf [1][2] accurate?

[1] (EDIT: now with screenshots) http://imgur.com/a/g61VZ

[2] (Not affiliated with link, but the risk-averse may wish to open in a sandbox) ftp://hackbbs.org/milworm/270

Doing a google search for the link showed me the title of the document which I remember reading in the past. The overall coverage of Y&T cookies is more or less accurate at the time of writing back in like 2010/2011, but there's a bunch of mostly minor technical inaccuracies too. I don't want to comment on much without rereading it, but I remember the description of Sled ID made me laugh (which btw I'd guess less than 1% of current Yahoo employees knows what that is).

Also, the video that goes with the PDF is too funny! Just watched it on YouTube [0] again. Notice how he doesn't actually sign into Web Messenger, just goes to the login page? If he had, it would've failed. Same thing with him closing the browser before Yahoo Mail loaded. "Sensitive" reads and everything that did a write operation always (unless there was a bug) validated the cookie against the UDB. So even if you stole the signing key, without the values from the UDB, you would have very limited ability to do anything other than the trivial things shown in the video.

[0] https://m.youtube.com/watch?v=n2CNp_zmje8

It seems that Yahoo has a problem with moribund accounts- many people had a Yahoo ID 10-20 years ago, and then abandoned it.

If these accounts are not deleted (and there are a bunch of organisational reasons not to), then the MD5 hash has to be kept around somewhere, until the user re-enters a password and a better hash is generated.

> Yahoo doesn't use a centralized session storage. If you know a few values (not disclosing the exact ones) from the UDB, it's theoretically (guess not so theoretical now) possible to create forged cookies if you steal the signing keys. To my knowledge, the keys were supposed to only be on edit/login boxes (but it's been a while so I may be forgetting something), so this is a pretty big breach.

Isn't that highly confidential company information?

> 1) As Yahoo "upgraded" all password storage in UDB (where all login / registration details are stored) to be bcrypt before 2013, I'm curious how this was possible.

You check the plaintext password sent to the backend against the md5, on success you rehash it as bcrypt, insert it in the table.

Web tokens, for example, don't necessarily include just a session ID. Some include the full session details within its payload. This can be quite useful, actually, because it offloads session-lookup onto the client.

How do you invalidate a JWT server-side without the user interacting with the server ?

My preferred method:

Add an "expires" field to the token, this should contain a date after which the token is no longer valid. Now all token s auto-invalidate after a certain period.

Allow some or all tokens to "refresh" by calling a particular endpoint (call with valid token and get a token with expiry from now).

Optionally add some form of identifier to the token (user_id works great) so that you can push a message out to your servers that looks like this: "All tokens for x expiring before y are invalid". Once time y has passed your server can forget about the message. This will be a very small set (often 0) as very few people use the "log out my devices" features.

Logouts should be done client side by deleting the token.

If you are worried about your token being sniffed you are either not using HTTPS, or sticking it somewhere stupid.

> Add an "expires" field to the token, this should contain a date after which the token is no longer valid. Now all token s auto-invalidate after a certain period.

Doesn't JWT already have this - "exp" is a reserved claim for expiration time?


4.1.4. "exp" (Expiration Time) Claim

The "exp" (expiration time) claim identifies the expiration time on or after which the JWT MUST NOT be accepted for processing. The processing of the "exp" claim requires that the current date/time MUST be before the expiration date/time listed in the "exp" claim.

Yes but that is more for standard idle time expiration.. The problem being addressed above is for actively invalidating an existing JWT for a user once they already have it (and before the default/original expiry is met).

> Now all token s auto-invalidate after a certain period.

You need to make sure that there is some process that will refuse to keep on re-upping the cookie lifetime. Otherwise an attacker could indefinitely keep the stolen cookie alive.

If you see a suspicious usage pattern then force a login by invalidating the tokens. Allowing indefinite refreshing is a feature and a drawback of this method.

You CBS Combine a session cookie with a jwt Token That get sent over a Header

Which gives you the worst of both worlds

Tokens have in-built expiry dates (cryptographically signed by the server upon issuance). Once that date has passed the token becomes useless.

If you meant "how can you prematurely invalidate a specific user's JWT without needing a server side lookup", you can't.

I think the best you can do is issue different classes of JWT to a user based on what actions you wish to grant them. This lets you reduce load going to backend lookups to only a subset of JWTs where the ability to invalidate them earlier than planned on a per user basis is necessary/desired.

For JWTs that aren't tied to backend lookups the only solution if one or more users are accessing resources they no longer should be via one of these tokens is to invalidate all of them.

The client can hold onto the token indefinitely, the server doesn't care. But next time a request comes in with that token it will be expired. The server validates the timestamp which is part of the encrypted payload that only the server can decrypt; instant validation and no DB lookup.

This is possible if you support the 'jti' claim[1]. There's a discussion of an implementation of it here[2].

[1]: http://self-issued.info/docs/draft-ietf-oauth-json-web-token... [2]: https://auth0.com/blog/blacklist-json-web-token-api-keys/

Each JWT has an issued at date, so you just need to reject all tokens issued before that time. In addition to invalidating all tokens if there is a breach, each user account can have its own datefield to invalidate all the tokens for that account if a user changes their password or whatever.

I'm not too familiar with JWT, but i have some hands-on experience with Macaroons; the simplest way would be to have a custom caveat of validity set in the token, let's say, a validity GUID, which is an id of server-side record of validity (true/false), e.g. in some database table. Once you set that record of validity to false, the token bearing that GUID automatically becomes invalid.

Otherwise, without server-side changes (such as change of secret key used for signature generation), it is impossible.

With JSON web tokens (JWT), the client or server must know the secret key used to sign the token in order to validate it, but anyone can view its payload.

Could do it if you knew the JWT token text in theory?

MD5 is still not too bad, if properly salted. And if you use multiple rounds of hashing, it can be as slow as Bcrypt. As far as I know, MD5 is still not generally broken, we only found some weaknesses.

To prove me wrong you can try and reverse this one (unsalted , just one round):


Even so, the fact that we have the knowledge to generate collisions in MD5 means you really shouldn't be relying on it when there are better alternatives.

Try and generate a collision with the hash I gave. You can't, as far as I'm aware.

We can only generate collisions of carefully crafted sources, not arbitrary ones.

So MD5 is fine, as long as you follow the standard procedure for storing password hashes:

1) Unique salts + long master salt (to prevent rainbow table lookups).

2) Enough rounds of hashing.

3) Don't allow the most common passwords.

4) Don't allow very short passwords.

I'm not saying MD5 is ideal, I use Bcrypt / Scrypt myself. But it's not MD5's fault Yahoo's engineers are lame.

I'm wondering if this is one of the reason Alex Stamos left...

DO NOT delete your Yahoo account! In their disclaimer when you delete it, they state:

> "[...] we may allow other users to sign up for and use your current Yahoo! ID and profile names after your account has been deleted"

Bummer if you forget that it was the password reset email for your Facebook account, huh? Instead of deleting your account, purge it of all data: https://honeypot.net/purge-your-yahoo-account/

I just deleted all data from my account and set an automatic responder stating that, due to security concerns, I no longer use that account. I created my Y! account in 1998, it's a shame it has come to this. There were a lot of memories I had to purge along with my account (even though I had a different main account in the last decade). Shame!

This is a terrible policy. Do other email providers have a similar policy?

Probably not that terrible if they only do it for accounts that were created and never used. Like all the good GitHub usernames that seem to be abandoned.

GitHub usernames and emails are very different things. You don't get password reminders sent to your github profile, but you can get those via email.

BTW, no, most email providers never allow the reuse of close account names.

Microsoft seems to, although I can't find a specific statement from them confirming it: http://windowsitpro.com/blog/recycled-email-addresses-and-ou...

If someone knows how to delete more than 100 emails at a time, let me know. I have more the 10k emails, 80% of which are probably spam!

... And the answer is, scroll to the very bottom, then delete. I was able to delete over 1000 that way.

The other way is to search before:"2016/12/15", and delete all the search results.

They used to automatically put an email address back into circulation if you failed to log in for 6 months.

"Separately, we previously disclosed that our outside forensic experts were investigating the creation of forged cookies that could allow an intruder to access users’ accounts without a password. Based on the ongoing investigation, we believe an unauthorized third party accessed our proprietary code to learn how to forge cookies."

So that exactly explains how my Yahoo account was used to send spam despite having a password that can't be reasonably brute forced (despite them using MD5). :-/

The forged cookie attack was used on a limited number of accounts, by a state sponsored actor. Going to this amount of effort and then sending spam would be on par with breaking into a bank just to steal the printer paper from the office.

Most likely either: 1) you were phished and didn't realize it 2) logged in to your Yahoo account from a device that had malware on it

> just to steal the printer paper from the office

Or stealing $6,000 with $100,000 gun :)


I'm willing to accept that perhaps that was not how my account was compromised but the time frame when this happened was well in line for when this breach supposedly occurred.

Regardless, it was some sort of automated spam/phishing emails that were sent from Yahoo's network using my account to contacts on my list. I analyzed the headers of multiple bounced messages that were sent to email addresses no longer in use and confirmed the origin of the traffic.

I'm not going to fall for a phishing attack and I only access email from devices I personally control. Could one of them had some sort of malware infection? I guess it is possible but I am security conscious and it is highly unlikely. I also would expect a hacker that has compromised one of my devices would be far more interested in using my banking credentials than using my Yahoo account to send spam.

You reused the password on other websites, I'm guessing. Especially likely if it was a strong (i.e. hard to memorise) password.

The bulk hacking attacks that began around Spring 2010 hit all the big webmail providers. The source of the passwords was always, without fail, reversed hashes from breakins at other big websites:


Source: was a tech lead on the Google anti-hijacking team during this period.

Nope, not password re-use either. I learned that lesson the hard way over a decade ago.

Regardless, it's something that has always continued to eat at me since I can't say for certain how it happened.

Are you sure they actually logged in to your account to send spam (are the spam emails visible in your sent folder), or could it be that someone is just spoofing the SMTP MAIL FROM / email From: header?

As far as I can tell it wasn't someone spoofing my email address. Emails were sent to people on my contact list and the numerous bounce messages to contacts that no longer had valid email addresses confirmed the origin of the traffic.

It's possible that a contact of yours was compromised, and that contact had many contacts in common with you. And then they spoofed your address.

That's a good theory but in my case the sets of common contacts would be almost nil for that account.

I had the same issue, I could see the email sent from sent folder. This happened about year ago and I was very surprised.

Given Yahoo's security policies, whose to say someone wasn't just sending it from Yahoo's SMTP servers without any access to user's email accounts?

What do you mean by a password that can't be reasonably brute forced?

EDIT: To clarify, I mean specifically with md5. I'm by no means an expert, just curious because I had considered md5 so broken that this comment caught my attention.

Rumours of MD5's death have been greatly exaggerated.

MD5's weakness is that it's (relatively) easy to produce two strings which have the same hash. However, given an MD5 hash, it's not easy to produce a string which also has that hash.

In principle, one could intentionally construct two passwords which have the same hash. It's hard to see how that could be exploited maliciously - any attacker knows both passwords to begin with. Even then, making colliding strings that would make acceptable passwords hasn't been done yet, AFAIK: the shortest colliding strings found so far are 64 bytes long and contain several unprintable characters.

OTOH, computers are fast enough now that brute-forcing MD5 is practical for short strings with a limited set of characters, which is what passwords tend to be. One should use algorithms like PBKDF2, scrypt, and bcrypt which can increase their complexity as the computation capacity of potential attackers increases. This isn't because of a particular weakness in MD5, though, and one should equally avoid storing passwords as SHA-512 hashes, say.

The thing you definitely shouldn't use MD5 for is digitally signing a file you didn't make, because it's possible that whoever did make it also made another file with the same MD5 hash, for which your signature would also be valid.

On a side note: You can use such crafted strings as a black box testing tool to verify if a site does infact use md5 or other weak algorithms to store the passwords. This can perhaps be used in conjunction with other factors to craft an attack.

As a corrollary this can also be used as a testing tool by anyone for any third party site to determine known vulenrablities in their password storage

Definitely check this episode of 'Hacked' out for a simple overview. I just started listening to this show. It's a shame there are so few episodes.


A preimage attack for MD5 has complexity of about 2^123. So, even if you get the MD5 hash for a password, it will be exceedingly hard to find a password that has the same hash (assuming the original password is long and random).

I don't think that's true.

This site from 2006 claims they could find collisions in an average of 45 minutes on a 1.6 Ghz Pentium 4: http://www.bishopfox.com/resources/tools/other-free-tools/md...

If you account for speed increases over the last 10 years and assume the password thief has access to a botnet, then it wouldn't surprise me if they've found collisions for the entire list.

Edit: Nevermind, the link finds two strings that hash to the same thing; it does not find a string that hashes to an existing hash.

The collision generator behind that link does not implement a preimage attack (given a string X, come up with another string Y with the same MD5 hash).

Instead, it implements the much easier collision attack (come up with two strings that have the same MD5 hash).

I thought the whole point of the MD5 vulnerability was that the limit was 2^128 and as such there are more inputs that possible output hashes, meaning more possible input collisions.

All hash functions have collisions. The point is that a good cryptographic hash function makes it very hard to find collisions.

The “preimage attack” on a cryptographic hash function tries to find a message that has a specific hash value. That is, you lock down a hash value (the MD5 hash for a password) and try to find a message that hashes to that value (the original password, or any other input that happens to have the same hash).

The best known preimage attack against MD5 has complexity 2^123. It's better than brute forcing, but still unpractical. Thus, if I come up with a good password that is long and random, you will have a very hard time coming up with a string that has the same MD5 hash value.

The practical attacks against MD5 are collision attacks. A collision attack tries to find two messages with the same hash value. With MD5 in particular, there's a chosen prefix collision attack, where you choose two messages and append to them so that the hashes will match. This was particularly devastating with X.509 signatures and certificates, where the attacker could have the MD5 hash signed by a certificate authority, and then use the same signature with their other message that has the same MD5 hash.

What about Rainbow Tables? (https://en.wikipedia.org/wiki/Rainbow_table#Precomputed_hash...)

Instead of computing the MD5 of a huge number of passwords looking for a match, you simply store the precomputed password and hash pairs in a database table.

A rainbow table is just a precomputed table of hashes for a lot of passwords. Some tricks are used to make the table smaller, but you can think of it as just a lookup table. Only the passwords that were precomputed and put into the table will be found.

Rainbow tables are usually computed for short passwords (1-10 characters) and limited character set (say, alphanumerics). They are good for finding the bad passwords if you get your hands on a set of MD5 hashed passwords. But they are of no help if you need to reverse a good, long, random password.

Every hash has a finite output length, and therefore a finite number of possible outputs. 2^128 is a very large finite number. It's not that large in the grand scheme of things (there are over 2^260 or so atoms in the universe), and it's definitely better to use a hash with 2^256 outputs now that there exist good 256-bit hashes that are faster than MD5, but 2^128 is still quite a large number. The internets are quoting me about 10 billion hashes per second on a good GPU from a few years ago, which comes out to about one sextillion years to find an input for every possible output. (It divides linearly if you have more GPUs, but that clearly won't help very much.)

What's broken about MD5 is that, due to an algorithmic flaw, it's very easy to generate two inputs of your choice that have a matching output. That's great if you want to do things like spoof an SSL certificate (you generate two certificate signing requests, get one of them signed, apply the signature to the other), but not directly helpful for attacking a password hash where someone else chose the password.

What is conceptually broken is that such an algorithmic flaw exists, and also due to algorithmic flaws it takes a bit under 2^128 tries to find an input for a specific possible output. That worries mathematicians, because it's a sign the hash isn't behaving as randomly (speaking informally) as one would hope, and that people are starting to understand its structure. If that understanding continues, it might be broken more in the future, so you absolutely shouldn't build new systems on MD5 because we expect the research to happen at some point.

But, at least today, it's still true that you can have a password that can't be brute-forced despite the use of MD5. Maybe someone will present a paper tomorrow that disproves that.

This is a very clear explanation, thanks!

All hashing algorithms that I am aware of have more inputs than outputs. By the pigeon hole principle, there will always be collisions. MD5 is weak, but it still isn't trivial to find an input that hashes to the same thing as a high entropy password.

> that hashes to the same thing as a high entropy password.

To be clear, it's not the entropy of the original password that matters, except for the fact that all common low-entropy passwords already have their MD5s stored in public databases. (What hashes to 5f4dcc3b5aa765d61d8327deb882cf99? You can look it up with Google.)

You can come up with two plaintexts that hash to the same thing in MD5. You can't come up with something that hashes to a new MD5 value given to you, aside from finding it in one of those databases.

If it's a password so long and complex it wouldn't be in any rainbow table computable in reasonable time. While MD5 can be computed quickly, there is still a limit to how many you can compute -- and there are an infinite number of possible passwords if they aren't length limited.

Interestingly even if the password has infinite length, an MD5 hash has a fixed finite length. You can think of it as a glorified modulus operator, beyond some point the longer passwords will have hashes that match shorter ones.

True -- but assuming these passwords aren't stored the same (very, very wrong) way on another site, and they're no longer useful on Yahoo, what's important is finding the real password, not just a password that happens to match the given hash.

Rainbow tables are attacks against secure algorithms.

MD5 is recognised as an insecure algorithm: given a known hash, there are multiple possible passwords that would resolve to the same hash, therefore appearing to be the correct password.

With MD5, it's not necessary to compute an infinite number of possible passwords, and it is possible that, given a particular hash, a collision can be found within a reasonable time.

Either a) you don't have a clue about the complexity involved in finding a collision for a specific hash or b) your definition of "reasonable time" is longer than the age of the universe and/or using 100 trillion state of the art GPUs is realistic.

I'm leaning towards option a, you read a blog post once and think you're an expert on cryptography now.

  > the complexity involved in finding a collision for a specific hash
If it can be shown that a preimage collision can be computed in less time than an exhaustive search, the algorithm is generally regarded as having a weakness, even if the given "less time" is still a very very long time.

The theoretical complexity of MD5 is 2^128, but a preimage attack was discovered in 2009 which showed that a collision can be found in 2^123.4. [1]

Collision attacks against MD5 have become more practical, there are even frameworks for it [2]. The complexity of 2^123.4 still makes a preimage attack against MD5 computionally unfeasible, but given that it's been shown to be weaker than its theorerical 2^128, it's possible that MD5 has other weaknesses which would allow the complexity to be reduced to a level that is computationally feasible.

[1] https://www.iacr.org/archive/eurocrypt2009/54790136/54790136...

[2] https://marc-stevens.nl/p/hashclash/

To be fair, pretty much every MD5 discussion I've ever seen or been involved in (including with "security expert" former coworkers) has had someone making the same claim.

What you're describing is the same for every having algorithm in existence. All hashes can represent multiple (indeed, infinite) passwords. So they all have collisions. This is because all hashes are fixed-length, and so finite, while the possible inputs are infinite.

This isn't the reason that MD5 is weaker than other algorithms.

You are describing a first preimage attack. There have not been any computable first (or second) preimage attacks on md5.


There are collision attacks, but that is not relevant for password cracking.

From 2009: a preimage attack reduced the complexity from 2^128 to 2^123.4 [1].

It's still a big number, but it's less than the theoretical complexity.

[1] https://www.iacr.org/archive/eurocrypt2009/54790136/54790136...

What I meant by "computable" is something that can be computed with today's hardware.

Pretty much even if you choose a high entropy password like say:

the MD5 algorithm can be broken using various techniques like collisions, unsalted I believe means that their database would accept the hashes the third party has. End result is they should have migrated away from MD5 after it was declared unsafe.

No it can't.

Two principles here:

1. If your password is very very good (a Diceware password would suffice), then any method of storing passwords that is better than storing them in plaintext will stop someone from brute forcing it.

2. If your password is very bad, then even an excellent password hashing algorithm will not save you.

"Just use bcrypt" is meant to save people who are in the middle.

No, a collision attack would not give you the plaintext from a hash. A first preimage attack would do that, but no computable first (or second) preimage attacks against md5 have been found.


Nope, that doesn't explain it. Without Yahoo! UDB access to get a couple values unique to your login, you can't forge a cookie that allows you access to Yahoo! Mail.

Related: former Yahoo security engineer talks about a backdoor Yahoo installed for the NSA to read private emails...behind their security teams' backs...


In case you are looking for the important information, it seems to be MD5 hash without salt.

Bloody hell. Sloppy and incompetent.

I'm genuinely curious how the decision to use MD5 gets made. Who says, "hey, maybe we should use MD5." And then who responds, "that sounds like a great idea Bob." Seriously. I've known for years that MD5 is insufficient for hashing passwords and I'm just some random guy. This kind of thing really baffles me.

Yahoo has been a company for a long time. I imagine your conversation happened round about 1999 when using MD5 wasn't insane. And then they were just slow to upgrade.

It's still bad, I'm just saying the conversation about what hash algo to use didn't happen yesterday.

I'd like to believe that. However, I was recently asked to test a new website for an organization I volunteer for, and discovered their "forgot password" flow emailed me my plaintext password. I wrote an explanation of why this was bad, and how it could be fixed, to a non-technical friend of mine who works there; he passed my email to the (Bay Area based!) consulting shop that did their website. The shop sent this response:

"We do not store passwords as a plain text in database. We have functionality which encrypts and decrypts passwords. We have only ecnrypted passwords in the database.

Almost all other servers use one-way encryption. In this case, passwords cannot be decrypted from hashing."

Again, this is a Bay Area based shop. For code written in 2016.

I was shocked to receive this, but it (among other things) leads me to suspect that there are lot of people out there, in positions of power, who aren't just ignorant, but who actively cling to password-storage anti-patterns.

I'm at a loss for how to fix this.

Just for clarity, the "forgot password" flow emailed you the current password of the account (not a temporarily one)?

That's insane...

Yes, the current password.

submit the website to http://plaintextoffenders.com/

Ironically, hosted on a Y! site.

But it's not like if we didn't have a pretty much continuous stream of major data leaks for the past 5 years. Surely yahoo engineers occasionally open a newspaper...

From everything I've read, the engineers did. The problem was that the security team had to go head-to-head with the budget team. And unfortunately, the budget team won - since the upper levels didn't feel that the IT security salaries were a necessary expenditure. And beyond that, there was concern that making people actually change their passwords regularly and requiring anything like security in said passwords was going to discourage users from using Yahoo and send them over to GMail.

Unfortunately... that argument wasn't wrong.

> The problem was that the security team had to go head-to-head with the budget team. //

Wouldn't engineers at such a big corp whistle-blow such incompetent decision making?

Apparently [1] they had a $1.37B net income in 2013. Given using bcrypt with a Blowfish hash and salting was pretty much a de facto standard by that point (I think that's what Wordpress were doing, hardly revolutionary security work) it seems the relative cost for Yahoo was approximately zero.

All I can imagine is that those in control were asked to leave the system open for government snooping? Why else would engineers working there not [anonymously] bring this to press attention - "hey, Yahoo security amounts to a piece of sticky tape holding a bank-vault shut".

- - -

[1] http://www.marketwatch.com/investing/stock/yhoo/financials#

It's not that hard to implement something at the start. It's more work to retrofit it on top of an existing system in a way that doesn't reduce the total security.

But would it require users to change their password?

The way I would have implemented it, but would be keen to know how secure it is, is that you start with the md5 of the password (md5(password)). You then bcrypt or scrypt that md5 (bcrypt(md5(password))) and replace the md5 in your database with the bcrypt hash.

When a user logs in, all you need to do is to calculate the md5 first then check that md5 against the bcrypt hash you have stored.

I am not a crypto expert but intuitively it doesn't look like I would have weakened the security that way. You can't really attack bcrypt(md5(password)) much more than bcrypt(password). Can you?

The method I've used is to add the column for the new stronghash then you update the old column to stronghash(<oldhash>), where <oldhash> is dumbhash(password) check against that on login stronghash(dumbhash(password)) and generate just stronghash(<password>) while you have the plaintext password in memory and update the row to add the new hash (simple and interoperable, not dependent on dumbhash) and drop the stronghash(<oldhash>). After a <longtime> limit (to optimize both maintenance overhead of the additional column / behavior and limit exposure to only minority users that haven't logged in for <longtime>), you drop the stronghash(<oldhash>) from everyone and do a "we sent you a reset email" for anyone that's trying to log in but has no <stronghash> password hash.

This is fine workflow, but keep in mind

> and do a "we sent you a reset email" for anyone that's trying to log in but has no <stronghash> password hash.

Yahoo is an email provider so many of these users won't have an external provider to refer to.

This workflow is much better than the other proposals I've read up-thread.

It's one way to do it, which is okay sometimes.

The other way is to add a new empty column for bcrypt. The next time the user logs in, you save the bcrypt hash and you remove the MD5 hash.

Over time, the active users will be migrated to the new scheme. The only issue is the abandoned accounts, they'll keep the old weak scheme.

There are other migration techniques. If you know md5(password), you can create bcrypt(md5(password)).

That's what I do, though care should be taken that you can't then login against the old passwords by putting md5(password) in the password field.

Usually you do this by decorating the bcrypt(md5(p)) entries in some way so you can recognize which ones are tested with bcrypt() vs bcrypt(md5()).

I am not sure I agree. Your way will leave all the non active users exposed in the case of a leak. They may not be active on your website but are likely active on another website using the same password.

As I said, that's an option among others, it has drawbacks.

For a website like Yahoo with billions of abandoned accounts, that's a serious drawback ^^

The problem is in collisions. Md5(password) can yield the same result for many different values of password so simply bcrypting that result means that you start with a restricted possibility space. So less secure. Punts the question to how much less secure. Seems to me it would still be worth it to do and then all new passwords going forward are done correctly.

Agree, but a collision even for md5 is a relatively rare event. When brute-forcing the bcrypt hash, this would reduce the attempts you would need to try against a given hash, but only by a very small factor. With a reasonable work factor, I would assume it would still make a brute force attack impractical at scale.

I didn't do the test, but I'd expect that there wouldn't be more than a handful of collisions for the md5 of the 100m most common passwords.

[edit] I actually I just did the test on this 10m password list and no collision


I've done it before on a 1 billion word / password list and didn't get any collisions.

That being said md5 does generate collisions. I was playing with the IMDB movie database that you can download. They use a combination of the title and the year as a primary key. I tried using an md5 instead to save space (but giving a reproducible ID instead if an identity column), and got many collisions. No collision with SHA256.

Wait, what? No MD5 collisions at all were publicly known until Xiaoyun Wang disclosed one in 2004 using a new cryptographic technique she invented (explained in Wang and Yu's "How to Break MD5 and Other Hash Functions").

MD5 has a 128-bit output so collisions that occur by chance should require about 2⁶⁴ inputs (18 exa-inputs). Surely your database didn't contain over 2⁶⁴ different movie records.

Could you take a look at what you were doing again? Your description doesn't really make sense mathematically.

You must be right. I can't reproduce it. I must have fucked something up then.

You likely goofed something up. No one has demonstrated two strings that are conceivably used as passwords that users type in -- and that includes the tuple {movie title:year} -- that have MD5 collisions.

The security problem with MD5 isn't collisions.

I think you are right, I can't reproduce it.

What you're describing is not possible given the database you tested. Are there more details that would clarify your post?

Oh, of course md5 has collisions. It's relatively easy (not computationally easy, but there are known methods) to find two random strings that hash to the same value, it's just very difficult to find a string that hashes to the value of a specific other string.

Not "relatively easy" by chance: it should require 2⁶⁴ entries in your database to see a single collision happen at random! It's only "relatively easy" following cryptographic research in the early 2000s that exploits structure in MD5 to produce collisions deliberately.

Yes, collisions are easier than preimages, but they still shouldn't occur by chance in real applications!

Realized my wording was way to ambiguous, clarified. Thanks!

Very nice. Thanks for that. So yes, this is likely the thing to do in this situation.

Unfortunately, this isn't an accurate description of the nature of the collision problem with MD5, which involves carefully crafted inputs using a sophisticated cryptographic attack -- not arbitrary user inputs that don't intend to collide with each other. See my and danielweber's comments about this down-thread.

(Yes, susceptibility to collisions was recognized as a problem with MD5 leading to a reason not to use it, but the collisions in question were constructed, not encountered accidentally. There isn't any evidence to date that the probability of a collision given two randomly chosen inputs is higher than the expected 1/2¹²⁸. You could test this yourself by hashing 2⁴⁰ random strings under MD5: you won't see a collision among the outputs!)

>Md5(password) can yield the same result for many different values of password //

Not "many different" using the normal constraints of text/numbers/typographical-marks and with maximum password lengths of 32 or so (I'll bet Yahoo's was shorter than that in 2013).

Are there any MD5 collisions in [:graph:]{,32} ?

I really doubt it. When people demonstrate MD5 collisions, they use a hex strings like

0e306561559aa787d00bc6f70bbdfe3404cf03659e70 4f8534c00ffb659c4c8740cc942feb2da115a3f4155c bb8607497386656d7d1f34a42059d78f5a8dd1ef

Yes, because MD5 digests are much shorter than 32 characters, even if it's just ascii, so by the pidgeonhole principle there must be. If you're asking if there are _known_ collisions between two messages with less than 32 printable ascii characters -- the answer is likely yes, but there are not known to me and likely not publicly known at all yet.

I thought md5 were 32 characters. But you're right every md5 hash would be in that space, so there must be collisions.

bcrypt(md5(password)) is what Yahoo! did when they switched.

Especially about it being a bad idea to make people regularly change their passwords!

And nobody ever seemed to say "hey, maybe we should be using something more secure". Yahoo's been around for how many decades, and the fact they were still using MD5 in 2013 is just shameful. Yeah if it was some legacy code from 1993 you can probably excuse it, but I just can't believe after 20 years nobody thought it was a problem.

I'm not really a software developer but I really can't imagine it being a huge change. Instead of md5(pass) you could probably just change that to secure_hash(md5(pass), salt), add another column in the database for the salt, and rehash all the passwords. Customers wouldn't notice. Rehashing the databases would take a while, but otherwise that's really not a huge amount of work.

Well, you can only rehash if you have the plaintext password. So you have to wait until they login again, or force a password reset for everyone. In the former case you're stuck with a bunch of md5 passwords hanging around for any account that's not very active, and for the latter you'll lose some percentage of active accounts whose reset process is for some reason no longer functional. You could mix-and-match the two methods (start with the former, force the latter on any stragglers after, say, a few weeks) to minimize the damage, but that's more work and a number that someone somewhere in the organization finds very important is still probably gonna go down.

(I've never had to do this myself, so these are just the most obvious options I came up with. Possibly there are others.)

  You can only rehash if you have the plaintext password
There are techniques to rehash, even without the plain-text password, and without the user having to login to trigger a rehash.

Drupal 7 used such a technique for upgrades from Drupal 6, migrating from MD5 to a salted sha512 hash, but it's not an uncommon technique.

The old passwords are stored as MD5 hashes in the databases. The MD5 hash is processed through the same techniques as new passwords: a salt and the new sha512 hash. Provide a way to identify whether the origin was a password, or an MD5 hash.

Either way, you end up with a hash. You can identify whether the origin was a password, or an MD5 hash, but you can neither determine the origin MD5 hash, nor the origin password, as the new hash is secure. So even if the original MD5 hash was insecure, the new hash is secure.

When someone attempts to login, you still need to determine which password-validation to use: hash = sha512(salt + password), or hash = sha512(salt + MD5(password)), but the security level is the same.

> hash = sha512(salt + MD5(password))

Passing the password through MD5 reduces the complexity to 128 bits, you can't get that back.

So the security level is not the same, though it may be resistant to some attacks on MD5.

And it's probably not important for most people, since there are less than 2^56 eight character ASCII passwords.

  > "Passing the password through MD5 reduces the complexity to 128 bits, you can't get that back."
Assuming that the new hash is secure (and sha512 is generally agreed to be secure), then, given a specific sha512 hash, the original MD5 hash can only be determined via rainbow tables, which is a Big-O operation. Even though entropy is reduced, it's still a significant work to determine the original MD5 hash (significant in this instance being longer than the heat-death of the Sun, given current extrapolations of computing performance).

Attacks against MD5 are based around knowing the original MD5 hash. In this instance, the original MD5 hash is unknown, so there is no mathematical shortcut to finding a collision.

In this case an attacker isn't looking for a collision (which would mean creating two passwords with the same hash, and what hash that is doesn't matter).

The attacker needs a password with a specific hash, and the best reported attack for that is around 2^128.

Agreed, that the best reported rainbow-table attack on MD5 is 2^128 (i.e. the complete range of possible MD5 hashes).

Personally, I'm willing to chance that my password will be discovered via a brute-force attack within the next 0.65 billion billion years [1]

[1] http://bitcoin.stackexchange.com/questions/2847/how-long-wou...

I think it does make sense to be cautious.

A new preimage attack could be discovered - or might already have been, secretly.

> Passing the password through MD5 reduces the complexity to 128 bits

No, this is not the problem with MD5. You are not going to find two user-memorizeable-and-typeable passwords with an MD5 collision.

If you are bringing a password with more than 128 bits of complexity to the party, any password storage scheme better than plaintext will have your password safe.

For passwords, there is no known problem with MD5, unless you know about a preimage attack.

Collisions are a problem for digital signatures, not for passwords.

But some people do want and use more than 2^128 bit passwords, for whatever reason, and an MD5 intermediate stage limits that.

I was doing all kinds of mental gymnastics trying to figure out how this would work; thanks for explaining it so clearly.

I have been in this situation, and you're correct.

Somewhere in the organization, a product team is going to throw a fit about usability and churn over the decision to reset user passwords en masse, or to force users to change them when they first log in. This isn't a slight against product managers, but one of the clearest indications of a company's overall security culture "health" is how the security, engineering and product teams choose to compromise and "pick their battles." Risk accepting vulnerabilities has a legitimate place when you have to balance product development and usability, but so does pushing back on egregious issues.

I don't have privileged insight into Yahoo's organization, but in this case it's pretty clear the security team should have either been more diligent in conveying the ramifications or less kneecapped by the surrounding org units, depending on the circumstance. More importantly, Yahoo should have "migrated" their passwords in the manner a parallel comment explains in this thread. This is what Facebook and other companies did after maturing their security programs (see "Facebook Onion" on how Facebook transitioned away from MD5).

Also good to note - there is evidence Yahoo's security culture improved over the years. The decision to go with MD5 almost certainly happened in the 90s, and when Tumblr suffered a breach all users were forced to reset their passwords. The capability and awareness was clearly there.

x0's algorithm was secure_hash(md5(pass), salt), you already have md5(pass) so this can be done in one bulk update.

Does an insecure algorithm mean that you effectively have the plain text passwords?

Not necessarily, because of collisions.

The password "foo" may encrypt to the hash "12345". If an attacker were to discover that the hash is "12345", they would look for a password that hashes to "12345", which could, hypothetically, be the password "bar". They don't know the original password "foo", they've simply discovered an alternative, which happens to match the algorithm enough to unlock access.

In general, rainbow tables are used for identifying and attacking common passwords, but that doesn't mean that the algorithm is insecure.

Insecure algorithms can be attacked through collisions, which don't necessarily give you the original password, they just provide an alternative password which is accepted by the algorithm. The distinction matters when it comes to password reuse, because if Site A uses MD5, but Site B uses sha512, finding a collision that grants access on Site A doesn't necessarily give you a password that will grant access on Site B.

Having worked with monolithic legacy codebases that they likely have, it has gone through hundreds of developers who dont work for the company anymore that created a bunch of spaghetti code means its a huge effort required to make sure that none of their other services break when they implement such changes. Also, management HATES when dev teams do this because it isn't "new stuff" thats immediately visible to their bosses nor the end user.

If anything goes wrong with the password update, users get angry, lose faith in the services, stress, a few people get fired maybe, etc etc. On the other hand, letting it stay old and crappy just everything stays just peachy, and nobody is the wiser that the entire system is a house of cards. Until the day someone hacks the database of course... which happened so its "now" a problem.

They're not going to begin to take security seriously even after this incident. They'll do what they need to right now but there's no auditing and their users don't normally care about this sort of thing, therefore the management won't care either.

There are likely to be a lot of identity systems using the password in the database, all of which have been coded to look for an MD5 hash, not a salted hash. This means code in a number of applications have to be updated at the same time.

The typical way around this is to create your new destination column (e.g. sha256 with salt), and progressively have applications reference this column rather than the MD5 unsalted column.

It's a huge amount of work, and if the applications were made in 1990's, the code is likely legacy. If Yahoo are doing regular code security reviews, this will likely have been put in the pile of "we need to fix, but it's too costly to do".

> It's a huge amount of work, and if the applications were made in 1990's, the code is likely legacy.

Which begs the question, can legacy code survive in an international network?

That's the right question to ask. The answer is no, because new security vulnerabilities are disclosed every hour.

A large organisation will implement layered security (otherwise known as layers of the onion) to prevent this type of attack. This means; more secure passwords to access the password database, fewer people with access, rotation of access passwords, auditing of backup storage and encryption, etc etc. Clearly Yahoo's layers of security were all broken to allow this type of theft.

>It's a huge amount of work //

Really? Moving from doing md5(password) to bcrypt(password,salt)? I see organisations make things hard and legacy code-base, yadda, yadda but surely if Yahoo couldn't do this then they couldn't manage scratching their own butt; it really seems like quite a small change in the scheme of things. Like one senior engineer, one afternoon of work (then testing, etc., OK, sure) ... ?

"It Takes 6 Days to Change 1 Line of Code" https://news.ycombinator.com/item?id=13119138

I'm going to go out on a limb and guess you've never worked as a software engineer in a large organisation.

Given MD5 hashes are currently stored, how do you propose user's password get converted to SHA256/512? Should Yahoo brute force the passwords, and then store them in the new algorithm? Or should they wait for the user to log on, verify their password, and store it in the new hash algorithm (given some users rarely log on, this could take over 12 months to complete 80% of users).

Yes it could take months or years to complete the process, but they've had at least a decade.

Even if it never completes (abandoned accounts), it would still have saved most active accounts from being breached.

100% agree. Yahoo should have started the process a long time ago.

I was just replying to the comment it could be completed in an afternoon.

You're right on the first count. It wasn't sarcasm, it was a question.

On the storing of hashes though the standard protocol has been to pass the hash in as if it were a password.

Hashing the hash isn't a good idea, you're reducing the domain of your secure_hash function to the range of md5. The way to do it is to have a "password hash algo version" column and when the user puts in their password, you verify against the hash[algo](password) and rehash with the later version, changing the algo column for that user.

You could do both though. Give much more security in the short term and upgrade anyone else who logged in later.

I did ask about the hash of hash thing some time ago and ptacek claimed that's a reasonable thing to do.

> you're reducing the domain of your secure_hash function to the range of md5.

Oh no, only 128 bits. The NSA will be able to brute force one of those passwords in 80 years.

You need to do both. If you only do the latter, then stale accounts which never log in again will never have their passwords upgraded to the more secure hash. Hashing the hash allows you to replace the md5 hashes immediately, and then you can perform the upgrade if/when the user logs in again.

>I'm not really a software developer but...

If I had a nickle for every time I've heard this statement then I'd have enough to comfortably retire.

Yes, in theory, changing a column in a database (which in this case, happens to be a password) seems simple, but in practice, it's not.

You're assuming engineering is just sitting on their thumbs, reviewing their code once a week, thinking of ways to optimize it.

In reality, they're constantly under pressure to develop new features, fix reported bugs, move on to the next project, keep the site from falling over, etc etc.

And the ones who choose NOT to work hard aren't sitting around reviewing old code either.

For an IdP at the scale of Yahoo, the can adopt something as complicated as supporting versioned passwords and migrating credentials to the latest secure algorithm upon successful login. You have the clear text password at that point. You can store metadata such as the version (or algorithms) used to hash the credential.


It's easy as hell. Even PHP, so often flamed for "bad security" these days supports EASY functions for this (and polyfills are available, if you're running PHP < 5.5, which you should't do anyway):

- password_hash, which creates a salted hash (the returned value consists of a type/strength spec, the hash, and the salt)

- password_verify, which verifies a password with a hash in a timing-safe manner

- password_needs_rehash, which tells you if you should update the hash in the database

password_hash and password_needs_rehash take a parameter for the hash function (currently only bcrypt is supported, quite likely to keep people from using md5/sha1), and for the cost (the amount of hash function calls).

I believe any reasonable programming language these days has such functions.

What I am NOT so sure about is how the various LDAP server implementations, which many people use for SSO and "normal" account management (because it's easier to connect a new software to LDAP than to migrate existing user db's into LDAP), handle password storage. I mean, having an LDAP server for the credentials prevents any form of password leakage, but in case someone breaches both servers/the LDAP daemon is running on the same host as the webserver?

Nothing is "easy as hell" at scale.

Normally you'd =not= store the salt separately; the usual way is keeping the salt and the password together in the same 'blob'

Rehashing can be safely implemented as long as the auth. process can handle both md5 and some composite hash [i.e. shash(md5(pwd))]

It's really a trivial operation.

I doubt that decision was made in the last decade. It's surely just something that's been around for a long time and was never upgraded.

Still neglectful, but I sincerely doubt it was just a recent engineer's bad decision-making.

It gets/got made ~10-15 years ago. (I don't understand the "no salt" thing, though. That was common practice even ~20 years ago on Linux machines, so I'm mildly surprised that it wasn't implemented in this case.)

> I'm genuinely curious how the decision to use MD5 gets made.

You assume a formal decision was made? I think a manager just went "make them secure" and history was made. That's how it usually seems to happen if it's not a user-facing thing.

I think the organization as a whole is just indifferent. Does this breach really matter to Yahoo's bottom line? They were already sold to Verizon. Most of the active users probably won't read this news. It's sad to say, but I think Yahoo as a whole just doesn't care about their users.


No, sorry. They're borderline criminally negligent. When you have 1bn passwords stored in raw md5, a decade after the first rainbow tables were published, then you don't deserve anyone's business or your freedom.

Sure, it's borderline negligent.

But it's already a godsend compared to what many banks do, storing passwords in plaintext, sending reset passwords via plaintext email, requiring 4-8 character passwords that can only contain digits and a limited set of characters, etc.

I'd be more than happy if any bank would follow Yahoo!'s password standards.

Most banks don't have a billion customers. (There are probably a few that do, but not many.)

It's really not. Unsalted MD5 has been shameful for a long, long time.

As a data point: when I was a teenage code monkey in 2004 writing PHP I already understood that unsalted MD5 is unsafe.

According to Wikipedia:

* 2004 it became possible to find MD5 collisions at a rate of one per hour on a cluster

* 2005 it became possible to do this within "a few hours" on a consumer laptop

* 2006 it became possible to do this within one minute

* nowadays it's possible to do this "within seconds"

Plus, as others have mentioned, it's now possible to find collisions instantly by using widely available rainbow tables, e.g. https://md5db.net/decrypt

MD5 collisions are probably not important for passwords.

To put it in layman terms.

The MD5 collisions attack usually done by researchers: They want to generate 2 files with the same MD5 hash (they can put anything they want in these files).

This kind of attack doesn't affect passwords. The user picked one file (i.e. the password), you don't know it, you can't change it, you can't choose it.

Care to explain? The hashes are what is compared so it seems it's important.

The existence of crafted collisions -- being able to create a pair of M1 and M2 such that MD5(M1) = MD5(M2) -- is primarily relevant to situations where MD5 is being used as a signature algorithm, such as in certificate issuance. In these applications, being able to generate a pair of documents with the same hash is catastrophic.

Being able to generate a pair of passwords that are treated as equal, on the other hand, is useless from a security perspective. It's a neat party trick, but it's not dangerous.

Now, if there were a preimage attack -- being able to take MD5(M1) and come up with a M2 such that MD5(M2) = MD5(M1) -- that'd be a much bigger deal, and it'd break MD5 password hashing wide open. But nobody's done that yet.

I'm a total greenhorn when it comes to cryptography, but the difference between these two situations was totally lost on me until I read this comment. When I see, "It's easy to create MD5 collisions," my first thought is, "If you give me a hash, it's easy to find a string that results in an identical hash." If I'm understanding this right, that would be a "preimage attack," and would be bad for all the reasons being discussed in this thread.

However, it seems like "It's easy to create MD5 collisions," at least as it is true today, actually means something different: That, given a string, it's easy to find a second string that shares the same hash. If that's the case, I have two questions:

* I am totally lost as to how these are different scenarios. There's no difference I can see between "Here's string A" and "here's the hash of string A," if the goal is to find a "string B" that shares the hash. Are these "crafted collisions" generated by modifying string A and string B, until a collision pops out?

* If that's the case... what's everyone freaking out about? Why were people saying MD5 is unsafe 20 years ago, if even now, we can't achieve a preimage attack that can get you into an account based on the valid password's hash? Yahoo could have printed these hashes out and hung them up on posters in the mall and no one would have been able to get into accounts from it. There are dozens of comments lamenting how stupid this was, but... it seems like there's no actual problem?

> However, it seems like "It's easy to create MD5 collisions," at least as it is true today, actually means something different: That, given a string, it's easy to find a second string that shares the same hash.

Very early MD5 collision attacks were even weaker, actually: given nothing, it was possible to find a pair of arbitrary garbage strings which had the same hash as each other. It wasn't until later that it became possible to pick what the strings would "look like".

> Are these "crafted collisions" generated by modifying string A and string B, until a collision pops out?

Generally speaking, yes.

> If that's the case... what's everyone freaking out about?

The issue with using MD5 as a password hash function actually has nothing to do with collisions. That's a red herring. :) The real problem is that using any fast and/or unsalted hash function for passwords is unsafe!

A fast hash function is unsafe because it makes it easy to generate a bunch of potential passwords, calculate their hashes, and look for a match.

An unsalted hash function is unsafe because it makes it possible to build a "rainbow table" of all possible passwords and their hashes, and look up password hashes in that table.

As used in this situation, MD5 is both fast and unsalted.

Most people here don't seem to understand the difference between collision and preimage attacks. So they're overreacting to the fact Yahoo used MD5.

Storing unsalted passwords, however, would be a huge mistake, if Yahoo did so as someone here claimed.

There are precomputed lookup tables for the unsalted hashes of many, many passwords (both MD5 and more secure hashes) and cracking unsalted passwords is simply a database lookup.

Ah ha! There's the weakness I was missing, thank you so much for responding. I hadn't even thought of it that way---I knew salts shook up the resulting hashes, but an actual benefit of it is that it makes it pretty much impossible to do any "homework" (rainbow tables) ahead of time.

Google(MD5(M1)) = MD5(M2) is more than enough for most users.

That website does not find collisions. It uses rainbow tables (or some other type of table) to crack passwords that it already knows.

Collisions are irrelevant for password cracking.

> Sure, SHA1, scrypt or bcrypt with salt were already common back then, but it's an entirely different story than if they had used it today.

Not an excuse, this is Yahoo, not a PHP shop in India doing some low budget contracting.They should have a top of the line security team enforcing the most recent secure practices. Furthermore I got no email from Yahoo telling me that my account may have been hacked. Both incompetent and irresponsible at the same time.

By the way I did some PHP dev back in 2011. bcrypt hashing was already common practice. How can you come up with that argument in good faith ?

> Furthermore I got no email from Yahoo telling me that my account may have been hacked

Then your account was most likely not on the list of accounts compromised.

> By the way I did some PHP dev back in 2011

Well Yahoo is a tad bit older then that, by about 17 years. This is not an excuse, but really comparing your 2011 coding to 1994.... Go ahead and boot up your old 486. I'll get back to you when this page loads up in an hour. :)

Yahoo's code base is old and huge, like billions of lines huge. Yahoo's engineers have modernized it at a massively rapid pace. I'm not sure of current state, but when I left Yahoo finance was written in something like 10 languages including serving pages in C, cause that's all they had back then.

Current tech is NodeJSish and others. They have their own hardened versions. But still migrating millions of lines of C to something other then C isn't a walk in the park.

> How can you come up with that argument in good faith ?

Let's say I've seen far worse in 2016, from companies storing far more sensitive data.

Like a bank, with no 2FA support, emailing me my plaintext password after clicking "Password forgotten", in 2016.

This story is problematic, but I'd be grateful if that bank would implement even the same stuff as Yahoo.

Also malicious (allowing NSA to search through everyone's emails).

Current law seems to dictate that if the NSA wants that, it's what they're getting. Blame the government.

They actually fought in court about it, so I commend them for it

I'm speechless.

More and more are migrating to cloud these days, I expect more and more epidemic leakage will come.

I host everything myself except for email, which is always a headache but contains more private info than all others I manage combined. Maybe it is time to run a small email server again but it is easily said than done, gosh please give me something like a working PGP or whatever for safe emails(PGP is dying from what I read)...

"Based on further analysis of this data by the forensic experts, we believe an unauthorized third party, in August 2013, stole data associated with a broader set of user accounts, including yours. "

"The stolen user account information may have included names, email addresses, telephone numbers, dates of birth, hashed passwords (using MD5) and, in some cases, encrypted or unencrypted security questions and answers. "

I'm a paid premium member for Yahoo's service for many years, I would like to join somebody else to sue the hell out of Yahoo.

Suing companies for this sort of thing isn't as easy as you'd think. One of the issues is damages, as in, you need to prove you incurred some sort of tangible harm or damage. This is usually calculated in financial damage. Currently there is a big split in the legal community about whether having your password or other info stolen, without any thing else happening (such as leveraging that information to get inside bank accounts and stealing money) is enough harm to satisfy the damages requirement because there was no financial damage done. Not saying I agree, but it's an issue.

If you can prove financial or other harm resulted from this, then yes, you'd might have a case.

Another avenue you could take is breach of contract or some similar claim. As in, you paid them and formed a contract according to their ToS, and their ToS (I assume) states they use at least reasonable security. Yet they didn't, which would be a breach of contract.

The complexity isn't that much of a problem. Windows server + smartermail has a nice UI all the way. The problem is cost.

[edit] by the way I wonder how useful would be a tutorial "for dummies" of how to set up your own mail server from scratch. I assume that users who would be happy to pay for their own server but feel it is too complicated would likely be windows users, i.e. wouldn't mind having to pay for a license and would like to use an environment with a relatively exhaustive UI. I'll give it a try.

SmarterMail have both perpetual + monthly lease licensing options. Leasing of SmarterMail Pro 250 Mailboxes was as low as $15/mo. Overall SmarterMail have a easy installation and all the management is via the web interface.

if you are looking for a perpetual license, grab the 46% discount that's going to end by 31/12/2016 from https://www.tweakservers.com/mail-servers/smartermail/

Well inbound email is really not a big issue from my experience. The issue (if any) is deliver-ability of outbound email. But that can be handled in any number of ways. (You can use someone else's smtp if there are issues or you can just follow best practices to have a clean ip address they are available).

I have been doing both inbound and outbound for roughly 20 years on our own equipment. But even doing just inbound gives you better control and in a way you are able to lessen the attack surface of being a large vulnerable target.

I've heard that setting up an outbound email server on places like Linode or DO is tricky, because of how likely it is the IP block you're on will be considered spammy. To get around that, I rent a VPS from a local ISP here in Seattle. They have their own equipment, their own IP ranges, etc. It's a bit spendier than Linode but it's not breaking the bank.

good to know that, not sure if aws is better as far as IP-range-blocking is concerned.

Hey just to let you know I host ~300 domains on 1 aws instance. We only have issues when a clients password gets phished, but we also have a limit on the amount of emails per day they can send. So it's never a real issue.

~20,000 emails a day

Going on 4 years. AWS "blocks" are perfectly fine. If you are going to host your own just get your self an Elastic IP and let your account manager know that you intend to send mail. As they (use to? I had to do this 4 years ago) have their own internal anti-spam system which you may hit.

On the contrary I also host my own mail on an instance I have over at [0] which is rock solid and I've had no issues that are not the fault of my own. I would recommend at minimum.

The only thing I can say is if you want to do email yourself possibly use [1] for an easy to setup system and make sure you get a box with minimum 512mB of RAM or around 1GB because ClamAV is fat.

Or go [2] for a hosted solution. Who are doing great things regarding encrypted mail.

[0] https://www.prometeus.net/billing/aff.php?aff=157

[1] http://redmail.com/

[2] https://protonmail.com

for [1] do you mean http://www.iredmail.org/ ? never heard about redmail.com though.

Me and a number of my freinds in clubs at my university use it whenever we need to send secrets to eachother, but everything else is done in clear text.

> PGP is dying from what I read... Can you please provide some references for this? What are the alternatives?

Consider "Engineering Security", by Peter Gutmann: https://www.cs.auckland.ac.nz/~pgut001/pubs/book.pdf

Axigen's free e-mail server is pretty solid and easy to set up.

> More and more are migrating to cloud these days, I expect more and more epidemic leakage will come.

Why? Couldn't isn't relevant to security.

If anything, it makes it easier to configure firewalls and rights, so it's easier to put security in place.

> Irrelevant. Cloud doesn't impact security.

I disagree. The larger the congregation of value by a single target, the higher value the target. Saying it doesn't impact security is like saying whether a building is a bank or a house doesn't impact security.

(It should also probably be noted that I assume the OP was referring to "cloud" as in centralized data services as opposed to "cloud" as in hosted servers/VMs)

If anyone else has screwed up and used MD5 for passwords and doesn't know a good way to migrate towards something secure: https://paragonie.com/blog/2016/02/how-safely-store-password...

Well on the upside, if you changed your password as a result of the hack from a few months ago, you should theoretically be safe against this one which happened in 2013.

Those security questions, on the other hand, are still fair targets.

I had a Yahoo account entirely to use a Yahoo email list; I used to have it for Yahoo chat, but I haven't used that in years.

So I ignored the hack a few months ago. I also never got notified that I was vulnerable.

Just now I tried to log in to see if my password had been invalidated. Nope. It was my old insecure "pattern-based" password (myprefixYAHOO) that I use nowhere any more. Probably short enough to have brute forced with MD5 in a few minutes at most.

And yet...no spam sent from my account. No spam in my account (except some kind of announcement from "Aabaco, the new name of Yahoo Small Business" from a year ago. Just some of the mail from the email list that petered out over two years ago as the list transitioned into a Meetup group.

So I guess Yahoo either has considerably more than 1B users, or there were simply so many compromised accounts that they didn't bother trying to use all of them to send spam.

Changed the password just now to something secure "just because", but it's hard to care.

It's more that there's more then 1B accounts out there - remember that this isn't just "yahoo.com" that got affected, it's Yahoo, YMail, RocketMail, yahoo.co.jp (a HUGE community btw), and several others which all fall under the "Yahoo accounts" umbrella. Not every account was hacked by any means; terrifyingly, the number of accounts isn't nearly what you'd expect as a percentage of "Yahoo accounts".

Yahoo! Japan is separate from Yahoo! "worldwide". They actually run separate parallel infrastructure for many things, so I highly doubt YJP was part of the one billion accounts.

Makes me wonder what service or other sub-section of Yahoo actually got hacked, then. Doesn't seem to say in the article.

I almost hope the data is made somewhat public so Troy / https://haveibeenpwned.com/ can get a hold of it and provide the public with reassurance.

By now I suspect you can simplify it down to just matching on the RHS for any domain registered to Yahoo.

A number of ISPs have used Yahoo to provide mail services in the past, so it's probably not quite as straightforward as that.

And not just small ISPs, but major ISPs, such as BT (with 32% of the broadband market-share in the UK).

That was my first thought. I used to have a Yahoo email and I'm assuming at this point (multiple hacks), it's out in the wild.

I used to as well but if you still can access it you should delete it: https://help.yahoo.com/kb/account/close-yahoo-account-sln204...

One day, this will be Google announcing they've had a breach of this size. Not looking forward to that day.

At the very least they probably wouldn't be using md5.

there's a couple of things that these major providers getting pwned teaches you:

1) their security isn't good just because of their scale/size (that begins to seem more and more like a false-assumption nowadays)

2) migrating your email to a new provider is quite difficult (consider that the average person will have just 1 - or 2 - email accounts and they link EVERYTHING to it)

3) the price of ads/convenience is no longer worth it. I'm assuming at least a sizable minority of internet users are using ad-blockers these days. They can't get your eyeballs, so they package and sell your data. Granted, you can probably now get the same (raw) data on the black market by paying a fraction in bitcoin and you'll get to see those billions of emails telling people someone attacked their farm in farmville from 2009

Lastly (and I really hope this happens), Yahoo implodes/collapses (cause the average Joe won't migrate willingly) and leaves a vacuum for their 500+ million email users. Hopefully the smaller providers (Proton, Migadu, Posteo, Tuta, etc.) get at least 10% of these users and the email-cartel is broken (somewhat).

If Yahoo goes down, I won't have email; or at best I'll maybe keep a Zoho. I hate Google's mail interface, I hate the way they make 'conversations' out of discrete emails, and I especially hate their lack of folders. I use GMail begrudgingly at work, and only when necessary, and every time, I look at it and go, "what dipshit ever thought this was a functional way to deal with email?" As a dedicated Windows user, I'm more likely to use iCloud than GMail if Yahoo goes down; but I doubt that.

I like Yahoo email as a user. Yes, they've made mistakes, and I accept that. I'd prefer their mistakes over Google's superiority complex.

> I especially hate their lack of folders

GMail supports labels as folders. When you create a new label it will ask you if you want to nest the label under another label and you can do this repeatedly to make a nested folder structure.

Crucially, this will show up as nested folders via IMAP.

They should use IMAP labels. Their IMAP implementation has always been terrible and broken. Anything you label gets put into an IMAP folder and therefore you download it multiple times per label.

I agree; fuck everything about gmail usability. It was a cool trick when it came out. Now it's just overly bloated AJAX, non-standards compliant garbage.

No, no - I understand that you can think this, and that they claim it, but from a UI angle, it's wrong. I hate the implementation of labels.

They don't actually disappear when I click on inbox. When I want my inbox, I want just that folder - all filtered content goes elsewhere and disappears until I want it. That's not GMail's way.

I have a giant folder hierarchy in my gmail, so I assure you this can work.

In your case it sounds like you're taking a message with the "Inbox" label and adding the "some/folder" label, which will indeed still show it in both places. If you move the message, which removes "Inbox" and adds "some/folder" it will no longer show up in the Inbox.

> I hate Google's mail interface

Their IMAP interface is both standards compliant and fully functional. I'm not terribly fond of the Gmail web/native apps either, so I just don't use them (though I do occasionally hop on the web app when my client isn't searching the email as effectively as google does).

With 2FA it can be a bit more work adding devices, but it's not a deal breaker for me.

Their IMAP implementation is anything but compliant or functional. They implement labels as folders. Every time you use a label, it downloads that message multiple times and puts it into folders.

Also when you try to write drafts in Thunderbird for Gmail, it stores them in such a way as each saved draft turns into part of the conversation (WTF?!) It makes conversations totally unreadable.

I quite gmail years ago and do not miss their broken IMAP implementation.

Fair enough, I guess it's about differing usecases. The few labels that I do use on gmail are set up such that they are indistinguishable from folders (if they match a filter, they don't go in my inbox, and I don't have any overlapping labels). I also rarely have lingering draft emails, so I guess I've not noticed that particular issue (though I do use Mail.app, not thunderbird). Gmail's imap support has been adequate for me since they introduced it however long ago that was. YMMV

Edit: my point about standards compliance for gmail imap was purely about it actually working with third party clients, I've always known that it doesn't conceptually work the same way as a standard imap server.

I don't download my email. I have a webmail for a reason - I don't want a mail client with all its attendant files gumming up my PC. I moved off Netscape Communicator to webmail because it took up over 40% of my drive, and I've never regretted the decision.

And I don't like 2FA either. It's a hassle and never, ever, worth my time or energy. There was one gaming service (I think it was an MMO) that demanded 2FA or bust. I don't use that service and never will.

> I don't like 2FA either

That seems to be a rather dangerous position to hold these days. I personally dislike that googles 2FA is SMS based (unless there's a way to use e.g. Authy with it that I'm unaware of), but still seems that the only way to be reasonably safe is a strong password and 2FA.

I'll add that The authy app on the Apple Watch has made 2FA for services that support it rather painless.

I use 1Password as my 2FA app for Google services. I only set it up relatively recently so maybe support for non-SMS is new. Or region-based?

It's been there for years. Since ~1yr ago they also support U2F (hardware token based, un-phishable)

Google also allows you to use google authenticator, but I don't believe they allow third party services.

You can use any TOTP app you like, they all use the same QR code format. You can even build your own, it's all well documented.

Google does allow other apps (I think it is still same as GAuth) SASS pass and other authenticators with good. But yes they don't allow other token provides like Yubikey or RSA fobs

Google is one of the major proponents of U2F, which is based on hardware tokens. Yubikeys support it, either the cheap U2F-only one or a Neo / 4. I use it and it works flawlessly. At the moment, only Chrome implements the required APIs as far as I know, but Mozilla is working on adding it to Firefox.

> But yes they don't allow other token provides like Yubikey or RSA fobs

I think this is incorrect, at least provided you're using Chrome. The implementation was buggy somewhere in the chain the last time I tried it, but it's there.

> If Yahoo goes down, I won't have email; or at best I'll maybe keep a Zoho.

Zoho needs a phone number verification for signup. Unless you're confident that Zoho will never get hacked like Yahoo has been (multiple times), your phone number could be one more piece of information that's exposed yet again whenever it gets hacked (this also depends on how you use email and if you include your phone number in emails).

I never gave them a phone number...

You don't like GMail, I understand that (I don't either), but why stick with Yahoo? There are more providers with "old-fashioned, boring" web interfaces...

You can disable 'conversation' mode. I did that 2 days after they make it the default.

Migrating e-mail is very difficult, especially if you're like me and decide to setup your own e-mail server. The biggest problem I had was my e-mail getting falsely classified as spam:


I've also occasionally found really old services that use my old e-mail account. Even thought I have the password, they still require e-mail verification; which can't be done because the gmail account doesn't exist and it bounces.

Just use google apps, $5 a month gets you unlimited domains, an account with like 40+ aliases, email, and all the other applications. Setup SPF, and DKIM records, and away you go. There's no reason to deal with running your own email service at this point.

What irritates me is that there's really no safe place for email.

Even if you pay for a host they still have access to it all. You have to really trust them.

If you try to set it up at home you need a static IP and need to be prepared for it never working because of spam filters and stuff not trusting dinky self hosted services.

And no matter what you do - barring PGP which nobody uses - it's all sent over plain text anyway.

But that's less worse than anyone having your entire email life with one hack. Sigh.

Just use your own domain. That gives you the ultimate power over the your emails regardless of the mail provider you end up using.

haha I don't understand why this comment isn't more prevalent, especially here on Hacker News. Like really, why are so many using vanilla yahoo.com and gmail.com addresses?

> Lastly (and I really hope this happens), Yahoo implodes/collapses (cause the average Joe won't migrate willingly) and leaves a vacuum for their 500+ million email users.

You do realize this makes you sound both terrible and ignorant, right? I happen to have a Yahoo account, which I registered way back when the options were that, Hotmail, and maybe AOL. I have self-hosted mail now, and a redirectable primary address, but that Yahoo address lives on in various address books.

If you want to realize your email dream, you should try to turn the Yahoo addresses into an eternal forwarding service for current accounts. Maybe you could sell space on the signup page to smaller providers.

Wait, according to this,

""" Based on further analysis of this data by the forensic experts, we believe an unauthorized third party, in August 2013, stole data associated with more than one billion user accounts. We have not been able to identify the intrusion associated with this theft. We believe this incident is likely distinct from the incident we disclosed on September 22, 2016. """

this incident occurred in 2013 and 2016, or they needed three year to figure this hack out. How is this possible?

It baffles me that Yahoo continues to live an independent existence. It's like a Terminator that never had a clear mission and now just wanders around randomly banging into things.

When credit cards are compromised, the responsible party is usually responsible for providing identity theft protection. Why not tech firms that seek to store sensitive personal information? Maybe it'd scale back the desire for every firm to collect as much personal info as you'll provide them.

To be fair, that identity theft protection is lip service/worthless bullshit.

True, however, it does put SOME price on data collection, rather than leaving it in the realm of pure externality. The deterrent of cost is the benefit, not the protection itself.

On the other hand, adding a price may embolden deep pocketed organizations to 'pay to absolve' for losing data to hackers on an ad hoc basis as a cheaper alternative to strong security and limiting data collection scope. In that case, the impotence of ID theft protection hurts a lot more.

I generally like this idea, but wonder if it would result in even fewer disclosures.

Nobody in here mentioned it: phone numbers were leaked, too. Which I consider even worse.

I wanted to sign up for Flickr, but the Yahoo login requirement was a big turnoff, because it requires a phone number. This nagged me so much that I never did it.

Turns out: right decision. Because my 8 year old phone number isn't target of spam yet.

Phone number being a requirement for signup is bad. There are providers who mandate a valid phone, which they verify through SMS or call, even for paid accounts and services (not just for the payment processing step). Whenever possible, I avoid signing up for such services.

Its scary to think about the consequences if the only reason Yahoo knew they got hacked was that they are more, and not less competent at security. Do you think the security team {insert retailer, other nontech company with a login screen here} is somehow MORE competent?

I logged into my yahoo email in chrome in an incognito tab and it logged into someone else's account. This was probably in 2014 (it could have been in 2013). I wonder if this was related at all.

What's likely is that two people were logging in at once and they ended up with the same credential because someone didn't realize that a servlet is a singleton.

Thanks. I posted to reddit asking what happened but no one ever answered. I thought it had to do with someone leeching my wifi or something causing some "saved data" from IP or something crazy like that. The other user was also living in Japan (I sent him an email on his own account telling him what happened but didn't stick around to see what happened). No, I didn't read his emails but when I looked at "new" expecting mine I saw a few emails about something in Fukuoka Japan (probably where he is or close to where he is).

"We analyzed this data with the assistance of outside forensic experts and found that it appears to be Yahoo user data"

How the forensic experts could have analysed? based on the log data? my another question is, just assume if yahoo is trying to dump the experts, can it be possible? or else, still the experts be experts to make sense out of it?

Is this english?

Notice that this is yet ANOTHER hack, not the one HN was talking about a few month ago. also notice they were still using MD5 passwords AND without salts ... None of these hacks have been disclosed directly to their users, I never got an email saying I may have been hacked and I should reset my password, irresponsible.

This "new" one happened before the previously disclosed hack.

What a hot mess. I am glad I mostly ignored their services over the years.

I hope they stopped depending upon those security questions if that is part of the leak. On a side note, this seems like a great time to be an abuser. One can collect so much information about users - they may actually have more data than any govt in the world.

Did you catch this?

"Separately, we previously disclosed that our outside forensic experts were investigating the creation of forged cookies that could allow an intruder to access users’ accounts without a password. Based on the ongoing investigation, we believe an unauthorized third party accessed our proprietary code to learn how to forge cookies."

While I agree with your sentiment towards security questions, they are irrelevant when something like that is done. A bit scary.

No, that part of the article wasn't there when I read it. Interestingly, the article ended with the word "Developing". I believe they meant that the article will be updated as and when they receive more information.

I hope everyone stops relying on security questions!

What is your mother's maiden name?


Of the Greenwich T3m92uGKhWMRV7Um0WVF50LKQNowpoe0FWwWryL2r9jkuAHyLTCY8QoY79iMiSjo6CHCZGWl's ?

I laughed too hard at this.

I hate it when I'm asked this in person at banks and shit.

"Your mother's maiden name has four numbers in it?"

"It's a password. You should never use real answers for security questions."

Which only works until you call in asking for a password reset and when they ask you the question you just say "I just hit the keyboard a bunch".

No, I pull up the answer out of 1Password and read it off to them.

I thought Klathmon was pointing out say that they an attacker could say that they just mashed on the keyboard and that would be good enough for the fallible human on the other end of the phone.

Anecdotally, I had a time where I couldn't remember may answer to a secret question except that it was a type of food. I called in and the human on the other end let me reset my password with just that explanation. Take that for what you will, but it seems like if someone knows you use passwords that are random strings, they can use that to break in.

Sorry, I meant to imply that the support person will hear the explanation and let you reset the password without the actual answer.

Fair enough, as I believe I've had that happen. Random string for one of my financial institutions, needed to reset something. Pull up 1PWD, with random string at the ready and...they asked me questions that could have been pulled from a copy of my credit report. I didn't ask, so I'm not entirely sure, but I wonder if they didn't look at the answer, said to themselves "fuck that" and went with Option #2.

Diceware is a decent option for security questions. They work fine over the phone.

"Charlie capital-echo lima peru capital-october..."

Wait until they introduce a real name policy...

But when the security question answers are leaked in plain text, they can still use it to get into your account.

It's a good strategy, but a pain when you have to tell them over the phone.

Side note: if your Apple account security questions are gibberish, your account now gets stuck in an infinite reset loop. I need to answer the security questions to reset my security questions or to reset my password. This occurs even if you have the current valid password, the questions are mandatory for all changes. The questions are also mandatory for phone support, so I’m locked out of my account even though I have the password. Great job everybody.

The answers to my security questions tend to look like "e74bd7eb-10c6-4b90-bde0-dde2ed64946e"

Your security answer is a software license key?

Easily cracked!

Or maybe a Windows CLSID.

Guys... let's just delete our Yahoo accounts. That company can't go bankrupt fast enough. It will sell our data for quarters.

You're right, but it goes beyond that.

Yahoo used to be a titan. I was a regular user of Yahooligans back in the day. Yahoo (at one time) had been my go to search engine. I can't say that it was ever my primary email account, but I used it. I used Yahoo Messenger. I was part of a community that centered around some Yahoo games. Yahoo used to be a titan that was a direct Google competitor in the realms of communication, search, news and entertainment.

Sure, I can delete an account for any service belonging to any company(!) if I want. But when you use a service, there's an explicit level of trust that they'll protect your information to the best of their ability. We assume that we can use their services as our primary driver without worrying about MD5 hashes without salt. We assume that they'll take more security procedures than a student making a toy app testing boundaries in a 400-level course.

Sure, we can delete our account, but this is an unnerving situation. It's not like Yahoo was thought to be some back alley operation where everyone nervously awaited news like this. Yahoo was a direct competitor and contender to some of the biggest digital companies on the Web. This level of incompetence is mind blowing.

At least it is to me.

This goes into the much bigger issues of there not being enough search engines. Back in the day if you couldn't find something, there was Lycos, Hotbot, AltaVista, MSN .. each had their own indexes (or they purchased access to a few major indexes).

Later we saw people ditch their indexes and just using a few big players. Now we have Google, Yandex ..and...Bing? DuckDuckGo uses a combination of Yandex and others, Microsoft has been found parsing Google to build their index ...

I want more options, but the search space barrier to entry is very high.

Yahoo was my primary (as in only one not provided by my isp or school) e-mail address from when they bought Geocities until the early '00s. I've been disentangling as many services from them as possible over the past couple of years.

But that's where all my junk mail goes...

I recently decided that if I don't trust the company not to send me spam, I don't need to avail myself of their services. It might not be possible for everyone, but it's made my life a lot simpler.

I don't trust any company to not send me spam. Even if they don't do it today, they may get bought-out, hacked, or otherwise lose control of the data. Even if everything suggests that they won't spam me, I'm not entirely convinced. Even if they're the most upstanding company on the planet, I'm still not convinced. Based on past experience, you understand.

Well, I still have a spam filter algorithm. But I'm overall done balancing multiple personal email accounts because of the liability involved. If it gets hacked (and it's one I don't ever check), will I notice? It seems unwise to have extra email accounts open, potentially with my personal data and associated with my identity online.

It's also a matter of attention: I have a limited amount of it, and tracking multiple email accounts and managing a spam account isn't worthy of it.

Is it possible to extract all emails+contacts from yahoo without paying ? furthermore my paypal account is linked to yahoo.

You could use an IMAP client such as Thunderbird to copy your mail away from these clowns to another service that supports IMAP. Fastmail has an IMAP-based bulk importer that in my experience works well. I used it to hoover all of my mail out of GApps and into my FM account.

Download Mozilla Thunderbird, setup a new account with your Yahoo credentials and it will auto-configure it for IMAP. This does not require any payment. I did it recently and am still using it. You need to setup folder synchronization for offline downloads of all the messages and folders you have on your account.

Yes, you need to scrape their webpages. 10 years ago when I pulled everything from my Yahoo acct, there were several choices of open source scripts which could do it.

No, it's no longer necessary to resort to scraping the webmail pages. You can easily setup a client like Mozilla Thunderbird for IMAP with Yahoo and get all your mails and folders on to it for free. I have done this recently and it has been working. Only the ad-free webmail from Yahoo is a paid option.

How about... let's leave it alone? If you don't like it, don't use it. I like it.

Honest question: why did you need a Yahoo account?

There are many, many active communities and mailing lists still using Yahoo Groups.

And to clarify, some Yahoo Groups require a Yahoo account to participate. You cannot participate with a regular email address, you must sign up to a Yahoo account.

If you join a real-world social gathering which happens to use such a Yahoo group, you may find yourself excluded from online communication with that social network unless you agree to sign up to Yahoo.

But I'm afraid I have no sympathy for outraged users. No more than if it was gmail or hotmail. They didn't pay for the service, they got an email service for free. It's hard to complain when it is free. And they did enter into an agreement where they sell their privacy against a free service.

Knock-Knock... I pay for Y!Mail Plus. It's not just free users wanting a drop-box for spam.

I actually wasn't aware there was also a paid service.

Though I doubt this is a large share of the 1B accounts.

I'm sure you are right that it is a low percentage, but I bet there are a lot. When I used Yahoo, I paid, as did my wife. For the extra storage, no ads, and I could be misremembering, but I think paying got you access to imap or pop3, which wasn't generally available before the iPhone.

This is a time where a decent password manager comes in handy. I can look in my password history to see what my password was in August 2013, and see if that password is still in use anywhere else, then change the password on those sites.

honest question: if you're going to the extent of already using a password manager, why isn't every site getting a unique password?

In my case, it's because I still have some very old accounts in there. Accounts that predate not only this password manager (LastPass) but the previous one (KeePass) and which in fact go all the way back to something that started with "Yet Another (YA)" back on a Palm device.

I really ought to go through and do some janitorial work in there, but some of those are for sites that actually still exist and for which those logins are likely still valid. I don't care enough about them to go log in on each and change passwords, but I also don't want to simply delete them and leave yet another orphaned account.

Lastpass can report which sites are sharing the same passwords (and also which are not using a random password generated by it). For some sites it even automates the password changing for you. It doesn't work for all sites (including this one) but it saved me a load of time just recently.

I recently did this after I found a password manager that works with the devices I use. Took me a few hours to remember and track down all the accounts I use. Managed to get close to 70 accounts and even then I missed some. I think it was worth it though.

Back in 2013 I wasn't using the password generator feature for some reason. I was just using it as a vault for my memorized passwords. A bit silly I realize and now generate passwords.

honest answer: one of those sites is your recovery email, the one way back into your digital life if something goes deeply wrong. It's the password reset email if another site is hacked. It's the "bootstrap myself from some other computer" account if your machine dies or your house burns down.

IMHO you should memorize one very strong password for one somewhat-trustworthy site.

> IMHO you should memorize one very strong password for one somewhat-trustworthy site.

This would be necessary if one is using a password manager, which is something everyone should use for multiple reasons and benefits.

As an alternative, you could also invent a scheme for passwords. Have a prefix, body and suffix for every password. You decide which ones should be static and which ones should be something that's easy to derive just by looking at the website name (part of the name, few letters from specific positions). You can also have different static pieces based on the nature of the site - email vs. bank vs. online store. This may not be as good as using a unique password per site that's a random strong password generated by a password manager, but is easy to remember depending on how you construct it.

I've struggled with this, the issue to me is by their very nature I want those passwords that can be used for bootstrapping/resetting everything else to be very strong ones. I've settled with making a list, encrypting it with a memorized moderately-strong passphrase, and storing copies (flash drive or base64-on-paper) in a few physically secure locations that probably won't all get destroyed at once. Maybe this is overly paranoid but it wasn't a huge amount of effort, either. At least I am pretty confident the weak links are now the security of those services themselves (and my client computers) and not the passwords.

I hate having a secondary system just to get access to my accounts. I use a password algorithm. This ensures I have a unique, easy to remember (or derive), hard to guess password for every account:


It seems like you're getting downvoted (not sure why). I use "password algorithms" like you've blogged about (but very different in nature) to have memorable passwords for some sites. For many other sites in the last several years, I have started moving to creating unique passwords and using a password manager to store and use them.

Let's not forget that high ranking officials in the US govt. used Yahoo to send classified information to print at home.

security is important, but lets not forget the strides theyve made in making meaningful connections with their audience via collaborative relationships with powerful leaders such as Katie Couric

For the longest time my yahoo account (which I had not checked on in many years) reported at least a dozen open sessions originating from IPs in Russia and Eastern Europe, and unlike my legit sessions I was unable to kill them in the control panel (the site would bug out)

So yeah, Yahoo's been hacked. Duh...

Finance and Flickr are about all Yahoo is good for any more, and I think my portfolio page loads (instead of 404'ing) maybe 1/2 the time I request it...

(God I really hope they dont mess with flickr though...)

I'm really not sure how they could do flickr more damage than they already have.

I'm using Yahoo mail and when I logged in, they gave me a link to their security notice. About 'Hashed passwords', it says:

"At the time of the August 2013 incident, we used MD5 to hash passwords. We began upgrading our password protection to bcrypt in the summer of 2013. Bcrypt is a password hashing mechanism that incorporates security features, including salting and multiple rounds of computation, to provide advanced protection against password cracking."

WOW. So basically they did not even salt their passwords until 3 years ago! I knew about the importance of salting password hashes since I was like 17 years old and this mega billion-dollar corporation did not.

Also, they claim:

"Hashing is a one-way mathematical function that converts an original string of data into a seemingly random string of characters. As such, passwords that have been hashed can’t be reversed into the original plain text password."

Which in the case of MD5 is a deceptive claim; even a basic dictionary attack could probably reverse at least 50% of all their accounts' MD5-hashed password (assuming most people use one-word passwords with maybe a few digits at the end).

It's one thing to know it and the other to deploy it. You don't know how messed up their system might be - it might actually be a very difficult change if it's tied to other components with some crazy kludges. Looks like they didn't prioritize them well enough. And if you're looking for an example of a company that's way too afraid of changing anything in their system because it's too much of a mess, consider PayPal an example...

Yahoo is so frustrating

I got the email this morning regarding the hack, I've not used Yahoo for a long, long, long time, so figured I would go and delete my account.

So I log in, password in 1password is incorrect, no big deal I go to reset it. They send me an email, I reset the password then go through the account deletion process. It tells me my account is "deactivated" and will be deleted in 90 days

...Once that was done I just so happened to look through my emails to see what Yahoo had sent me in the past and I saw that I had undergone the exact same procedure (deleting my Yahoo account, presumably after news about another hack) about 3 months ago but completely forgotten about it.

So what I must have done today was relogged into my 'deactivated' account that I 'deactivated' back in September, which caused it to become active again, then issued a 'deactivate' request again, so now I have to wait ANOTHER 90 days for it to be deleted.

I've made a note of this fact this time to avoid relogging into Yahoo again...

If your account name is related to identifiable to you in any way and could potentially be used to spoof your identity, you shouldn't delete it because Yahoo will let a new user take your account's address.

Yes, that's worse than Facebook, which does the same thing, but for 30 days.

I thought "didn't they already announce this recently?" Nope, that was a different one. Boy oh boy.

My thoughts too. I was "yawn .... holy hell another billion accounts". It will be interesting to see where Verizon lands on this.

When are the mutlibillion dollar lawsuits that cause these idiots to get it together with security

Unfortunately, lawsuits are rare unless a user can demonstrate that the hack led to measurable harm.

Likely obviated by their EULA

MD5 in 2016?.I hope yahoo can save itself and tech community all this embarrassment by just going out of business one and for all.Folks at the helm of affairs at yahoo are incompetent. And it is about time government started to persecute incompetent CEO.

It occurred in 2013

Unsalted MD5 has been demonstrated to be vulnerable to collisions since 2005. Rainbow tables existed way before 2013. There's no excuse for a tech company of this size.

UNsalted anything has been phased out earlier in a lot of other places

..and it took them three years to find and report it?

More likely report than find. From what I've seen of their current disclosure policies, and what execs have written on Y!Answers and such, they find the problem, they figure out who did it and how, and then after they've figured out how to fix it, they alert the userbase and the public - in that order.

Also, please do remember that we're getting into a different leadership team now at Yahoo; previously they were absolutely convinced that disclosure and alarmism were one and the same - and that any perceived weakness in the Yahoo Mail product would drive people to GMail.

The intrusion happened more than a year after Marissa Mayer became CEO of Yahoo.

Maybe that can get Verizon another $1B discount.

I'm hoping Verizon kills the deal. It would send a powerful message (unintentional on Verizon's part, but irrelevant) that a major data breach + installing NSA's rootkit on your servers could one day cost you billions of dollars, as well as give you a forever tainted reputation.


By now it's probably easier if Yahoo just published the (short) list of services that weren't owned through-and-through right under their noses, and notify users unaffected by any breach (0 rows returned).

I'd forgotten my yahoo password but wanted to change it. They sent a code to my phone and I was able to do that.

Then I tried to set up 2 factor authentication but I am unable to do it. It keeps rejecting the same phone number as being either invalid or not recognised as a contact, no matter which format I choose to enter it. I've dropped the interational prefix, added it, added and dropped the plus sign, added and dropped the 0 after the international prefix etc etc.

I'd dump yahoo altogether except it's the email for my paypal for over a decade and i can't change that.

I've changed my paypal email twice in the past already when I started getting too much spam.

Regarding 2FA at Yahoo!, I've also had issues... SMS stopped arriving altogether and I had to disable it.

Paypal says I can't change it because it's my primary email address. I prefer using authenticator anyway rather than sms 2fa and yahoo don't offer that it seems

Maybe you have to add a new email to your paypal account, then change it to be the primary and then delete the old one.

You're right! Thanks very much.

In the context of (unsalted) MD5 passwords: If they have a large legacy base of MD5 hashed ones, how would one "move" those to a stronger hash function?

I can imagine something like re-hashing the existing one with a better algorithm and some salt, and storing new ones solely using the new algorithm + salt. But that introduces some additional complexity because every hash needs information about how it was hashed (MD5 + X vs. just X).

Is there an established best practice for this?

Yes, there is.

The one I prefer, which you've mostly laid out, is: new passwords are entered as bcrypt(pw) and then stored as "B-$result", old passwords are re-hashed as bcrypt(hash) = bcrypt(md5(pw)) and stored as "M-$result", then your auth function works as follows:

    def auth(user, pw):
      hash = get_hash(user)
      if hash starts with "B-":
        return hash == bcrypt(pw)
      else if hash starts with "M-":
        return hash == bcrypt(md5(pw))
        # remove this once you've rehashed your entire database
        return hash == md5(pw)
The naïve solution is to skip the "B-"/"M-"/"" annotation but if you do that you've introduced a situation where attackers can login to old passwords using md5 leaked from another source.

The other solution (that Yahoo used) is use bcrypt(md5(password)) which allows them to rehash all existing passwords without logging in.

I think that's what he described, plus adding a prefix in order to indicate it's been re-hashed.

That's exactly right, thank you!

If you collect user PII & get hacked, you should be obligated to pay for the damages. Specifically, covering the user for identify theft monitoring for 10-15 years.

I just attempted to log in to an old @yahoo.com account that I haven't used in probably five years or more.

On the login screen, there was a short notice about this breach (with a link to more details), and after logging in I was prompted to create a new password, and update recovery emails / phone numbers.

That doesn't negate any of this shit that happened, obviously, but maybe they're at least gonna try to make things better (we can hope, anyways).


Chrome says "The server presented a certificate that was not publicly disclosed using the Certificate Transparency policy. This is a requirement for some certificates, to ensure that they are trustworthy and protect against attackers."

Probably my Chrome version is too old I guess? (probably not, it's 53 which is only a little behind the latest).

Your Chrome version is indeed to old. There's an issue with Symantec certificates and Certificate Transparency in Chrome 53. Just update it.

Being a Fortune 500 CISO must be so easy. Corporate expectations are evidently low enough that you probably don't have to show up for work.

It's amazing that they're telling users to change their password/security questions 3 years after the hack.

So, the scuttlebutt last time was that they disclosed the hack due to a potential Verizon buyout forcing their hand. Seems as though this could be the same thing, generally speaking.

Can anyone enlighten me as to how Verizon compels Yahoo to disclose this information? Or rather, how does Verizon know about these intrusions, if they do?

It's most likely just part of the due-diligence required for the merger.

Yahoo brass probably decided that publicizing this information wasn't worth the PR hit, and so they buried it, but Verizon doesn't want to take on the risk of a potential lawsuit, so their lawyers required Yahoo to disclose it if they want the deal to go through.

I believe in this case Yahoo wasn't previously aware of the hack - they were notified by law enforcement, which presumably found a file of the information somewhere during an investigation. There was enough info to let them ID a time period, but nothing that lets them know the who/how of the breach.

Some states like CA have a legal requirement to notify in the event of a breach, so hiding this event is illegal.

The article is from Yahoo, it's a notification from Yahoo, announced by their head of security. They're not hiding the event.

Seems like they hid it for a few years.

While i was new to programming and i read some articles about why not to store password in clean text, a google was enough to taught me about blowfish algorithm and the concepts of higher costs hashing benefits! Well my life first program was more secure then Yahoo i guess, storing password in MD5 too bad Yahoo...

Your life first program was about hashing passwords? Mine was about printing "hello world".

What's I dislike the most about this situation is that I cannot even shut down my yahoo email account, as it could be re-created by someone else, i.e hijacked.

It's also terrible that such bad password policies are being pushed onto users, yet no guarantee of security is associated with them.

LOL this is rich:

"...identified data security issues concerning certain Yahoo user accounts."

Certain...more like all up to that point?

Not by several orders of magnitude.

Are you saying yahoo has... trillions of users?

Trillions of accounts, based on reputable sources. Users and accounts are different.

[citation needed]

1 billion accounts. I'm curious: Has there been a bigger data breach, in terms of user volume?

What good is requiring you to change your password on the next login? How do they know it's not just being re-compromised? There are a lot of accounts that are orphaned, but the contents are exposed and still a threat to the original owners.

Why not just lock the accounts?

Alternate (and to me, more believable) explanation: this is a great way to get all of Yahoo's inactive users to sign in, bump the "active in the last year" user count, and goose the company's valuation.

sigh this is really shitty news. In a time when governments are deciding more invasive surveillance is in everyone's best interest too, it's probably never been more profitable to be a hacker.

This just proves that Silicon Valley is full of geniuses. I mean, look at how cleverly Yahoo kept it a secret for so long! Well, at least the Valley's rapacious landlords got paid.

Everything will come to light. All our info is being stored somewhere. One day, people who know you will be able to easily search a database of all your information for specific things.

What happened that made them disclose this > 3 years later?

So when did Yahoo stop using MD5 as the password hash? 2014?

Have they stopped? ;-)

I didn't know Yahoo had 1B accounts. Most must not be active, otherwise how could they be so small financially compared to Facebook and Google?

FYI - the tumblr link for the notice redirected me to a "You have viruses installed on your computer" site. Hacker News just got phished.

Sales and marketing automation companies just got a huge boost in their capability to do SMTP validation.

Time to go check HANSA on the onion.

How does yahoo have a billion accounts???

I cannot tell from this disclosure -- have they updated their algorithm beyond MD5 at this point?

Maybe they're using double MD5 with a salt of "$uper$ecure".

I don't know why anyone would still be using Yahoo as their email provider at this point.

I don't use the account, but I was required to create one in order to get internet connectivity at home via AT&T...

What alternatives to Flickr do I have? I think I pay $25/US/year at the moment.

Yahoo - AOL email for grandparents, owned by Verizon, destined to be maintained by Taos.

This is the same Yahoo that wants us to switch to a LESS secure password-less Yahoo Key?

Security question : mothers maiden name?answer: 1q&#*v83%?ghd53

Date of birth : 01/01/2011

Using a random answer doesn't help against an attack it the security questions are stored in plain text. I'm not saying storing security questions as a hash is any better practice since these questions just need to go away. I am saying that most likely they aren't stored as hashes so a phone operator can query you hence random is only as good as something like BarkBarkRuffRuff for a maiden name.

I think the point is not to reuse these common security question answers between sites

First thing I did was control-F "sorry", "apolo", "inconv".


Is this __another__ 1 B users or the previous one which was already posted?

I thought we knew about this already, is there more info than before?

This is a separate breach

What value does Yahoo have for Verizon now, the brand is so tainted?

I'm not sure Average Joe really associates these hacks with incompetence or negligence. Those nasty hackers are making victims of poor Yahoo.

That's why Yahoo made the point of blaming a "state-sponsored actor". You would expect a giant tech company to be able to defend itself against random hackers, but what if it was the government of Russia?? That's why Sony Pictures blamed North Korea for what was, in the opinions of security experts, the work of an insider.

OK so I'd like to invite the pure free market types to explain how this gets fixed without any government, including no lawsuits. Because I keep hearing from free market types that 100% of phishing victims are ignorant and basically deserve what happens to them, if they can't learn that they're being duped they deserve to be duped, they somehow think wholesale loss of trust ends up being focused only on specific companies rather than entire technologies. And so on.

So how are these externalities dealt with where there is no such thing as insurance for this type of breach? There's no way to put the toothpaste (my private information in the form of answers to personal "security questions") back into the tube (only my brain or nearby sphere of influence).

And this goes along with IoT devices that aren't having their known exploits patched by their manufacturers. Similar problem different details.

So without broad laws that say this is wrong and here is a mechanism to attach a tangible cost to this information so a proper risk assessment is done, I imagine we keep seeing this happen with essentially no punishment beyond what Yahoo already is getting punished for.

> including no lawsuits

Are there are "free market types" who actually believe there shouldn't be any form of sanctions whatsoever for causing harm? I've talked to quite a few hardcore libertarians, and I've yet to encounter anyone who takes it that far.

Even anarcho-capitalists, the most hardcore libertarians, believe heavily in the court system.

So I'm not sure what the OP means "without lawsuits". Because lawsuits would most likely be their answer here. Also maybe competition from other email vendors who take your security seriously and doesn't leak 1 billion emails? Or pressure from investors not to create that type of liability?

Pretty obviously a strawman, it's far easier to win such an argument with silly caricatures of libertarians as an opponent... someone who believes that all companies should be able to do whatever they want, without any consequence!

Only the most extreme niche of the already niche group of anarcho-capitalists believe in private courts or private law enforcement. Which does not at all reflect mainstream libertarian thought. Who instead wish for a "minimal" state, which at a very minimum means centralized courts.

I've heard economists argue that economies and societies do not exist without some form of a legal system (chiefs, kings, courts, etc). It's the very core of human co-existence to be able to resolve disputes in a fair and just way.

Seems that anarchy is contrary to government, and out of necessity a government is needed to have a court. Those pure free market types I'm referring to self describe exactly as anarcho capitalists and say all disputes are resolved by insurance, exactly zero government. If there's a court, maybe that's a venue the insurance companies all agree upon. But if you don't have insurance or don't have good enough insurance you don't get as much representation or as much of a payout and that's your choice, sometimes life is unfair and you get screwed over.

And as it's describe to me I almost immediately start thinking of Gangs of New York and axes. It's such a total departure from anything remotely civil I can only imagine this leading to a bunch of heads being chopped off. But hey, there's insurance for that too I guess.

Quoting Mises who is the Marx of anarcho-capitalism:

> To be opposed to the state is then not necessarily to be opposed to services that have often been linked with it; to be opposed to the state does not necessarily imply that we must be opposed to police protection, courts, arbitration, the minting of money, postal service, or roads and highways. Some anarchists have indeed been opposed to police and to all physical coercion in defense of person and property, but this is not inherent in and is fundamentally irrelevant to the anarchist position, which is precisely marked by opposition to all physical coercion invasive of, or aggressing against, person and property.


> An important point to remember is that any society, be it statist or anarchist, has to have some way of resolving disputes that will gain a majority consensus in society. There would be no need for courts or arbitrators if everyone were omniscient and knew instantaneously which persons were guilty of any given crime or violation of contract. Since none of us is omniscient, there has to be some method of deciding who is the criminal or lawbreaker which will gain legitimacy; in short, whose decision will be accepted by the great majority of the public.


(Note: not defending this stuff, just pointing it out for sake of discussion).

Elsewhere someone pointed out the book "Anarchy, State, and Utopia" which has a better overview of what libertarians believe in. Which is a "night-watchman" state, a minimalist government which includes courts, police, and border control.


Yeah if anything lawsuits/courts are central to their argument, that the owners of property will use legal recourse to settle damages to them, and that mechanism serves most of the functions of gov't regulation.

No, but there are libertarians who believe that risk mitigation skills never be required up front, no matter how unable the party at fault would be to restore whatever was lost. (Note: Some things cannot be restored at all; lives and disability being the obvious cases)

The point of government is to mitigate external and unacceptable risks, and we have grown this system based on experience over hundreds of years. Some super free market types seem to argue that we should throw all that away and then institute systems that, over time, will just reinvent the same things. My guess is they believe they will personally come out on top during the reset period through whatever strength/privilege they inhabit.

In Robert Nozick's Anarchy, State, and Utopia he argues the minimal state would be "limited to the narrow functions of protection against force, theft, fraud, enforcement of contracts, and so on." I'd imagine most libertarians would agree basic contract laws and some level of lawsuits should be acceptable to a functioning society.

There are many who view contacts as a holy right but would also allow you to sign away anything in a contract – so if you accepted a EULA which makes it hard to sue, it's your fault. There's a certain consistency to that position but it completely ignores the scale of the power differential.

I think "free market types" understand and accept that the world is a messy place, and that human organizations cannot generally be trusted to consistently do what they claim. ("Your information is safe with us", etc.) I believe that this is due to fundamental properties of human nature and group psychology.

I believe that there is no general way to create near-perfect accountability for the statements of people, and that the second-best option is to embrace the uncertainty and develop a finely-honed sense of risk assessment.

It sounds like this is the underlying concern -- if you are unable to appropriately assess risk, and you place your trust in an organization that then betrays your trust, you feel violated and want to work toward preventing that sense of violation in the future.

You could do that by working to hold organizations accountable through regulation or other means.

Or you could improve your ability to assess risk, and consciously and deliberately accept risks as they come. When the inevitable adverse event happens, you understand that you consciously accepted a risk in the past, and appreciate the opportunity to refine your own personal ability to evaluate risk.

So, I don't agree that this is something that needs to "get fixed". I fully expect that my basic personal information is poorly secured, and I consciously accept that in exchange for the benefits of participating in our current society and using the current services offered.

It's easy to say that "we need better information security", but every decision has a tradeoff. Increasing security fundamentally increases costs, slows the flow of information, and creates less nimble organizations. Many of the services you expect to be available -- Uber, cheap IoT devices, whatever -- may simply not exist in a world where a "proper risk assessment" and "tangible cost" of information breaches are applied.

You, of course, are free to spend your time educating the public about information security, or how corporations can't be trusted, or even lobbying for information security regulation. That's what makes Earth fun and interesting -- everyone's following their own passions!

You're building a straw-man. Before trying to argue against a position you should try to understand it.

On a similar note, if you're going to tell someone they're wrong about something, you owe it to them to explain why. Otherwise they won't (nor should they) take your feedback seriously.

I did. I said the person was building a straw-man and can solve it by understanding the topic more. That means research free market capitalism then present an argument.

IMO if you're going to tell someone they're attacking a strawman, you should point out specifically what they've got wrong about their opponent's argument. Otherwise any random person in the world can simply say, "Strawman, read more plz".

> there is no such thing as insurance for this type of breach

There is insurance. Some of the breached accounts likely contained credit card info, and any exploits are covered by existing insurance.

Some of the accounts might contain embarrassing emails, but few people have insurance against hacked disclosure of that sort.

Things like this happen because the public doesn't care. Most people who use Yahoo for email are computer illiterate and are more likely to do things that lead to a breach. They are also more likely to quit using the service if things like 2FA are required.

So in a sense, the invisible hand delivered the service that was demanded, and now the market can correct. Some percentage of the users impacted will learn from the experience and demand higher quality email hosting in the future.

> this gets fixed

This being what exactly? What is a threat model you are talking about? Because it sounds like you think that a typical user account used for porn, social sites and hobby is worth protecting somehow to the extent of government involvement.

Devils advocate because I'm not a free market type: Can we sue companies for data breaches? This could solve the issue with large companies but not fly by night operations.

Well maybe you can tell me how you get phished without being retarded? In all my experiences phishing schemes are like bit.ly links that have a form that requires the user to enter in all their data. If you fall for something like that, i don't have much sympathy tbh

You're demanding accountability from insane people.

I learned long ago, never to wrestle with a pig. You get dirty, and besides, the pig likes it. Read more at: https://www.brainyquote.com/quotes/quotes/g/georgebern137450...

Ironically, you're quoting George Bernard Shaw who believed in eugenics, anti-vaccination, and "expressed admiration for both Mussolini and Stalin".

I'd be sufficiently curious to find a single person who fits the profile of the ideologue the OP is referring to. I'm afraid such a person doesn't exist.

Plain md5 again. Nice.

Oh no! All of those free offers could be stolen from me!

"passwords hashed with MD5" Jesus seriously?

Sorry, there's no shielding Marrisa Mayer from this. Yes, she had only been there a year or so. But that's long enough she should have been on top of security. Yes, she's just killing time until she leaves now anyway. But, the symbolic statement is still important - she should resign.

> Yes, she had only been there a year or so

Uh, it's been 4 years... I know, time flies.

GP means that she had only been there a year or so when the leak happend.

I guess this is the final nail in the coffin...

Easier to list who was not affected??!!

This Yahoo company seems pretty cavalier.

Thank goodness I use gmail?

> MD5 hash


This occurred in 2013.

When did they know?

MD5 hash? Jesus...


They can have 17 years of junk mail as that's all I ever used yahoo for


We detached this subthread from https://news.ycombinator.com/item?id=13180911 and marked it off-topic.

Yes, we've had issues like this when hiring offshore workers located in the US.


Yes, 'offshore' is a race. Every critique is a racism.

Coming from said 'offshore' (at least in regards to the US), I see that 'quality' people work remotely for monies comparable to the onsite workers, launch startups, et cetera. If you outsource to the offshore for the costs, guess what, you get lesser quality for the said cost. Nothing racist in that, but I understand your position - a SJW to every household!

Yes. We pay an independent Indian contractor rates that are fully competitive with any American contractor's rates. If someone has the same skillset as the competition, their rate is going be in the same ballpark, no matter where they live.

If your statement is accurate, why offshore the work?

He came recommended by someone who had used him before. His country of residence was not relevant to our hiring decision.

I wish I could upvote this thread 100x. Everyone should have to deal with global competition and everyone competitive should have commiserate compensation. Unfortunately there are barriers to that (informational and transactional).

I also think everyone uncompetitive should get basic income.

I've been hearing a lot about basic income lately. What I wonder is: how much should a person receive?

e.g. if a competent, competitive person receives X per year, should basic income be 0.1X, 0.25X, 0.90X, etc?

Also, would competent people working remotely from a different country be taxed to contribute to the basic income fund?

I'm sure they meant to say xenophobic.

Yeah, just keep throwing those out there...one is bound to stick, right?

If you're a priori judging the quality of a someone's work based exclusively on the nation of origin, that falls within the dictionary definition of xenophobia.

You can choose to not like the connotations of the word, but that's just your choice. Definitions of words are real.

race is a bogus concept anyway, "racism" is the everyday term for xenophobia. give it a break.

Race is not a bogus concept. It may be cultural but it's still real.

Offshore is not a race. That's just xenophobia.

There is actually no such thing as different human races (in the biological sense of the word). People who believe that are... you guessed it... called racists.

I honestly doubt that many people on this forum hold that opinion. I really hope not, anyway.

I also haven't heard any good conversations about race that deny it. If you accept the existence of racism, you should accept the evident existence of race.

I think the other-ness of offshore labor has parallels to many things in the history of racism—namely, the exploitation of xenophobia and tribalism to justify cheap labor, explicit or implicit. There's also many different aspects: the xenophobia is nationalistic, not phenotypic; offshoring does not imply a poor wage or reduced quality of life for the local economy, just compared to the exploiting economy; we don't see (much) moralistic justification for offshoring.

My point being, this is a complex subject and you're not contributing much by denying race itself. There are easier ways to dispel notions you disagree with than alienating others in the conversation by denying them.

> If you accept the existence of racism, you should accept the evident existence of race.

This seems to be a non sequitur to me. I can accept that people exist who divide humanity into X number of races based on some perceived fundamental differences and treat those groups differently without accepting that their divisions are valid and in practice real.

And in practice, race doesn't exist. That is, humans don't fit nicely into the given "race" boxes. For example, North Africans don't look much like Middle Africans, so are they part of the "black" race? How about native Malays, are they Asian? Or Indians, are they black, or a separate race? What about Pacific Islanders, are they black or Asian or a fourth race? Are Mediterraneans as "white" as Scandinavians? How about Bangladeshis, are they Asian or black or part of the "Indian" race if that exists? Are Arabs white or a different race or black? What about people who have grandparents who are white, black, Asian, and Middle Eastern? What race are they?

In practice, at best you can divide into broad familial groups typically centered in countries, which results in hundreds of ethnic groupings which could hardly be called "race" in the way it is commonly used. And those ethnic groupings are the result of large amounts of intermarriage and continue to intermarry, because humanity is a big mess of DNA originating from the same source, the first of the human race. The only human race that exists is humanity itself.

No, race is just unscientific. When we talk about race we talk about it sociologically.

... "offshore workers" are not a race. You realize offshore workers could be the same race as the person posting right?

This is technically true, which is the best kind of true. Swap in nationalist, chauvinist. It's utterly clear what was meant by racism above, in spite of your semantics.

I want to write and name a macro after you! Anytime I want, I can just append -obe or -ist to any word in any document I'm writing...regardless of context or meaning! As a bonus, it will recognize file-extension-neutral files (all 822 of them!).

Yes we understand, racist is the liberal catch all for anyone you disagree with. The problem is that it dilutes the value of the word, so when someone deserves to be called racist, it people don't know if they really are racist, or are just opposed to outsourcing jobs.

Racist is the laziest argument ever and it's diluting the word. If people were more accurate about describing problems, their criticisms wouldn't be dismissed to easily.

A perfect example is Donald Trump. Every argument about Trump became "he's a racist!!11", which caused people to ignore and focus on the wrong problems with him! There was a lot of negatives to Trump, and yet everyone just said "Racist, boom I don't need to argue any more!"

People started ignoring completely legit arguments because the racist angle was so overplayed. I'm sick of hearing racist as the end-all be-all argument. It doesn't work anymore, the card is overplayed. The only place it still works is in liberal bubbles. (Note I'm socially liberal, fiscal conservative moderate, no dog in this race)

That's true. That's why there's also the umbrella term, "bigoted", when referring to bias stemming from ignorant prejudice. The watering down of "racist" to equal all forms of bigot is indeed a problem, and you have brought up a valid concern.

I don't know whether sqldba was suggesting something racist, but I would say it is not necessary to identify a particular race hint at a racist viewpoint.

When Ronald Regan said "welfare queen", did he mean to imply "black woman on welfare"? If I write, "Middle-eastern people are lazy" do I mean to imply "Arabs are lazy - but not necessarily Jews"?

I also wonder whether you can have a racist view of your own race. For example, a stereotype is that black people are inherently bad at math - are there any blacks that believe this?

Yes. You can be racist against your own race.

Given the context, you know damn well what he means - it's coded speech and I refuse to believe that you're that naive.

You claim to be able to accurately discern intent from plaintext on the internet? If true you're wasted anywhere outside the justice system.

While some offshore workers might not be MIT grads (/s) like you, branding them all as an incompetent group is neither fair nor correct. If offshoring didn't provide tangible value to the US IT industry it would've been shut down a while ago.

The tangible benefit is working for less money.

It is impossible for US corporations to do what they do with out offshoring unless migration policy is reformed , you can't have both ways guys. And offshore quality is bad is BS 99% percent of the time when you really analyze it is purportedly made to be as bad quality.


Anyone up for trying to get a corporate death penalty law on the books?

We should at least be able to execute them in Texas.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact