Hacker News new | past | comments | ask | show | jobs | submit login
Scribd hacked – emails and hashed passwords leaked (scribd.com)
106 points by joosters on April 4, 2013 | hide | past | favorite | 65 comments

Every single time this happens, I immediately wonder: "what was the hashing scheme?"

Like many others before it, Scribd disappoints by not addressing this question. Instead we get this:

Even though this information was accessed, the passwords stored by Scribd are encrypted (in technical terms, they are salted and hashed).

How long was the salt? AFIK, MD5 hashes with an insufficiently long salt can be bruteforced with open sourced CUDA setups.

Further, how did they determine the following?

Most of our users were therefore unaffected by this; however, our analysis shows that a small percentage may have had their passwords compromised.

We use scrypt for passwords hashing. This is modern hard to crack password hashing algorithm.

We do have database access logs, so it was pretty straightforward to identify which users were affected.

You should add this. Savvy people will be positively surprised to see a company actually caring about doing password authentication right.

That is awesome. You should feel comfortable telling people this; it puts you way ahead of the game.

Thanks for clarifying, good to see you're using a decent hashing algorithm :)

I'm still a little unsure of how you are able to know some users had their password compromised. Is it a simple case of finding successful log in attempts from the same IP address as attack?

Compromised != Hacked. To clarify: no accounts were accessed by the hackers, but small amount of account records have had passwords encrypted with outdated algorithm (basically SHA1 + salt), so we preemptively reset their passwords and sent out emails to all affected users.

This is how we define "compromised" - people which had their passwords hash with old algorithm, which is relatively easy to crack.

This seems to imply that many of (all?) the emails/encrypted passwords were leaked, but you don't consider most of them "compromised"...

I'd like to echo this concern -- were all emails/encrypted passwords leaked, but you only consider those protected by outdated hashing schemes to be compromised?

If so, I feel you have an obligation to alert ALL of your users.

Additional question: when did users first alert you to the hack?

For the future, I wonder how useful it would be to run old hashed passwords through a newer system such as scrypt. This way those users who haven't logged in in awhile could also benefit from the safer hashed passwords.

    scrypt(hmac_sha1(password, salt), salt, cpumemargs)
In the future, you could even do it again with more cpu and memory requirements for scrypt, upgrading older users' hashes again with another run of scrypt.

that is a weird definition of compromised.

is it true that all, or greater than 1% of, emails and hashes were dumped?

I find it hard to believe you migrated 99% of passwords to a new scheme. I've never seen over 60%, and that is with a lot of prompting to users (and as as Scribd user i've never been prompted)

The migration can be transparent, since the app has your plaintext password when you log in.

Alternatively, stored passwords can be upgraded by using the new scheme on the hold ashed password, and storing that that's how the password should be checked in the future.

Since not everyone was migrated, I'm assuming they went the first way.

http://www.scribd.com/password/check thank you for this. now I can run a list of emails against this to see who has scribd account

I just put in a bunch of fake email addresses and they all returned with "Good news - your password has not been compromised." I think the only confirmation that youd get of an existing account is if the password was compromised.

They can modify it to simply say whether your account was compromised, regardless of whether you have an account (ie, if no account -> not compromised).

...Which they ought to do. Offering the ability to enumerate user accounts is unlikely to be the immediate goal of this utility, but it's an effect nonetheless.

30 minutes later and it's fixed. Entering an invalid email also results in a "this email was not compromised" message.

That's what they're doing. "aijaspijasohisaho@asoihdshohdusudhs.com" gets a message saying that that account wasn't compromised.

That's good to hear. As a future suggestion to anyone else who finds themselves in this unfortunate situation - including some technical granularity in your press release can go miles in offering reassurance to your technical audience/users.

Why? Honestly asking: what difference does this have on the end result? Now that you know they are using scrypt, how will that impact your actions?

You could say that this has a bearing on whether you continue to use the service, but if that were the case, wouldn't it be better to suggest that all services provide this information up front?

You will not successfully maintain positive customer relationships by boiling all customer interactions down to questions like "how will that impact your actions [right now]?" Relationships are a string of positive and negative experiences that must be carefully curated.

The decision to remain in a relationship is rarely a singular event (related to a singular experience). You could think of it more as the cumulative result of all relationship experiences. Even the best relationships involve some negative experiences, but the important part is making sure those negative experiences are mitigated as best as possible. Customers will give more leeway to vendors with whom they have a strong NET positive relationship.

There are two important technical points that could have been included to great effect:

1) That they store the encryption scheme with the password record so that they can upgrade their crypto incrementally.

2) That their most recent auth algorithm uses scrypt.

So how do these two points directly impact the mitigation of what is otherwise a negative experience? First up we should look at users who will understand what points 1 & 2 mean. These users will respond positively to these items, because it changes the conversation from "Scribd just got h4x'd" to "Hey, at least they had good crypto in place."

The next tier of users will come along, read these comments, and feel more confident that the community of knowledgable people around them are feeling OK about this, so they should too.

As to the question of, "wouldn't it be better to suggest that all services provide this information up front?" I would say yes, it would. This action is not mutually exclusive of including technical details in this communication though.

>Now that you know they are using scrypt, how will that impact your actions?

For one, I'm much less annoyed/pissed off at them now that I know they use scrypt. I'm not about to cancel my account and never use them again. And I'm not freaking out about whether my email and password have been added to a botnet cracking script running against every other website out there.

I've gotten so accustomed to hearing of companies using MD5 + salt and thinking that's secure, that is a pleasant surprise to find one using bcrypt, and downright mindblowing to find one using scrypt. Yes, my expectations are low.

>wouldn't it be better to suggest that all services provide this information up front?

Yes, absolutely.

If I'm understanding kpumuk's comment elsewhere in the thread[1], if you got notified/test positive on their check page[2], then you are at risk if you've reused those credentials, since they were grandfathered hashes with weak protection.

> [...] but small amount of account records have had passwords encrypted with outdated algorithm (basically SHA1 + salt), so we preemptively reset their passwords and sent out emails to all affected users.

> This is how we define "compromised" - people which had their passwords hash with old algorithm, which is relatively easy to crack.

I came up positive on the check, which does make sense since i signed up a long time ago and don't often/ever sign in generally, so they wouldn't have had the opportunity to upgrade my hash after moving to better schemes.

Happily it was a 1-tiem/throwaway password though, but bit scary that it's the first list (that I'm aware of) I'm actually on.

[1] https://news.ycombinator.com/item?id=5493536

[2] http://www.scribd.com/password/check

So what do you do past this point? I know you can probably rough out how much time it would take to find hash collisions and ask as your users to change their passwords before that amount of time elapses, but past that point, can't you no longer assume that it's the actual user logging in to change their password?

We performed a forced password reset on the users with compromised hashes. The old password will not work on Scribd, and those users will need to go through the password reset flow to regain access.

Ah ok. I was wondering how you verified the users' identity if the password was compromised. That makes sense, thanks!

We have reset passwords for all affected users. Hashes that got leaked are not useful now.

Well, that assumes people aren't reusing those passwords.

Salts make cracking a list of N password hashes take roughly N times as long, but if a password is cracked anyway (because it's common and/or because the hash is not using very many rounds, or because an attacker only cares about one particular account), and the password is reused elsewhere, the fact that it was salted doesn't matter anymore.

GP is right; if owners of the leaked accounts [email, hash] pairs are reusing passwords, the leaked hashes are potentially useful even though scribd has reset them. They're simply not useful for logging in to scribd.

yeah if you simply know what a password is, of course it's compromised, but you're not supposed to easily break a salted hashed password.

Salts only really protect against rainbow tables; if the attacker is willing to use a dictionary or brute force attack against a single password, they're not of much use.

Can I suggest you also share what parameters you use with scrypt? Scrypt is parametric and you can choose weak or strong choices for parameters depending on how long you want to spend validating passwords.

Back when Evernote was hacked, I got the idea of creating a draft of the kind of response I would prefer in a case like this.

I also intended to write a simple website script that could generate a statement. Things came up, and the gist has gathered dust for a while.

The gist is available here with some example cases listed that companies can learn from, and people are free to provide feedback or spin it off:


Post your most important feedback in its comments, so other companies reading the gist see it as well.

Companies definitely need to be prepared for full disclosure in the event of a security breach.

»What hashing scheme do you use?« does not matter for most users.

Most users use weak passwords and a substantial part of this passwords is easy to recover using a dictionary attack. It does not really matter if you use MD5, SHA1, SHA2, HMAC, PBKDF2, bcrypt, scrypt or whatever, nor does it matter if you use no salt, the same salt for all users or a unique salt per user. Even for PBKDF2, bcrypt and scypt the cost factor will - for practical reasons - usually not be large enough to mitigate dictionary attacks using a few thousand of the most common passwords. Therefore weak passwords are compromised regardless of the used hashing scheme. And because especially users with weak passwords tend to reuse the password for different accounts many other accounts are compromised, too.

A user caring about security will not reuse passwords for different accounts and this alone reduces the impact of the event by a huge amount. Further a strong password alone makes it very unlikely that attackers will recover the password even if only unsalted MD5 is used for hashing. Therefore - unless the password is stored in plain text - it is highly unlikely that an attacker will be able to access an account protected by a strong password.

I definitely don't want to argue that using unsalted MD5 is okay - it is not - but for the average user the difference between a weak and a strong hashing scheme is not as large as one would naively expect. Strong hashing schemes will especially protect users using infrequent dictionary words or medium length hard passwords because the additional computation power required to perform a dictionary or brute force attack will force the attackers to use smaller dictionaries and shorter passwords.

Finally storing passwords may benefit from security through obscurity. If the attacker is unable to figure out the used hashing scheme he will be unable to perform a dictionary or brute force attack. This does not mean everyone should come up with there own hashing scheme - this would do MUCH more harm than good - but, for example, using a unknown random - 294,897 instead of 300,000 - cost factor and keeping it secret or adding a second secret salt buried deep in the code to the salt stored together with the username and hash will make it quite a bit harder for the attacker to perform an attack unless they got the information from an insider or were able to steal your code or binaries.

Using an in-code secret key (as opposed to the not-secret-by-necessity salt) is commonly referred to as a pepper. It improves security in the cases when an attacker has access to your database but not to your code or filesystem.

Figuring out the hashing scheme used for a given hash is frequently trivial. All an attacker needs to do is hijack his own hashed password and salt and then run combinations of common hashes with salting patterns until he gets a hit. This is going to be hundreds of combinations to test on the high end, and will generally yield results very easily.

> It does not really matter if you use ... a unique salt per user.

I agree with the other points that you make on this aspect but I do not quite understand this particular point (quoted above) can be true. If you use a strong unique salt for each user's password, then you are padding the length of actual password hashed and thereby effectively reducing the possibility of a successful dictionary attack to virtually zero. If this is so, then how could one mount a successful dictionary attack ?

It is common to store per user salts together with the hashes, often even as a single string formed by concatenating the salt and the hash. Therefore getting hold of the hashes usually means getting hold of the slats, too.

But the other case I mentioned - using the same salt buried deep in your code for all users (what is called a pepper as I learned recently) - will do what you describe until the attacker is able to figure out the pepper used by either stealing the code or brute forcing it.

Finally note that just using a pepper is no good idea and even when combined with a salt needs some careful thoughts. Just using a pepper will yield equal hashes for equal passwords while using a unique per user salt will avoided this. The other problem is that with a pepper you are reusing the same secret for each user. Therefore an attacker has thousands or even millions of samples and may be able to extract information if the scheme is not designed carefully. Combining password, salt and pepper must essentially avoid the same pitfalls as keyed hash functions when combining the key and the message. See for example the design principles behind HMAC [1].

[1] http://en.wikipedia.org/wiki/Hash-based_message_authenticati...

> How long was the salt? AFIK, MD5 hashes with an insufficiently long salt can be bruteforced with open sourced CUDA setups.

The length of the salt has little impact on security beyond 16 bits or so, where it's still feasible to generate rainbow tables for all salts.

If you're storing plain hashes, it doesn't really matter whether it's MD5, SHA-1 or SHA-256 - the work required for a brute-force attack is largely the same. The next step up would be using a key stretching algorithm like PBKDF2 or bcrypt.

Why do you want to know the hashing scheme? Isn't it better if nobody knows? :)


kpumunk and others, please note that http://www.scribd.com/password/check leaks information about whether or not a particular email address is in your system. It's hardly a critical weakness or anything, but just an FYI and something you may wish to reevaluate.

Edit: thanks for the quick fix! I would advise that a good compromise would be to change the current message from

"Good news - your password was not among those compromised. You do not need to take any action at this time."


"Good news - if this is the email your Scribd account is associated with, your password was not among those compromised. You do not need to take any action at this time."

Small difference, but the lack of definiteness to the response will be enough to make me think of double-checking my email (in my case, it was an email address I haven't used in years - I'm one of the first Scribd users!)

Er, what? Every single site I can think of 'leaks' this information through their registration page.

Try and sign up for a new Scribd account with an existing email:

"That email address is already taken; please choose another one"

Try to sign up for a Facebook account with an existing email:

"Sorry, it looks like somebody@somewhere.com belongs to an existing account"

Doesn't really matter, the password reset page leaks the same info and has presumably been up forever:

There is no account registered to mail 'abc@smackfu.com'.

And that's a lot harder to fix.

Why is that harder to fix? Many sites handle it properly:

"If there was an account associated with that email address we have emailed it with instructions."

Since people log in with a username, not an email, they may not know which email address they associated with the account. Especially if they used a one-off to avoid spam. Personally, I prefer systems that do a reset to email after you provide a username, since usernames are often public / verifiable anyways.

Thank you for your suggestion. Not a weakness anymore :)

This seems to have become a common occurrence. Just curious about a couple of things : a) How easy/hard is it for the hashed passwords to be cracked by the hacker? b) How are hackers getting access to emails and password data so often? You would think these large sites have enough layers of security to prevent this from happening. Is it social engineering or common loopholes in these systems?

a) Depends on what hashing algorithm is used (and whether it incorporates a salt). http://hashcat.net/oclhashcat-plus/#performance

b) Because it's hard to secure a corporate infrastructure, (i.e., making a good set of usability/security trade-offs), and there's no such thing as perfect security. A web application that's been audited can still have a security flaw that enables disclosure of authentication data. Or an employee's machine might be compromised, leading to a compromise of corporate infrastructure or data that he/she is able to access.

Does the reference to similar incidents this year suggest this was a watering hole attack targeting a Java vulnerability? It would be nice to know, just generally, what vectors were used here to the extent it was anything novel.

>Even though this information was accessed, the passwords stored by Scribd are encrypted (in technical terms, they are salted and hashed). Most of our users were therefore unaffected by this; however, our analysis shows that a small percentage may have had their passwords compromised.

Could someone explain me what this means: "our analysis shows that a small percentage my have had their password compromised" ?

Do you think this refers to the fact that they might have conducted a statistical analysis which evidenced the fact that _potentially_ a relevant percentage of password could have completely decrypted?

I mean, is this warning just the outcome of a statistical analysis on the possibility that password could have been decrypted?

Or they are still just referring to the fact alone that a small percentage of hashed & salted password have leaked?

I have gone through the "check your email" form. And this was the result:

>We're very sorry to tell you that your Scribd password was among those compromised. If you have used this password on any other services, you should change it immediately.

Compromised here refers to leaked, or potentially decrypted based on some sort of statistical analysis they made?

Thankfully this doesn't read like the typical large corporation email: "...the attacker was able to gain access to all the passwords, which were stored as plaintext in the database." It sad to have to say it, but Kudos to Scribd for actually storing passwords the way they should be stored.

> (...) corporation email (...)

And even government agencies:

"UK intelligence agency stores passwords in plain text" http://www.zdnet.com/uk-intelligence-agency-stores-passwords...


Excuse me for my ignorance if incorrect, but if a unique salt is used for each user, and the salts were not compromised, would it then not be possible for the passwords to be cracked no matter what encryption is used?

My post was meant as a reply to the comment by psycr and I just moved it there.

My - and a very often valid - assumption is, that unique per user salts a stored together with the username and hash. Distributing this information across different systems will make it harder for attackers but such schemes are not very common. There is also the risk that the weakness that enables an attacker to compromise one part of the information will also enable them to compromise the other part(s). Therefore it is probably a good idea to use systems as different as possible to store the different parts, for example two different database systems from different vendors.

Unique salts have to be stored somewhere. A common practice is to just use another piece of information associated with their account. The purpose of a salt is to make various bruteforcing attacks difficult. It doesn't do much if someone has full access to a system.

> Unique salts have to be stored somewhere. A common practice is to just use another piece of information associated with their account. The purpose of a salt is to make various bruteforcing attacks difficult.

All absolutely true.

> It doesn't do much if someone has full access to a system.

Not true. Without salt they can try passwords and if the hash matches ANY in the system they know the password for those accounts. With a big dictionary of likely passwords (or just normal words) many passwords will be discovered very quickly.

With salt you have to try the password dictionary against EACH user (actually each salt value but they should be unique). This makes discovering passwords harder by a factor of the number of users. Yes you can pick any user and run the dictionary and have a good chance of finding the password but you have to expend large computing resource for little reward (possibly worth it to break a bank account but not the average web app).

The only real advantage you gain from a salt these days is that a precompiled database of hashes can't be used against you. Whether it be from someone's personal collection or one of those web-based hash sites.

Generating a rainbow table for each user isn't that much more difficult or time-consuming than having a single rainbow table. Processing power is cheap and easy to come by these days. A small botnet can be rented to generate rainbow tables at a faster rate than most supercomputers.

I received this email, but it landed in my spam folder as suspected phishing...

  Be careful with this message. Similar messages have been used to steal people's personal information. Unless you trust the sender, don't click on links or reply with personal information.
I thought this was the hackers that got my email off scribd trying to phish my password.

That was my immediate reaction upon seeing it as well, although in this case I noticed that it was sent to me+scribd@example.com, which was indeed the mail that I signed up with. Whilst it's mostly useful for filtering/labelling or figuring out which company sold you to a spam list, it's quite useful in this case as a basic 'yep, not just a scattershot phishing attempt' indicator.

Not foolproof, of course; the people who stole hashes & emails would be the obvious choice to attempt a quick phish, and they now have all the account emails. I wonder if I'll see an uptick in spam...

hehe, same here. My email address was completely randomized, and only used for scribd too (something like f9xl203js@mydomain.com), so I was very sure it was either scribd, or it fell into the wrong hands. When the email ends up in the spam folder and mark as potential phishing as well - my gut feeling was therefore that my email was indeed leaked.

The irony is that the email itself was generated by scribd itself, and not by any spammers.

Props to Scribd, this is the first time I've seen a company include a tool that lets you check if your account might have compromised. http://www.scribd.com/password/check

Hmm, Scribd uses Rails. Possible it was an unpatched Rails exploit?

bad news, it's horrible. I think different website should use different hash with salt to avoid this situation

Do you understand what salt and hash means? The algorithm may be the same, but the result is going to be different. There are many hashing algorithms, and many ways of salting the hashes. It's not always the same.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact