Like many others before it, Scribd disappoints by not addressing this question. Instead we get this:
Even though this information was accessed, the passwords stored by Scribd are encrypted (in technical terms, they are salted and hashed).
How long was the salt? AFIK, MD5 hashes with an insufficiently long salt can be bruteforced with open sourced CUDA setups.
Further, how did they determine the following?
Most of our users were therefore unaffected by this; however, our analysis shows that a small percentage may have had their passwords compromised.
We do have database access logs, so it was pretty straightforward to identify which users were affected.
I'm still a little unsure of how you are able to know some users had their password compromised. Is it a simple case of finding successful log in attempts from the same IP address as attack?
This is how we define "compromised" - people which had their passwords hash with old algorithm, which is relatively easy to crack.
If so, I feel you have an obligation to alert ALL of your users.
scrypt(hmac_sha1(password, salt), salt, cpumemargs)
is it true that all, or greater than 1% of, emails and hashes were dumped?
I find it hard to believe you migrated 99% of passwords to a new scheme. I've never seen over 60%, and that is with a lot of prompting to users (and as as Scribd user i've never been prompted)
Alternatively, stored passwords can be upgraded by using the new scheme on the hold ashed password, and storing that that's how the password should be checked in the future.
Since not everyone was migrated, I'm assuming they went the first way.
You could say that this has a bearing on whether you continue to use the service, but if that were the case, wouldn't it be better to suggest that all services provide this information up front?
The decision to remain in a relationship is rarely a singular event (related to a singular experience). You could think of it more as the cumulative result of all relationship experiences. Even the best relationships involve some negative experiences, but the important part is making sure those negative experiences are mitigated as best as possible. Customers will give more leeway to vendors with whom they have a strong NET positive relationship.
There are two important technical points that could have been included to great effect:
1) That they store the encryption scheme with the password record so that they can upgrade their crypto incrementally.
2) That their most recent auth algorithm uses scrypt.
So how do these two points directly impact the mitigation of what is otherwise a negative experience? First up we should look at users who will understand what points 1 & 2 mean. These users will respond positively to these items, because it changes the conversation from "Scribd just got h4x'd" to "Hey, at least they had good crypto in place."
The next tier of users will come along, read these comments, and feel more confident that the community of knowledgable people around them are feeling OK about this, so they should too.
As to the question of, "wouldn't it be better to suggest that all services provide this information up front?" I would say yes, it would. This action is not mutually exclusive of including technical details in this communication though.
For one, I'm much less annoyed/pissed off at them now that I know they use scrypt. I'm not about to cancel my account and never use them again. And I'm not freaking out about whether my email and password have been added to a botnet cracking script running against every other website out there.
I've gotten so accustomed to hearing of companies using MD5 + salt and thinking that's secure, that is a pleasant surprise to find one using bcrypt, and downright mindblowing to find one using scrypt. Yes, my expectations are low.
>wouldn't it be better to suggest that all services provide this information up front?
> [...] but small amount of account records have had passwords encrypted with outdated algorithm (basically SHA1 + salt), so we preemptively reset their passwords and sent out emails to all affected users.
> This is how we define "compromised" - people which had their passwords hash with old algorithm, which is relatively easy to crack.
I came up positive on the check, which does make sense since i signed up a long time ago and don't often/ever sign in generally, so they wouldn't have had the opportunity to upgrade my hash after moving to better schemes.
Happily it was a 1-tiem/throwaway password though, but bit scary that it's the first list (that I'm aware of) I'm actually on.
GP is right; if owners of the leaked accounts [email, hash] pairs are reusing passwords, the leaked hashes are potentially useful even though scribd has reset them. They're simply not useful for logging in to scribd.
I also intended to write a simple website script that could generate a statement. Things came up, and the gist has gathered dust for a while.
The gist is available here with some example cases listed that companies can learn from, and people are free to provide feedback or spin it off:
Post your most important feedback in its comments, so other companies reading the gist see it as well.
Companies definitely need to be prepared for full disclosure in the event of a security breach.
Most users use weak passwords and a substantial part of this passwords is easy to recover using a dictionary attack. It does not really matter if you use MD5, SHA1, SHA2, HMAC, PBKDF2, bcrypt, scrypt or whatever, nor does it matter if you use no salt, the same salt for all users or a unique salt per user. Even for PBKDF2, bcrypt and scypt the cost factor will - for practical reasons - usually not be large enough to mitigate dictionary attacks using a few thousand of the most common passwords. Therefore weak passwords are compromised regardless of the used hashing scheme. And because especially users with weak passwords tend to reuse the password for different accounts many other accounts are compromised, too.
A user caring about security will not reuse passwords for different accounts and this alone reduces the impact of the event by a huge amount. Further a strong password alone makes it very unlikely that attackers will recover the password even if only unsalted MD5 is used for hashing. Therefore - unless the password is stored in plain text - it is highly unlikely that an attacker will be able to access an account protected by a strong password.
I definitely don't want to argue that using unsalted MD5 is okay - it is not - but for the average user the difference between a weak and a strong hashing scheme is not as large as one would naively expect. Strong hashing schemes will especially protect users using infrequent dictionary words or medium length hard passwords because the additional computation power required to perform a dictionary or brute force attack will force the attackers to use smaller dictionaries and shorter passwords.
Finally storing passwords may benefit from security through obscurity. If the attacker is unable to figure out the used hashing scheme he will be unable to perform a dictionary or brute force attack. This does not mean everyone should come up with there own hashing scheme - this would do MUCH more harm than good - but, for example, using a unknown random - 294,897 instead of 300,000 - cost factor and keeping it secret or adding a second secret salt buried deep in the code to the salt stored together with the username and hash will make it quite a bit harder for the attacker to perform an attack unless they got the information from an insider or were able to steal your code or binaries.
Figuring out the hashing scheme used for a given hash is frequently trivial. All an attacker needs to do is hijack his own hashed password and salt and then run combinations of common hashes with salting patterns until he gets a hit. This is going to be hundreds of combinations to test on the high end, and will generally yield results very easily.
I agree with the other points that you make on this aspect but I do not quite understand this particular point (quoted above) can be true. If you use a strong unique salt for each user's password, then you are padding the length of actual password hashed and thereby effectively reducing the possibility of a successful dictionary attack to virtually zero. If this is so, then how could one mount a successful dictionary attack ?
But the other case I mentioned - using the same salt buried deep in your code for all users (what is called a pepper as I learned recently) - will do what you describe until the attacker is able to figure out the pepper used by either stealing the code or brute forcing it.
Finally note that just using a pepper is no good idea and even when combined with a salt needs some careful thoughts. Just using a pepper will yield equal hashes for equal passwords while using a unique per user salt will avoided this. The other problem is that with a pepper you are reusing the same secret for each user. Therefore an attacker has thousands or even millions of samples and may be able to extract information if the scheme is not designed carefully. Combining password, salt and pepper must essentially avoid the same pitfalls as keyed hash functions when combining the key and the message. See for example the design principles behind HMAC .
The length of the salt has little impact on security beyond 16 bits or so, where it's still feasible to generate rainbow tables for all salts.
If you're storing plain hashes, it doesn't really matter whether it's MD5, SHA-1 or SHA-256 - the work required for a brute-force attack is largely the same. The next step up would be using a key stretching algorithm like PBKDF2 or bcrypt.
Edit: thanks for the quick fix! I would advise that a good compromise would be to change the current message from
"Good news - your password was not among those compromised. You do not need to take any action at this time."
"Good news - if this is the email your Scribd account is associated with, your password was not among those compromised. You do not need to take any action at this time."
Small difference, but the lack of definiteness to the response will be enough to make me think of double-checking my email (in my case, it was an email address I haven't used in years - I'm one of the first Scribd users!)
Try and sign up for a new Scribd account with an existing email:
"That email address is already taken; please choose another one"
Try to sign up for a Facebook account with an existing email:
"Sorry, it looks like email@example.com belongs to an existing account"
There is no account registered to mail 'firstname.lastname@example.org'.
And that's a lot harder to fix.
"If there was an account associated with that email address we have emailed it with instructions."
b) Because it's hard to secure a corporate infrastructure, (i.e., making a good set of usability/security trade-offs), and there's no such thing as perfect security. A web application that's been audited can still have a security flaw that enables disclosure of authentication data. Or an employee's machine might be compromised, leading to a compromise of corporate infrastructure or data that he/she is able to access.
Could someone explain me what this means: "our analysis shows that a small percentage my have had their password compromised" ?
Do you think this refers to the fact that they might have conducted a statistical analysis which evidenced the fact that _potentially_ a relevant percentage of password could have completely decrypted?
I mean, is this warning just the outcome of a statistical analysis on the possibility that password could have been decrypted?
Or they are still just referring to the fact alone that a small percentage of hashed & salted password have leaked?
I have gone through the "check your email" form. And this was the result:
>We're very sorry to tell you that your Scribd password was among those compromised. If you have used this password on any other services, you should change it immediately.
Compromised here refers to leaked, or potentially decrypted based on some sort of statistical analysis they made?
And even government agencies:
"UK intelligence agency stores passwords in plain text"
My - and a very often valid - assumption is, that unique per user salts a stored together with the username and hash. Distributing this information across different systems will make it harder for attackers but such schemes are not very common. There is also the risk that the weakness that enables an attacker to compromise one part of the information will also enable them to compromise the other part(s). Therefore it is probably a good idea to use systems as different as possible to store the different parts, for example two different database systems from different vendors.
All absolutely true.
> It doesn't do much if someone has full access to a system.
Not true. Without salt they can try passwords and if the hash matches ANY in the system they know the password for those accounts. With a big dictionary of likely passwords (or just normal words) many passwords will be discovered very quickly.
With salt you have to try the password dictionary against EACH user (actually each salt value but they should be unique). This makes discovering passwords harder by a factor of the number of users. Yes you can pick any user and run the dictionary and have a good chance of finding the password but you have to expend large computing resource for little reward (possibly worth it to break a bank account but not the average web app).
Generating a rainbow table for each user isn't that much more difficult or time-consuming than having a single rainbow table. Processing power is cheap and easy to come by these days. A small botnet can be rented to generate rainbow tables at a faster rate than most supercomputers.
Be careful with this message. Similar messages have been used to steal people's personal information. Unless you trust the sender, don't click on links or reply with personal information.
Not foolproof, of course; the people who stole hashes & emails would be the obvious choice to attempt a quick phish, and they now have all the account emails. I wonder if I'll see an uptick in spam...
The irony is that the email itself was generated by scribd itself, and not by any spammers.