

Poul-Henning Kamp: LinkedIn Password Leak? Salt Their Hide - CowboyRobot
http://queue.acm.org/detail.cfm?id=2254400

======
dredmorbius
I had a long discussion with a my colleague Commander Adams today about
improving password management policies for Krell Power Systems client logins.

This is a change that would have to be added to the current project backlog,
specified and designed, developed, and implemented. Selling this means making
a compelling case that salting and changing our hashes would actually solve a
problem for us and our clients. My sense is that this is the case, but
articulating the case in a unassailable way is still something that needs
work.

The most compelling case would be for our clients to demand this as part of
their security requirements for our systems. This sword cuts two ways, and a
number of our existing password policies are clearly based on well-intended
but somewhat misguided client-based requirements. The sane thing is to get
_good_ requirements.

Absent that, the question becomes: what is the threat, what is the risk model,
what is the mitigation, and what benefits does that mitigation buy us and our
clients.

The risk as I see it is disclosure of our user authentication hashes (thank
Krell we're not storing cleartext passwords ... at least not there).

Leaking unsalted hashes means that both rainbow tables can be applied against
the known hashes, and that duplicate hash instances (hence: duplicate
passwords) can be determined and targeted for rainbow/brute force attacks.

Leaking non-bcrypt hashes means that brute-forcing is cheap. At some
estimates, 3.3 billion keys per second on $1000 of hardware for MD5, roughly
half that for SHA1 ([http://www.extremetech.com/computing/84314-how-to-secure-
you...](http://www.extremetech.com/computing/84314-how-to-secure-yourself-
from-gpu-password-cracking)).

A successful attack would gain user access, and might gain access to user
information (of varying but largely low sensitivity) and be able to
impersonate the user for communications purposes. Some collections of user
data might be valuable for contact/communications/social-engineering purposes.

The biggest risk would be for users sharing keys among several services. As a
fair number of our clients are corporate, and it's fairly well known that
corporate password policies are often even more grossly weak than
individuals', the likelihood of compromised passwords being used to access
other user accounts in some instances is fairly high.

The question remains: how large are any of these risks?

What does salting and bcrypting buy in way of protection?

My read is that, on the technical side:

\- salting hides common passwords within our userbase, and renders rainbow
tables useless. Weak passwords are somewhat better protected.

\- bcrypt makes the costs of brute-forcing passwords markedly more expensive.
Very, very weak passwords could still be cracked, but we're talking on the
order of searching through perhaps a few millions of keys -- 4-character
alphanumeric mixed-case passwords would be at risk.

\- checking proposed (or entered) passwords against a known set of common
passwords -- even just a few tens of thousands of the most common ones --
would further reduce low-hanging fruit. Ideally I'd like to see a publicly
available corpus of all known passwords, to be used to exclude duplicates.

But again, the question becomes, what demonstrable benefits does this present
to us and our clients? How do I make the case?

~~~
Daniel_Newby
"What does salting and bcrypting buy in way of protection?"

Information leaks are common: a backup tape gets FedExed to the wrong address,
file sharing gets accidentally turned on, a Russian hacker finds a security
hole in your machine while scanning millions of machines, some idiot puts the
password database on a laptop and loses it. These sorts of problems are
constantly making the headlines.

If you have bcrypt-style password encryption, such leaks are a nuisance and
embarrassment.

If you do not have password encryption, the leak recipient can easily
impersonate any and all users. They can control your system, create false
communication, cause industrial equipment to destroy itself, send harassing
messages, conduct financial fraud, and so forth.

The cost to use password encryption is a little engineering labor, the return
on investment is a substantial reduction in risk.

~~~
Estragon
The trouble with defense in depth is that you have to admit your existing
defenses may be inadequate. I can see how that could be politically difficult
in a large organization.

~~~
dredmorbius
Even for a less-than-large organization, there are issues.

One is the perceived fear of looking incompetent in front of your
users/clients. For which I feel the appropriate response is "we'll look a lot
more competent if we mitigate the risks of such an event than if we don't,
regardless of whether or not it happens".

But really, the big one is simply: can you justify the engineering/product
cost of this change on the basis of a material business benefit to us and our
clients?

------
davidjohnstone
Would it be possible to come up with a simple little icon that can be put on
sign-up pages to indicate that the service is using PBKDF2 or bcrypt of the
like?

Then, it would need to become popular enough for users to start to recognise
it and look out for it when signing up. Even if most users don't have any idea
what it's about, plenty of the more technically inclined users would, and they
tend to be the early adopters anyway...

The idea is to add a bit of pressure to services to store passwords correctly
(similar to how users look for the green SSL bar when doing important stuff
online), and providing some transparency to the users who care about this.

~~~
gizzlon
This is a bad idea, and here's why:

Honestly, I see it as almost self-evident that user would never ever learn
this.

But more importantly, what would stop anyone from putting up these icons? Who
would check that they actually implemented it?

Even if that was solved, people would just implement this one thing because it
looked good. But there are plenty of other ways to ruin your password
security, so you couldn't really trust them more than you could in the first
place. (IMHO, this is a core issue with security standardization)

~~~
davidjohnstone
Yes, most users wouldn't understand what this is. However, if some did, and
they expected it to be there, that might be enough.

You're quite right that there'd be nothing stopping people from using this
dishonestly, except their consciences and the fact they may have some
explaining to do if a dump of MD5s of their passwords was released. That may
or may not be enough.

In any case, I'm sure that this industry can do a bit better than it is at the
moment. With big breaches of LinkedIn, Last.fm and eHarmony in the last 48
hours, surely something can be done.

~~~
rlpb
The problem is that the people who don't use a KDF don't know any better.
Aren't these the same sort of people who will implement the same logo as other
sites use without understanding what it means?

> except their consciences

I would add ignorance to that list.

------
SagelyGuru
It is all very well suggesting running the hash millions of times but sites
with many users might not want such performance hit.

This kind of escalating competition based purely on computing power indicates
to me that the very concept of passwords has probably had its day and we
should seriously think of better alternatives.

Passwords are no fun to remember and to keep secure for the users either.
Anyone with a reasonably active 'online life' suffers from this.

Maybe this is the real reason why Facebook is doing so well? Only one password
to remember.

~~~
chris_j
In order to reduce the computational overhead on the server, perhaps one
option is to run (at least part of) the hash on the client (eg in Javascript).
Does anyone have any idea of how that would perform and if it would be
feasible?

~~~
darklajid
In that case you just change the password. It's now whatever the client
(pre)computes, the user input is just used to derive the 'real password'.

~~~
tomerv
That's not entirely correct. The scheme that chris_j proposed can help prevent
weak passwords from being cracked, since now an attacker needs to do one of
two things to crack passwords: 1\. Try lots of weak password - hash each one
and compare to the list. This is slow, because the hash is slow. 2\. Try
breaking passwords with the partial hash - in this case the attacker either
needs to try very difficult passwords (since these are passwords after a
partial hash - what you called the 'real password'), or get the partial hashes
from the users, which requires more effort.

~~~
darklajid
I might totally be wrong, but for me the consequence of that approach are:

1\. The attacker doesn't need the text the user entered anymore, just the
precomputed hash

2\. Probably the length and alphabet is fixed now, which might
obfuscate/protect 'password' or 'test', but reduces the value of a strong
password. Granted, this last part is a gut feeling.

~~~
Robin_Message
Re the gut feeling – this is only a problem if the password has significantly
more entropy than the hash. So, worst case MD5 (128 bits), this is potentially
bad for people with printable ASCII (97 chars) longer than 19 characters.

But it's still not a real problem since 128 bits of entropy is unguessable in
the lifetime of the universe (checking 2^64 hashes a second, which is
obscenely many – perhaps every processor on the planet dedicated to the task
would be enough – covers 5% of the search space in 34 billion years.)

~~~
darklajid
Thanks for the answer. I can safely say that I'm far from an expert on the
subject. If you'd be willing to educate me a tiny bit more though :

Is the first case (judging normal passwords) factoring in that a password
varies in length? I mean, stupid thought again: You need to test all one
character passwords, all two character password, a hash is fixed in its
length?

And I wouldn't want to find the original input, I'd want to get in. For that
my totally fallible gut says that I'd need to create a 'word list' of
hexadecimal character permutations of length x. Is this really an impossible
task?

~~~
Robin_Message
Sorry, I missed this, but hopefully you will see it.

A hash is fixed, but at a long length. Now, because of geometric growth, the
shorter lengths are basically irrelevant (since there are 10s times more 19
character passwords than 18 character ones, 100s or 1000s times more 19 than
17, and so on)

On the second point, yes, exactly, you need a word list of all hexadecimal
strings of length x. Again, in the case of MD5 (128 bits), this is all the 32
character hexadecimal strings (since 32 characters * 4 bits per hexadecimal
character is 128 bits). Such a list has a length of 2 to the power of 128 by
definition - 340282366920938463463374607431768211456 items (about 10^38).

Making a list 10^38 items long is not impossible since that's well below the
number of atoms in the earth (about 10^50). It is probably impractical
however. Suppose you could store the numbers in iron (the most abundant
element), you'd need to store each item of the list in about 0.01 nanograms.

------
jules
One thing that I always wondered about this approach:

    
    
        for (i = 0; i < 1000; i++) 
            scrambled_password = HASH(scrambled_password)
    

Aren't we weakening the hash function? Presumably the hash function is not
one-to-one, so if you iterate this for many iterations there is a danger that
you could end up with a function that has a much higher probability of
collisions?

~~~
yuvadam
Why would you assume that?

 _Persumably_ there should be no real reason why HASH(8_char_password) =
160_bit_hash should be less strong than HASH(160_bit_hash).

Not only that, but most hashing algorithms already do several iterations
before returning the hash.

~~~
rfergie
I think the concern is that the hash function may converge.

------
sp332
A full second? Facebook has 900,000,000 active users. They would need over
10,000 CPUs running for 24 hours just to log them in.

~~~
lurker14
..amortized over several weeks. People don't log in every day.

~~~
sp332
If you share a computer, people log in several times a day. And I bet the
distribution over time is lumpy. Even assuming that demand is completely flat,
that's over 500 CPUs running 24/7 for three weeks _just for the hashes_ to log
people in.

------
ragmondo
Why can't _I_ be allowed to choose the authentication method I use to access
_MY_ data (and be responsible for the consequences if mis-used). Is my data in
linkedin really my data after all?

~~~
eli
It wasn't the method of authentication that was the problem, it was that the
stored credentials were inadequately protected against brute force attacks.

------
sparre
Is there a reason that one doesn't use a public-key encryption function with a
unique, random public key per password to store the scrambled passwords? One
would then store the public key and the encrypted password as md5crypt stores
the salt and the hashed password.

This is of course not run-time configurable to increase the computational
complexity of the password scrambling, but besides that, what are the
problems? (I assume that there must be some, since I haven't ever heard of
anybody handling passwords this way.)

~~~
jcromartie
Does this imply the _client_ doing the encryption? I.e. the client creates a
key pair and sends the public key to the server?

It sounds good but the challenge, as always, is the infrastructure. I think it
would be great if I had a single personal private key from which I could issue
chained keys for each domain where I have an account. But imagine managing
this across desktops, browsers, phones, game systems, etc. ...

~~~
sparre
My intent was that the server should still be responsible for scrambling the
password as usual. - My question is only about changing the algorithm server-
side.

------
drostie
Dear tech journalists, please stop saying stuff like "But we have yet to find
out why nobody objected to them protecting 150+ million user passwords with
1970s methods." We _do_ know why people use SHA1(unsalted password), and it's
because the dev stack still doesn't support something like SHA-256 or better
yet bcrypt/PBKDF2 at all levels.

So, right, I was a web developer pushing my PHP-based company to have a more
robust-against-db-compromise password hashing strategy. You know what the huge
problem was? The huge problem was, MySQL (and hence phpMyAdmin) didn't have a
SHA2() function until mid-2010. Not only is SHA2() 'not enough', i.e. it's too
fast and you want to do key stretching -- but even then, they didn't even have
_that_.

So suppose you are developing an agile product, someone loses access to their
account and asks for a new password, you type `head -c 9 /dev/urandom |
base64` into your shell and get back `pYG3fvp9c06m`. If you don't have
anything better built yet, you're going to go into the database and write the
one-off query `UPDATE users SET pw_hash=SHA1('pYG3fvp9c06m') WHERE username =
'bob.bobertson'`, or, at best, `SET salt='tyDvBBHioUNS',
pw_hash=SHA1('tyDvBBHioUNSpYG3fvp9c06m')`.

If you _could_ get an interoperable PBKDF2 working in MySQL/Postgres, PHP, et
cetera, devs would use that. It's precisely because it's not easy that it's
not adopted.

EDIT: My apologies to Poul-Henning Kamp for implying that he was a journalist.
I thought that would be a sort of compliment but I can see now that it's more
of a sort of category error. (But I still think that the problem is precisely
that the whole dev stack doesn't support any standard.)

~~~
willvarfar
Poul-Henning Kamp is many things, but journalist?

He is allowed to say stuff like "But we have yet to find out why nobody
objected to them protecting 150+ million user passwords with 1970s methods."

And this is Linkedin. They should know and do better.

I actually imagine that their very gifted developers are running around
wondering how they themselves didn't audit this.

~~~
willvarfar
> I actually imagine that their very gifted developers are running around
> wondering how they themselves didn't audit this.

or perhaps its that some 3rd party can authenticate users using sha1 passwords
i.e. that internally linkedin passwords are scrypted or something, but this
dump was from MitM between 3rd party plugin and linkedin?

------
ludwigvan
In last.fm's defense, they argued that there were some hardware devices
(radios) that had last.fm clients, so they couldn't update their password
system.

I have a question though, how would a strong password that takes around 1
second to hash affect the scalability of these systems; would it impact the
login times of users a lot? Imagine thousands of people trying to login at
once. Might it be the reason linkedin didn't hash and salt properly?

~~~
tosh
They still could've STORED them in a different way in the backend (e.g. use
the md5 hash as 'password' and then use pbkdf2) then the leak would not have
been as much of a problem as it was.

------
elchief
1\. Wouldn't it be better if they used openid like stackoverflow?

2\. Any advice on encrypting passwords? We store passwords for some 3rd party
services for our users.

