

One way to fix your rubbish password database - jgrahamc
http://blog.jgc.org/2012/06/one-way-to-fix-your-rubbish-password.html

======
healsdata
Is there a security disadvantage to taking the MD5 hashes you already have and
running those through bcrypt? It seems like that would let you get to a salted
bcrypt implementation in one day as opposed to waiting for all your users to
log in. Perhaps you could do the mix (md5 + bcrypt) until the user logs in and
then switch them solely to bcrypt?

~~~
tptacek
No. I don't believe there's any disadvantage to this. An MD5 hash is a 128 bit
random number; it's 16 fully random characters, better than almost any human
password.

~~~
lukeschlather
An MD5 hash is not a random number; it is generated from some text string.
It's possible that bcrypt(salt + MD5(text)) opens you up to collision attacks
that are not possible with bcrypt(salt + text). It seems unlikely that it
would open you up to attacks that are not possible with md5(text) but MD5 is
not a random number so I'm not too sure.

~~~
joshuahedlund
I'm still trying to learn this stuff, but I do not understand fundamentally
how bcrypt(salt + MD5(text)) could be worse than bcrypt(salt + text). What if
everyone's plaintext password was already a string of characters identical to
some MD5(text)? If bcrypt(salt + MD5(text)) could be bad, then doesn't that
mean bcrypt(salt + text) could be bad too?

~~~
lmkg
If you compose hash functions, you get the union of possible collisions.

Let's say that "foo" and "bar" are two distinct passwords that have the same
MD5 hash. Then bcrypt(md5("foo")) == bcrypt(md5("bar")), regardless of how
bcrypt("foo") compares to bcrypt("bar"). By pre-hashing with MD5, you have
added possible collisions that weren't there previously, and those collisions
remain regardless of how many more hashes you pile on top.

~~~
chc
We're not pre-hashing with MD5. The MD5 was already there. It's the only
source text we have. The proper comparison here isn't MD5+bcrypt vs. just
bcrypt — it's MD5+bcrypt vs. just MD5. So any collisions that MD5 causes are
immaterial — they'd be there either way.

It seems to me that the most obvious problem is that you get two chances at
colliding — once with MD5 and once with bcrypt. But bcrypt is not known to be
especially vulnerable to collision attacks, so this setup is probably not
noticeably worse than MD5 alone. But that's just looking at probabilities — I
ain't no fancy crypto expert or nothin', so there might be much more subtle
vulnerabilities than the added chance of collision.

~~~
lmkg
> _It seems to me that the most obvious problem is that you get two chances at
> colliding_

Yeah, that's all I'm saying. I was answering a question about being
"fundamental worse," and fundamentally, there are now two sources of potential
collisions instead of one. In theory, that's twice as insecure! However, the
practical effect is unlikely to rise above absolute nil anytime soon.

------
jcromartie
This is a good step, but unfortunately all of the actual passwords are still
out there, so they need to be changed.

I think a better idea would be to establish an easily implemented pattern for
"password bankruptcy" that companies could follow in the case of a leak.

~~~
donpdonp
What would a password bankruptcy pattern look like?

One thought is to invalidate all passwords and fall back on email password
recovery when a login is attempted.

This leads me to an idea I've tried once - if access to the inbox is
equivalent to password credentials, why not use an email to login? By this I
mean the web site login is a single field - email address. The system emails a
one-click-login URL to the user that can be re-used (possibly with a month
expiration time). The user can look up the URL in their inbox when they want
to login again, or use a long-lived cookie.

~~~
desas
Emailing a link to login was one of two supported login methods for redhats
mugshot social network. The other was sending the link via xmpp.

In practice I end up doing this for little used sites because I use either my
phone, tablet, and two laptops for browsing the internet.

It's annoying if you work somewhere that doesn't allow access to personal
email accounts and you want to log-in to something.

------
rb2k_
Even if you run a bad codebase that just uses unsalted MD5 and you don't want
to add a new crypto algorithm:

Couldn't you just run your whole database through X more rounds of MD5 and do
the same in your authentication function?

That way, script kiddies couldn't use precomputed rainbow tables they
downloaded somewhere off Bittorrent.

Each additional round will also reduce the speed of a brute force attack while
still keeping the changes to the codebase will be pretty small.

Unless there are rainbow tables for a certain number of MD5 iterations, it
would be a start...

~~~
IgorPartola
I think the conventional wisdom is that you should not re-invent security. I
am slowly learning this, but the give-away seems to be questions that start
with "Couldn't you just..."

~~~
rb2k_
It also was conventional wisdom that banks were too big to fail ;)

Are there any actual arguments against using this as an 'easy' fix to the
precomputed rainbow tables scenario? Multiple rounds of a cipher seem to be a
relatively common operation in crypto and have helped other old ciphers. One
of the more prominent ones would probably be the move from DES to triple DES.

I guess dictionary attacks on GPUs would still be easy enough, even with more
iterations, but anything that isn't directly in a dictionary might benefit
quite a bit from multiple iterations.

It's not as good as actually using proper crypto rather than hashing
algorithms that were designed to be fast, but it seems like an easy to
implement low-risk solution.

~~~
IgorPartola
I am no security expert. I can't tell bcrypt from a hole in the ground. All I
know is that it's all fun and games until there is a problem with this home-
brew implementation and then it's too late. That is why I think it's best to
avoid anything like this and instead go with bcrypt/scrypt, etc. and re-
evaluate periodically based on latest industry standards. Perhaps an actual
security expert on here can evaluate your idea. I seem to remember it being
raised many times on here, so there may be an answer to this on one of the
discussions of bcrypt vs scrypt or some such.

------
ernesth
Isn't the fact that s/bcrypt is by design costly preventing this idea from
being executed?

~~~
tedunangst
Assuming you use 0.01s per hash settings, you can upgrade somewhere around a
million accounts per hour.

------
mjschultz
I'll admit, I must be the only one that doesn't quite get the jump from step 4
to step 5.

In step 4, we make the assumption that their API is out in the wild, in use,
and sends the md5(s, p) in the request. I get that we take that value, run it
through scrypt and match against our stored value to authenticate. So the
database has:

    
    
        scrypt(s', md5(s, p))
    

No problem authenticating the API requests with that.

Step 5 says once the user logs in with their actual password, we update
entirely to the new scheme of scrypt(s'', p) and store just that. Now the
database only has:

    
    
        scrypt(s'', p)
    

But the API user still sends md5(s, p) to authenticate, right?

So then what happens when that same user goes back to the API-using app? It's
still uses the API so it'll send the MD5(s, p) and fail since we've discarded
the transitional scrypt value when they logged in via the web interface.

Is there a deprecation period that supports both types while API using apps
updated to a new API for the new scheme?

~~~
jgrahamc
Should have made clear that you can't do 5 if you need 4.

~~~
mjschultz
Ah, okay. So step 5 is the else condition from the "if" that begins step 4. At
least, until the API is upgraded to the new improved edition and most/all API
apps are using the new version.

------
nateabele
Okay, I must really be missing something here.

If your original database contains a bunch of unsalted SHA1 (or worse, MD5)
hashes, what good does securing the hashes themselves do if the means to
generate the corresponding plaintext has already been released into the wild?

Someone please tell me I'm missing something obvious.

~~~
joshrice
It's how to fix _your_ rubbish password storage, not LinkedIn's or the others
who've been compromised. That's the difference.

------
yathern
The article states that LinkedIn was using salted SHA-1 hashes, but I thought
that wasn't the case. Either way, aren't salted hashes essentially uncrackable
by all means except full out brute force?

If my password is "password", and I change it to "#b1@password%3dy", and then
hash it, isn't it secure from basic dictionary/rainbow table attacks?

I'm a bit new to cryptography, so please forgive me if I'm not understanding
some of this correctly.

~~~
IgorPartola
Brute force is sometimes all you need. The problem is that using GPU's you can
compute so many hashes a second that a short password simply cannot withstand
such an attack for long. The salt helps a bit, but if someone is brute-forcing
the hashes all it means is that once they have your password they don't have
the other person's who happens to use the same one.

~~~
yathern
I see, but isn't brute-forcing "aecd8c83718c381cpassworda3802..." going to
take far, far longer? Even on some huge botnet clusters, I still don't imagine
how it could be possible to crack that very quickly.

~~~
IgorPartola
Oh, of course. But it will still take less time than you think. After trying a
common dictionary the attacker just starts brute-forcing every single
combination and since md5 is so quick and works so well on the GPU that it may
take mere hours to find the answer. I've personally had what I considered a
secure password cracked out of a sha1 + salt setup. Now I use LastPass and
generate random different 32 character passwords for every service I use.
LinkedIn leak does not affect me: 32 chars is enough to give me a day or two
to change my password and none of my other accounts are compromised even if
the attacker gets my LinkedIn password.

~~~
yathern
Okay, thanks for the answer. I was under the impression that brute forcing
takes a long time.

------
notmyname
On step 3, where you say "scrypt(s'i, md5(si, password))", don't you actually
need "scrypt(s'i, md5(s0, password))", where s0 is the original salt? In other
words, you still need to know the original salt you were using to successfully
migrate.

Therefore, if you are storing the per-user salt as the first bytes in the
hashed password field, then you have to be careful when you "throw away the
old weak hash hi and forget it ever existed."

~~~
jgrahamc
The original salt is s_i which you do need to keep around. The new salt is
s^'_i.

~~~
notmyname
Ah. my mistake. I completely missed the "'" as I was reading it.

------
mistercow
If I'm not mistaken, the first paragraph is wrong. All signs point to LinkedIn
having used _un_ salted hashes.

~~~
aidos
According to their statement, they started salting some time recently. I guess
you got a salt when you next logged in?

------
PaulHoule
I used this strategy years ago (2002) to migrate plaintext passwords in a site
with 50k+ users. In fact, I built this into the system so I could do arbitrary
migrations between password encodings whenever I felt it was necessary.

It works well.

~~~
IgorPartola
Ruby and Django could do well to have this type of strategy baked in. This way
you update your configuration and the passwords are immediately upgraded.

------
danskil
I was just doing a write up on swapping auth back ends to gain more security
[http://schneems.com/post/24678036532/zomg-my-passwords-
are-i...](http://schneems.com/post/24678036532/zomg-my-passwords-are-insecure-
now-what).

------
gioele
> 4\. If, like last.fm, you were also allowing third-parties to authorize
> users...

... then you should stop doing that and you should start using OAuth, so the
client application never sees your user's password.

~~~
alexmuller
Theoretically, sure. But I can't think of a nice way to authorise users on
something like [1]. They'd then need a computer with the radio to provide some
kind of access code, I guess?

[1]
[http://www.robertsradio.co.uk/Products/Internet_radios/STREA...](http://www.robertsradio.co.uk/Products/Internet_radios/STREAM83i/index.htm)

~~~
ErrantX
One time passwords (feed the radio your generated password & let it use that
to negotiate authorisation/api keys).

------
esbwhat
I always wondered why people don't just use rainbow tables to get all the raw
passwords, and then hash them with the better algorithm. The ones that are
left, you just change upon login.

------
peteretep
Here's a worked example of a similar technique I wrote up ages ago:

<https://gist.github.com/1051238>

------
StavrosK
Did you get that from here?: <http://news.ycombinator.com/item?id=4078751>

~~~
jgrahamc
No, I asked a question yesterday about this
(<http://news.ycombinator.com/item?id=4080823>) and spent a long time thinking
about it. Great minds...

~~~
StavrosK
Yeah, I guess it's not that amazing a coincidence... Most people would arrive
at that.

~~~
jgrahamc
A more fun instance of this sort of thing on HN is when I suggested that HN
might be attackable because of a flaw in random number generation and then
someone else who hadn't seen my suggestion went ahead and did it.

Me mentioning it: <http://news.ycombinator.com/item?id=596126>

The attack: <http://news.ycombinator.com/item?id=639976>

~~~
mixmax
That was an absolutely amazing hack

------
pdenya
The variable names in this article are throwing me off. Is there a special
significance to the subscripts and superscripts in the variables?

~~~
kbanman
The subscript i denotes that the variable belongs to a single user i. The tick
at the top is pronounced 'prime' and is used to differentiate between versions
or iterations.

~~~
Ineffable
Is that called "prime" by most people? I've always heard it just pronounced
"dash", as in "s-dash" or "f-dash".

~~~
drivebyacct2
>Is that called "prime" by most people?

yes.

