> You were just going to run MD5 on a salt and a password and store the hash...

KateLawson · on Jan 9, 2010

It sounds like your understanding is a bit off. Let's keep these terms separate: composition and nonce. A composition is some fixed function, composed of multiple iterations of a smaller primitive. A nonce (or salt) is a unique value per instance, where "instance" in this case is a single stored hash result.

The BSD MD5-crypt password hash algorithm is composed of multiple MD5 iterations. However, if you look at the code, you'll see that the variation of what gets tossed into MD5 only depends on the loop counter (i). So the hash function is not data-dependent. You could unroll all the loops into a single, large function that applies MD5 repeatedly to various inputs.

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libcrypt/crypt...

There is a salt in MD5-crypt, but it is just input data to the function. It does not change the actual hash function that is performed.

Now, your question: "is iterated MD5 of the message alone less secure than iterated MD5 of the data + a salt?" (And a related question that I brought in above: "is a function that itself varies based on the input data more secure than either of these?)

From an algorithmic perspective, there should be no difference. If you can create an MD5+ that calculates exactly the same output as MD5(MD5(data)) but always in less steps than 2*MD5, then you have broken MD5. The easy way to see this if you can create this MD5+, you can also "roll back" an MD5 result to some previous value, a 2nd-preimage attack.

From the perspective of creating a brute-force search device, there is some difference. If you use the salt to vary your function, it starts to take up more logic area. The larger the state space you create, the more RAM your device will require or the more slightly-different logic blocks. This is what Colin is doing with scrypt. It's a good idea and one that we should be moving to in the future.

http://www.tarsnap.com/scrypt.html

However, there is no difference between MD5(MD5(data || nonce)) and MD5(nonce || MD5(data || nonce)) from a brute force or algorithmic perspective.

philh · on Jan 10, 2010

I think my mistake was in assuming that you could create a lookup table for iterated MD5, which I realise now would take up far too much space.

ErrantX · on Jan 9, 2010

Adding the nonce would make it harder; but the iteration is hard to crack anyway (ignoring collisions for the moment; you shouldn't be using md5 full stop because of that).

shabble · on Jan 9, 2010

presumably, the real issue of cracking a hashed password database is that you can then test if the user has the same password for other services (which is a fairly reasonable assumption). Collisions are likely to hinder that goal, by providing multiple 'valid' passwords which may not be the ones originally chosen by the user.

Still, I'm guilty of implementing the brain-dead 'append-a-secret-and-md5-it' scheme, so I'll be using something more sensible in future.

KateLawson · on Jan 9, 2010

I don't know if you've seen it, but I outlined the attacks against H(msg || secret), aka "secret suffix", here:

http://rdist.root.org/2009/10/29/stop-using-unsafe-keyed-has...

In particular, there are simple attacks when it is used with a hash function that has effective collision attacks, such as MD5.

tptacek · on Jan 9, 2010

MD5(MD5(x)) isn't the same hashing algorithm as MD5(x). You can't have collisions between the two of them.

philh · on Jan 10, 2010

I was talking about collisions within the same hashing algorithm. It might be the case that MD5^2(a) == MD5^2(b) where MD5(a)!= MD5(b), but not the other way around. So collisions in MD5^2 will be a superset of collisions in MD5. Iterating thousands of times will further increase the number of collisions.

cperciva · on Jan 10, 2010

So collisions in MD5^2 will be a superset of collisions in MD5. Iterating thousands of times will further increase the number of collisions.

This is 100% true, but 99% irrelevant. For any good hash, the number of collisions only increases linearly (proof: if it increased any faster, you would have a fast collision-finding algorithm); so you'd need to iterate your hash an unreasonably large number of times before extra collisions outweighed the benefit of slowing down the attacker.

That said: This is exactly why we have PBKDF2. PBKDF1 was just an iterated hash; PBKDF2 was introduced to "reduce concerns about the recursion degenerating into a small set of values".

tptacek · on Jan 10, 2010

You're suggesting that there are passwords that will generate 20-byte hashes that will themselves collide on the next application of MD5?

I'm not sure this has much to do with the security of PBKDF2.