> You were just going to run MD5 on a salt and a password and store the hash. PHK runs MD5 for thousands of iterations. That’s called “stretching”.
My understanding is that unless you add a nonce at each iteration, this wouldn't help at all, and probably would just increase the number of collisions. Is that correct?
I recently had to look at the phpBB password hashing scheme. As I recall, they iterate MD5 a small-ish number of times without adding a new nonce. There's a comment which says their scheme would be improved if they did something - but then it would be slower, which means they'd use fewer iterations, which would make it less secure.
It sounds like your understanding is a bit off. Let's keep these terms separate: composition and nonce. A composition is some fixed function, composed of multiple iterations of a smaller primitive. A nonce (or salt) is a unique value per instance, where "instance" in this case is a single stored hash result.
The BSD MD5-crypt password hash algorithm is composed of multiple MD5 iterations. However, if you look at the code, you'll see that the variation of what gets tossed into MD5 only depends on the loop counter (i). So the hash function is not data-dependent. You could unroll all the loops into a single, large function that applies MD5 repeatedly to various inputs.
There is a salt in MD5-crypt, but it is just input data to the function. It does not change the actual hash function that is performed.
Now, your question: "is iterated MD5 of the message alone less secure than iterated MD5 of the data + a salt?" (And a related question that I brought in above: "is a function that itself varies based on the input data more secure than either of these?)
From an algorithmic perspective, there should be no difference. If you can create an MD5+ that calculates exactly the same output as MD5(MD5(data)) but always in less steps than 2*MD5, then you have broken MD5. The easy way to see this if you can create this MD5+, you can also "roll back" an MD5 result to some previous value, a 2nd-preimage attack.
From the perspective of creating a brute-force search device, there is some difference. If you use the salt to vary your function, it starts to take up more logic area. The larger the state space you create, the more RAM your device will require or the more slightly-different logic blocks. This is what Colin is doing with scrypt. It's a good idea and one that we should be moving to in the future.
Adding the nonce would make it harder; but the iteration is hard to crack anyway (ignoring collisions for the moment; you shouldn't be using md5 full stop because of that).
presumably, the real issue of cracking a hashed password database is that you can then test if the user has the same password for other services (which is a fairly reasonable assumption). Collisions are likely to hinder that goal, by providing multiple 'valid' passwords which may not be the ones originally chosen by the user.
Still, I'm guilty of implementing the brain-dead 'append-a-secret-and-md5-it' scheme, so I'll be using something more sensible in future.
I was talking about collisions within the same hashing algorithm. It might be the case that MD5^2(a) == MD5^2(b) where MD5(a)!= MD5(b), but not the other way around. So collisions in MD5^2 will be a superset of collisions in MD5. Iterating thousands of times will further increase the number of collisions.
So collisions in MD5^2 will be a superset of collisions in MD5. Iterating thousands of times will further increase the number of collisions.
This is 100% true, but 99% irrelevant. For any good hash, the number of collisions only increases linearly (proof: if it increased any faster, you would have a fast collision-finding algorithm); so you'd need to iterate your hash an unreasonably large number of times before extra collisions outweighed the benefit of slowing down the attacker.
That said: This is exactly why we have PBKDF2. PBKDF1 was just an iterated hash; PBKDF2 was introduced to "reduce concerns about the recursion degenerating into a small set of values".
My understanding is that unless you add a nonce at each iteration, this wouldn't help at all, and probably would just increase the number of collisions. Is that correct?
I recently had to look at the phpBB password hashing scheme. As I recall, they iterate MD5 a small-ish number of times without adding a new nonce. There's a comment which says their scheme would be improved if they did something - but then it would be slower, which means they'd use fewer iterations, which would make it less secure.