You realize that despite the fact that I still don't follow the logic here, SHA ...

paulgb · on June 6, 2009

Actually I didn't, but that more or less answers my original question. Now I realize why you (or rather, your character) said that it didn't matter that md5 is broken in this case.

In general though, I'm still convinced that putting a hash function in a loop will increase the probability of a collision.

Or maybe I misunderstood your point about the loop? I assume by putting a loop around the hash you mean something like this:

  key = SHA1(password)
  1000 times do
    key = SHA1(key)
  return key

tptacek · on June 6, 2009

Look I don't want to be dense here --- and there's a good chance that's exactly what's happening --- but it sounds to me like you're talking about collisions between SHA1, SHA1(SHA1), SHA1(SHA1(SHA1)), etc.

Rename the functions. SHA1(SHA1) is now SHA1', SHA1(SHA1(SHA1)) is now SHA1''.

Explain why I care if there's an m and an n such that SHA1'(m) = SHA1''(n)?

They're two different hash functions.

jibiki · on June 6, 2009

No, you're still missing the point :)

He's saying that SHA1'(m) = SHA1'(n) implies that SHA1"(m) = SHA1"(n).

Let's do this with a very simple, silly hash function.

Let h(x) = (x/2)%10 (in integer arithmetic, e.g, 5/2 = 2, so h(5) = (5/2)%10 = 2%10 = 2.) So then let h2(x) = h(h(x)) = (h(x)/2)%10, and h3(x) = h(h(h(x))) = (h2(x)/2) % 10.

Here's a table:

  x:  h  h2  h3
  0   0  0   0
  1   0  0   0
  2   1  0   0
  3   1  0   0
  4   2  1   0
  5   2  1   0
  6   3  1   0
  7   3  1   0
  8   4  2   1
  9   4  2   1
  10  5  2   1
  11  5  2   1
  12  6  3   1
  13  6  3   1
  14  7  3   1
  15  7  3   1
  16  8  4   2
  17  8  4   2
  18  9  4   2
  19  9  4   2
  20  0  0   0
  21  0  0   0
  ...

Hopefully this table shows that collisions are much more likely for h2 than for h, and more likely for h3 than for h2. The range of h3 is smaller than the range of h2 which is smaller than the range of h.

paulgb · on June 7, 2009

Right, that's what I had in mind.

Of course, it's a bit of a jump to assume that the result is the same when you use a cryptographic hash function instead of a trivial one.

But regardless of the hash function, the same problem will occur unless there are no collisions for input between 0 and 2^160 - 1 (inclusive). From what I can tell, SHA1 can't guarantee this.