And to answer your question (I'm not a crypto expert, so take with a pinch of salt):
With SHA-1, that probability is generally accepted to be 2^(-160). 1000 is roughly 2^10, so the final probability is about 2^(-150). I guess that's still improbable enough...
But I think you do make a very valid and interesting point.
Maybe a better way to explain my concern is to consider hashing something once versus twice, rather than 1000 times.
Let h be a hash function and p be the probability P(h(x) = h(y)) for randomly chosen x and y.
Say you have unique inputs a and b.
Obviously, P(h(a) = h(b)) = p.
Now consider P(h(h(a)) = h(h(b))). h(h(a)) = h(h(b)) if:
h(a) = h(b) [probability p]
h(a) != h(b) [probability 1-p], and
h(h(a)) = h(h(b)) [probability p].
Since p is small, the probability of a collision has nearly doubled just by applying the hash function again.
[note: Higher up in this thread I mistakenly implied that the probability increased linearly. With a low number of iterations and realistic values for p it should be close to linear, but not exact.]
The converse is not true. There will be cases where SHA1(SHA1(m)) = SHA1(SHA1(n)), but SHA1(m) != SHA1(n). (Or does SHA1 somehow guarantee that this will not happen?)
So it seems to follow that the chances that H(a) = H(b) when H(m) -> SHA1(SHA1(m)) must be higher than when H(m) -> SHA1(m).
Am I missing something?
But on the off chance that we can end this thread gracefully, I'll point out that SHA256(SHA(256(m), m) is SHAd256(m), and considered by Ferguson to address security concerns in straight SHA256.
If SHA1(n) = SHA1(m), we know SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of SHA1 being deterministic)
Also, even if SHA1(n) != SHA1(m), there is still a chance that SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of collisions: SHA1(a) may equal SHA1(b) even if a != b.)
So the probability that SHA1(SHA1(n)) = SHA1(SHA1(m)) must be more than the probability that SHA1(n) = SHA1(m).
In general though, I'm still convinced that putting a hash function in a loop will increase the probability of a collision.
Or maybe I misunderstood your point about the loop? I assume by putting a loop around the hash you mean something like this:
key = SHA1(password)
1000 times do
key = SHA1(key)
Rename the functions. SHA1(SHA1) is now SHA1', SHA1(SHA1(SHA1)) is now SHA1''.
Explain why I care if there's an m and an n such that SHA1'(m) = SHA1''(n)?
They're two different hash functions.
He's saying that SHA1'(m) = SHA1'(n) implies that SHA1"(m) = SHA1"(n).
Let's do this with a very simple, silly hash function.
Let h(x) = (x/2)%10 (in integer arithmetic, e.g, 5/2 = 2, so h(5) = (5/2)%10 = 2%10 = 2.) So then let h2(x) = h(h(x)) = (h(x)/2)%10, and h3(x) = h(h(h(x))) = (h2(x)/2) % 10.
Here's a table:
x: h h2 h3
0 0 0 0
1 0 0 0
2 1 0 0
3 1 0 0
4 2 1 0
5 2 1 0
6 3 1 0
7 3 1 0
8 4 2 1
9 4 2 1
10 5 2 1
11 5 2 1
12 6 3 1
13 6 3 1
14 7 3 1
15 7 3 1
16 8 4 2
17 8 4 2
18 9 4 2
19 9 4 2
20 0 0 0
21 0 0 0
Of course, it's a bit of a jump to assume that the result is the same when you use a cryptographic hash function instead of a trivial one.
But regardless of the hash function, the same problem will occur unless there are no collisions for input between 0 and 2^160 - 1 (inclusive). From what I can tell, SHA1 can't guarantee this.