Hacker News new | past | comments | ask | show | jobs | submit login

As far as I can tell, you're confusing collisions with cycles. I don't think any of the main cryptographic hashes have problems with short cycles.



I could well be confusing the two, because I'm not sure if I even know what cycles are :-). I'm guessing from the name that cycles are when h(h(h(... h(a)))) = a? That's not quite what I'm getting at.

I'll try explaining my concern a little better. Suppose you have a hash function h. Let p be the probability that h(x) == h(y) for any unique x and y.

Now suppose you have some arbitrary inputs a and b.

  h(a) = h(b) with probability p
  h(h(a)) = h(h(b)) with probability 2p
  h(h(h(a))) = h(h(h(b))) with probability 3p
  etc.
The probability at each step increases because a collision is possible at each step, but a collision in a previous step guarantees a collision in the current one.

With a good hash function, p should be astronomically low, so I guess that's why it isn't a problem in this case? Or am I completely off-base in my assumptions?


I see what you mean now. And your intuition on cycles was spot-on.

And to answer your question (I'm not a crypto expert, so take with a pinch of salt):

With SHA-1, that probability is generally accepted to be 2^(-160). 1000 is roughly 2^10, so the final probability is about 2^(-150). I guess that's still improbable enough...

But I think you do make a very valid and interesting point.


The loop has nothing to do with the probability of a collision. If your key comes out of /usr/share/dict/words, you have a 2^18 search space, not a 2^160 space. If you have a 2^18 search space, you better hope your constant factors are very high. Hence the loop.


Yeah, I get why the loop is there. But I still think it increases the probability of a collision by nearly 1000x over the probability of a collision without the loop.

Maybe a better way to explain my concern is to consider hashing something once versus twice, rather than 1000 times.

Let h be a hash function and p be the probability P(h(x) = h(y)) for randomly chosen x and y.

Say you have unique inputs a and b.

Obviously, P(h(a) = h(b)) = p.

Now consider P(h(h(a)) = h(h(b))). h(h(a)) = h(h(b)) if:

  h(a) = h(b) [probability p]
     OR
  h(a) != h(b) [probability 1-p], and
  h(h(a)) = h(h(b)) [probability p].
So the total probability when we repeat the hash is p + (1-p) * p

Since p is small, the probability of a collision has nearly doubled just by applying the hash function again.

[note: Higher up in this thread I mistakenly implied that the probability increased linearly. With a low number of iterations and realistic values for p it should be close to linear, but not exact.]

[edit:formatting]


H(m) -> SHA1(SHA1(m)) is not the same hash function as H(m) -> SHA1(m).


Right, but if SHA1(m) = SHA1(n), then SHA1(SHA1(m)) = SHA1(SHA1(n)) also.

The converse is not true. There will be cases where SHA1(SHA1(m)) = SHA1(SHA1(n)), but SHA1(m) != SHA1(n). (Or does SHA1 somehow guarantee that this will not happen?)

So it seems to follow that the chances that H(a) = H(b) when H(m) -> SHA1(SHA1(m)) must be higher than when H(m) -> SHA1(m).

Am I missing something?


I think you're going to need to better explain the logic behind graf 2 before I can answer you.

But on the off chance that we can end this thread gracefully, I'll point out that SHA256(SHA(256(m), m) is SHAd256(m), and considered by Ferguson to address security concerns in straight SHA256.


It boils down to this:

If SHA1(n) = SHA1(m), we know SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of SHA1 being deterministic)

Also, even if SHA1(n) != SHA1(m), there is still a chance that SHA1(SHA1(n)) = SHA1(SHA1(m)). (this is a consequence of collisions: SHA1(a) may equal SHA1(b) even if a != b.)

So the probability that SHA1(SHA1(n)) = SHA1(SHA1(m)) must be more than the probability that SHA1(n) = SHA1(m).


You realize that despite the fact that I still don't follow the logic here, SHA collisions have no meaningful impact on the security of PBKDF, right?


Actually I didn't, but that more or less answers my original question. Now I realize why you (or rather, your character) said that it didn't matter that md5 is broken in this case.

In general though, I'm still convinced that putting a hash function in a loop will increase the probability of a collision.

Or maybe I misunderstood your point about the loop? I assume by putting a loop around the hash you mean something like this:

  key = SHA1(password)
  1000 times do
    key = SHA1(key)
  return key


Look I don't want to be dense here --- and there's a good chance that's exactly what's happening --- but it sounds to me like you're talking about collisions between SHA1, SHA1(SHA1), SHA1(SHA1(SHA1)), etc.

Rename the functions. SHA1(SHA1) is now SHA1', SHA1(SHA1(SHA1)) is now SHA1''.

Explain why I care if there's an m and an n such that SHA1'(m) = SHA1''(n)?

They're two different hash functions.


No, you're still missing the point :)

He's saying that SHA1'(m) = SHA1'(n) implies that SHA1"(m) = SHA1"(n).

Let's do this with a very simple, silly hash function.

Let h(x) = (x/2)%10 (in integer arithmetic, e.g, 5/2 = 2, so h(5) = (5/2)%10 = 2%10 = 2.) So then let h2(x) = h(h(x)) = (h(x)/2)%10, and h3(x) = h(h(h(x))) = (h2(x)/2) % 10.

Here's a table:

  x:  h  h2  h3
  0   0  0   0
  1   0  0   0
  2   1  0   0
  3   1  0   0
  4   2  1   0
  5   2  1   0
  6   3  1   0
  7   3  1   0
  8   4  2   1
  9   4  2   1
  10  5  2   1
  11  5  2   1
  12  6  3   1
  13  6  3   1
  14  7  3   1
  15  7  3   1
  16  8  4   2
  17  8  4   2
  18  9  4   2
  19  9  4   2
  20  0  0   0
  21  0  0   0
  ...
Hopefully this table shows that collisions are much more likely for h2 than for h, and more likely for h3 than for h2. The range of h3 is smaller than the range of h2 which is smaller than the range of h.


Right, that's what I had in mind.

Of course, it's a bit of a jump to assume that the result is the same when you use a cryptographic hash function instead of a trivial one.

But regardless of the hash function, the same problem will occur unless there are no collisions for input between 0 and 2^160 - 1 (inclusive). From what I can tell, SHA1 can't guarantee this.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: