
Lessons from the history of attacks on secure hash functions (2019) - erwan
https://electriccoin.co/blog/lessons-from-the-history-of-attacks-on-secure-hash-functions/
======
userbinator
I'm no expert but I did a bunch of reading around the time the MD5 collisions
were found, and have what I think is a semi-intuitive notion of why preimage
resistance is much easier than collision resistance. A hash function has an
internal state that is altered by its input, and so each of the bits of its
output is essentially a very large Boolean expression of all its input bits. A
pre-image attack means that you're given the outputs and need to solve for the
inputs, while a collision means setting two sets of these equations together
and finding a differing set of inputs which will "cancel out" internally and
result in the same output. The latter is easier, because you're not
constrained by the outputs --- the final hash output can be anything, as long
as you find two different inputs that give the same output.

This very interesting article about using "tunnels" to find collisions in MD5
is worth reading:
[http://eprint.iacr.org/2006/105.pdf](http://eprint.iacr.org/2006/105.pdf)

------
deerIRL
Did anyone else find the tables completely unreadable? Embedding them in a
scrolling iFrame without anchoring the headers to keep half the screen
whitespace just seems insane to me.

~~~
abathur
They could certainly be easier to use. Sticky left/top rows might help. A
modal might help.

An interactive, filterable visualization for the timelines might help, too.
The timeline could be greatly condensed if the exact change points were marked
for a non-confusing number of functions.

Or perhaps a cheeky play on Minard's graph of Napoleon's Russia campaign, but
with factors like the cost of compute instead of the temperature. Or on
Munroe's flow graph of U.S. congress composition/ideology.

------
kerkeslager
Given only the explanation, I think I'd have a hard time figuring out how
collision resistance differs from second-preimage resistance. After all, if
you generate a collision, you have two inputs (preimages) that hash to the
same output (image) so isn't the second input a second pre-image of the first
input?

I'm not an expert, but I'll try to explain how I think this works, and invite
real experts to step in if I get it wrong.

I think the way this works is that, with collisions, the attacker gets to
choose their inputs. They don't care what two inputs they find that generate
to the same output, they only care that they _do_ generate to the same output.
With a second-preimage attack, one of the inputs is chosen by the hasher.

As it turns out, the ability to choose both your inputs is quite useful! You
can view hash functions in reverse: for a given hash function, the inverse
hash function is a function which takes the output of the hash function and
returns the set of inputs to the hash function. For many hash functions, the
inverse hash function IS NOT uniformly difficult to compute partial solutions
to: some inverse hashes are much easier to compute than others. So if you can
find an output for which the inverse hash is easy to partially compute, you
can generate collisions at that point much more easily.

However, in the space of possible hash outputs, the number for which the
inverse hash is easier to compute is probably very small, or the possibility
of computing the inverse hash would have been noticed by the creators of the
hash function. So it's highly improbable that the hasher will have chosen a
hash input at a point where the hash input is one where the inverse hash is
easy to compute.

~~~
captaincrowbar
Collision attack: Find two different messages with the same hash.

[First] pre-image attack: Given a hash value, find a message with that hash.

Second pre-image attack: Given a message, find a different message with the
same hash.

------
pabs3
Another post on the lifetime of hash functions:

[http://valerieaurora.org/hash.html](http://valerieaurora.org/hash.html)

------
dependenttypes
> all others (RSA, DSA, ECDSA, Ed25519, etc.)

This table is incorrect. Ed25519 was explicitly designed with the intention of
only needing preimage resistance, not collision resistance.

~~~
erwan
The table is correct. First because design intent is irrelevant. Second
because the author clearly states that the focus of his comparative study is
on *-preimage resistance vs collision. There is more to the article than the
table :)

~~~
dependenttypes
I really do not understand what you are trying to say. Just like SPHINCS,
Ed25519 only cares that the hash function used with it has *-preimage
resistance and it would not be broken if a collision was found in it. See
[https://ed25519.cr.yp.to/ed25519-20110926.pdf](https://ed25519.cr.yp.to/ed25519-20110926.pdf)

> There is more to the article than the table :)

I am aware but my issue was with the table...

------
willvarfar
At the bottom the article mentions that the NSA said there was something wrong
with SHA-0. Do we now know what that was?

~~~
bannable
Boomerang attacks on SHA-0 have a complexity on the order of 2^33.

~~~
willvarfar
The boomerang (against block ciphers) was coined in 1999. It was used against
SHA-0 in 2008?

Apparently the NSA warned against SHA-0 in 1995.

So do we now know what they perceived the weakness to be? Or are we still
guessing?

