
Lessons from the History of Attacks on Secure Hash Functions - luu
https://z.cash/technology/history-of-hash-function-attacks.html
======
zackmorris
From what I understand, the Feb 23rd SHA-1 attack was possible because they
figured out how to get the internal state (160 bits or 5 words of 32 bits) to
match from two separate pieces of data. After that, additional data could be
appended to the first two pieces as long as it was identical.

The internal state would play back the same sequence from there on, just like
two random number generators starting from the same seed.

Here is a comparison of internal state sizes:

[https://en.wikipedia.org/wiki/SHA-0#Comparison_of_SHA_functi...](https://en.wikipedia.org/wiki/SHA-0#Comparison_of_SHA_functions)

SHA-256 is susceptible to this same flaw, it would just take longer because it
has about about 128 bits of security instead of less than 80 for SHA-1. It
looks like only SHA-3 really "gets it" with a 1600 bit state size.

After all of the effort put into making highly pseudo-random hash functions,
it's a wonder that the state size was only the size of the hash. By
comparison, Mersenne Twister's state size is 19937 bits (624 words of 32 bits
minus 31 bits):

[http://www.quadibloc.com/crypto/co4814.htm](http://www.quadibloc.com/crypto/co4814.htm)

Does anyone know why hash algorithms keep using such small state sizes,
leaving us vulnerable to this same issue?

~~~
meta_AU
A 1:1 state to input ratio is just the optimal performance value (more input
than state lowers secury, and the other way lowers efficiency). SHA-3 still
has that ratio, it just supports consuming more bytes at a time. Increasing
this size too much reduces efficiency for small input sizes.

SHA-512/256 is in a pretty good spot, as the internal state is twice the
recommended hash security size, and the output is equal to the recommended
size (256bit for "128bit security level). Common tree hash constructions will
want to take 2 inputs and reduce to 1 output, which is the exact ratio for
SHA-512/256.

~~~
magila
What you really care about for tree hashes is the block size to output ratio.
SHA-512/256 has a 1024 bit block size so it is less optimal than SHA-256,
which has a 512 bit block size, for tree hashes. This is assuming you don't
need any extra data in the internal nodes.

~~~
ribasushi
Could you expand on this comment and/or provide links to relevant reading?

Much appreciated!

------
hannob
> Another interesting pattern that I perceive in these results is that maybe
> sometime between 1996 (Tiger) and 2000 (Whirlpool), humanity learned how to
> make collision-resistant hash functions,

I actually feel that this can be even more generalized: At some point people
learned to create unbreakable algorithms. There is literally no mainstream
crypto algorithm beyond the 2000s that has seen any significant breakage. And
very likely there never will be, with one exception: quantum computers will
break modern ECC.

I think there's simply a dark age of crypto research with 90s algos and
earlier. Which isn't surprising: Back then people were fighting whether it's
even legal to do that kind of research.

~~~
meredydd
This is far too optimistic - just look at the "History" chart. The average age
of 90s hashes when they were broken was 10-15 years. It's equally probable
that the "modern" algorithms are just too young for us to see them broken.

~~~
hannob
There's some aspect you miss: For both major hash breakages there was a ~10
year warning phase (for md5 the first breakthrough was 1996, for sha1 2004),
where it was basically clear these hashes were bad, just noone had done the
full attack yet. There's no such warning from any modern hash yet.

------
tromp
The broken SPHINCS links should perhaps point to the paper
[https://sphincs.cr.yp.to/sphincs-20150202.pdf](https://sphincs.cr.yp.to/sphincs-20150202.pdf)

~~~
StavrosK
I'd never heard of this before. Is it as good as it looks?

------
jepler
There's a second hypothesis that explains why (second) preimage attacks are
not commonly published: after a hash collision is produced, the crypto
community has no further interest. How much was _ever_ published about, say,
MD4 after the 1995 document that this article cites, showing a practical piece
of software for generating collisions? I haven't tried to answer that
question, but I suspect the answer is "not much".

~~~
jepler
I could be wrong about this. For instance,
[http://dl.acm.org/citation.cfm?id=2148250.2148477](http://dl.acm.org/citation.cfm?id=2148250.2148477)
[https://pdfs.semanticscholar.org/868b/443aec1e87951c8a896b12...](https://pdfs.semanticscholar.org/868b/443aec1e87951c8a896b1298c1fa9cdd1e1e.pdf)
"Improved preimage attack on one-block MD4" is a 2012 publication improving
the preimage attack on MD4 from 2^107 to 2^95. So 17 years after a practical
collision attack on MD4, academics _ARE_ still working on it as a
cryptographic primitive, but despite that they have _NOT_ found a practical
preimage attack on it.

------
chronial
> Hash-based digital signatures are secure (resistant to forgery) as long as
> the hash function they are built on has second-pre-image resistance

I am not very experienced with this, but isn't this clearly wrong?

If I have a controllable collision (like SHA1), I can get someone to sign
document A, then destroy all evidence of document A's existence and claim they
signed document B.

Isn't it essential that a digital signature scheme is immune against such an
attack?

~~~
modalduality
Generally people won't sign garbage or random documents, so Document A has to
be of a very specific subset of documents a person would sign. Then finding
Document B constitutes a second pre-image attack.

~~~
hueving
You're right as long as the format doesn't contain a place to hide all of the
garbage (e.g. Crap not rendered in a PDF by the PDF viewer).

------
loup-vaillant
ed25519 is listed as sensitive to hash collision.

I believe that's an error: the official site¹ puts "collision resilience" in
the list of features.

(1) [http://ed25519.cr.yp.to/](http://ed25519.cr.yp.to/)

