From Collisions to Chosen-Prefix Collisions Application to Full SHA-1 [pdf]

nneonneo · on May 10, 2019

There are, broadly, two kinds of collisions. In the first kind, you can pick an arbitrary prefix P shared between the collision pair; the algorithm spits out a pair of (essentially random) blocks of data C1, C2 in the middle that are different but yield the same hash value (H(P+C1) = H(P+C2)). In the second kind, you can pick two arbitrary (equal-length) prefixes P1, P2, and the algorithm outputs collision blocks C1, C2 such that H(P1+C1) = H(P2+C2). Because of the Merkle–Damgård construction common to MD5 and SHA-1, you can add arbitrary (equal) suffixes without affecting the collision.

With the first kind, you don't have much control over the collision blocks, and the prefixes and suffixes have to be the same. This can still be useful - file formats can be carefully rigged so that the differences in the collision blocks are noticed somehow (e.g. misaligning length fields, changing testable field values, introducing different code). With MD5, these collisions take just seconds to compute; with SHA-1, the one known example (Shattered) took 2^63 hash evaluations to compute.

With the second kind, you can do a lot more evil, since you can pick two completely arbitrary prefixes. This was abused (for example) to create a rogue CA certificate from an ordinary signed website cert. With MD5, this takes several hours with a good GPU and CPU; with SHA-1, it was previously thought to take around 2^77 evaluations - more than 16,000 times harder than the first type of attack.

This paper describes a method that would take only 2^69 evaluations for the second kind of attack on SHA-1, only 64x harder than the first type (in the worst case). This would be feasible for a big institution like Google or a government.

All of which reminds us that, yet again, SHA-1 deserves to be laid to rest.

hinkley · on May 10, 2019

Around 2010 I had to play Benevolent Dictator and refuse to support SHA-1 signatures on software that could potentially kill people.

Every time another article like this comes out I experience two emotions. First, a bit of smugness mixed with a lot of relief that I won that argument. Second, a bit of shock when people say "it's time to stop using SHA-1" as if that hasn't been the recommendation for 10 years.

It's dead, dead, dead. Now is not the time to have discussions about upgrading. Now is the time to start throwing people under the buss for dragging their feet on upgrades.

twoodfin · on May 10, 2019

Some nice practical detail: They estimate using the methods of previous research that the new technique brings the price on EC2 of your very own SHA-1 chosen-prefix collision to somewhere between $1.2MM and $7MM.

JackC · on May 10, 2019

Does this have any implications for the urgency of replacing SHA-1 in git?

I recently asked a question on the pip issue tracker[1] about whether it would make sense for pip to start accepting sha-1 git refspecs when running in --require-hashes mode, on the grounds that there's more upside to encouraging use of --require-hashes than there is downside risk of sha-1 collision attacks. This surely doesn't help that argument, but I'm not sure if it hurts it either.

[1] https://github.com/pypa/pip/issues/6469

latchkey · on May 12, 2019

Previously: https://news.ycombinator.com/item?id=13719368

floatingatoll · on May 11, 2019

The Git SHA1 to SHA256 transition plan is here, though it may not be finalized yet: https://github.com/git/git/blob/master/Documentation/technic...

CiPHPerCoder · on May 10, 2019

First SHA1 was shattered. https://shattered.io

Now it's reduced to shambles.

It's time to stop using SHA1. (HMAC-SHA1 is still okay.)

tptacek · on May 10, 2019

It's OK in the sense that it's not a security emergency, but, don't use HMAC-SHA1 (or HMAC-MD5, which is also in the same sense "ok").

wolf550e · on May 10, 2019

When do you expect it would be feasible to forge HMAC-SHA1 or HMAC-MD5?

nneonneo · on May 10, 2019

All current collision attacks assume you know the intermediate hash values (IHVs). With a typical HMAC forgery scenario, you don’t know the secret key so you don’t know the IHV corresponding to the inner hash application, which means existing collision attacks are not applicable.

If you do know the secret key, collisions are easy, but probably pointless since you can already forge HMAC signatures for anything you want.

Still, even if there’s no way to break HMAC-SHA1 (or even HMAC-MD5) faster than the birthday attack, you should still prefer to pick a better hash algorithm for safety.

a1369209993 · on May 10, 2019

Also, you shouldn't use a different hash algorithm for HMAC than for other hashing, because that increases the amount of security-critical code you have, which directly increases the chance that there's a bug in your security-critical code.

tptacek · on May 10, 2019

It's pretty unlikely that you're going to have a bug in your SHA1 and SHA2 cores, for whatever that's worth to you. Virtually nobody implements them by hand.

kbwt · on May 12, 2019

Anecdotally I know of one very popular application that has a bug in their SHA-1 implementation causing it to effectively only perform a single round.

wolf550e · on May 12, 2019

Would that make it only interoperable with itself, and not have the security a correct implementation of SHA1 provides? Is it a security bug (i.e. is there anything worth stealing by breaking their not-quite-SHA1-hash)?

kbwt · on May 13, 2019

The application is using a proprietary client/server protocol, so it already lacks lacks any kind of interoperability.

In this specific case, it's unclear whether the bug has direct security implications. The broken SHA-1 is used on some user-controlled data that gets XORed onto the server's decryption of a user-specified payload before being passed into an RC4 key schedule. It's certainly plausible that this might produce a server-assisted privacy compromise of other users' sessions.

CiPHPerCoder · on May 10, 2019

Quite right.

tzs · on May 10, 2019

I didn't see any date in that paper. Is it new? (The URL suggests 2019, but that could just be when it was put on that server, not when the paper was written).

lelf · on May 10, 2019

It is, https://eurocrypt.iacr.org/2019/program.html