
Freestart collisions for SHA-1 - dchest
https://sites.google.com/site/itstheshappening/
======
mdewinter
My side project tries to give secure default settings for all major webservers
and other software (like haproxy, mysql, mailservers etc):
[https://cipherli.st/](https://cipherli.st/)

From the start it has listed the suggestion to set up >SHA256 keys.

If you want to test your site for a SHA 1 cert, you can try my other side
project: [https://ssldecoder.org/](https://ssldecoder.org/) \- you can also
use the SSL labs test but mine is faster for just testing certificate type.
(And it's open source, so you can use it internally as well).

Mozilla also has a good wiki page for SSL recommended settings:
[https://wiki.mozilla.org/Security/Server_Side_TLS](https://wiki.mozilla.org/Security/Server_Side_TLS)

~~~
praseodym
Mozilla has a similar tool: [https://mozilla.github.io/server-side-tls/ssl-
config-generat...](https://mozilla.github.io/server-side-tls/ssl-config-
generator/)

------
m1el
Does this mean that it is now possible to create a circular data structure in
git? :)

~~~
coldpie
The big take-away from the article seems to be:

> Even though freestart collisions do not directly lead to actual collisions
> for SHA-1, in our case, the experimental data we obtained in the process
> enable significantly more accurate projections on the real-world cost of
> actual collisions for SHA-1, compared to previous projections. Concretely,
> we estimate the SHA-1 collision cost today (i.e., Fall 2015) between 75K$
> and 120K$ renting Amazon EC2 cloud computing over a few months.

So if I understand their estimates correctly, it would cost around $100,000
and several months to create your circular data structure. (Or, somewhat less
trivially, compromise Git's SHA1 integrity promise in a still-probably-useless
way.)

~~~
joeyh
There's a way to embed arbitrary garbage invisibly inside git commit objects.
Just put your garbage after a NUL and nothing will display it as part of the
commit message. Using that to generate two commit objects that collide is the
most likely attack method.

Exploiting the result will involve some social engineering. Starting with
getting one of the colliding objects accepted into the repo you want to
attack.

At this point, it's cheaper to generate a SHA1 collision than it will be to
fix git not to use SHA1. Which is deeply worrying.

Basic hygiene at this point probably includes only merging git commits from
others that are gpg signed (as well as gpg signing as many commits yourself as
you can without going mad at the password prompts). Unfortunately, tooling
doesn't make this easy, and some things like git format-patch are actively
unhelpful by not preserving gpg signatures.

~~~
agwa
How does GPG-signing help, when the signature is over a SHA-1 hash?

~~~
joeyh
Thanks, I forgot that git got that wrong :/

Edit: Actually, looking at the code (do_sign_commit), git appears to gpg sign
the whole commit object.

I think it's in signed tags where git only signs the sha1 being tagged.

~~~
agwa
Yeah, I just emerged from the Git code myself ;-)

So you're correct that GPG-signing commits (but not tags) prevents collisions
in commit objects. The problem though is that a commit ultimately contains a
SHA-1 hash of a tree object, so now the concern is someone generating
colliding tree objects.

Edit: fortunately, the format of tree objects looks pretty rigid. I feel
somewhat reassured, but only somewhat.

~~~
joeyh
[http://joeyh.name/blog/entry/size_of_the_git_sha1_collision_...](http://joeyh.name/blog/entry/size_of_the_git_sha1_collision_attack_surface/)
has some discussion of colliding tree objects.

------
devit
Is the very accessible 120K$ estimate just for an arbitrary collision, or is
it also possible to cheaply obtain a collision with chosen start or end, or a
preimage, or a preimage with chosen start or end?

~~~
coldpie
I have only a little crypto experience (uni courses and as a light hobby), so
I could be wrong. My understanding is that a "collision attack" is a well-
defined term, where the attacker chooses both preimages. An attack where you
are given a preimage and a hash and must compute a second preimage with the
same hash is called a "second preimage" attack. A "first preimage" attack
would be finding a message that computes to a given hash.

There are different attack scenarios for each attack type, one isn't strictly
a subset of another.

The paper given here describes a collision attack, so they chose both messages
that result in the same hash. Further, they also generated different IV values
for each SHA1 algorithm, while in practice the IV value is fixed. This is what
"freestart" means.

I found this useful reference:

[http://cstheory.stackexchange.com/questions/585/what-is-
the-...](http://cstheory.stackexchange.com/questions/585/what-is-the-
difference-between-a-second-preimage-attack-and-a-collision-attack)

I hope this answers your question.

------
JosephRedfern
Does the algorithm solve the problem of "find me any two pieces of data that
produce a clash" (thereby taking advantage of the Birthday Paradox), or does
it say "given this piece of data, output data who's SHA-1 sum is the same"?

I assume that the second problem would be harder to solve - but I'm not sure
by what order of magnitude. If it's the former, what are the practical
applications of the attack?

~~~
ddlatham
The first problem is called _collision resistance_. The second is called
_second pre-image resistance_.

[https://en.wikipedia.org/wiki/Cryptographic_hash_function#Pr...](https://en.wikipedia.org/wiki/Cryptographic_hash_function#Properties)

~~~
jobigoud
So, what is a _freestart collision_?

The two messages from OP have many bytes in common. I would expect a hashing
function like SHA-1 to give wildly different outputs given similar inputs due
to the avalanche effect.

~~~
jebed091
It's a collision if you can choose the IV. That's not much practical use, it's
just an interesting demonstration that confirms research is slowly progressing
towards a real collision.

SHA-1 uses a Merkle-Damgard construction, which means the input is split into
fixed-size blocks and the output from one block becomes the IV for the next.
This is because SHA-1 operates on a fixed-size, so if you have more than 1
block of data, it's chained together like this:

    
    
      char *SHA1(char *IV, char *block);
    
      Digest = SHA1(SHA1(SHA1(StartIV, block1), block2), block3);
    

A freestart collision is a collision if you can choose the IV. That's not much
use, because we don't know how to find a block (or chain of blocks) that
produce the IV we want.

~~~
kkl
To make sure I understand correctly:

Suppose I had the magical ability to find a special value that would equal the
IV listed in the link (i.e. 506b0178ff6d1890....) when that special value was
applied to SHA-1. I could then use the results from this research to compute a
collision using input that is prefixed with that special value?

~~~
dunkelheit
Yes. If you are able to construct P1 such that SHA1(P1) = IV1 and P2 s.t.
SHA1(P2) = IV2, then

    
    
        SHA1(P1 + padding(P1) + M1) = SHA1(P2 + padding(P2) + M2) = f020486f071bf11053547a86f4a7153b3c950f4b
    

where padding(M) is the padding appended to message M as the first step of
computing SHA1.

(edit: clarified padding)

~~~
kkl
Thanks for clarifying. That is a really concise explanation.

------
colinbartlett
Can someone with more crypto experience than me explain what a "freestart
collision" is?

~~~
tptacek
Every SHA1 hash starts with an initalization vector, which is a kind of ur-
hash:

    
    
        67452301EFCDAB8998BADCFE10325476C3D2E1F0
    

The function of SHA1 is to transform this ur-hash into the hash of a specific
stream of data.

In a freestart collision, attackers start from something other than the
initialization vector. Think of it as if they're starting from the hash of
some piece of data and then hashing more data into it (that's essentially what
they'd be doing).

~~~
fla
But the standard IV is hardcoded in the sha libraries right ?

~~~
sdevlin
Yes. It's usually not even accessible through the hash function's exposed
interface.

------
andrewflnr
So I know we've been dis-recommending SHA1 for a while, but does this mean
we've officially entered "hell no!" territory? At what point do we consider it
"broken" and/or tell our PHBs to delay features while we transition away from
it?

------
ddlatham
Are there any known sha-1 collisions yet? If not, and the cost is that cheap,
I'm surprised.

~~~
dunkelheit
I guess it means that they are probably known, it is just not public
knowledge.

