Just a curiosity, since people are talking about Git still using SHA-1 (despite ...

rst · on Jan 7, 2020

Well, according to that reference, it's hardened against a specific, previously known attack. Do you have any information on whether that also protects against the different, new attack which was just published?

tialaramex · on Jan 7, 2020

Not so much the specific attack, as the broad class of attacks. I think this new work is in that same broad class but I am not a mathematician.

The idea in Marc Stevens' anti-collision work is that some inputs are "disturbance vectors" which do unusual things to the SHA-1 internals, and we want to detect those and handle that case specially since there is almost no chance of that happening by accident. It has a list of such vectors found during his research.

This paper doesn't talk about "disturbance vectors" but it does discuss ideas like "Boomerangs" which I think ends up being similar - I just don't understand the mathematics enough to know whether that means "the same" or not.

jlokier · on Jan 7, 2020

I was wondering the same thing, and hoping someone else would answer that.

joeyh · on Jan 7, 2020

Hardened sha1 does detect this new attack. Easy to test: Check their pair of files into a git repo and see that they have different checksums, while sha1sum(1) generates the same for both.

mzs · on Jan 7, 2020

checks-out, thanks

    $ mkdir sha1
    $ cd sha1
    $ curl -O https://sha-mbles.github.io/messageA
    ...
    $ curl -O https://sha-mbles.github.io/messageB
    ...
    $ echo foo > bar
    $ echo foo > baz
    $ openssl sha1 *
    SHA1(bar)= f1d2d2f924e986ac86fdf7b36c94bcdf32beec15
    SHA1(baz)= f1d2d2f924e986ac86fdf7b36c94bcdf32beec15
    SHA1(messageA)= 8ac60ba76f1999a1ab70223f225aefdc78d4ddc0
    SHA1(messageB)= 8ac60ba76f1999a1ab70223f225aefdc78d4ddc0
    $ git init
    Initialized empty Git repository in ...
    $ git add *
    $ git commit
    [master (root-commit) b274c88] sha1 collision test
    ...
     4 files changed, 2 insertions(+)
     create mode 100644 bar
     create mode 100644 baz
     create mode 100644 messageA
     create mode 100644 messageB
    $ git ls-files -s *
    100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 0 bar
    100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 0 baz
    100644 5a7c30e97646c66422abe0a9793a5fcb9f1cf8d6 0 messageA
    100644 fe39178400a7ebeedca8ccfd0f3a64ceecdb9cda 0 messageB
    $

Thorrez · on Jan 8, 2020

No, you and joeyh are incorrect about the test (but correct about the result). As can be seen in the output, SHA1(bar)= f1d2d2f924e986ac86fdf7b36c94bcdf32beec15 but git_SHA1(bar) = 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 . Why is there a difference? Not because of hardened SHA1. Hardened SHA1 essentially always produces identical outputs to SHA1

> git doesn't really use SHA-1 anymore, it uses Hardened-SHA-1 (they just so happen to produce the same outputs 99.99999999999...% of the time).[1]

https://stackoverflow.com/questions/10434326/hash-collision-...

There's essentially no chance that the string "foo\n" fell into that tiny probability of difference. The reason there's a difference is because before git hashes something, git will do various processing to it (maybe appending and prepending various things) and those things broke the carefully created collision. But a chosen-prefix attack might mean those various things can be accounted for, and a collision could still be found.

So we need to directly run hardened SHA1 on the data, which I believe is located at https://github.com/cr-marcstevens/sha1collisiondetection

As seen in https://github.com/git/git/blob/master/sha1dc_git.c

So I tested that one:

    $ sha1collisiondetection-master/bin/sha1dcsum bar baz messageA messageB shattered-1.pdf shattered-2.pdf
    f1d2d2f924e986ac86fdf7b36c94bcdf32beec15  bar
    f1d2d2f924e986ac86fdf7b36c94bcdf32beec15  baz
    4f3d9be4a472c4dae83c6314aa6c36a064c1fd14 *coll* messageA
    9ed5d77a4f48be1dbf3e9e15650733eb850897f2 *coll* messageB
    16e96b70000dd1e7c85b8368ee197754400e58ec *coll* shattered-1.pdf
    e1761773e6a35916d99f891b77663e6405313587 *coll* shattered-2.pdf

So it does protect against the new attack.

mzs · on Jan 8, 2020

I really appreciate this, thanks!

FactolSarin · on Jan 7, 2020

How would an attack on a git repo work? You create a repo with identical hashes but different content and next time the user clones from scratch they get your modified version?

flatiron · on Jan 7, 2020

yeah my thoughts about git are similar. look at the two messages they have an an example:

Key is part of a collision! It's a trap!yE'NsbK#ދW]{1gKmCx's/vr| -pJO_,1$)uB1qXv#U)9ESU;p~0G:Y ݕbBIjFra눰3&t'lB_!h5M([,˴QMK#|o5pv|i,+yYpݍD7_Rf\'GUZ,ϵdvAYAugV=Lk8_E 2 +nolBtxXoQt&+?Y3LP:'Qt(,ۛuԪWJm:A"M6<|B4kVv̨ޠA=M+m%殺j5N|EMA\Ed- s&@u@:a?pq^Xf0U?R}

and

Practical SHA-1 chosen-prefix collision!'lka}vbI3,·W]Ǟ+gK}Cxs/v&r| }-hRJO_ rO̳;bzC ,1&uRP-MXrU3aO;pr0:sY'2 l&r7#(A{oNyCJ_W,8 əbحBYީpFr2a8#&t+n_15q(_,ˤQMW#hzYMgVV=L,kO0E*N +oc@BpXoᯖd&?+?[{3LвP&'U t ( WJÏm\:A"6>>|SB(k;Vv̨ޠ^A=Y ;om%j-|cUAAۜEТ&@o@:La3psH^eXf0QJm ݶd

they have the same sha1sum, but in all practicality its nonsense since both messages are pure trash. you couldn't have malicious C code that would have the same hash as non malicious C code in this example

saalweachter · on Jan 7, 2020

Isn't that like incredibly simple?

Dump your garbage string behind a // or inside an #if 0, restrict the garbage string character set to characters which will not disturb that, and your compiler will whistle while it works.

pathseeker · on Jan 7, 2020

Depends on if the chosen prefix attack allows the content to appear arbitrarily in the middle of the byte stream like that.

Thorrez · on Jan 8, 2020

That's exactly what a chosen prefix attack means. You choose the arbitrary prefixes. Then the garbage is inserted. Then (due to SHA1's Merkle–Damgård construction) you append a postfix that's mostly arbitrary (but the same in both files).

flatiron · on Jan 7, 2020

anyone checking diffs would notice that, or working on the file, etc. it wouldn't survive long

munk-a · on Jan 7, 2020

I think active projects would detect this fine - but what if that commit was pushed to lpad and everyone ended up pulling it to local because it's a dependency of a dependency of a dependency in NPM?

Or what if it's a really obscure library for parsing like... pyramidal jpeg2000s, are the library consumers going to be checking the source? Heck, most people already don't check download checksums unless their downloader does it automatically.

saalweachter · on Jan 7, 2020

Hmmm, does the garbage string actually have to survive long?

If there's a followup CL to "delete a garbage string that accidentally made it into the repo", which doesn't actually fix whatever else was added, would that get you anywhere?

munk-a · on Jan 7, 2020

If you could push up a commit that computed to the same hash of the last tagged release in a repo... I'm not certain, the tag might end up referencing the new object? Certain versions of git (i.e. maybe git for windows) may also react in different manners.

In theory you might get people building software packages for distros to build your malicious version, you may also just temporarily shut down the ability for anyone to check out the version (basically denial of service for making?) but the time window would be weird.

thenewnewguy · on Jan 7, 2020

You'd probably be most successful modifying the original repo - either by being the creator of the software or gaining their trust. However, it would have to be a rather powerful SHA1 attack for the commit to still be valid syntax, hard to detect, and make a meaningful malicious change.