
Linus on Git and SHA-1 - dankohn1
https://plus.google.com/+LinusTorvalds/posts/7tp2gYWQugL
======
joatmon-snoo
The actual mailing list discussion thread can be found here, and is infinitely
more informative than any of the bull being spouted in this thread:
[http://public-
inbox.org/git/20170226004657.zowlojdzqrrcalsm@...](http://public-
inbox.org/git/20170226004657.zowlojdzqrrcalsm@sigill.intra.peff.net/T/#t)

~~~
azernik
My takeaway from all that - the git project is internally taking steps to
mitigate vulnerabilities from this particular attack (making it harder to
insert the arbitrary binary data necessary into git metadata), but a) is just
throwing up their hands at the problem of projects that store binary blobs
like image data in their repos, and b) is not taking this as a signal that
more serious sha-1 attacks are on the horizon and they should speed up their
hash-replacement efforts.

This latter leads into the problems with Linus' positions in particular. In
that thread, does not take seriously the threats that this poses to the
broader git userbase, because he only seems to care about the kernel use-case:
trusted hosting infrastructure at kernel.org (itself an iffy assumption, given
previous hacks and the use of mirrors), and the exclusive storage of human-
readable text in the repo which makes binary garbage harder to sneak in. These
do not apply to most users of git. His rather extreme public position
(paraphrased, "our security doesn't depend on SHA1") is even more troubling -
it absolutely _does_ depend on SHA1, this just isn't (yet) a strong enough
attack to absolutely screw over the kernel. A stronger collision attack (eg a
chosen-prefix as opposed to identical-prefix, or godforbid a pre-image attack)
would _absolutely_ invalidate the whole git security model.

~~~
Chyzwar
SHA1 is only used for uniqueness. You still need to have write access to repo
to perform attack. If you already have write access you do not need to make
collision...

~~~
azernik
For this particular attack - what if you do not have write access, but have
sufficient social capital to get a pull request merged with a benign-seeming
file?

~~~
joatmon-snoo
If we're just talking about this particular attack, the two files don't even
resolve to the same SHA-1 in Git:

    
    
        $ sha1sum shattered*
        38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
        38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf
    
        $ git hash-object shattered*
        ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0
        b621eeccd5c7edac9b7dcba35a8d5afd075e24f2

~~~
acqq
As soon as somebody invests a few $100k there are going to be such files. Now
it's known how much it takes. That much.

Luckily, there is also a known solution to detect that kind of files.

------
bascule
Linus's transition plan seems to involve truncating SHA-256 to 160-bits. This
is bad for several reasons:

\- Truncating to 160-bits still has a birthday bound at 80-bits. That would
still require a lot more brute force than the 2^63 computations involved to
find this collision, but it is much weaker than is generally considered secure

\- Post-quantum, this means there will only be 80-bits of preimage resistance

(Also: if he's going to truncate a hash, he use SHA-512, which will be faster
on 64-bit platforms)

Do either of these weak security levels impact Git?

Preimage resistance does matter if we're worried about attackers reversing
commit hashes back into their contents. Linus doesn't seem to care about this
one, but I think he should.

Collision resistance absolutely matters for the commit signing case, and once
again Linus is downplaying this. He starts off talking about how they're not
doing that, then halfway through adding a "oh wait but some people do that",
then trying to downplay it again by talking about how an attacker would need
to influence the original commit.

Of course, this happens all the time: it's called a pull request. Linus
insists that prior proper source code review will prevent an attacker who
sends you a malicious pull request from being able to pull off a chosen prefix
collision. I have doubts about that, especially in any repos containing binary
blobs (and especially if those binary blobs are executables)

Linus just doesn't take this stuff seriously. I really wish he would, though.

~~~
tytso
That's _not_ the plan. That was an idea that was thrown out if this was an
emergency (it's handling different length hashes, and doing so that we don't
have to force a flag day conversion which is hard), but once people realized
that in fact, the sky was not following, the plan which Linus outlined in his
G+ post was devised --- which does not involve truncating a 256-bit hash.

~~~
bascule
Can you please link me to "the plan" then? I have been trying to follow some
of the ML discussion and that was the last plan I saw him put forth, e.g.:

[https://marc.info/?l=git&m=148787047422954](https://marc.info/?l=git&m=148787047422954)

~~~
pedrocr
By clicking "next in thread" in that very post you get Linus replying to
himself with a non-truncating plan:

[https://marc.info/?l=git&m=148787163023435&w=2](https://marc.info/?l=git&m=148787163023435&w=2)

~~~
bascule
Thanks for the link.

Re: snark, I did hit "next in thread" but managed to skim over that in his
response.

But again, thanks anyway.

(Note: this still sounds more like spitballing than "the plan", but at least
it's a step in the right direction)

------
runeks
One thing SHA-256 has going for it is that millions can be made from finding
pre-image weaknesses in it, because it's used in Bitcoin mining. If you could
"figure out" SHA-256, and use it to take over Bitcoin mining, you'd make $2M
the first 24 hours, at current rates. And if you play it wise, it could take a
long time before anyone figure out what's going on.

With regards to market price for a successful attack, I don't think any hash
function stands close to SHA-256. And for that reason I think it would be the
right choice.

~~~
wolf550e
[https://twitter.com/veorq/status/834872988445065218](https://twitter.com/veorq/status/834872988445065218)

~~~
runeks
I think the claim is interesting, and I certainly wouldn't reject it. But just
relaying someone else's claim, without any substantiating argument, I find
quite uninteresting.

~~~
wolf550e
Jean Philippe Aumasson[1] is an important currently active cryptographer who
studied this particular problem closely. If he says he does not expect an
algorithmic break of SHA2, I believe him that I don't need to plan to switch
from SHA2 to SHA3 in an hurry.

I don't have the math to understand a more detailed explanation for his
reasons to make such a statement. If you do have the math, ask him.

1 - [https://131002.net/index.html](https://131002.net/index.html)

------
maxander
I don't really get the threat model here. If an attacker is pushing commits
into your repository, you're long since toast on all possible security fronts,
right? Is there anything nefarious they could accomplish through hash
collisions that couldn't be done simply by editing commit history?

~~~
leoh
Not really. From Linus — I think the most important point that has not been
discussed extensively:

> But if you use git for source control like in the kernel, the stuff you
> really care about is source code, which is very much a transparent medium.
> If somebody inserts random odd generated crud in the middle of your source
> code, you will absolutely notice.

~~~
stinos
_If somebody inserts random odd generated crud in the middle of your source
code, you will absolutely notice_

Unless I'm misunderstanding how this would work in practice (I assume commits
being added under my name or existing commits being modified?), I'm abolutely
not buying that though. Way too general, no? I would likely notice something
like that, but I know enough colleagues who are just as likely to not notice.
Mostly because they don't fully grasp git so they're just like 'hmm, thing
says I'm behind, ha I know the fix is pull, yes I'm a git wizard' and just
pull then continue working without even checking what got changed.

~~~
ProblemFactory
Sure, most people don't review all the code that they pull.

But if you are in a position to insert "garbage data" into a commit in
preparation for an attack, it will be much easier to insert malicious but
safe-looking code in the first place. Omit an array bounds check, disable SSL
certificate validation, or anything else that looks like a mistake but will
allow you to compromise the running code later.

------
hannob
One thing that I think is worth mentioning: This was completely avoidable. Git
isn't that old, it wasn't taken by surprise by the SHA1 attacks.

The first paper from Wang et al, which should've put SHA1 to rest, was
published in 2004, the year before the first ever Git version was released. It
could have been easy: Just take a secure hash from the beginning.

------
ploxiln
If anyone is really interested in more assurance of git commit contents,
there's "git-evtag", which does a sha-512 hash over the full contents of a
commit, including all trees and blob contents.

[https://github.com/cgwalters/git-evtag](https://github.com/cgwalters/git-
evtag)

------
simias
While this post sounds very reasonable to me there's one point that I really
don't get: why does he keep saying that git commit hashes have nothing to do
with security?

If he believes that, why does git allow signing tags and commits and why does
Linus himself sign kernel release tags? Isn't that the very definition of
"using a hash for security"?

------
hackuser
Related, from Mozilla:

* The end of SHA-1 on the Public Web

[https://blog.mozilla.org/security/2017/02/23/the-end-of-
sha-...](https://blog.mozilla.org/security/2017/02/23/the-end-of-sha-1-on-the-
public-web/)

 _As announced last fall, we’ve been disabling SHA-1 for increasing numbers of
Firefox users since the release of Firefox 51 using a gradual phase-in
technique. Tomorrow [Feb 24th], this deprecation policy will reach all Firefox
users. It is enabled by default in Firefox 52._

~~~
azernik
To be fair (although I've been commenting angrily about git's continued use of
SHA-1 elsewhere) it's a lot easier for a browser to change hash algorithms
than for git.

------
luckydude
Linus is a little behind the times with this comment:

``Other SCM's have used things like CRC's for error detection, although
honestly the most common error handling method in most SCM's tends to be
"tough luck, maybe your data is there, maybe it isn't, I don't care".''

BitKeeper has an error detection (CRC per block) and error correction (XOR
block at the end) system. Any single block loss is correctable. Block sizes
vary with file size so large files have to lose a large amount of data to be
non-correctable.

------
butwhynotmore
In the specific case of cryptography where it's unknown how bulletproof the
algorithm will be why not use multiple hash functions? Perhaps using the top
10 best hash functions of the day. That way you're not putting all your eggs
in one basket and if nefarious collisions are able to be created in the future
you still have the other hash functions to both "trust" and double check
against. It's even more unlikely that nefarious collisions will be able to be
constructed that collide all the other hash functions as well. You could just
append the hashes to each other or put them in a hash table or something.
Maybe my computer science is not up to snuff but it seems like this would
provide more resiliency against future and non-public mathematical
breakthroughs as well as increased computing power such as quantum computing.
Yes, it would take a little longer to compute all the hashes in day to day
use, but with the benefit of a more robust system both now and in the future.

------
theseoafs
Have there been writings on what exactly git's migration strategy to a new
hash function will be? Apparently they have a seamless transition designed
that won't require anyone to update their repositories, which seems like a
pretty crazy promise in the absence of details.

~~~
keeperofdakeys
In git the SHA-1 hash is simply an identifier for an object - it's used in the
filename, but not stored in the object. And when a commit or tree object
references others, it's just a name that can be looked up in the database. So
a commit object hashed with SHA-256 can easily reference a previous commit
that was hashed with SHA-1.

During the switch, a bit of deduplication may be lost. But the only
interesting issue I can see is how git fsck will tell which hash an object was
created with when verifying the hash (maybe with length?).

~~~
iamgopal
Git update repo kind of command may be ??

------
claar
Also see discussion of Linus's earlier comments at
[https://news.ycombinator.com/item?id=13719368](https://news.ycombinator.com/item?id=13719368)

------
godzilla82
Newbie question .. can some one please help me understand the attack scenario.
if I, as the attacker, want to inject malicious code/binary into a git repo,
then I need to write my malicious code/binary in such a way that the resultant
hash collides with one of the commits (? Or the last one?) in the repo. Is
this correct?

------
jmount
Probably isn'y the sky falling. But if knowing the length fixed all hash
function issues then cryptographic hashes would just use a some more bits for
length.

------
frik
Can someone correct me. SVN/Subversion and GIT are affected by SHA-1 problem.
SVN uses SHA-1 internally, but exposes only a numeric int as revision. GIT
uses SHA-1 internally and as revision. So if someone commit a modified PDF
that collides he can run havoc on both SVN and GIT at the moment. It seems
easier to fix the issue in SVN than GIT.

~~~
hannob
It's a somewhat different issue.

Git can probably not be havoced by committing two colliding files (and doing
so would require doing another chosen prefix attach with a git blob header).
But git looses cryptographic integrity promises due to this attack (aka: you
can have different source trees with different histories leading to the same
top commit hash). svn never had any cryptographic integrity to begin with.

------
yuhong
I do wonder how many outside of crypto circles know about SHA-2 circa 2004.

------
colin_fraizer
This, btw, is why we have e-cigarette bans. The fact that the generally high-
IQ, paid-to-think-about-subtle-categorization community of software developers
needs to be inoculated against the "I Heard SHA-1 Was Bad Now" meme, should
serve as a reminder for why most things should not be managed by democracy.

(Yeah, I know this will be read as a plea for monarchy and downvoted. It
simply proves my point: people are WAY too subject to errors in the classes
(1) "I hate him because he said something 'bad' about something 'good'." and
(2) "I hate him because he said something 'good' about something The Tribe now
knows is 'bad.')

~~~
grzm
Save yourself some downvotes and remove the mention that you expect them.

~~~
colin_fraizer
Save myself from people proving my point? Why?

~~~
grzm
The HN guidelines specifically ask not to express the expectation of
downvotes. You may be downvoted purely for ignoring the guidelines, regardless
of the rest of your comment.

 _Please don 't bait other users by inviting them to downvote you or proclaim
that you expect to get downvoted. _

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
colin_fraizer
A generally reasonable guideline, but in this case, I am actually criticizing
the tribalism that makes people rise to that bait.

It similarly leads to the discussion of how "I can't believe Linus is trying
to defend SHA-1 when The Tribe already knows it is cryptographically 'bad'."

~~~
grzm
If you are taking the view that you're expecting downvotes to prove the point
that people who are trying to uphold community standards are doing so blindly
or ignorantly, you'll very likely think you're proven correct when you do
receive downvotes. Can you blame them? You're explicitly flaunting the
guidelines they choose to abide by while telling them they're wrong to do so
_in your special case_.

~~~
recursive
ahem... "flouting"

~~~
grzm
You're right, of course. Thanks!

------
dboreham
Um what? Software written in the past 20 years has a baked-in assumption that
the length of some ID can't change?

------
debatem1
I'm mystified as to why this is even a discussion.

SHA1 is busted. That impacts some git users. The fix is not invasive. Fix the
bug. Make the transition. Move on.

Super unprofessional.

~~~
AsyncAwait
It's not that simple. Git is a widely used software integrated into many
places, so keeping some backwards compatibility is important.

Just going ahead and start breaking things would _really_ be unprofessional.

~~~
snakeanus
They did break backwards compatibility with git v2.0 but sadly they did not
bother to change the hash function.

~~~
joatmon-snoo
What backwards compatibility was broken?

~~~
Kubuxu
Little bits in CLI interface [https://blogs.atlassian.com/2014/06/happened-
git-2-0-full-go...](https://blogs.atlassian.com/2014/06/happened-git-2-0-full-
goodies/)

