
Linus' reply on Git and SHA-1 collision - sampo
http://marc.info/?l=git&m=148787047422954
======
notfed
Pertinent facts for the worried:

1) Git doesn't rely on SHA-1 for security. It relies on HTTPS, and a web of
trust.

2) Even if git did rely on SHA-1, there's no imminent threat. What happened
today was a SHA-1 collision, not a preimage attack. If a collision costs 2^n,
a preimage attack costs 2^(2n).

3) Even if someone managed to pull off a preimage attack, creating a
"poisonous" version of one your git repository's objects, they'd still have to
convince you pull from their repo. This requires trust.

4) Even if you pulled it in, your git client would simply ignore their
"poison" object, because it would say, "oh, no thanks, I already have that
object". At worst, the code simply wouldn't work. No harm would be done.

When it comes to git, an attacker's time is better spent creating a secret
buffer overflow than wasting millions of dollars on a SHA-1 collision.

~~~
Perseids
I don't like the living-on-the-edge-attitude that Linus and others here
promote regarding Sha1 in git. First, attacks only get faster over time. What
costs millions today is likely to be achievable on commodity hardware in the
coming years. Second, attacks only get more flexible over time. A contrived
collision on MD5 in 2004 got perfected to a single block collision in 2010
[1]. Third, devising an update strategy and rolling it out takes time. I can't
guess how much hardwired use of Sha1 is in Github.

Fourth, people use git in creative ways. Linus may think it is a cardinal sin
to commit binary blobs in a git repository, but I can't imagine I'm the only
one using git as a poor man's backup and file sharing solution.

And last but not least, relying on Sha1 takes effort of constantly asserting
its use in Git is still secure. Support request to that end will skyrocket
from now on, both of the constructive kind, like the technically concerned
coworker ("but isn't git insecure now that Sha1 is broken"), and of the
regulatory kind ("if you use Sha1, MD5, … please fill out these extra forms
explaining why your process is still eligible for certification with ISO norm
foobar").

Since we have to migrate away from Sha1 at some point in the future, I'd like
it to be sooner rather than later.

[1] See Wikipedia for a timeline and references:
[https://en.wikipedia.org/wiki/MD5#History_and_cryptanalysis](https://en.wikipedia.org/wiki/MD5#History_and_cryptanalysis)

~~~
maligree
"On the edge"

"A contrived collision on MD5 in 2004 got perfected to a single block
collision in 2010 [1]" so they'd have at least years to fix it were they using
md5?

He says they'll migrate, but it's no reason to go crazy. If anything, calmness
of this sort is what we need more of (this industry, anyway... we go crazy
about stuff way too much).

~~~
Florin_Andrei
As someone who has done "security" full time before - there's nothing worse
than the "Security By Jumping Up And Down Like An Excited Monkey" policy.

Fix what's broken, no doubt. But stay rational and look at the problem from
all perspectives.

------
paulddraper
Linus has toned down a lot from a decade ago.

> You are _literally_ arguing for the equivalent of "what if a meteorite hit
> my plane while it was in flight - maybe I should add three inches of high-
> tension armored steel around the plane, so that my passengers would be
> protected".

> That's not engineering. That's five-year-olds discussing building their
> imaginary forts ("I want gun-turrets and a mechanical horse one mile high,
> and my command center is 5 miles under-ground and totally encased in 5
> meters of lead").

> If we want to have any kind of confidence that the hash is reall
> yunbreakable, we should make it not just longer than 160 bits, we should
> make sure that it's two or more hashes, and that they are based on totally
> different principles.

> And we should all digitally sign every single object too, and we should use
> 4096-bit PGP keys and unguessable passphrases that are at least 20 words in
> length. And we should then build a bunker 5 miles underground, encased in
> lead, so that somebody cannot flip a few bits with a ray-gun, and make us
> believe that the sha1's match when they don't. Oh, and we need to all wear
> aluminum propeller beanies to make sure that they don't use that ray-gun to
> make us do the modification _outselves_.

> So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that
> you can generate an object that looks halfway sane and still gets you the
> sha1 you want. Even the "breakage" doesn't actually do that. And if it ever
> _does_ become true, it will quite possibly be thanks to some technology that
> breaks other hashes too.

> I worry about accidental hashes, and in 160 bits of good hashing, that just
> isn't an issue.

[http://www.gelato.unsw.edu.au/archives/git/0504/0885.html](http://www.gelato.unsw.edu.au/archives/git/0504/0885.html)

~~~
zkms
> "what if a meteorite hit my plane while it was in flight - maybe I should
> add three inches of high-tension armored steel around the plane, so that my
> passengers would be protected"

I think this is a shockingly good example of how smart people get security
questions utterly wrong. The right analogy when it comes to security has to
involve some type of adversary, not just random, unmotivated natural phenomena
-- as long as we're using aircraft analogies, it's not so much "a meteorite
might randomly hit my plane in flight" as "there's angry and armed people
shooting at my plane with armour-piercing ammunition".

Indeed, in the real world, putting hundreds of kilograms of armour on aircraft
isn't the absurd/childish idea Linus seems to think it is:

[https://en.wikipedia.org/wiki/Fairchild_Republic_A-10_Thunde...](https://en.wikipedia.org/wiki/Fairchild_Republic_A-10_Thunderbolt_II#Durability)

~~~
DiabloD3
The A-10 isn't a plane, it's a gun with a plane wrapped around it.

And it hunts tanks.

~~~
hermitdev
I'm sad that the AF wants to retire this plane. This plane was defined with
one purpose: build a plane around this gun, that being a 30mm autocannon
firing depleted uranium shells (not radioactive) that fires at such a rate
that it retards the velocity of the plane carrying the gun, and would melt the
gun if it fired from full to empty continuously.

The plane was built for survivability. It can withstand an engine being shot
off (why the engines are external on "pods"). The cockpit is (I think)
surrounded by a 2 inch titanium tub.

I'm not sure the JSF, i.e. F-35 will be capable of taking over this role, as
is intended.

~~~
MereInterest
As a side note, depleted uranium is certainly still radioactive. "Depleted"
refers to the percentage of U-235, the isotope used for making nuclear
weapons. Only about 0.7% of naturally occurring uranium is U-235, with the
rest being mainly U-238. U-238 emits alpha particles, with a half life of
about 4 billion years.

The radioactivity is unrelated to its use as bullets, which relies on its high
density, but it is radioactive.

------
chx
Extremely relevant discussion on stackoverflow from 2012 on how would git
handle a SHA-1 collision, someone called Ruben changed the hash function to be
just 4 bits padded with zeroes and checked what git actually does on
collisions:
[http://stackoverflow.com/a/34599081/308851](http://stackoverflow.com/a/34599081/308851)

------
Strom
Downloading the PDFs [1] and comparing their sizes takes less than a minute.
They're the exact same size. Yet here we have Linus making one bet after
another that size has to be different for this attack.

Now to be fair, he also keeps repeating that he _hasn 't seen the attack yet_.
Which leads me to question why is this post interesting to HN? Is it to show
how Linus aimlessly speculates and gets his guesses wrong?

\--

[1] [http://shattered.it/](http://shattered.it/)

~~~
more_original
The PDFs have the same size, but they do not have a header _in the file_ that
states their overall size. If PDF had a header at the beginning of the file
that states the file size, then it could be harder to find a collision. From
what I understand, the attack works by inserting garbage data after a fixed
file prefix and before a fixed file suffix (anyone please correct me if I'm
wrong).

~~~
acqq
> If PDF had a header at the beginning of the file that states the file size,
> then it could be harder to find a collision.

No. It doesn't change anything if the size is in the PDF header. The size of
both PDFs are the same, the header of both PDF files is the same on the both
"shattered" files now.

What Linus says is that if you tried to put these two PDF files in git, it
would not see them as the same, as git calculates the sha1 differently. But
Google would be able to produce two PDF files that would, as git sees them,
appear to be same just as easy as these that were produced.

P.S. (answer to your answer to this message) Note, You wrote one level above

> If PDF had a header at the beginning of the file that states the file size,
> then it could be harder to find a collision.

And I argued that it isn't harder, but irrelevant.

From your answer:

> But to generate a collision with a different prefix q one would have to do
> the expensive computation all over again

Yes. Now read what your claim was again. It's not harder. Exactly as easy as
the first time.

~~~
more_original
> But Google would be able to produce two PDF files that would, as git sees
> them, appear to be same just as easy as these that were produced.

Right, but they would have to re-do their enormous calculation. ("This attack
required over 9,223,372,036,854,775,808 SHA1 computations.")

Google started with a common prefix p (the PDF header), then computed blocks
M11, M12, M21 and M22, such that (p || M11 || M21 || S) and (p || M12 || M22
|| S) collide for any suffix S. Given p, M11, M12, M21 and M22, anyone can
make colliding PDFs that show different contents quickly. But to generate a
collision with a different prefix q, e.g. one including the file size, one
would have to do the expensive computation all over again, I think.

Note: I'm not trying to argue that SHA-1 can be made secure with padding. I
was just trying to say that the statement "The PDFs have the same size" misses
the point.

------
cpeterso
Here is Mercurial's response to the SHA-1 attacks: "SHA1 and Mercurial
security: Why you shouldn't panic yet."

[https://www.mercurial-scm.org/wiki/mpm/SHA1](https://www.mercurial-
scm.org/wiki/mpm/SHA1)

------
stickfigure
Several years ago I worked on a security product that used git as a sort of
tripwire-type database. Since SHA1 was considered inadequate for Real
Security, we had to hack jgit to use SHA256. It took a stupid amount of work -
the 160-bit hash size was scattered all over the codebase in countless magic
numbers. But it worked.

The product was cancelled. I always wondered if the patch would be of any use
to anyone.

~~~
kqr
160 bits are still quite many. You could have done what Linus suggests and use
a better hash but truncate it. If it's a good hash, a truncation of it should
still be good (modulo the fewer number of bits, of course.)

~~~
hvidgaard
That is jerry rigging it. If a stronger hash function is known to be secure,
you throw all that confidence out the window if you truncate it. At best you
reduce the brute force complexity, at worst you enable pre-image attacks.

~~~
Nutomic
This is wrong. There is no faster way for preimage attacks with truncated
SHA-2.

[http://crypto.stackexchange.com/questions/9435/is-
truncating...](http://crypto.stackexchange.com/questions/9435/is-truncating-a-
sha512-hash-to-the-first-160-bits-as-secure-as-using-sha1)

~~~
hvidgaard
I never said that, at all. I explicitely say

    
    
      At best you reduce the brute force complexity, at worst you enable pre-image attacks.
    

One thing I hate about crypto talk is statements like this

    
    
      So, truncating one of the SHA-2 functions to 160 bits is around 2^20 times stronger when it comes to collision resistance.
    

Which is all too broad. What if SHA-1 is down to 2^10, is truncated SHA-2
2^30? Does it mean we have proved that no weakness exist in SHA-2? A correct
statement would simply be that no known attack exists on truncated SHA-2 yet.

------
mcbits
In 20 years, the $100,000 attack will be a $100 attack (or perhaps a $1
attack), but programmers of the day will be overwhelmed with fixing all the
32-bit timestamps that everyone ignored for 50 years because the clearly
forecast problem hadn't blown up in their faces quite yet.

~~~
otabdeveloper
> In 20 years, the $100,000 attack will be a $100 attack (or perhaps a $1
> attack)

No. Moore's Law has been dead for years and will never come back. The benefits
we saw in recent years came from people figuring out how to compile code for
SIMD processors like GPU's, not faster or cheaper silicon.

~~~
hvidgaard
Moore's law still holds, and is expected to hold true until at least 2025 (see
wikipedia). I don't think it will be done by then, but that is just guess
work.

~~~
olavgg
How much faster is a cpu from 2012 and a cpu from today when you compare them
by single thread performance @ 3 ghz.

~~~
hvidgaard
Moore's law is not about single thread performance, and attacks like this are
easily parallelized anyway. Not to mention that fixing them at 3Ghz is just
trying to coherce the conslusion to be that we have seen little gain in the
last 5 years.

------
vog
So many paragraphs in the beginning, just to finally read:

 _> Do we want to migrate to another hash? Yes._

Wouldn't all that time trying to explain away the SHA-1 issues be better spent
on developing a safe transition plan? Work on this could have started long
ago, and if it would have started, going from SHA-256 to SHA-512 to SHA-3 to
... would be a no-brainer by now.

In the simplest case, ensure that all newly created git repositories work woth
SHA-256 by default (or SHA-512, or whatever), and switch back to SHA-1 for old
repositories.

In the more advanced case, provide the possibility for existing repositories
to have multiple hash values (SHA-1, SHA-256) for every blob/commit, then
phasing out client support for old hashes as time goes on. When some SHA-1
collision happens, those who use newer git versions would notice and keep
having a consistent repository.

If all those different browsers and web servers were able to coordinate a
SSL/TLS hash transition SHA-1 to SHA-256, then a protocol like git with
roughly 2 widespread implementations should be able to do that, too.

~~~
majewsky
> Work on this could have started long ago

I read through this thread yesterday, and walked away with the impression that
they _have_ started working towards a hash migration (and general
cryptoagility) already, albeit not with much priority.

------
phaemon
There is some speculation on whether Linus got it right or wrong, but I
haven't seen anyone actually test this with the shattered-1 & 2 files, so I
did.

Git sees them as different despite them having the same hash. You can test
with:

    
    
      mkdir shattered && cd shattered
      git init
      wget https://shattered.it/static/shattered-1.pdf
      git add shattered-1.pdf
      git commit -am "First shattered pdf"
      git status
      wget https://shattered.it/static/shattered-2.pdf
      sha1sum *
      md5sum *
      mv shattered-2.pdf shattered-1.pdf
      git status
    

So it doesn't see the files the same.

Apologies for those on mobile (please fix this HN!): the commands are: mkdir
shattered && cd shattered && git init && wget
[https://shattered.it/static/shattered-1.pdf](https://shattered.it/static/shattered-1.pdf)
&& git add shattered-1.pdf && git commit -am "First shattered pdf" && git
status && wget
[https://shattered.it/static/shattered-2.pdf](https://shattered.it/static/shattered-2.pdf)
&& sha1sum * && md5sum * && mv shattered-2.pdf shattered-1.pdf && git status

EDIT: Ah, of course! git adds a header and takes the sha1sum of the
header+content, which breaks the identical SHA1 trick. You can add a footer on
and they keep the same SHA1 though. Don't have time to play about with this
more just now, but try it with `cat`ing some identical headers and footers
onto the pdfs.

EDIT2: Actually, this is discussed more extensively in the other thread which
I hadn't read yet. Go there for more details:
[https://news.ycombinator.com/item?id=13713480](https://news.ycombinator.com/item?id=13713480)

------
dlubarov
> That usually tends to make collision attacks much harder, because you either
> have to make the resulting size the same too, or you have to be able to also
> edit the size field in the header.

> pdf's don't have that issue, they have a fixed header and you can fairly
> arbitrarily add silent data to the middle that just doesn't get shown.

This doesn't seem like much of an obstacle, since you can add silent data to
all kinds of files, like

\- With HTML, JS, etc. you can just add whitespace.

\- Some formats like GIF89a have variable-length comments.

\- With any media format that uses palettes, you can add extra, unused colors.

\- Just about any compression algorithm can be tuned to manipulate the
compressed size. E.g. with DEFLATE (which is used by PNG in addition to some
archive formats), you can use a suboptimal static coding rather than the
correct Huffman tree.

\- With most human-readable document formats, you can add zero-width spaces or
something similar.

~~~
xenadu02
Yes but the arbitrary data in the PDF doesn't have to be rendered.

It is much more difficult to change source code in a way that:

1) generates a collision 2) is still valid source code 3) the changes cause a
desired effect (like a backdoor) 4) has the same file size

~~~
pg314
People store all kinds of binary data in git, not just source files. E.g.
images or PDF files with specs or documentation.

------
moonshinefe
Correct me if I'm wrong, but if you're letting untrusted people push to your
git repositories, you're pretty much screwed anyway.

Given a case where someone with permission to push gets compromised and a
malicious actor can pull this sha-1 attack off, aren't there bigger problems
at hand? The history will be there and detectable or if they're rewriting
history, usually that's pretty noticeable too.

I may be totally missing a situation where this could totally screw someone,
but it just seems highly unlikely to me that people will get burned by this
unless the stars align and they're totally oblivious to their repo history. So
I guess I agree with the "the sky isn't falling" assessment.

~~~
CJefferson
The problem is I can't now download a git repository from someone I don't
trust, verify it is correct, and then publish "I, Chris Jefferson, trust git
commit abc125.... is good.". Now I would have to be sure everyone who has ever
committed to that git repository wasn't trying to do something dodgy.

I have put git commits into scripts I run on automated servers for example, to
be sure that every server runs exactly the same copy of the program.

------
robertelder
I posted this on the reddit thread, but I thought it would be interesting to
hear feedback here too:

I don't know much about git internals, so forgive me if that is a bad idea,
but what does everyone think about it working like this:

If future versions of git were updated to support multiple hash functions with
the 'old legacy default' being sha1. In this mode of operation you could add
or remove active hashes through a configuration, so that you could perform any
integrity checks using possibly more than one hash at the same time (sha1 and
sha256). If the performance gets bad, you could turn off the one that you
didn't care about.

This way by the time the same problem rolls around with the next hash function
being weakened, someone will probably have already added support for various
new hash functions. Once old hash functions become outdated you can just
remove them from your config like you would remove insecure hash functions
from HTTPS configurations or ssh config files. Also, you could namespace
commit hashes with sha1 beging the default:

git checkout sha256:7f83b1657ff1fc53b92dc18148a1d...

git checkout sha512:861844d6704e8573fec34d967e20bcfef3...

Enabling/disabling active hash functions would probably an expensive
operation, but you wouldn't be doing it every day so it probably wouldn't be a
huge problem.

~~~
citrusui
Take a look at multihash[0]. I don't know the inner workings of the program,
but I imagine it would be possible for `multihash` to periodically rehash
files (as a cron job?) when a new crypto algorithm gets introduced.

[0]:
[https://github.com/multiformats/multihash](https://github.com/multiformats/multihash)

------
almog
Do you know if git objects' size header was designed to deal with a possible
collision or does it serve another purpose as well?

Just some context - git calculates an object's name by his content in the
following way. Say we have a blob that represent a file who's content is 'Here
be dragons', then the file name would be:

    
    
      printf "blob 17\0Here be dragons\!\n" | openssl sha1
      # => a54eff8e0fa05c40cca0ab3851be5aa8058f20ea
    

So the object gets stored in
'.git/objects/a5/4eff8e0fa05c40cca0ab3851be5aa8058f20ea'

~~~
Kubuxu
Also git compresses before hashing IIRC.

~~~
andrewshadura
It doesn't.

------
hannob
The Git docs have some very specific claims about cryptographic integrity of
git:

[https://git-scm.com/about/info-assurance](https://git-scm.com/about/info-
assurance)

These claims are wrong as long as it uses SHA-1. Full stop.

It'd be really nice if git had cryptographic integrity. Not just because it'd
prevent some attacks on git repos, but because it'd make git essentially a
secure append only log. Which would be interesting, as it'd more or less
automatically give some kind of software transparency for many projects.

~~~
dromen
Some more advanced Git operations (I sadly never needed so far) can be used to
break the "append only" part, right?

Like for instance rebases?

~~~
hannob
You can always break append only logs, the point is, it's detectable.

~~~
dromen
What do you mean by detectable in the context of Git-as-append-only-log?

~~~
azernik
Mirrors that pull down the version with rewritten history will not be able to
run a fast-forward (because they will contain commits that are not in the
upstream version), and will loudly complain.

------
gsylvie
I prefer the entire thread:
[http://marc.info/?t=148786884600001&r=1&w=2](http://marc.info/?t=148786884600001&r=1&w=2)

~~~
SFJulie
the ratio of relevant new relevant to this problem information/new information
is low.

If a low signal/noise ratio is still the purpose of information then the
thread is less interesting than Linus mail:

\- if we add size it will make forgery harder \- yes SHA1 should be replaced

What linus is missing is people rewriting history. This will not be a concern
for git, but certainly will for any crypto currency relying on SHA1 in a close
future. (Hint this transaction belonged to me)

~~~
petertodd
I'm not aware of any crypto currencies that rely on SHA1.

Can you name one that does?

~~~
rurban
Bitcoin is a popular one. 2 sha-1 rounds.

~~~
Retr0spectrum
No, it uses SHA-256

------
sandov
Sorry for my ignorance, but isn't SHA-1 in git supposed to protect only
against data corruption and not against someone maliciously replacing the
entire repo?

------
CJefferson
I wish they'd thought about this in advance. So many little things (like
suffixing every hash with '1', and rejecting commits with hash values which
don't end with '1') would have made the switchover much easier to do in a
backwards-compatible way.

------
nialv7
> but git doesn't actually just hash the data, it does prepend a type/length
> field to it.

To me it feels like this would just be a small hurdle? But I don't really know
this stuff that well. Can someone with more knowledge share their thoughts?

I think Linus also argued that SHA-1 is not a security feature for git
([https://youtu.be/4XpnKHJAok8?t=57m44s](https://youtu.be/4XpnKHJAok8?t=57m44s)).
Has that been changed?

~~~
lamontcg
Yeah what is the attack here?

If you don't have permissions to my repo that already limits the scope of
attackers to people who already have repo access. At that points there's tons
of abuse avenues open that are simpler.

If someone could fork a repo, submit a pull request and push a sha for an
already existing commit and that would get merged and accepted (but not show
up in the PR on github) well that would certainly be troubling, but at that
point I'm well past my understanding of git internals as to how plausible that
kind of attack would be...

~~~
mixologic
Repo access doens't stop people from injecting code into your repository. A
pull request actually _puts_ objects into your repo, but under a different ref
than heads and tags.

1\. Go to github, do a git clone 'repo' \--mirror 2\. cd to the bare repo.git
and do a git show-ref and you will see all the pull requests in that repo. If
any of those pull requests contained a duplicate hash, then in theory they
would be colliding with your existing objects. But since git falls back to
whatever was there first, I think it would be _very_ challenging indeed to
subvert a repo. You'd essentially have to find somebody who's forked a repo
with the intention of submitting a PR, say, on a long running feature branch.

You could then submit your PR based off hashes in their repo before they do,
which would probably mean your colliding object would get preference.

Its pretty far fetched, but the vector is non-zero.

~~~
E6300
It's easier than that. Someone could create a benign patch and a malicious
patch that collides the hash. But what could they do with that?

For example, let's say it goes (letter##number represents the hashes):

State 1:

Real repo: R3 -> R2 -> R1 -> R0 (note: R3 is the head)

Benign PR: B1 -> B0 -> R1

Malicious PR: M1 -> M0 -> R1 (note: M0 = B0, but the contents are different)

State 2, after merging Benign PR:

Real repo: R4 -> R3 -> R2 -> R1 -> R0, R4 -> B1 -> B0 -> R1

If Malicious PR was merged now, Git would, I imagine, just believe that M1 is
a commit that branched off of B0, since that's where the "pointer" is at.

So, yeah, what would this actually accomplish?

------
meta_AU
Painful to read all the 'this isn't an issue because of bad reason X, Y, Z'.

Git can implement checking for easily collided data and warn the user,
potentially even look to implement the safer hash countermeasures too. The
fact that this isn't a second preimage, or that SHA1 isn't used to auth a repo
doesn't really factor in to it.

~~~
swsieber
But he doesn't say that they won't implement checks for collided data and warn
the user (not in this post anyway).

He does say that the sky isn't falling, and there are some steps they can take
to mitigate it.

Edit: or do you mean all the posts here in the comments?

~~~
meta_AU
Yes to your edit, but I can see the ambiguity.

------
droopyEyelids
gcache

[http://webcache.googleusercontent.com/search?q=cache:Syesdur...](http://webcache.googleusercontent.com/search?q=cache:SyesdurCreIJ:marc.info/%3Fl%3Dgit%26m%3D148787047422954+&cd=1&hl=en&ct=clnk&gl=us)

------
htns
This is a shocking aspect of crypto. Old standards get broken regularly, yet
people waltz around with a "it would be embarrassing to do more since no one
else does" attitude.

------
chmike
There is not mutch risk now, but git should be able to switch to another and
longer hash. Truncating another hash to 40 chars does not "fix" the problem.
It just move it into another place.

Another possibility, but this is a hack to keep key length to 40 chars, would
be to change key encoding from hex encoding to base64. In 40 chars you could
encode 240 bits instead of 160. It is preferable to get rid of the hard coded
40 char limit. It shouldn't be that hard.

~~~
theseoafs
The problem isn't the length of the text representation of the hash, it's the
length of the actual binary representation of the hash.

~~~
chmike
The comment to whitch Linus respond reports that there are 40 constants in
many places in the git code. The bit size of a SHA1 hash is 160 bits which
holds in 20 bytes. In hexadecimal, the length is 40chars. So the problem
reported in the initial comment was the text length, not the bit length.

Of course there is probably 20 hardcoded in many places too. So in this case
the bit length is an issue too. You are right. Switching to base64 encoding
would not solve the hardcoded bit length if any.

Using file hashes as file identifier doesn't look like a good idea as
suggested here
[https://valerieaurora.org/hash.html](https://valerieaurora.org/hash.html)
because hash lifetime are short. The system should support changing the hash
every year. The work required to compute hashes increases too.

------
hzhou321
What exactly is the security issue on a repository?

~~~
coliveira
I think this is an important point. If you depend on a source control tool to
handle security, you have bigger problems in life. There are way too many
things to worry about when designing a source control tool to believe that
security will be handled properly.

------
budu3
Can someone with more expertise shed mpre light on this? What does he mean?
What kind of things are they hiding in the Git commit. If git is opensource
then how can they hide anything?

> Git has opaque data in some places (we hide things in commit objects
> intentionally...

------
trengrj
So would it be possible to migrate to a different hash seamlessly?

~~~
clusmore
From reading various discussions, it sounds like there are quite a few places
that make implicit assumptions about the length of the hash, so from a
technical perspective it might be a hassle to migrate to longer hashes.

I think the bigger problem would be external -- tooling and other
integrations. I'm guessing if they did move to another algorithm, as part of
the migration git would need to re-compute the hash for every single object in
all of our repos and migrate all our refs over to the new hashes, so that
repos created before and after the change would be indistinguishable. This
would mean that every commit hash which appears in plaintext in commit logs,
emails, bug trackers, etc. would be wrong. Not to mention 3rd party tools
which make the same assumptions about hashes that git itself does. It sounds
like a nightmare to me, and one that I would only want to force on the
community if absolutely necessary.

~~~
hvidgaard
You could add the new hash functionality, and enable Git to use variable
length hash functions and multiple different hashes based on setup. So for
compatibility, you keep SHA-1, but for future you use a better hash function.

------
Gaelan
I've been toying with the idea of modifying git to use (iirc) SHA-256 so that
any commit hash could be downloaded directly from IPFS. Seems like as good a
time as ever.

------
detronizator
I'm not an expert of SHA-1 collisions, but I'd take Linus word for it. :)

------
wmccullough
Normally I eye roll whenever I see that there is some sort of reply to
anything from Linus, but this time I agree with him.

~~~
castis
Well considering git is his creation, it makes sense that he'd weigh in.

