
Git hash function transition plan - vszakats
https://github.com/git/git/blob/master/Documentation/technical/hash-function-transition.txt
======
benjaminjackman
Only have time to skim it, I didn't see anyplace, so might be a good time to
suggest multihash:
[https://multiformats.io/multihash/](https://multiformats.io/multihash/)

Having git to use that could be a great opportunity to standardize on a de
facto hash function encoding standard.

What would be the best way to suggest that (if it hasn't been already, though
I am guessing it likely has).

~~~
rjzzleep
Wasn't there an issue with JWT that was summarized as this:

"This is a good idea, but it doesn't solve the underlying problem: attackers
control the choice of algorithm" ?

Here's another quote from the Wireguard paper[1]:

"Finally, WireGuard is cryptographically opinionated. It intentionally lacks
cipher and protocol agility. If holes are found in the underlying primitives,
all endpoints will be required to update"

[1]:
[https://www.wireguard.com/papers/wireguard.pdf](https://www.wireguard.com/papers/wireguard.pdf)

~~~
abritinthebay
That’s only true of JWT if you allow your server to _accept_ all algorithms.

You don’t actually have to.

~~~
rblatz
Correct, your token authority should specify which algorithms are valid, and
your clients should self configure via a secure back channel to only accept
the algorithms your token authority issues.

~~~
abritinthebay
Exactly! JWT is a much misunderstood system it seems. Though it doesn’t
exactly help itself by being quite complex

------
ris
Funny, I always expected Git to transition by adding a stronger hash as a
piece of metadata to each commit and continue using SHA-1 for the day-to-day
identifier, seeing as most of the time Git doesn't actually go back and
actually _verify_ the whole commit chain unless you ask it to.

~~~
a_t48
They actually considered the reverse (search for `Using hash functions in
parallel`)

------
colinbartlett
Previous discussion:
[https://news.ycombinator.com/item?id=13906804](https://news.ycombinator.com/item?id=13906804)

------
styfle
This doesn’t render very well on mobile. I wish the Git team would write their
docs as a .md so GitHub could render as HTML with word wrap in all its glory.

~~~
nayuki
Here is a rich text version of the same document:
[https://www.kernel.org/pub/software/scm/git/docs/technical/h...](https://www.kernel.org/pub/software/scm/git/docs/technical/hash-
function-transition.html)

------
pwagland
So, this is the transition plan. Is there anywhere where we can find what
progress has been made on the plan? As far as I can tell, it is only a plan at
the moment?

I also like the idea of a transition plan, but is there anywhere a proposed
timeframe, for phasing out the non "post-transition" modes of operation? That
is, as an organisation, is there anything that we can do with this now towards
our future planning?

~~~
Piskvorrr
For something as widespread as Git, there _is_ no "post-transition", I'm
afraid: while maintained code will get migrated, old repositories will hang
around Forever.

Note that Git is a protocol - all of its implementations will eventually need
to change, and each repo using it as well. This is decentralized by the very
purpose of Git.

------
bjackman
So it says the protocol won't be extended initially, only the repo format. I'm
trying to figure out the implications of that. IIUC this basically boils down
to: can we make sure that when you have a signed tag (i.e. a hash signed with
GPG), the content of your repo is truly the same as what the signer intended,
and not a collision generated by a bad actor.

It says that there will be a new format for signed objects, i.e. you will now
be able to sign tags with NewHash. But if the format is not extended, does
that mean you can't get push or fetch those objects? If so then I believe this
is just foundational work with no immediate functional impact, right?

(Not shitting on it btw, it's obviously still a Good Idea!)

~~~
mathw
It explains this further rather later in the document.

There's a compatibility mode, where it understands a translation between SHA-1
named objects and NewHash named objects, and translates them at the boundary -
i.e. during a pull or a push.

Obviously you're at risk to some extent of flaws in SHA-1 being exploited in
your remote, although presumably if the translation layer detects the SHA-1 of
something didn't change but the NewHash did then it'll scream.

It does seem this is a temporary situation though, as it mentions in one small
sentence that for the final transition stage they envisage the protocol also
supporting NewHash, so they can throw away all SHA-1 metadata everywhere. What
they don't address in that plan is how the protocol gets extended, but they do
clearly rely on that happening for the full transition to take place.

------
cdancette
Torvald's on signing commits : [http://git.661346.n2.nabble.com/GPG-signing-
for-git-commit-t...](http://git.661346.n2.nabble.com/GPG-signing-for-git-
commit-td2582986.html)

~~~
scrollaway
He makes excellent points on tags; the one I hadn't considered before is that
tags indeed can be separated from the tree, which makes them a unique asset in
a git tree.

The problem with that however is how we use tags _today_. Creating a tag in
the modern lingua franca of git means creating a new version. If you push that
tag to Github or Gitlab or what have you, a handy "release" will be created
for you. If you're signing all your commits for some security reason, you
don't want that, aye?

So you'd want tags that are tracked separately and that's not easy to do. `git
commit --sign` is going to include the signature in the commit, not create a
separately-tracked tag with an appropriate name or whatever. It certainly
sounds interesting, albeit unintuitive, and that summarizes git perfectly :)

~~~
finnthehuman
> _The problem with that however is how we use tags today._

"Doctor, it hurts when I cargo-cult workflow from GitHub..."

~~~
scrollaway
Do you have a point?

~~~
brlewis
Point being that one shouldn't cargo-cult workflow from github.

The point is phrased using an old pop culture reference:
[https://en.wikipedia.org/wiki/Smith_and_Dale#.22Dr._Kronkhei...](https://en.wikipedia.org/wiki/Smith_and_Dale#.22Dr._Kronkheit_and_His_Only_Living_Patient.22)

~~~
scrollaway
It's not cargo-culting, it's not from github, and it's not even a workflow. If
that was the point, it's a terrible one to make. I genuinely don't understand
how people found that comment insightful, or anything short of
trolling/hostile, but _whatever_.

------
CobrastanJorji
The main downside to switching the hash function is that, when explaining why
developers should stop worrying about hash conflicts, we'll need to calculate
a new analogy to replace the standard, 180 bit "every member of your
programming team being attacked and killed by wolves in unrelated incidents on
the same night" scenario.

~~~
deathanatos
That analogy presumes that the hash function's output is uniformly random;
when you know how to manipulate it s.t. its output is _not_ random, then
obviously it doesn't hold.

The question of _accidental_ collisions is still relevant, even with SHA-256,
and the answer is still the same: it's so vanishingly improbable that it is
assumed to be impossible.

------
westurner
> Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, K12,
> and BLAKE2bp-256.

~~~
jszymborski
Not sure what K12 is (Keccak?), but BLAKE2 is a very attractive option.

------
amelius
How does it prevent this exact same problem in the future?

------
joseluisq
> In early 2005, around the time that Git was written, Xiaoyun Wang, > Yiqun
> Lisa Yin, and Hongbo Yu announced an attack finding SHA-1 > collisions in
> 2^69 operations. In August they published details. > Luckily, no practical
> demonstrations of a collision in full SHA-1 were > published until 10 years
> later, in 2017.

> The hash function NewHash to replace SHA-1 should be stronger than > SHA-1
> was: we would like it to be trustworthy and useful in practice > for at
> least 10 years.

------
hwc
Why is SHA-3 not explicitly mentioned as a candidate?

~~~
AlphaSite
SHA3 is slow.

------
anton_gogolev
NewHash is a terrible name - on par with Xbox One [X] and iPad New. Googling
stuff will be hard, and good luck explaining to less technical-savvy users
what is this all about.

Plus, in 100 years, when SHA-256 is compromised, what would be the name of a
_new_ new format?

~~~
Ixio
Can someone explain the name? It does not look like a good name. Or is NewHash
just a placeholder name for the git project because the haven't made a final
decision on a new hash function? (It's hard to google and find out)

~~~
hdhzy
[https://github.com/git/git/blob/master/Documentation/technic...](https://github.com/git/git/blob/master/Documentation/technical/hash-
function-transition.txt#L784)

------
derekmhewitt
Can someone explain why they would transition to a new hash function and not a
block chain based system of tracking? If one of the goals of introducing a
stronger hash function is signage of individual commits it seems like a block
chain would be ideal.

~~~
milkey_mouse
Chains of Git commits are already a blockchain - at least, already a DAG, and
to be more specific, they are both Merkle trees. Internally, each commit
contains the hash of the previous commit it was based on:

    
    
        $ git cat-file -p HEAD
        tree e013f4d121199d60b70043f525aef4a7e641b5f6
        parent 152bbb43b30ced1b32e9ed6f5ba2ac448de725b6
        author Linus Torvalds <torvalds@linux-foundation.org> 1510512373 -0800
        committer Linus Torvalds <torvalds@linux-foundation.org> 1510512373 -0800
    
        Linux 4.14
    

You can even GPG sign each commit if you want to ensure authenticity. The
other aspects of cryptocurrency blockchains don't really apply here: we don't
need a single "true chain," in fact that's the point of branching.

(Kids these days with their blockchains...)

