Hacker News new | past | comments | ask | show | jobs | submit login
Nth commit must have commit hash with N leading zeros (github.com)
133 points by lifthrasiir 3 months ago | hide | past | web | favorite | 27 comments

The most recent commit [1] which id has 13 leading zeroes is believed to be the lowest SHA-1 hash in Github in my cursory analysis.

[1] https://github.com/seungwonpark/kudeki-chain/commit/00000000...

Do you work for github or is there some other way to view commit hashes across the whole of github?

Also what happens if two unrelated git hashes collide on github?

I have verified the fact via ghtorrent [1] when the previous commit (with 12 zeroes) was introduced (Korean tweet with a simple script [2]). However I haven't verified the current commit yet, hence the "cursory" analysis.

On the git commit id collision, I don't know about the actual mitigation deployed in Github but they are certainly aware of the possibility [3].

[1] http://ghtorrent.org/ (Note that what I really used is a stripped down version with only commit ids, so this also contributes to a word "cursory" in my cursory analysis.)

[2] https://twitter.com/senokay/status/1131826418340356096

[3] https://github.blog/2017-03-20-sha-1-collision-detection-on-...

That would be another fun version of this: “each commit is smaller/larger than the previous commit”

It wouldn't be that much different as commiting is getting exponentially harder in both cases.

I must be missing something... how is higher (or lower) hash getting exponentially harder?

For each extra leading 0, there are 1/16 as many hashes available as the previous one - so that is exponentially harder.

But higher could be just by 1, and which is almost exactly the same difficulty... I guess, it would usually be randomly distribited in the remaining range, uniformly, so on average, the range would 1/2 as many each time. Which is also exponential (though, not as fast as 1/16). Is that the reasoning?

For growing hashes, we can limit the growth of the search complexity by choosing to accept a more constrained range of hashes in each iteration.

Suppose H is an L-bit hash function and let H_n be the commit hash at the nth iteration. If we seek to generate H_n subject only to the constraint H_n+1 > H_n, the search complexity grows exponentially as you describe above.

But suppose we search subject to the additional constraint H_n < H_n+1 < min(H_n + k, 2^L) for some constant k. So long as 2^L - H_n > k, the search complexity for H_n+1 is only dependent on the constant k and the number of bits of the hash function, L.

Heh, you made me consider if kudeki-chain also restricted the range (though not to a constant):

One could read the title as exactly n leading zeros, but of course, it just means it has n leading zeros (that is, it may have more, too) - and the py code is implemented that way. So, for example, if you started with 16 leading zeros, it could be used for the first 16 commits.

  work = len(sha1) - len(sha1.lstrip('0'))
  assert commit_num <= work, "Not enough work done."

BTW cute that assert is exactly appropriate for the ci tests, for determining whether it is accepted as a commit.

Sure, it works, but it doens't have the fun of the problem starting easy and getting harder over time.

Setting k as sqrt(2^L-H_n) would be fun middle ground probably, though I'm lazy to analyze its properties.

average number of days before you find a given prefix ~= ((86,400 seconds in a day)(hashes per second))/(1/singlecharbit^num_prefix_chars)

I think it would converge to a similar value rather quickly. Someone with access to fast hashing would vastly overtake the crowd and keep his #1 spot until a bigger fish appears.


This is similar in concept to how Bitcoin defines its hardness, though it uses a different hash and block structure of course. I wonder if successful commits in this repo are coming from that world or maybe from SHA1 cryptographic weaknesses.

I have been bringing up this point in conversations about blockchain for a while now. Git is essentially a chain of commits, and the hash of a commit will change based on what you're applying it on by including the previous commit's hash in the body that gets eventually hashed into HEAD.

Also reminds me of that time back when we were using 8-char truncated git hashes as deployment tags and we eventually begun getting collisions. We also had an interesting bug when all 8 chars were numeric and Python (or the framework/library that was used) ingested what would be a string it as a number.

Git is a merkle tree. Blockchain also is a merkle tree. They are the same data structure.


The difference between blockhain and git is the hashing function. Git’s is designed for speed and constant difficulty, blockhain is designed for slowness and increasing difficulty. The logical meaning of each block is different also.

Python is mostly strongly typed. You can get a mix of floats and ints from e.g. parsing JSON, but your library's authors must have been pretty explicit on wanting to confuse strings with numbers.

I know, I was surprised too. But I guess if it comes in as a number from the get-go, it will simply continue as a number until we tried to slice it or something, which made the deployer abort. :)

I mean, of course the library in question could contain something like this:

        hash = int(raw_hash)
    except ValueError:
        hash = raw_hash
But WHY? Can you think of something one could do accidentally to achieve the same effect?

Maybe a migration from SVN and some legacy code somewhere. Since revisions will have r12345678 kind of format.

See, what it reminds me of is Knuth's bounty for errors.

I ran a computer and network security course back from 2010-2013 and the simplified version of this was part of our wargames. The difficulty scaled with recency of the last commit.

Admittedly we had to minimize the number of points awarded for this as it'd be a "money for marks" conversion otherwise but it was fun enough to prove a concept.

Nth commit needing N leading zeros ups the ante a fairly hilarious amount ;)

This is PoW Gitcoin.

Could somebody explain in plain words what happens here?

(From my very limited understanding) making commits requires you to find a specific hash, thus making new commits exponentially harder over time.

At some point it would cost significant money (besides the developers time) to make commits. It's basically an economic game like bitcoin, that those people who have economic interest in the code, get to decide what's in it.

Silly games bruteforcing content until a "special" SHA1 output value is achieved.

HNers with long memories may remember the very similar XKCD "Alma mater" challenge from 2013[1][2] (which is no longer active). There is a sort of continuance of that online at https://beatthehash.com/ .

The goal of that challenge was similar: find some input string by brute force, such that the Skein-1024( input ) XOR a provided "target" has more zero bits than anyone else. (At the time, Skein was a SHA-3 finalist.)

Git commits must be marginally structured, whereas the Skein challenge input was arbritrary, but that doesn't change the problem much.

[1]: https://xkcd.com/1193/

[2]: E.g., https://web.archive.org/web/20130403124655/https://xkcd.com/...

The "global-warming" tag of the repo is appropriate.

This software _so_ should be a rewrite of the Apollo 11 guidance computer software...

To The Moon!!!

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact