Also what happens if two unrelated git hashes collide on github?
On the git commit id collision, I don't know about the actual mitigation deployed in Github but they are certainly aware of the possibility .
 http://ghtorrent.org/ (Note that what I really used is a stripped down version with only commit ids, so this also contributes to a word "cursory" in my cursory analysis.)
For each extra leading 0, there are 1/16 as many hashes available as the previous one - so that is exponentially harder.
But higher could be just by 1, and which is almost exactly the same difficulty... I guess, it would usually be randomly distribited in the remaining range, uniformly, so on average, the range would 1/2 as many each time. Which is also exponential (though, not as fast as 1/16). Is that the reasoning?
Suppose H is an L-bit hash function and let H_n be the commit hash at the nth iteration. If we seek to generate H_n subject only to the constraint H_n+1 > H_n, the search complexity grows exponentially as you describe above.
But suppose we search subject to the additional constraint H_n < H_n+1 < min(H_n + k, 2^L) for some constant k. So long as 2^L - H_n > k, the search complexity for H_n+1 is only dependent on the constant k and the number of bits of the hash function, L.
One could read the title as exactly n leading zeros, but of course, it just means it has n leading zeros (that is, it may have more, too) - and the py code is implemented that way. So, for example, if you started with 16 leading zeros, it could be used for the first 16 commits.
work = len(sha1) - len(sha1.lstrip('0'))
assert commit_num <= work, "Not enough work done."
BTW cute that assert is exactly appropriate for the ci tests, for determining whether it is accepted as a commit.
Setting k as sqrt(2^L-H_n) would be fun middle ground probably, though I'm lazy to analyze its properties.
This is similar in concept to how Bitcoin defines its hardness, though it uses a different hash and block structure of course. I wonder if successful commits in this repo are coming from that world or maybe from SHA1 cryptographic weaknesses.
Also reminds me of that time back when we were using 8-char truncated git hashes as deployment tags and we eventually begun getting collisions. We also had an interesting bug when all 8 chars were numeric and Python (or the framework/library that was used) ingested what would be a string it as a number.
The difference between blockhain and git is the hashing function. Git’s is designed for speed and constant difficulty, blockhain is designed for slowness and increasing difficulty. The logical meaning of each block is different also.
hash = int(raw_hash)
hash = raw_hash
Admittedly we had to minimize the number of points awarded for this as it'd be a "money for marks" conversion otherwise but it was fun enough to prove a concept.
Nth commit needing N leading zeros ups the ante a fairly hilarious amount ;)
At some point it would cost significant money (besides the developers time) to make commits. It's basically an economic game like bitcoin, that those people who have economic interest in the code, get to decide what's in it.
The goal of that challenge was similar: find some input string by brute force, such that the Skein-1024( input ) XOR a provided "target" has more zero bits than anyone else. (At the time, Skein was a SHA-3 finalist.)
Git commits must be marginally structured, whereas the Skein challenge input was arbritrary, but that doesn't change the problem much.
: E.g., https://web.archive.org/web/20130403124655/https://xkcd.com/...
To The Moon!!!