

Git Collisions? - hobbyist

I want all the git users to help me answer this query.<p>Quite often I worry about git hash collisions. I know the probability is as small as it can get. But what is the probability that out of all the git users so far, any one out of all had a git collision? (This is not the same as the previous prob)<p>Secondly hypothetically if I combine all the git repositories ever created in this world into one giant git repo, will I see any collisions?
======
shrughes
See <http://en.wikipedia.org/wiki/Birthday_attack>

A basic estimate is that the probability of collision is (n^2 / 2) / 2^160,
where n is the number of hashes that have been generated. For example, if
you've generated 2^64 hashes, you have a 1/2^33 chance of collision. This
works pretty well until you get too close to 2^80, and larger.

Edit: That formula actually is a shorthand for (n * (n-1)/2) / 2^160, you get
it because on your first hash, you have 0/2^160 chance of collision, and on
your second hash, you have 1/2^160 chance of collision, and on your third, you
have 2/2^160 chance of collision, and on your nth, you have (n-1)/2^160 chance
of collision (assuming you haven't had any collisions already). So you can add
up those probabilities to get (n * (n-1)/2)/2^160 -- you can just add the
probabilities because the chance of having _two_ collisions is super unlikely
(until it is no longer super unlikely, around 2^80 or so). (Really it's 1 - (1
- 1 / 2^160)^k and we're approximating that with k * (1 / 2^160.)

~~~
shrughes
Sorry, I had a bit of a brainfart when writing that. You can add the
probabilities simply because (1 - e - f) is a good approximation for (1 - e)(1
- f) when e and f are tiny. No other reason. The chance of having "two"
collisions (I meant one existing collision) is zero because we were in a case
that assumed that.

------
_ikke_
[https://www.wolframalpha.com/input/?i=number+of+combinations...](https://www.wolframalpha.com/input/?i=number+of+combinations+of+sha1)

> 983.544.651.006.435.431.129.340.665.918.456.708.968.598.498.835

or 9.8 * 10^47 combinations are possible. Although this doesn't give the
complete picture, this shows how big the sha1 space is.

~~~
joelittlejohn
So in other words:

983 billion, billion, billion, billion, billion

If every human that ever lived had made a thousand git commits every second of
their lives to the same git repository, we would still expect no collisions.

