Hacker News new | past | comments | ask | show | jobs | submit login

> by pure luck

No, even that is not true. If all digital (and non-digital) storage media ever manufactured by humans - meaning all hard drives, tape drives, CDs, DVDs, BluRays etc. ever manufactured to date, and every book and word ever printed or written down. If those ALL were only filled in with UUIDv4s generated from a good random source .. you would still not see even one single collision!

UUID collisions are only possible with currently known human technology if your randomness source is not good enough. And it will remain so unless there are some astronomical leaps in digital information storage technology - at least 10 orders of magnitude more storage than currently exists.

EDIT: I thought of a way for programmers to mentally visualize how unlikely UUID collisions really are. Let's imagine that in some not-too-distant future, there are 10 billion people on Earth. Each of them are given one thousand CPUs. These CPUs have 1024 cores each, and they run at 10 GHz (clock cycles per second). The CPUs implement a hypothetical instruction that can generate a totally random UUID in one clock cycle.

As an experiment, all people on Earth one day decide to program all their thousand CPUs each to run a tight loop that will indefinitely generate UUIDs on all 1024 cores and then immediately discard them.

After continuing to run this experiment (whose electicity bill will make Bitcoin look like Earth Hour) all day, 24/7, for about 800 years, the likelihood of one UUID ever having been generated twice will have exceeded 50%.




I'm sorry to say that your analysis is wildly incorrect.

- 10 billion people =~ 2^33

- 1000 CPUs =~ 2^10

- 1024 cores =~ 2^10

- 10 GHz =~ 2^33

So: one second's computation by all of these people is 2^86 UUIDs generated. UUIDs are 128 bits. With probability essentially 1, there will be a collision within one second.

The reason is known as the birthday paradox. If you sample random values from a set of size k, after you've chosen about sqrt(k) values you will have chosen the same value twice with probability very close to 1/2. By 10*sqrt(k) samples you'll have found a collision with probability well over 90%.

In this case, after sampling 2^64 values you'll have a collision with probability 1/2. That happens in roughly 250 nanoseconds (2^-22 seconds) in your thought experiment.

2^64 sounds like a lot, but in many contexts it's not all that much. Every bitcoin block mined takes well in excess of 2^70 SHA evaluations. Obviously the miners are not dedicated to generating UUID collisions, but if they were they'd easily find thousands of them in the time it takes to mine one block (this neglects the fact that it is much easier to sample a UUID than to evaluate double-SHA256).


> The reason is known as the birthday paradox.

Right. Forgot about that little thing. You're absolutely correct.

With this taken into account a _single_ person with 1000 pieces of 1000-core, 10 GHz CPUs generating UUIDs will generate a match in a few minutes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: