
Birthday Paradox Revisited - bookofjoe
http://datagenetics.com/blog/february72019/index.html
======
abetusk
The Birthday Problem [1]:

    
    
        How many people do you have to put in a room
        before there's a 50% chance of two having the
        same birthday.
    

Assuming a uniform distribution of birthdays. The answer turns out to be about
23. In general, for M possible birthdays (other than 365, say), you need about
sqrt(M) people before you get a 50% chance.

The article is a standard introduction to the Birthday Problem with some real
world data thrown in. The 'paradox' comes from the surprise that the number is
so low (23) compared with the number of days in the year (365). As the article
points out, one short description of why the number is so low is that you're
comparing each new person with every person already in the room instead of
drawing two numbers at random and seeing if they're the same.

For the curious, I have a minimal post on how to derive the Birthday Paradox
and other canonical probability problems [2].

[1]
[https://en.wikipedia.org/wiki/Birthday_problem](https://en.wikipedia.org/wiki/Birthday_problem)

[2] [https://mechaelephant.com/dev/Assorted-Small-Probability-
Pro...](https://mechaelephant.com/dev/Assorted-Small-Probability-Problems/)

~~~
bhl
An even, shorter intuition would be that: number of pairs in a group of n
people grows O(n^2), which implies some square root factor. I would lead with
that before starting a more rigorous proof.

~~~
waterhouse
Agreed. To narrow down some of the details: There are n*(n-1)/2 = roughly n^2
/ 2 pairs. If each pair has a 1/k chance of matching, then the expected number
of pairs that match is (n^2 / 2) / k. Therefore, if we set n = √k, we get
roughly 0.5 pairs on average.

Now, if it were impossible to ever get multiple matching pairs, then "the
chance there's at least one match" would be equal to "the expected number of
matches". Specifically: "chance of at least 1 match" = "expected number of
matches" \- "chance of at least 2 matches" \- "chance of at least 3 matches"
\- ... . Since it's possible but unlikely to get multiple matches, the
approximation should be reasonably close.

------
jypepin
I really enjoy reading about those kinds of problems because they are so
opposite to my intuition.

Even thinking about this more, and reading (and understanding) the solution,
if I think about this, my guess would still be "well, with 182 people, half
the pigeon holes would be taken, so there would be 50% chance the next person
getting in the room would pick a taken hole".

Another similar problem is the Monty Hall problem. Simple, easy to understand
when explained, but still, despite understanding the solution, doesn't feel
right!

~~~
admax88q
You're formulating the problem wrong in your mind.

Its true that if you have 182 people in the room all with unique birthdays and
you add one more random person to the room there is a 50% chance of them
sharing a birthday just like you described. But you're assuming you already
managed to gather 182 people without any birthday collisions. That's a
different problem then the one originally posed. In your case you collected
182 people _with unique birthdays_ and checked the probability of a collision
when adding one more. But the real question asks what is if you grabbed those
182 people at random what is the chance that any two of them _already_ share a
birthday?

~~~
jypepin
indeed, that makes more sense this way!

~~~
admax88q
Unfortunately I don't have any good intuition to share for the Monty Hall
problem, as I haven't been able to get an intuitive understanding of it yet.

However I did just read about Bertrand's Box Paradox[1], and it's very much
the same sort of thinking as the Monty Hall problem, but more intuitively
understandable for me at least.

[1] -
[https://en.wikipedia.org/wiki/Bertrand%27s_box_paradox](https://en.wikipedia.org/wiki/Bertrand%27s_box_paradox)

------
nkurz
_The result is that the non-uniformity does have an impact effect, but it 's
very, very small. When there were four or less people in the room, from my
experiments, the non-uniformity of distribution resulted in a slight decrease
in the chances of a birthday collision (but this could loss of precision from
too small a sample size as the percentage of times, for instance that two
people collide is just 0.274%). After five people, the non-unformity provided
a slight increase in the chances of all collisions. This is what I would
expect._

Does this make any sense? The author says that with a realistic non-uniform
distribution of birthdays, they found that collisions were less likely when
there were fewer than 5 people involved. I can't think of any reason this
would be the case. If certain birthdays are more common than others, I'd think
that the chances of a collision must go up regardless of the number of people
involved.

Is there any plausible mathematical explanation for this effect? Or was the
experiment just underpowered? Or worse, might the simulation code be buggy?
Presumably they would have run the simulation more than once after getting
such such a counterintuitive answer, and it seems really unlikely that this
effect would be consistent unless something was broken.

~~~
ordu
_> I can't think of any reason this would be the case._

The author suggests such a reason: the loss of precision. As I understand it
hints to a limited precision of float/double number representation. When you
calculate a sum of millions of fractions some of fractions could turn into
zero, because you have divided a too small number on a too large one.

~~~
nkurz
I think the floating-point precision interpretation is negated by the author's
suggestion that the loss of precision is "because of the small sample size". I
can sort-of see how a purely analytical solution might be susceptible to
rounding issues (and I say sort-of because I don't think such an approach
would actually include multiplying millions of fractions, and the
straightforward approach involves numbers near 1 instead of near 0), but the
reference to "sample size" makes me think that this was a numerical
simulation.

But presuming the author ran the simulation more than once (wouldn't you do
this if you had such a surprising result?) I don't see how there could be any
consistent effect unless there was a bug in the logic of the program. Unless
maybe they ran it multiple times with the same random seed and got exactly the
same results?

------
jihadjihad
In a similar vein, I saw this linked from HN not too long ago and thought it
was a really neat application of the birthday paradox--all about finding
duplicate packs of Skittles:
[https://possiblywrong.wordpress.com/2019/04/06/follow-up-
i-f...](https://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-
identical-packs-of-skittles-among-468-packs-with-a-total-of-27740-skittles/)

------
mensetmanusman
The US data show that the healthcare system is happy to induce labor to
support holiday gatherings of hospital staff.

The interesting thing is that induction increases the risk of c-section, which
is a negative for the family, but a positive for the healthcare system (for
obvious reasons).

Also note that the US has one of the worst birthing mortality rates amongst
western countries...

USA#1

------
jbarberu
Me and my 4 year younger sister share our birthday. Growing up people would
often gasp and ask themselves "what are the odds??", to which I'd usually
reply "1/365?".

My father and step-sister also share their birthdays.

I guess probabilities are just really unintuitive to people.

~~~
EForEndeavour
Obligatory quote by Richard Feynman:

"You know, the most amazing thing happened to me tonight... I saw a car with
the license plate ARW 357. Can you imagine? Of all the millions of license
plates in the state, what was the chance that I would see that particular one
tonight? Amazing!"

------
gus_massa
Why is there a huge drop on July 4th?

~~~
dentemple
As the article points out, induced labor provides an element of choice with
regards to birth.

Not as many mothers (or doctors) in the U.S. seem to want to induce labor on
U.S. Independence Day.

~~~
beefsack
A lot of maternity wards will try to avoid booking inductions on public
holidays, because they will have fewer staff to deal with the spontaneous
births on the day.

