
Paradoxes of Probability and Other Statistical Strangeness - robocaptain
http://quillette.com/2017/05/26/paradoxes-probability-statistical-strangeness/
======
boreas
For those who might be interested, and in a slightly different vein than the
examples in the article, there's the "sleeping beauty" paradox:
[https://en.wikipedia.org/wiki/Sleeping_Beauty_problem](https://en.wikipedia.org/wiki/Sleeping_Beauty_problem)

Basically, an agent is put to sleep and told they will be woken up once or
twice, depending on the results of a fair coin flip, without the ability to
remember other awakenings.

What probability does the agent assign to the event that the coin landed
heads?

The intuitive response is 1/3, but this poses obvious epistemological
problems. The agent has, ostensibly, no new information at all, and their
prior is surely 1/2\. Hope someone else finds this as interesting as I do!

~~~
colonelxc
I mostly find it interesting in that people could think that the chance is 1/3
(and that it may even be obvious!). After reading the description I can
understand what they are getting at, but I think the conditional probability
is messed up.

Instead of P(Monday | Heads) = P(Monday | Tails) = P(Tuesday | Tails) it is
really P(Monday | Heads&Awake) = P(Monday | Tails&Awake) = P(Tuesday|
Tails&Awake) or something like that. But the interviewer isn't asking about
that, they are asking for the probability of the coin. The 3 positions are
only exhaustive given that you are awake to be interviewed about them, not
exhaustive of possible states (it's missing P(Tuesday | Heads&Asleep)). Since
you're always awakened at least once, I find the argument that being awake has
'given you information that it is not tuesday AND heads' is pretty weak. While
true, both heads and tails expect to be awoken while it is not both tuesday
AND heads.

~~~
roenxi
The outcomes can be easily enumerated. If Sleeping Beauty always answers
"Heads" she will be right 33% of the times she is asked. This is pretty close
to the definition of a 33% chance.

She hasn't been given new information by waking up, she also knows as she goes
into the experiment - "most of the outcomes where I am being interviewed
involve the coin toss coming up tails".

------
Houshalter
By far the most unintuitive paradox for me personally is the one presented
here:
[https://youtu.be/go3xtDdsNQM?t=3m27s](https://youtu.be/go3xtDdsNQM?t=3m27s)

"Mr. Jones has 2 children. What is the probability he has a girl if he has a
boy born on Tuesday?" Somehow knowing the day of the week the boy was born
changes the result. It's completely bizarre.

~~~
astrocat
I'm an idiot, but I'm going to throw my hat in the ring here:

The video is wrong. The problem reads: Jones has 2 kids. What is P(he has a
girl) given that he has a boy born on a Tuesday. Consider, for a moment, what
information we're getting from "boy born on a Tuesday." This is no different
than "boy with red hair," or "boy with 5 freckles." The fact that the BOY was
born on a tuesday does not change P(day of the week girl was born). Imagine
the "boy with 5 freckles" case - let 5 freckles be denoted by F5, six freckles
by F6 and so on... would the appropriate calculation include enumerating P(boy
F5, boy Fn) for all n? No.

The "born on Tuesday" is irrelevant. Thus you have the following scenarios: \-
one kid is TuesdayBoy and the other is also a boy, born at any time \- one kid
is TuesdayBoy and the other is a girl, born at any time

Out of these options P(Jones has a girl) is a flat out 50%. There is no need
to bring in concepts of "which was born first" or enumerate all possible days
of the week each child could have been born.

Ok... now all the real smartypants here can correct me :)

~~~
Chinjut
There are 2 * 7 * 2 * 7 ways to assign gender and birth-day-of-week to two
children. By convention, all are considered equiprobable (this is the same as
assuming kids' genders and birth day-of-weeks are independent of each other
and of all facts about other kids, and that both genders are equally likely
and all 7 days are equally likely for any given kid.)

Of these possibilities, 27 are situations where one kid is a Tuesday boy. [Do
you dispute this count?]

Of those, 14 are situations where one kid is a girl. [Do you dispute this
count?]

The answer to "What proportion of cases where there is at least one Tuesday
boy also have a girl?" is thus 14/27.

You have stated by fiat that certain things are irrelevant to certain other
things, that certain things have probability 50%, etc, but in doing so, you
have not considered the count correctly. You are likely misled by phrasing
such as "the boy", when there are families with two boys in which there is no
proper referent of "the boy" and no particular answer to question like "Which
day was 'the boy' born?".

~~~
astrocat
ok ok... let me try to get this straight. Just as kind of a mental process for
trying to understand whether or not something passes the smell test, I
typically try to take the basic premise and turn it up to 11 and see if that
still makes sense.

In this problem, as you've described it, we're enumerating "ways to assign
gender and birth-day-of-week." We can do this because there are a _countable_
number of "days of the week" (so we can map to the integers: 1-6) AND there is
also a surjective function of [child] -> [day of the week they were born]. Am
I right so far?

Now let's replace the set [1-6] with another countable set that also maintains
the surjective function. We could say "day in the lunar cycle" (so ~27
options), or better "day of the year" (366 options), for example. Do we now
need to consider the 2 _366_ 2 _366 ways to assign gender and birth-day-of-
the-year? Take it further with whatever you want: "birth weight in milligrams"
or "number of freckles" (as I previously suggested). All countable things that
meet the surjective requirement.

This is starting to smell funny, right? So let's take a look at the math.

You say there are 2_7 _2_ 7 ways to configure day+gender, assuming
independence for kid 1 (k1) and kid 2 (k2). This represents: (k1 gender
options * k1 day of week options) * (k2 gender options * k2 day of week
options). Right? I'm with you so far. Then you say "Of these possibilities, 27
are situations where one kid is a Tuesday boy." Hold up.

We are given two pieces of information: that one of the kids is a boy, and
that particular boy was born on a Tuesday. Let's say the boy is k1 (this is an
assignment of enumeration, not of "who came first;" just like Sunday = 1 does
not mean that any kid born on a Sunday was born before every kid born on
Monday = 2). So now the k1 options are [1 _1] (boy, tuesday), and the total
number of options are: [1_ 1] * [2 _7] = 14. Of those 14, 7 are girl options.
And we 're back to a straight 50%.

So yes, I dispute the 27 number. It seems like it is arrived at by 2_1 _2_ 7,
minus one for an apparent duplicate. But the 2 _1_ 2*7 represents maintaining
gender non-specificity for Tuesday boy, which should be incorrect, no?

> You have stated by fiat that certain things are irrelevant to certain other
> things...

Yes, but that's what "independent" means, right? You also stated that you're
assuming these two things are independent, hence equiprobability. But
independence is defined by P(A) = P(A|B). The probability of A is completely
unaffected by B. Yet the outcome you arrive at is that P(A) IS affected by B,
so the math presented is internally inconsistent.

What am I missing here? I'm fascinated by the uncertainty around this little
problem.

~~~
finind
Let's see if I can help you understand this a bit better. First, let's clarify
the problem being asked. There are 2 different problems with different
solutions and it helps to explicitly separate them.

problem 1) You go up to a person and ask them if they have exactly 2 children,
at least one of which is a boy born on Tuesday. They say yes. What is the
probability that they have a girl?

problem 2) You go up to a person and ask them if they have exactly 2 children,
at least one of which is a boy. They say yes. You then ask them which day of
the week a boy they have was born on. They say Tuesday. What is the
probability that they have a girl?

The original problem that was posed is equivalent to problem 1, but not
equivalent to problem 2. This could be what is confusing you, because in
problem 2 the extra information plays no role in the selection process, while
it does play a role in problem 1. In problem 2, the answer is the standard
2/3\. Why are the probabilities different between problem 1 and 2? Here's why:

Think about the set of people who could answer yes to the question in problem
2. The ratio of these groups is important. A parent with BB (two boys) is
equally likely to answer yes to problem 2 (100% likely to be exact) as a
parent with BG and GB (also 100% likely to answer yes), which leads to the
correct solution of 2/3\. However, in problem 1 a parent with BB is NOT
EQUALLY LIKELY to answer yes as a parent with BG. This is because we added an
extra qualifier (must be born on Tuesday). The parent with BB has two chances
to meet this qualifier because they have two boys, so the parent with BB is
actually more likely to answer yes to the question than the parent with BG. As
the qualifier becomes more and more rare (day of lunar cycle), the probability
of the BB parent answer yes P(yes|BB) approaches twice the value of P(yes|BG).
So now you're left with some subset of parents with BB, BG, and GB, but in
this scenario you've sampled from BB approximately twice as much as you've
sampled from each of the BG and GB groups, leaving you with approximately the
same number of people from group BB as the combined amount from groups BG and
GB. This is why the probability approaches 50%

I spend a while writing this, so hopefully it helps!

~~~
astrocat
:) Thanks for taking the time. I've realized a few things, and found it helped
to get a bit more formal.

Jones has 2 kids. Let A be "he has a girl" and B be "he has a boy born on
tuesday." First thing I realized is A and B are NOT independent - this is key.
P(A) includes the option of Jones having two girls. But if B is true, then the
two girls option isn't on the table anymore, which affects P(A). Realizing
this helped me start to better understand what kind of problem we're dealing
with.

Second was realizing that P(A&B) is not at all the same thing as P(A|B) - the
probability of A _given_ B - when A and B aren't independent. The problem is
asking for P(A|B), and by the rule of conditional probability: P(A|B) =
P(A&B)/P(B)

P(B) can be solved for without too much fuss: solve 1-P(!B). For each kid you
have 2 genders and 7 days of the week, or 2 * 7 = 14 options. 13 of those are
not "Boy & Tuesday." So you have P(!B) is (13/14) * (13/14) = 169/196\. P(B) =
1 - 169/196 = 27/196.

This leaves us trying to figure out P(A&B). I can't think of any other way to
do it other than enumerating all options. We can take a shortcut and just look
at all 27 possible scenarios where B is true. This seems to be the method of
choice ;) As others have shown, we see that 14 of those satisfy A. So P(A&B) =
14/196.

Now, we can solve: P(A|B) = P(A&B)/P(B) = (14/196)/(27/196) = 14/27

So I'm now part of the "math checks out" club. Thanks for all the help people!

------
georgewsinger
Another "paradox": even though it's possible to randomly pick a rational
number from the reals, the probability of this happening is 0.

~~~
Sinergy2
Please describe how it is possible to pick such a number. For example, I can
readily imagine how to pick a random 32b float, but that it is an entirely
problem with a nonzero probability.

~~~
georgewsinger
Usually in math we assume the axiom of choice :)
[https://en.wikipedia.org/wiki/Axiom_of_choice](https://en.wikipedia.org/wiki/Axiom_of_choice)
I'm assuming this could somehow lead to such a "random" pick in the technical
sense.

In terms of implementation, I'm not aware of an algorithm that can randomly
pick a real number on an actual computer. Perhaps a mathematician could show
how to pick one on some abstract machine with infinite resources, and not
constrained by finite bit representations of numbers.

~~~
dragonwriter
> In terms of implementation, I'm not aware of an algorithm that can randomly
> pick a real number on an actual computer

An actual (finite in time and space) computer can't even _represent_ arbitrary
real numbers, much less _randomly choose_ them.

~~~
dorgo
What about PI? We can represent it in terms of "we know what we are talking
about" and we can distinguish it from other numbers.

~~~
leni536
You can only have countable number of first or second order logic statements
each defining a specific real number.

------
fitchjo
My favorite statistical/probability paradox has always been the birthday
paradox.

~~~
beefield
I don't know if Monty hall problem counts as a paradox, but that is quite high
on my favourite list of counterintuitive probability results.

~~~
thousandautumns
In my experience the only reason the Monty Hall problem comes off as
paradoxical is because it is usually poorly explained.

------
pella
[https://en.wikipedia.org/wiki/Category:Statistical_paradoxes](https://en.wikipedia.org/wiki/Category:Statistical_paradoxes)

------
daxfohl
"Paradox" is a pretty strong term. The items presented are more in the
category of common errors and counter-intuitiveness.

~~~
Houshalter
[https://en.wikipedia.org/wiki/Veridical_paradox](https://en.wikipedia.org/wiki/Veridical_paradox)

~~~
daxfohl
Fine, but I was left feeling blah by the "paradoxes" presented.

