
I don't like this cartoon - pav3l
http://andrewgelman.com/2012/11/16808/
======
InclinedPlane
Interesting, but it IS a joke, which usually requires taking some liberties
with the truth.

For example, an issue I have with it is that it's not actually possible for
the Sun to explode. It is incapable of going nova. This is an old sci-fi
trope, of a civilization that dies because its Sun goes nova, but this isn't a
thing that actually happens. There are very specific circumstances where a
nova can occur, and it is typically a recurring thing. More so, it requires
the presence of a white dwarf star, which must come into existence only after
a star has gone into a red giant phase. None of these things are conducive to
life developing on nearby planets, they aren't even conducive to the continued
_existence_ of nearby planets. Regardless, our star is not a white dwarf, nor
does it have a companion and therefore will never produce a nova. More so, it
is about 1/8th too small to go supernova.

~~~
btilly
I learned something today. <http://en.wikipedia.org/wiki/Helium_flash> is what
I thought was the process that leads to novas, it is not.

That said, from our point of view a helium flash is pretty scary. Of course
the Sun is not expected to do this for another 5 billion years.

But according to [http://www.whillyard.com/science-pages/our-solar-
system/sun-...](http://www.whillyard.com/science-pages/our-solar-system/sun-
evolution.html) the Earth is likely to be uninhabitable in about a billion
years anyways.

------
lisper
Since it's Sunday and I'm in procrastination mode I took a little excursion
into the rabbit hole and found this:

[http://www.stat.columbia.edu/~gelman/research/published/fell...](http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf)

The money quote:

"More than that, though, the big, big problem with the Pr(sunrise tomorrow |
sunrise in the past) argument is not in the prior but in the likelihood, which
assumes a constant probability and independent events. Why should anyone
believe that? Why does it make sense to model a series of astronomical events
as though they were spins of a roulette wheel in Vegas? Why does stationarity
apply to this series? That’s not frequentist, it isn’t Bayesian, it’s just
dumb."

~~~
zanny
If the default state of procrastination was researching higher order
mathematical theory the world would be.... probably more predictable.

I'll let myself out.

------
amouat
Reminds me of the letter Babbage wrote to Tennyson, correcting his poem:

"In your otherwise beautiful poem one verse reads,

Every moment dies a man, Every moment one is born

If this were true the population of the world would be at a standstill. In
truth, the rate of birth is slightly in excess of that of death. I would
suggest that the next edition of your poem should read:

Every moment dies a man Every moment 1 1/16 is born

Strictly speaking the actual figure is so long I cannot get it into a line,
but I believe the figure 1 1/16 will be sufficiently accurate for poetry."

Seems jgc has written a blog about it:

<http://blog.jgc.org/2010/09/on-being-nerd.html>

~~~
praptak
Actually a moment is an infinitesimally small amount of time, therefore in
order to model the birth/death distribution across moments we need to
rigorously define... wait, what am I doing with my life?!

------
mistercow
Here's the thing: the frequentist in the comic has made an error even by
frequentist standards, and that error is equivalent, in Bayesian thinking, to
choosing an inappropriate prior.

The problem is that many frequentist techniques implicitly choose a prior for
you. That's handy since choosing an appropriate prior is hard. But it also
abstracts away the choice of prior.

If a Bayesian makes this mistake, anyone can look at their math and say
"There. That's where you chose a bad prior."

If a frequentist makes this mistake, you have to have a complicated analysis
to explain why the method used is inappropriate.

~~~
gajomi
I agree in with this analysis. The cartoon is, of course, a caricature but
highlights the nature of the essential problem you point out, which is that
unraveling the line of reasoning carved out by a frequentist is not a
straightforward matter, even if the argument is correct. From the Bayesian
perspective all arguments are deductive once you have all the information. A
typical reply to this observation is that in most cases it will be clear how
the steps of a calculation translate into the various assumptions needed to
clarify the argument. Of course, what counts as "typical" depends on the kinds
of questions one asks. If there are only a few kinds of problems in your
field, then maybe you can get away with heuristic lines of reasoning, patching
up problem in special cases as needed. But if you are, say, trying to write a
general purpose software for statistical analysis it will not be possible to
rely on inductive lines of reasoning particular to a certain field. This might
partially explain the popularity of the Bayesian approach in machine learning
circles.

------
rcthompson
Given that Randall Munroe has already expressed that the Frequentist vs.
Bayesian angle of the comic was an afterthought, I think the real point (and
the point that I actually took away from the comic) is about the blind
application by scientists of the 0.05 P-value threshold for significance
without regard to the specific circumstances of the experiment, which I can
attest is a _huge_ issue in the scientific literature. Another huge issue is
using a statistical test on data that is known not to satisfy the assumptions
of the test, either out of ignorance or because the test gives a good (i.e.
P<0.05) result where the correct test doesn't.

~~~
Evbn
Cf Randall's own classic comic about jellybeans causing cancer, of maybe one
color of jellybeans causing cancer.

------
jere
Wow, this is embarrassing. I took statistics, but it's been almost a decade. I
got a pretty nice review on Bayes' theorem in an AI class I took last year
though. I thought I understood the xkcd comic quite clearly, but I'm
completely lost on the posted article.

Apparently, Munroe was a bit confused also.

>The truth is, I genuinely didn’t realize Frequentists and Bayesians were
actual camps of people—all of whom are now emailing me.

<http://andrewgelman.com/2012/11/16808/#comment-109366>

~~~
acomar
Some how this only makes the comic and reaction way more funny.

------
pav3l
I really like this sentence:

> All statisticians use prior information in their statistical analysis. Non-
> Bayesians express their prior information not through a probability
> distribution on parameters but rather through their choice of methods.

Also, from the comments:

>My problem is ultimately not with the cartoon. My problem is that there are
practitioners and teachers of statistics who spread cartoonish ideas about
statistical methods without recognizing these ideas are inaccurate.

~~~
Symmetry
I'm not sure how it can be a good idea to be choosing from various statistical
methods based on which one will give you the sort of answer you intuitively
think is correct. I mean, if you're going to bite the bullet and bring priors
into your analysis in an ad-hoc way like that, why not just acknowledge their
existence mathematically?

~~~
pav3l
When you are reporting, say, a p-value from your ANOVA F-test, you are making
formal mathematical assumptions, such as normal marginal distribution of your
dependent variable. A lot of Frequentist methods (hypothesis testing) are
really just mathematical shortcuts from times when computation was more
expensive. The problem is, many people tend to misuse the tests where they are
not appropriate either because those give "better" answers, or simply out of
ignorance.

~~~
Symmetry
Right, but the way you deal with the situation in the comic is going to end up
being much fuzzier and more subjective issue of picking you reference classes.

------
DaniFong
I do like this cartoon; it's a very nice representation of the problem with
the viewpoint.

The main issue is that it make frequentist humans look like idiots. They're
not.

Simply oftentimes lead astray -- as this example illustrates.

In particular, the epistemological fact that people misuse p-values must be
exposed. Just because it is unlikely something happened by chance, in one way,
in your model, does not prove your hypothesis correct.

------
colechristensen
While it's true (and should be obvious) that a competent statistician wouldn't
make such silly mistakes, there are quite a few scientists who are not-so-
competent statisticians who publish results with those exact errors.

~~~
Evbn
Including the entire A/B testing world, from startup to Amazon.com and big
banks. (Source: working with all three.)

------
lcargill99
Isn't the joke that if the sun _did_ just go nova, then a $50 bet is
meaningless?

~~~
thyrsus
Correct. The frequentist is being played for a double loser: the sun has not
gone supernova, and he's out $50 (p < 0.0000000...) or the sun has gone
supernova, and there will never be an opportunity to settle.

------
keithwinstein
Here is my writeup on the "real" difference between frequentist and Bayesian
methods: <http://stats.stackexchange.com/a/2287/1122>

Even more here: <http://qr.ae/17BEW>

The truth is they both make tradeoffs that can appear ridiculous. In fact, the
criticisms of confidence intervals and p-values apply almost exactly, in
transpose, to credibility intervals and posterior probabilities.

Confidence intervals and p-values are a worst-case technique. The p-value will
always control the false positive rate below alpha, even in the worst input.
Sometimes you do want this -- e.g. when we say that the worst-case runtime of
QuickSort is O(n^2), that's useful, even if we do have a prior distribution
over the inputs and could also say that the expected runtime is O(n log n).
But the errors are correlated across observations. You can have a valid "95%"
confidence interval that always produces total nonsense when the experiment
ends up with output x, as long as x happens <5% of the time for all possible
inputs.

Credibility intervals and posterior probabilities are an average-case
technique, where we integrate over the prior. Even if the prior is correct,
the errors are correlated across inputs, which can be a problem. In the
cookie-jar example at stackexchange, the 70% credibility interval is "wrong"
80% of the time when the jar is type B. That means if you send out 100
"Bayesian" robots to assess what type of jar you have, each robot sampling one
cookie, you will expect 80 of the robots to get the wrong answer, each having
>73% posterior probability in that wrong conclusion! That's a problem,
especially if you want most of the robots to agree on the right answer. The
two methods just make different tradeoffs in the way they quantify
uncertainty.

My quibble with the cartoon, though, was that it's not really about the
frequentist vs. Bayesian debate. If you want to decide whether to take action
(like shuttering a satellite) in response to a "YES" output from the
instrument, everybody will agree that you need to calculate {rate of events} *
{false negative rate} * {cost of false negative} and compare it with {1 - rate
of events} * {false positive rate} * {cost of false positive}.

The frequentist agrees with this math, the Bayesian agrees with this math, and
the math doesn't even use Bayes' rule. This is basic actuarial science or
decision theory.

The frequentist might do the mechanics in a certain way. They may say they are
first going to calculate a p-value, and then ask whether the p-value is less
than a threshold alpha, where alpha was set based on the costs and rate of
events in order to control the "false discovery rate." And then take action
only on a "significant" result.

The Bayesian might do the calculation a little differently too; they could say
they are first going to use the expected rate of events as a prior, then
calculate the conditional probability that there has been an event (given the
instrument's reading), and then multiply this posterior probability by the
cost of false negative, and its complement by the cost of false positive, to
decide which action has lower expected cost.

But both the frequentist and Bayesian will get the same answer and end up with
the same result as somebody who evaluates the inequality above directly. I
don't think any technique has a monopoly on the correct answer here.

~~~
jules
The posterior distribution is only half the story. A true Bayesian uses a
utility function to make decisions. How much you care about the worst case vs
the average case is in the utility function. That's exactly the problem with
frequentist methods: you're still making assumptions, but they are implicit
and hardcoded in your choice of method, instead of explicitly stated and
tweakable. With a particular choice of prior and utility function, you can
recover many frequentist methods, but in most cases those will not be the
prior and utility function you actually want. For example maximum likelihood
estimation corresponds to a utility function equal to the likelihood (i.e. the
probability mass), which at the very least should strike you as ridiculous for
continuous quantities (maximum likelihood can still be useful as an
_approximation_ technique if the problem is intractable with your actual
utility function). With a frequentist method, you _are_ using a prior, you
just don't know which one.

For some problems you might be able to get the correct decision in a very
roundabout way by setting your alpha to the right magic value, but (1) it's
not clear how to find the right alpha and (2) in general you cannot encode a
complete utility function into a single number.

~~~
keithwinstein
> With a frequentist method, you _are_ using a prior, you just don't know
> which one.

I don't think so. Look over the cookie-jar example. The confidence interval
guarantees _worst-case_ coverage at least equal to its confidence parameter,
for all values of the parameter. The credibility interval gives average-case
coverage, integrated over the prior.

The confidence interval gives guaranteed coverage for every value of the
parameter (conditioned on each possible input value). The credibility interval
includes enough mass in the conditional probability function, conditioned on
each possible output observable.

These are different mathematical objects and they do different things. The
confidence interval doesn't use a prior over input values; it is giving you
guaranteed coverage for _any_ input value.

Let me put it this way: if you think the frequentist method is using some
prior, what choice of prior will make the 70% credibility intervals in the
cookie-jar case be identical to the 70% confidence intervals?

Anybody can think about utility to make decisions; it's not unique to Bayesian
methods. Statisticians and engineers have been calculating ROC curves and
choosing operating points on the ROC frontier (based on cost/benefit analysis)
since World War II.

~~~
jules
I meant that in the context of making a decision. The point of statistics is
to make decisions. For example you want to know whether a medical treatment
works so that you can decide whether or not to give it to people. So you do a
hypothesis test to see whether the treatment works better than a placebo, and
then if the p-value is small enough you give it to people, and otherwise you
don't. Instead of explicitly separating the assumptions (prior & utility) from
the logical deduction, the assumptions are embedded in this procedure. Why
would the assumptions implicitly made by the choice of procedure be the
assumptions you want to make? You take the answer to a question that's
irrelevant to the decision, namely "given that the treatment doesn't work, how
likely is the data" and try to tweak a decision based on that. There is no
principled way to make decisions based on that information.

Credibility intervals are about average-case coverage, but Bayesian statistics
as a whole is definitely NOT just about average case. In general the utility
`U` is a function of the decision `d`, and of the posterior knowledge you have
of the world `P`. In many practical cases the utility might be the expected
profit: U(d,P) = integral(profit(x,d)P(x)dx). But it certainly doesn't have to
be. If you are risk averse you might choose your utility as U(d,P) = min_x
profit(x,d) to ensure that your utility is the minimum profit you make given a
decision, rather than the average. Another example is U(x,P) = P(x) which
gives maximum likelihood estimation. Making a decision based on a hypothesis
test can also be emulated with a utility function. Suppose the hypothesis is H
and we make decision d1 if p-value > alpha and d2 if p-value < alpha. We
choose a prior P(I) that makes each possible observed data set I equally
likely, and we choose the utility function that reverses Bayes' rule to
compute P(I|H) to make a decision based on that:

    
    
        U(d1,P') = [P'(H)*P(I)/P(H) > alpha]
        U(d2,P') = [P'(H)*P(I)/P(H) < alpha]
    

where brackets are indicator notation. Note that P(I) appears to access the
data set which the utility function does not have access to, but recall that
P(I) is constant regardless of the measured data. Of course not many people
have such a prior and utility function...so it doesn't really make sense to
hard code them into the method.

In general the process works like this. Given prior P and utility U and
measured information I:

1\. Compute posterior P' from prior P and information I according to Bayes'
rule. 2\. Perform decision argmax_d U(d,P').

Can you point me to a similarly principled approach to decision making based
on utility with frequentist methods?

~~~
keithwinstein
> Can you point me to a similarly principled approach to decision making based
> on utility with frequentist methods?

Sure, as I said anything involving ROC curves, where we pick an operating
point by trading off the cost of false positives vs. false negatives and a
design rate of incoming true positives and negatives.

~~~
jules
Can you give a mathematical recipe with assumptions and deductions? ROC curves
don't cut it, it's just twiddling of a parameter of a classifier. Is it
optimal in any sense? How do you know it's a good classifier _for making
decisions_ , when it's a classifier based on "given the hypothesis, how likely
is the data" and not the other way around? Is it generalizable to other
situations?

~~~
keithwinstein
Yes, given a design rate of true positives or negatives, and a cost for false
positives and false negatives, you can pick the optimal operating point. It
will be optimal in the sense of minimizing average cost when the incoming rate
equals the rate you designed for. You'll get the exact same answer as a
"Bayesian" who uses conditional probability to calculate the same thing and
whose prior equals the design rate. I gave a worked-out example in my original
post ("If you want to decide whether to take action...").

Sure, it is generalizable -- we use ROC curves for radar, medical imaging,
almost any diagnostic test...

~~~
jules
Sure, the problem is not which point on the ROC curve you pick, the problem is
which classifier you use to obtain it in the first place. I can pick a random
classifier with a tunable parameter and draw its ROC curve and then pick the
"optimal" point, but if the classifier sucks then that's no good. Why would a
frequentist classifier based on a hypothesis test be good? A hypothesis test
is the answer to the wrong question for the purposes of making a decision.

As I showed above, you can indeed get the same result from Bayesian decision
making if you use a weird prior and utility function, which shows that
frequentist decision making based on hypothesis tests is a subset (of measure
0) of Bayesian decision making. Again, that just means that you encoded a most
likely wrong prior and utility in the choice of method without any
justification.

------
zzzeek
Just that response seems to prove the cartoon's point.

~~~
jme3
I don't see how, really. Gelman himself is a (rather famous) Bayesian, and if
you read the comments, you'll see that Randall himself pops up and basically
cedes Gelman's point.

------
mannjani
Can't we all just take a cartoon AS a cartoon? I don't think Munroe is trying
to'smack down' frequentists here, it's all just for a little fun. Look at his
own comment on the post:
<http://andrewgelman.com/2012/11/16808/#comment-109366>

------
StavrosK
I don't understand how the two camps can exist. Don't the two methods produce
different results? Surely, only one is true. Which is it?

~~~
Symmetry
It's mostly a philosophical difference between thinking of probabilities as
measures of relative frequency versus thinking of probabilities as measures
about one's uncertainty about the outcome. There isn't so much a huge war
between them as there used to be, but if you want to read about the history of
that this was a book I enjoyed: [http://www.amazon.com/The-Theory-That-Would-
Not/dp/030016969...](http://www.amazon.com/The-Theory-That-Would-
Not/dp/0300169698/ref=tmm_hrd_title_0)

Being horribly biased in favor of the Bayesian interpretation ever since I
learned it was a thing I'll give an example of places that frequentists can be
wrong. People who disagree can give counterexamples. ;)

[http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...](http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/)

On the other hand, some argue that certain forms of inference are _invalid_
and that it doesn't matter if they give the correct answer or not in practice
because they're _invalid_. Calculus was attacked on this basis early on
because many mathematicians thought that taking the limit of something as it
approached 0 wasn't a thing you should be able to do.

~~~
StavrosK
Thanks, I'll read that post now. I'm actually going by Yudkowsky's post:

<http://lesswrong.com/lw/ul/my_bayesian_enlightenment/>

It might not have been the exact same one, it was one where he mentioned the
riddle and how his friend got the wrong answer because he was a frequentist.
That might be where most of my misconception arises from.

------
Evbn
This cartoon actually explains why the Higgs Boson and tachyon hunting folks
use p=0.000001 or so, not p=0.05. Extraordinary claims require extraordinary
evidence. Not a problem with frequentism at all.

