
The Great Statistical Schism - Schiphol
http://quillette.com/2015/11/13/the-great-statistical-schism/
======
msandford
What's amusing is that Bayesian reasoning is what human beings do all the
time. Most of the -isms that people can do (racism, sexism, ageism, etc) have
to do with taking the prior (what already I know about other similar people)
and applying that knowledge to the new person. It's "too hard to understand"
and yet all kinds of people are doing it every day.

~~~
streptomycin
Frequentist: Here's a couple formulas that kind of make sense. Plug in your
data and go.

Bayesian: Here's Bayes' theorem, a single formula that makes perfect sense!
See, it's simpler, right? What, you want to use it? Something something beta
distribution something something Markov chain Monte Carlo something something
- okay, got it?

Good luck teaching that to undergrads. In the real world, intro stats classes
are taken mostly by people with little math background, no programming
background, little interest in statistics, and little innate talent for any of
those things. With the frequentist approach, at least they have a fucking
chance.

~~~
Turing_Machine
I agree that frequentist methods are more amenable to teaching students to
apply rote formulas that they do not understand.

I do not agree that this is a feature. Quite the contrary.

~~~
streptomycin
Whether it's a feature or not is subjective. And I think I mostly agree with
you - it's probably bad that we're encouraging the application of bad
statistics.

But the alternative isn't just replacing bad with good, it'd be more like
telling ~98% of the science majors in the country, "hey, you're too dumb to do
real statistics, so please just stop trying". I can't see that realistically
happening.

------
pdkl95
[http://jakevdp.github.io/blog/2014/03/11/frequentism-and-
bay...](http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-
a-practical-intro/)

I recommend this series of posts on frequentism and bayesianism. Of particular
interest is part 3, which has a very good explanation of why frequetism
shouldn't be used in science.

    
    
        Let me say it directly: you should be suspicious of the use of frequentist
        confidence intervals and p-values in science. In a scientific setting,
        confidence intervals, and closely-related p-values, provide the correct
        answer to the wrong question.

~~~
david_ar
Also read chapter 37 "Bayesian Inference and Sampling Theory" of
[http://www.inference.phy.cam.ac.uk/itila/book.html](http://www.inference.phy.cam.ac.uk/itila/book.html)

------
kriro
I think the author is treating the frequentists somewhat unfairly. A good
design involves both NHST and equivalence tests + discussion about
power/effect size (imo).

Random grab from my methods collection in my literature tool: "A Comparison Of
Equivalence Testing In Combination With Hypothesis Testing And Effect Sizes":
[http://digitalcommons.wayne.edu/jmasm/vol2/iss2/6/](http://digitalcommons.wayne.edu/jmasm/vol2/iss2/6/)

I think the biggest problem is that many fields just churn out people who can
click buttons in SPSS but couldn't really tell you what a p-value actually
represents. "Unfortunately" it's fairly easy to do the testing with tools.
From students, I often get the feeling that tests are conducted for the sake
of conduction a test (talking about BA theses etc.).

I'm not sure how wide spread Bayesian thinking is in other fields. Coming from
CS it is somewhat natural since it's pretty pervasive in AI/ML. I think our
econ students didn't get exposed to it at all (iirc). Would love some feedback
from other fields :)

------
cormullion
There's a good book on the subject, "The theory that would not die", which
covers the history and sheds some light on the people (Fisher, Laplace, Bayes,
Pearson) behind the names.

Reviewed:
[http://www.ams.org/notices/201205/rtx120500657p.pdf](http://www.ams.org/notices/201205/rtx120500657p.pdf)

------
imh
I really never understood the divide. They are all just tools in our toolbox.
It feels like arguing screwdrivers versus hammers. They may not do the same
thing, but both can be useful!

~~~
jonahx
The author specifically argues against this view as something that obscures
the superiority, in most cases, of the Bayesian approach:

 _One downside of this ecumenicalism is a reluctance to ask fundamental
questions: having a strong opinion on this matter has gone out of fashion.
Who’s to say one statistical philosophy is better than another? Aren’t all
statistical philosophies equally valid paths to good data analysis?
Frequentism is “true for me”. As in religion, so in statistics. If you
criticise a colleague for using p-values when posterior probabilities are
clearly more appropriate, it will lead to accusations of being a ‘zealot’ [2]
who should stop ‘crusading’_

and then later:

 _If Einstein’s theory were easier (as well as being more accurate), teaching
Newton’s would be silly. Yet that’s the way most statistics curricula are
structured. The only reason statisticians think frequentist ideas are easier
is that they are used to them_

~~~
imh
I guess I hate the idea of a "statistical philosophy." It's math for god's
sake. Given these axioms, these other things are true.

The point about teaching relativistic versus newtonian mechanics is totally
different because they're incompatible. One is more correct than the other,
which is not the case here. In the stats case, both tools give correct answers
to different questions.

Maybe I just get frustrated feeling like people want a data analysis silver
bullet to teach.

~~~
kgwgk
I'd say the analogy is more valid than you think. Classical (frequentist)
methods are valid as long as the results correspond to the relativistic
(Bayesian) ones.

~~~
lottin
As far as I know, Bayes came first and Fisher later, so some may say the
analogy is the other way round.

~~~
david_ar
That's true, but Bayesian methods were handicapped by a lack of computational
resources for quite a long time. Modern Bayesian methods, although
fundamentally the same, are in practice quite different today compared to a
century ago. Frequentist statistics gained popularity because it was more
tractable to apply to complex problems until computational technology caught
up.

Edit: one can argue that frequentist methods are generally approximations to
Bayesian methods, much like Newton vs Einstein

------
keithwinstein
I think these kinds of articles generally overstate the "schism." I wrote
about this here: [http://blog.keithw.org/2013/02/q-what-is-difference-
between-...](http://blog.keithw.org/2013/02/q-what-is-difference-between-
bayesian.html) and here: [http://qr.ae/70H3k7](http://qr.ae/70H3k7)

Three points:

(1) I couldn't follow the author's Bayesian analysis or what "standard
assumptions" were used. Does anybody know what is meant?

Here's a plausible Bayesian analysis using what are arguably standard
assumptions: "assume, a priori, that the success rate of the new drug was
drawn uniformly between 0 and 1. Given the outcome of 83/100 i.i.d. successes,
the probability that the new drug has a success rate worse than the old drug
is Integrate[PDF[BetaDistribution[84,18],x],{x,0,0.7}]. In other words,
p(worse|outcome,uniform prior) = 0.0018."

Here's a plausible classical analysis: "Before doing the experiment, let's
specify how to calculate the p-value at the end. For whatever outcome we get,
we'll calculate the one-sided probability that the old drug produces an
outcome that good or better. Now perform the experiment: our outcome is
83/100\. Per the method that we pre-specified, the probability that the old
drug would have produced an outcome that good or better is Sum[Binomial[100,i]
(0.7)^i (1-0.7)^(100-i), {i, 83, 100}]. In other words, p<0.0022."

Here's a plausible takeaway: "What do you know, the p-value from the classical
analysis and the posterior probability from the Bayesian analysis are almost
the same. There isn't much difference in this case, and contrary to a point
emphasized in the article, the posterior probability was actually slightly
smaller. The claim that p-values overstate the certainty of findings compared
with Bayesian methods is not supported and probably not true -- often the
accusation is the reverse, that p-values are too conservative!"

(2) The Bayesian analysis and the classical one are trying to achieve
different things. Speaking very generally, classical methods are about
designing an _experiment_ that has a cap on the rate of false positives, even
in the worst-case input (and then running that experiment). Depending on who
you are, this may or may not be what you really want to do. Bayesian methods
are (again very generally) about calculating the conditional probability of
some event, given a particular observation and well-stated prior assumptions.
Again, this may or may not be what you really want to do.

The difference is sort of like the difference between saying that the running
time of QuickSort is O(n^2) on adversarial input, compared with saying that
the running time is O(n lg n) in expectation, assuming the input order is
uniformly distributed. Both of these statements can be useful.

You don't have to pick a side and declare yourself a worstcaster or an
expecterian, any more than you have to call yourself a Bayesian or a
Frequentist. These are families of mathematically-sound techniques, not
religions.

(3) Ultimately, statistics doesn't _really_ matter until somebody starts
making a decision based on the results. And once you start putting a cost on
bad decisions and designing a decision rule to maximize utility (the domain of
decision theory), methods based on p-values and posterior probabilities end up
reaching THE SAME DECISIONS. This makes sense since there can really only be
one utility-maximizing decision theory.

This was understood in the early 1940s when they had to decide how to balance
the cost of mistakenly shooting down an Allied aircraft versus the cost of
mistakenly letting a Nazi aircraft off the hook. It was understood in the late
1940s when Shannon and others worked out the mathematical theory of
communication. These fields do not have squabbles about a Bayesian vs.
frequentist schism.

------
nonbel
>"For example, consider the ‘null’ hypothesis that the new drug has exactly
the same effectiveness as the old drug."

I'd encourage people to ask: Why should I consider that null hypothesis when
it sounds like a strawman? Wasn't the use of strawman arguments debunked over
1000 years ago?

~~~
andrewflnr
A strawman argument is a distorted version of someone else's position,
formulated to be easy to knock down for the purpose of appearing to win
debates. A null hypothesis is just a starting point for reasoning.

~~~
loup-vaillant
Interesting: how do you _chose_ that starting point in the first place?

Sounds like the null hypothesis is just as arbitrary than any Bayesian "prior"
(prior probability distribution, I mean).

~~~
evmar
The null hypothesis is the hypothesis that your control and experimental
groups behave roughly the same* . This is the baseline assumption when group
membership is randomly assigned. It is not at all arbitrary.

* see my below comment for elaboration on what "same" means

~~~
loup-vaillant
Okay, but, it's like _expecting_ the two groups to be the same, which is a bit
ridiculous. I'd rather just measure how much evidence the experiment provides
either way.

~~~
nonbel
Exactly, we do not care they are different (usually). We care in what way and
how much they are different. Then we care about what possible explanations
there may be for that difference.

I would say an exception is studies of ESP, etc where the existence of any
effect would be surprising. However, in practice, everyone just assumes
paranormal researchers messed up the experiment somehow. Small effects are not
taken seriously even if they are "significant" because there are many
explanations for such differences.

~~~
loup-vaillant
> _I would say an exception is studies of ESP_

Actually, E.T. Jaynes touched that topic a bit, and argued that it's not
really an exception: probability theory applies to this phenomenons (or lack
thereof) as usual. If I recall correctly, he reached the same conclusion as
you just did: if a study looks like it demonstrated an instance of ESP, it
probably went wrong.

