
Statistical Thinking: My Journey from Frequentist to Bayesian Statistics - sebg
http://www.fharrell.com/2017/02/my-journey-from-frequentist-to-bayesian.html
======
nabla9
Bayesian vs. frequentist seems to have little to do with the underlying issue.

If clinical trialists use p-values wrong, how is moving to Bayesian methods
going to be less misused and misunderstood?

The real issue is the the established practice in the research field. It's
hard to introduce new methods if peer review is not familiar with them, or if
everyone in the field has problems with such basic concept as p-values.
Researchers have the tendency to apply statistical tools they have learned
mechanically and peer review accepts them mechanically. New tools need more
thought, not less.

~~~
stdbrouw
> If clinical trialists use p-values wrong, how is moving to Bayesian methods
> going to be less misused and misunderstood?

Bayesian methods have the advantage that, while they can still be misused and
misunderstood, at least when used correctly they tell us what we want to know
and are easy to interpret (a posterior probability is exactly what you think
it is) whereas p-values are hard to interpret correctly (a low p-value can
imply a large effect size, a large sample size, or in the face of publication
bias, lord knows what).

That said, I do think that the advantage to Bayesian thinking in research
would mostly not be about the methods but about the attitude: aside from a
couple of weirdos pushing Bayes factors, Bayesian statisticians almost
universally communicate results using credible intervals (HPD) instead of
dichotomizing the evidence into significant or not significant. Frequentist
confidence intervals will get you most of the way there and could completely
replace p-values, but if you're going to advocate for better statistics and
uproot established practices, might as well go all the way and encourage
better methods and better ways of communicating results at the same time.

~~~
jamii
[http://pcl.missouri.edu/sites/default/files/freeLunch_0.pdf](http://pcl.missouri.edu/sites/default/files/freeLunch_0.pdf)
has a neat example of why Bayes factors might be preferred over credible
intervals.

> By using posterior credible intervals, we might reject the null, but by
> using Bayes’ rule directly we see that this rejection is made prematurely as
> there is no decrease in the plausibility of the zero point.

~~~
stdbrouw
But that's the thing, there is just never any real need to do hypothesis
testing, the range of plausible effect sizes is all that matters.

~~~
conjectures
I see what you're saying and mostly agree, but...

This could totally come up if you have a non factorising multivariate
posterior distribution, and you want some sound way to summarise it when you
can't reason from the marginals alone.

------
pdkl95
"Confidence, Credibility, and why Frequentism and Science do not Mix"

[http://jakevdp.github.io/blog/2014/06/12/frequentism-and-
bay...](http://jakevdp.github.io/blog/2014/06/12/frequentism-and-
bayesianism-3-confidence-credibility/)

------
digitalzombie
Post talk about the con of Frequentist compare to Bayesian and end up talking
about Fisher likelihood school of thought.

Post argue that Likelihood is basically Bayes, it's actually Fisher school's
of thought which is a half way point or a can be seen as a compromise between
Frequentist and Bayesian approach.

A good book that talk about this which I've been meaning to read is... In All
Likelihood: Statistical Modelling and Inference Using Likelihood by Yudi
Pawitan.

I've read the first chapter and it's a very interesting read. I might skip
bayes and go straight to likelihood coming from the school of frequency.

~~~
stdbrouw
> A good book that talk about this which I've been meaning to read is... In
> All Likelihood: Statistical Modelling and Inference Using Likelihood by Yudi
> Pawitan.

It's good but it's mostly a technical introduction to statistical inference
(i.e. the mathematics that make statistics work), and the remarks about the
differences between different schools of statistics are mostly short asides.
It's not a great recommendation for non-statisticians who want more insight
into the different schools of statistics.

Also, both frequentist and Bayesian methods rely on likelihoods as an
intermediary, so I guess you could say it's a "compromise" but only in a very
uninteresting way.

The likelihood L(theta|data) is simply shorthand for P(data|theta), and to get
a posterior probability you then simply apply Bayes' theorem to get
P(theta|data) ~ P(data|theta) * P(theta), and this last part is your prior.

------
konschubert
Frequentist and Bayesian statistics are different tools that answer different
questions. Arguing whether one is better than the other is like arguing
whether one should drink water or eat bread.

I also think that a lot of the criticism against Frequentist inference comes
from people who have experienced trouble applying a clean Frequentist solution
in a more complex case and thus switched to Bayesian statistics as a
replacement that is easier to handle.

~~~
a_bonobo
To me, the popularity of frequentist approaches in the literature means that I
see many misapplications and erroneous applications to get that holy p<0.05.
Bayesian thinking is not that popular with these people so it's not 'tainted'
yet.

It's a bit like the popularity of Excel - we see many people complain about
Excel's automated changing of strings to dates, for example. If we'd all
switch to R to fix that problem everybody would complain about
stringsAsFactors=T instead.

~~~
digitalzombie
> misapplications and erroneous applications to get that holy p<0.05.

Or you can just force them to give confidence interval.

------
Xcelerate
Probability is a weird concept. It's much more philosophical than it appears
at first glance, but much less philosophical than something like the
interpretations of quantum mechanics.

Ultimately, what is a "probability"? It's a number that can be used for making
_predictions_ about the future. Neither Bayesian methods nor frequentist
methods are "ideal" in the sense of predicting (computing) the future — in
fact, results from algorithmic information theory put the best bounds on what
we can "hope to predict". Very loosely speaking, the best predictor is the
shortest program that reproduces the data. But this is uncomputable in
general, which means we use something like minimum message length (MML) or
minimum description length (MDL) — which are also sometimes uncomputable, but
a bit more manageable at least.

In some situations, Bayesian methods can be shown to be equivalent to MDL, in
the sense that sizeof(model) + sizeof(parameters) + sizeof(residual data) is a
log reformulation of Bayes theorem.

See this paper for more details:
[http://homepages.cwi.nl/~paulv/papers/mdlindbayeskolmcompl.p...](http://homepages.cwi.nl/~paulv/papers/mdlindbayeskolmcompl.pdf)

------
lottin
Wikipedia does IMO an excellent job explaining the notion of a confidence
interval. On the other hand, I find the idea of being "73% certain" that
something will happen much harder to understand. A percentage implies a ratio,
but Bayesians never explain what the numerator and denominator are.

~~~
contravariant
The interpretation of the probability is the same as in frequentist
statistics, except you're making statements about the model resulting from
your assumptions and data, instead of some hypothetical experiment. I suppose
the Bayesian approach is more about building the model whereas the frequentist
approach is more about selecting the best model out of several.

~~~
BeetleB
>The interpretation of the probability is the same as in frequentist
statistics

Not at all. Frequentists cannot define a probability on whether it will rain
in a location on a given day. They will respond that such a probability is
meaningless. Bayesians can, however, give a meaning to it.

~~~
contravariant
True, but the way a Bayesianist (?) will assign meaning to it involves
creating a model, based on some assumptions, which will return a probability.
The Bayesian notion of probability is equivalent to the frequentist notion of
probability for experiments done _on that model_. In that sense they are the
same.

------
baybayes
I should say that the Bayesian point of view is about thinking that all your
computed probabilities are only the result of partial information, more data
means changing those probabilities using the new information and Bayes'
Theorem.

In the Bayesian way of thinking you are always wondering if there is another
source of information that can enhance your probabilities. Like in the random
forest method, you try to gain new knowledge from many sources.

------
shurtler
"I really like penalized maximum likelihood estimation (which is really
empirical Bayes) but once we have a penalized model all of our frequentist
inferential framework fails us. No one can interpret a confidence interval for
a biased (shrunken; penalized) estimate."

Not sure I completely understand it, but good point.

------
eanzenberg
It's always funny seeing this articles where they do 'frequentist' analysis
wrong then decide Bayesian is the winner.

Yes you can use prior evidence. Yes you have to be careful of multiple
comparisons. Yes you have to account for reproducibility.

Oh and Bayesian has p-values too.

------
LanceH
It seems a fair percentage of stats people think frequentism is inferior.

------
lngnmn
What a mess. They literally produced chimeras and used them as a building
blocks and premises for another ones.

~~~
unit91
Stats amateur here. Care to explain?

~~~
lngnmn
Correlation, no mater how "strong" is not a causation. A causation must be
proved experimentally. Statistical sampling does not constitute a valid
experiment. Statistical inference is not an equivalent to a logical inference
and they cannot be used interchangeably. Unproven statistical models are no
better than astrological calculations or metaphysics.

------
gydfi
Bayesian statistics is a fairly sensible approach to certain types of
statistical problems, so I'm not sure why its proponents always seem to talk
about it as if they're pushing some new religious movement with a lot of
nudity, or Amway.

~~~
Ntrails
In my first stats lecture at university the lecturer informed us we would not
learn about Bayesian statistics and if we'd wanted to do that we should have
gone to _York_ or something.

~~~
gjm11
Why York in particular? I mean, was the subtext "York, which is our ancient
enemy" or "York, which everyone knows is a grossly inferior university" or
what?

~~~
kgwgk
Maybe related: [https://www.york.ac.uk/maths/staff/peter-
lee/](https://www.york.ac.uk/maths/staff/peter-lee/)

~~~
keithpeter
[http://www-users.york.ac.uk/~pml1/bayes/book.htm](http://www-
users.york.ac.uk/~pml1/bayes/book.htm)

Gives out pdfs of the exercises in his textbook on his home page. Thanks for
pointer.

------
zump
Does someone have an ELI5 for frequentist vs. bayesian?

~~~
tnone
Nobody seems to be capable of explaining this properly. It's like monad
tutorials, they explain what happens while mistakenly thinking they are
telling you why it happens. I keep trying to fit this idea into my head and I
can't because the information is not given.

\- Where did this difference come from? When did it develop?

\- What are the basic premises that a Bayesian believes that a frequentist
doesn't, and vice versa? Reason it all the way through front and back.

\- What does the B/F's model look like? What are the pieces they use, how are
they arranged, what are the dependencies, how does causality flow?

\- Why are the choices made by one invalid for the other's model? Where do
they agree deliberately despite this?

\- What are the consequences in the real world? Give me a real example on why
this difference matters? "Real" meaning I don't care about dice, I care about
engineering and science.

Instead you get some bullshit about fitting a distribution you don't
understand to a model you can't see, while relying on understanding the
nuances between words like probability and likelihood which is what you are
trying to learn in the first place. Plus I swear the numbers agree in 99% of
the "examples" given, with some handwaving "but it's different" to excuse it.

Fucking explanations, how do they work? Not in academia.

~~~
eli_gottlieb
Ooooookay. So, _very_ long story short...

They're two different academic _traditions_ for what constitutes Good
Statistics. They're _originally rooted_ in the _philosophical_ dispute over
whether to treat probabilities as frequencies of random outcomes
("frequentist") or as degrees of plausibility ("Bayesian").

In actual fact, a well-trained frequentist knows exactly how and when to use
Bayes' rule for gambling, and a well-trained Bayesian knows exactly how and
when to publish a paper with a p-value.

The _really important_ difference is over how a whole field expresses its
consensus or tradition about what constitutes strong evidence or a plausible
theory. A Bayesian would like researchers to elicit priors before experiments
(which express something like what _reviewers '_ expectations will be about
the experiment), and then calculate posterior distributions _after_
experiments. We could thus then trade off "weak" and "strong" experiments
against prior beliefs, while also reducing publication bias' pernicious effect
on statistical strength -- or so Bayesians claim. Bayesian methods are also
usually more computationally intensive and can make use of small sample sizes.

Frequentists had a lot of disagreements with that sort of thing, and so
Neyman-Pierce and Fisher and the like developed a whole lot of statistical
methods that don't rely on ever treating a probability as a belief. They
preferred to differentiate clearly between a frequency of experimental
outcomes, and what researchers think. They figured that Bayesian "priors" were
subjective, biased, and untrustworthy. Also, quite importantly, their methods
involved a lot less rote computation and instead made use of impressively
large experimental samples.

Depending on which tradition you were raised in, and which philosophers of
science you side with, you can argue until the end of the world about which
one's better. My advice? Use whatever your field demands you use to publish,
but be Bayesian on the inside.

~~~
BeetleB
I'm not a statistician, and have only studied frequentist statistics (I assume
that's the standard taught in introductory stats courses in school).

Like the person at the root of this thread, I have struggled with explanations
on why Bayesian is so great. The answers that worry me tend to be along the
lines of "Well, suppose you want the probability for event X (typically a
"one-off" event). Frequentist statistics cannot give you an answer (one-off
events have no distribution to speak of). But with Bayesian statistics, I can
compute a probability for it!"

Yes, but as someone else has pointed out, what the heck do you mean by
"probability"? Frequentist statistics is fairly clear on the definition. The
whole argument given above seems like he is happy he has some mechanism to get
an answer, with little thought about whether he is asking a meaningful
question.

Which is why your comment resonates with me:

>They preferred to differentiate clearly between a frequency of experimental
outcomes, and what researchers think. They figured that Bayesian "priors" were
subjective, biased, and untrustworthy.

I don't want an answer that's dependent on how the person thought. That
definitely comes across as subjective to me.

~~~
eli_gottlieb
>I don't want an answer that's dependent on how the person thought. That
definitely comes across as subjective to me.

Then I think you'll be somewhat disappointed when you learn more about
philosophy of science and the core debates over methodology. The biggest
problem is: _nothing_ is purely objective. _Everything_ involves assumptions
of some sort, otherwise we run head-on into the Problem of Induction, white
ravens, No Free Lunch Theorems (on the more machine-learny side), and other
such problems.

>Yes, but as someone else has pointed out, what the heck do you mean by
"probability"? Frequentist statistics is fairly clear on the definition. The
whole argument given above seems like he is happy he has some mechanism to get
an answer, with little thought about whether he is asking a meaningful
question.

I don't think frequentist statistics are very clear here at all! A p-value,
after all, is a likelihood, which frequentist statisticians _insist_ is _not_
a probability, but which the math clearly says _is_ a conditional probability.
So when you get a p<0.05 finding, it _never_ means, "We actually ran this
experiment under a control hypothesis N times, for some large N, and fewer
than five came out this way." It's a measure of counterfactual outcomes,
conditional on an assumption which we pretend to expect to be true. When the
p-value is small, we then pretend to be surprised, and pretend to make an
interesting inference.

I say "pretend" because an ordinary NHST is mathematically equivalent to a
Bayesian credible hypothesis test with a uniform prior over the hypotheses.
Performing the frequentist test involves _pretending to believe_ that uniform
prior, even though you probably _actually_ set up the experiment _in order_ to
obtain a significant p-value.

In the end, the NHST is a chiefly _social_ practice, and the p-value is
chiefly _social_ evidence. It's a way of convincing peer reviewers to accept
(that is, subjectively believe) that you did a real experiment, when they
would otherwise skeptically believe that you made it all up (which,
unfortunately, some researchers have been known to do!).

Bayesian methods don't get rid of this subjective, social component to science
and make everything "objective", any more than you can do that by hiring Mr.
Spock to do your statistics. Bayesian methods drag the subjective, social
component of prior elicitation out into the sunlight where everyone involved
has to acknowledge it. They also give you numbers that are actually about the
experiment you really did, as opposed to measuring your experiment against an
infinity of counterfactual experiments you never really performed.

(And also they're easier with small sample sizes, their results are more
intuitive to interpret, and generative models are more intuitive to think
about than test statistics.)

All that said, I totally have used frequentist statistics (took a very similar
class to yours) when called upon to do so. Fighting a philosophy-of-statistics
holy war against your higher-ups in the workplace hierarchy is a really bad
idea, so however nice Bayesian or frequentism might sound, sometimes you
buckle down and do what ships products and publishes papers.

~~~
BeetleB
Your criticism of p-value _usage_ is legitimate. However, this is not core to
frequentist statistics.

When I first encountered p-values, even with a frequentist mindset, I saw the
huge problem that one could have with them. Many frequentists do not like
p-values. I wouldn't be surprised if most actual frequentist statisticians
(not those in fields like medicine, psychology, etc) do not like p-value
usage.

Attacking p-values is not a valid argument against frequentist statistics.

I'll also add that it seems that many Bayesians are really dying for a
_number_ , and because frequentist stats doesn't give it to them, they reach
for another tool that will - but with little thought about the validity of the
tool. I'm not here to defend frequentist statistics, but just because it
doesn't give all the answers, that does not mean that some other tool that
does give some answers is correct.

It is equally abusable as p-values. I suppose if a Bayesian says he used
Bayesian approaches because it made sense given his problem, that's fine (and
in my mind, he is just being a statistician, not a Bayesian). The self-
identified Bayesians I always encounter don't fall into that mold. They fall
into the category of "Look what I can compute that I could not with
frequentist statistics" \- but any attempts I have to understand what that
number means fails - they cannot explain it either, beyond "this is how I
feel".

~~~
eli_gottlieb
I'm not really trying to make an argument against frequentist statistics and
for Bayesian ones. I'm more trying to point out what each style exposes (by
printing it in your papers) or conceals (by leaving it semi-consciously
understood from that one class in grad school).

