
Frequentism and Bayesianism: A Practical Introduction - atakan_gurkan
http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/
======
thebear
It is perhaps worth drawing attention to this sentence in the article: "Though
Bayes' theorem is where Bayesians get their name, it is not this law itself
that is controversial, but the Bayesian interpretation of probability implied
by the term P(F_true | D)." A widespread misunderstanding is that there is
something fundamentally Bayesian about Bayes' theorem, or even that
frequentists don't believe in it. It is rarely pointed out that this is not
the case, and we should thank the authors for doing so.

------
atakan_gurkan
The follow-ups are also well worth reading:

[http://jakevdp.github.io/blog/2014/06/06/frequentism-and-
bay...](http://jakevdp.github.io/blog/2014/06/06/frequentism-and-
bayesianism-2-when-results-differ/)

[http://jakevdp.github.io/blog/2014/06/12/frequentism-and-
bay...](http://jakevdp.github.io/blog/2014/06/12/frequentism-and-
bayesianism-3-confidence-credibility/)

[http://jakevdp.github.io/blog/2014/06/14/frequentism-and-
bay...](http://jakevdp.github.io/blog/2014/06/14/frequentism-and-
bayesianism-4-bayesian-in-python/)

~~~
nkurz
In case it's not clear from the beginning where he really stands on the
matter, in Part 3 he offers his opinion on the relative merits of both
approaches:

<spoiler>

The moral of the story is that frequentism and Science do not mix. Let me say
it directly: _you should be suspicious of the use of frequentist confidence
intervals and p-values in science_. In a scientific setting, confidence
intervals, and closely-related p-values, provide the correct answer to the
wrong question. In particular, if you ever find someone stating or implying
that a 95% confidence interval is 95% certain to contain a parameter of
interest, _do not trust their interpretation or their results_. If you happen
to be peer-reviewing the paper, _reject it_. Their data do not back-up their
conclusion.

</spoiler>

~~~
judk
Um, the folks at CERN don't agree with your assessment of frequentism in
science. Their papers are explicitly frequentist, they celebrate results based
on how many "sigmas" they get. P-value all the aym

~~~
bayesianhorse
I'm not a physicist, but as far as I know, CERN has access to tons of data
with every intention of drowning any prior belief. In this setting, I would
expect frequentist methods to shine, and bayesian method to be intractable.

However I would not expect CERN papers to make the kinds of terminological /
theoretical lapses about confidence the parent thread was talking about.
Papers should be rejected for that kind of error, even if you are not a
bayesian.

~~~
betatim
Exactly this. At CERN (and all of HEP elsewhere) the Bayes vs Freq wars were
fought many years ago and are long over. The conclusion: when you have lots of
data they converge (as they should!).

In the case that they differ you almost always find that you have very few
observations. I would argue that this 'difference' is not that exciting
because it must be dominated by your assumptions, not your observations. After
all once you accumulate enough observations the two methods tend to converge.

Personal conclusion: if the methods disagree work on getting more data instead
of fighting over which method is better.

------
lutusp
Readers should be aware that the linked article was composed entirely in the
IPython notebook environment, which means Python code blocks, Latex
renderings, and graphics, can all be freely mixed in a (to me) very nice,
readable article format.

[http://ipython.org/](http://ipython.org/)

~~~
bayesianhorse
And it literally can't be said enough times how awesome it is!

------
graycat
Here's how I relax, avoid both frequentism and bayesianism, and just love
probability:

I assume that there is a non-empty set, commonly called Omega, which I regard
as the set of all experimental 'trials' that I might observe. But, actually,
in all the history of everything in the universe, we see only one trial, only
one element of this set Omega.

Next, there is a non-empty collection, usually denoted by script F, of subsets
of Omega. I assume that set script F is contains Omega as an element and is
closed under relative complements and countable unions. By _relative
complements,_ suppose A is an element of script F. Then the _relative
complement_ of set A, maybe written A^c, is essentially set Omega - A, that
is, the set of all trials in Omega and not in A. Then set script F is a sigma-
algebra. Each set A in script F is an _event_. If our trial is in set A, then
we say that event A has _occurred._

Next there is a function P: script F --> [0, 1]. P assigns 0 to the empty set
(event) and is countably additive. Then function P is a _probability measure_.
So for each event A in script F, P(A) is a number in [0, 1] and is the
_probability_ of event A.

Now can define what is means for two events to be _independent_ and can
generalize to two sigma algebras being independent.

Next, on the set R of real numbers, I consider the _usual topology_ , that is,
the collection T of open subsets of R. Then I let set B, the _Borel sets,_ be
the smallest sigma algebra such that T is a subset of B.

Next I consider a function X: Omega --> R such that for each Borel set A,
X^{-1}(A) is an element of script F. Then X is a _random variable_.

Essentially anything that can have a numerical value we can regard as a random
variable.

Then we can state and prove the classic limit theorems -- central limit
theorem, weak and strong laws of large numbers, martingale convergence
theorem, law of the iterated logarithm, etc.

Now we are ready to do applied probability and statistics. And we have never
mentioned either frequentism or Bayesianism.

For more details, in an elegant presentation, see J. Neveu, _Mathematical
Foundations of the Calculus of Probability._

~~~
judk
OK, now try to make some claims about the real world. You will have to make
frequentist confidence or Bayesian credibility claims.

~~~
graycat
No, commonly the key to "some claims about the real world" is independence.
Then can apply, with meager assumptions, say, the weak law of large numbers
(which we do prove), that is, take an average. Also sometimes can make some
"claims' based on conditional independence (which my axioms give us the
ability to define) and apply, say, the martingale convergence theorem.

------
eli_gottlieb
Frequentists and Bayesians care about two different likelihood functions:

* Frequentists care about p(evidence | parameters), and interpret probability as a measure over subsets of the counterfactual set of repeated trials (usually independently identically distributed) produced by their model.

* Bayesians care about p(parameters | evidence), and interpret probability as "belief" or "propensity to bet". This is, of course, philosophically ridiculous, since they proceed to ground _rational_ belief _in_ Bayesian statistics. What they are really doing is exactly what their likelihood function says: taking a measure over subsets of the counterfactual set of possible worlds which could have produced their evidence.

The frequentists have the advantage of their methods being more
computationally tractable. The Bayesians have the advantages of intuitiveness
and of yielding more accurate inferences from the same limited data-sets. Pick
the tool you need and remember what you're taking a measure over!

------
jmount
Nice article. In this direction my group has been trying to help teach that
you tend to need to be familiar with both frequentist and Bayesian thought
(you can't always choose one or the other: [http://www.win-
vector.com/blog/2013/05/bayesian-and-frequent...](http://www.win-
vector.com/blog/2013/05/bayesian-and-frequentist-approaches-ask-the-right-
question/) ) and that Bayesianism only appears to be the more complicated of
the two ( [http://www.win-vector.com/blog/2014/07/frequenstist-
inferenc...](http://www.win-vector.com/blog/2014/07/frequenstist-inference-
only-seems-easy/) ).

------
afafsd
I don't understand why Bayesian statistics needs to be an "-ism", and still
less why other statistics needs to be an "-ism" too. I don't understand why
people feel the need to line up on one side or the other or get so worked up
about it. Other branches of mathematics seem to avoid this kind of thing, they
have no problem with the idea that there's different ways of doing the same
thing.

It actually discourages me from learning more about Bayesian statistics,
because the whole thing sometimes comes off as a cult.

~~~
lutusp
> I don't understand why Bayesian statistics needs to be an "-ism", and still
> less why other statistics needs to be an "-ism" too. I don't understand why
> people feel the need to line up on one side or the other or get so worked up
> about it.

Don't get hung up on the terminology, instead pay attention to the ideas
behind the terms. The Frequentist and Bayesian approaches are very different,
and produce different outcomes, so they deserve to be understood and their
differences sorted out.

For example, and without providing all the technical details, using a
Frequentist analysis a decision was made to recommend breast cancer screening
x-rays for women over a given age. Later, after serious problems arose, a
Bayesian approach showed that the false positive rate was 5 to 1 (5 false
positives to 1 real cancer detection), which meant many more people were given
the false news that they had cancer than those who actually did.

[https://www.princeton.edu/~achaney/tmve/wiki100k/docs/Bayes_...](https://www.princeton.edu/~achaney/tmve/wiki100k/docs/Bayes__theorem.html)

This is not to exalt the Bayesian approach over the Frequentist, because both
have their place, it is only to show how dramatic the difference can be.

~~~
pessimizer
It seems like you're confusing Bayes' Theorem with Bayesian statistics. Bayes
probably wasn't a Bayesian, and everybody uses Bayes' theorem.

~~~
lutusp
> It seems like you're confusing Bayes' Theorem with Bayesian statistics.

Yes, a tempest in a teapot. Anyone caught using Bayes' name without qualifying
the use is placed in the same position as someone referring to the Victorian
Era without mentioning that Victoria wasn't a Victorian. In most cases, it's
not worth the digression.

------
Tarrosion
What is the theoretical justification for taking a completely flat prior? "If
we set the prior P(Ftrue)∝1 (a flat prior),"

There's no probability distribution which is constant over the whole real
line. Is the idea that we can pick a distribution which is constant over an
arbitrarily large (but finite) interval around the observed data, and so in
practice, we may get results arbitrarily close to those given?

~~~
jmount
I'd say the justification is two parts. First is the Bernstein–von Mises
theorem (priors don't matter once you have enough data, as long as you didn't
violate Cromwell’s rule by using zeros). The second part is: improper priors
are considered okay- as long as you check the posterior corresponds to a
sensible distribution.

~~~
Houshalter
If I understand correctly, assuming all real numbers have equal prior
probability, then no matter how much data you gather, it's still infinitely
unlikely that the true value will exist within any finite range. E.g. the
probability of the true value being between -100! and +100! is 0. If you draw
from the distribution you will always draw "infinities".

~~~
jmount
That is why you have to check. If the P(data|param) is concentrated, then even
for uniform priors on the real like you can have P(param|data) is proportional
to a sensible distribution. But you have to check for specific model and data
(say P(data|param) = c e^{(data-param)^2}), it isn't enough to verbally work
through infinities.

------
droob
"37 Ways to More Accurately Read the Bones You're Casting to Predict the
Harvest"

------
adrianbg
Does anyone know how to make the formulas render properly? Even using the
iPython notebook viewer hasn't helped.

~~~
walrus
In your JavaScript console, run:

    
    
      var s = document.createElement('script');
      s.src = 'https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML';
      document.body.appendChild(s);
    

The page is pointing to the old MathJax CDN, which was decommissioned on July
31: [http://www.mathjax.org/changes-to-the-mathjax-
cdn/](http://www.mathjax.org/changes-to-the-mathjax-cdn/)

