
Modern Science and the Bayesian-Frequentist Controversy (2005) [pdf] - ot
http://statweb.stanford.edu/~ckirby/brad/papers/2005NEWModernScience.pdf
======
tjradcliffe
One aspect of the Bayesian/Frequentist "controversy" (which amongst working
scientists isn't very controversial) is the supposedly subjective nature of
probability under the Bayesian system.

I think this is mostly a conceptual error on the part of Bayesians: we should
be treating the "degree of belief" in the Bayesian system as precisely that, a
degree of belief, and nothing more. It is not a "probability", which I'm happy
to leave with a frequentist definition.

That is, Bayes' theorem should read: p(h|e) = P(e|h)*p(h)/P(e)

where the lower-case "p"'s indicate plausibility (degree of belief) and the
upper case "P"'s indicate probability as measured by frequency of occurrence.

Probabilities under this system are completely objective. Plausibilities are--
naturally--subjective. As someone who advocates for an epistemology of a
knowing subject (that is: there is no knowing without a subject that knows,
and the central question of epistemology is how that subject knows) it is
entirely natural that the degree of belief in a proposition should be subject-
specific. After all, whose degree of belief is it?

This is not a failure of the Bayesian approach, but a virtue: it allows us to
successfully navigate the many traditional paradoxes of subject-free theories
of knowledge, which put "truth" outside of the realm of the knower and pursue
a false goal (certainty) much like pre-scientific alchemists pursued the false
goal of the philosopher's stone.

~~~
cscheid
Wait: the entire point of Bayes's idea was that under very sensible and
straightforward rules, "plausibilities" behave exactly like probabilities. So
why multiply concepts?

What is the difference, in your proposal, between "plausibilities" and
"probabilities"?

~~~
captainmuon
Well, one problem is that some people, including myself, object to assigning
probabilities to facts of nature. What is the probability that the mass of the
Higgs Boson is 126 GeV +- 0.1 GeV? That doesn't make sense, it is either 100%
or 0%, true or false, as the mass is a natural constant. It's like asking for
the probability that 3 is between 2 and 5.

Of course, you can make statements like P(m=126) sensible, by recognizing P as
degree of belief, or plausibility. These don't describe facts, but rather
knowledge, or your uncertainty. The wonderful thing is that they fulfill the
same axioms as ordinary probabilities, and thus you can use Bayes theorem and
so on with them.

Distinct from degrees of belief are probabilities (or frequencies in some
Bayesian literature). These do not describe your uncertainty, but are really
physical properties, or facts of reality, in disguise. If you have a fair die,
P(1) = P(2) = ... = P(6) = 1/6, the equality of the numbers means that all
faces have the same area (under the plausible assumption that the die has a
uniform density). The area of a face is (for small deviations from P=1/6)
proportional to the probability to land on that face. So a probability of P !=
1/6 means you have a deviation from the perfect cube form.

One reason I think it is very important to be aware of this distinction is
that people object to "Bayesianism" because most Bayesians don't make the
distinction. People do not reject Bayesianism because it's methods don't work,
they reject it because it seems philosophically unsound. If they would stop
calling plausibilities probabilities, a lot of discussion and confusion would
go away IMHO.

(Maybe we should forgo all the p words and call the Bayesian quantity just
"awesome score" or something. Less people would have problems with evaluating
an arbitrary formula that doesn't have all the philosophical and political
connotations that probabilities have.)

~~~
nkurz
_one problem is that some people, including myself, object to assigning
probabilities to facts of nature_

What do you feel you gain from this philosophical objection? Is there a
difference between a "fact of nature" and a one time event that has already
happened? Consider a coin flip that has happens a light second away. After the
flip, but before the result has reached you, much like a physical constant,
isn't it also 100% heads or tails but just unknown to you? Does it change from
being a probability to a plausibility at the time of the flip?

(asked with ignorance but genuine curiosity)

~~~
captainmuon
That's actually a good question. I don't know if you gain much from that
objection besides clarity. After all, both "kinds" of probabilities share the
same mathematics.

I think it's the other way around. People have certain intuitions, or
prejudices if you wish, and it's useful to have terms that match your
intuitions - even if they are slightly redundant. I find it helps me
understand and explain statistical problems better.

About the coin: I think you have to distinguish between one concrete flip of
the coin, and the coin itself. The coin itself has certain probabilities for
generating heads and tails, and they don't change.

A concrete flipping of the coin is a different situation. You might see the
coin, and know with near 100% certainty what it shows. Or you might have some
machine which detects whether it is heads or tails, and this machine has a
certain accuracy. When it signals heads, you might say it has a 95%
plausibility to really be heads. And before you measured the state of the
coin, you might say the plausibility is 50%, assuming the coin is fair. So the
plausibility does change at time of observation (or when you get new
information).

What about the time of the flip? Well, after the flip has occurred, this
concrete instance is either 100% heads or 100% tails of course. If I repeat
_this exact coin toss_ (which I can't do in reality), I will always get the
same result. And the probability to have heads, when you have heads, is 100%
(P(toss #1 is heads | toss #1 is heads) = 100%).

This statement is a bit silly, but I think it shows that the probability of a
thrown, lying coin to be heads or tails is not a very interesting quantity.
Our degree of belief, given certain facts, that it is heads or tails, is
interesting. And the probability of the coin in general, determined by its
geometry, is interesting. And especially interesting is how you get from one
to the other, which is where Bayesian inference, hypothesis testing, and so on
come in (The coin is known to be rigged with heads=99%. My flawed detector
says it is tails. What should I believe is the state of the coin, or what
should I bet on?).

Finally, I should note that you should take what I've said with a grain of
salt. I'm at best an armchair statistician, although I do get to think about
stuff like this a lot at work (as a physicist).

------
tedsanders
I don't think this paper teaches the difference between Bayesianism and
Frequentism (though in fairness to the author, that may not be the goal). The
paper says Bayesianism is 'aggressive' and 'subjective' while Frequentism is
'cautious' and aims for 'universally acceptable' conclusions. I wouldn't
characterize them that way. Frequentism is also subjective
([http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...](http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/))
and Bayesianism can be quite cautious. To me, the real difference in Bayesian
and Frequentist methods are the types of questions they aim to answer.
Bayesian methods try to answer the question of "What is the probability of the
model" given some observed data. Frequentist methods try to answer the
question of "What is the probability of the observed data" given an assumed
model. In this sense, they are somewhat complementary, which why many
practicing statisticians don't actually care about the Bayesianism-Frequentism
debate.

I am happy that Bayesian methods are becoming more popular, though, since the
questions we actually care about ("what is the probability of the model given
the data?") are often best suited by Bayesian methods.

Also, as a physicist I'm disappointed that the paper asserts that physicists
disdained statistics in 1903. For the most part, statistical mechanics was
developed in the late 1800s!
([https://en.wikipedia.org/wiki/Statistical_mechanics#History](https://en.wikipedia.org/wiki/Statistical_mechanics#History))

~~~
gwern
> The paper says Bayesianism is 'aggressive' and 'subjective' while
> Frequentism is 'cautious' and aims for 'universally acceptable' conclusions.
> I wouldn't characterize them that way.

Worst-case vs average-case seems like the relevant distinction here. (
[http://lesswrong.com/lw/f7t/beyond_bayesians_and_frequentist...](http://lesswrong.com/lw/f7t/beyond_bayesians_and_frequentists/)
[http://lesswrong.com/lw/jne/a_fervent_defense_of_frequentist...](http://lesswrong.com/lw/jne/a_fervent_defense_of_frequentist_statistics/)
)

> Frequentist methods try to answer the question of "What is the probability
> of the observed data" given an assumed model.

Isn't that more null-hypothesis testing...? Frequentism is broader.

~~~
tedsanders
I'm not an expert, but my opinion mostly comes from this seminar by Michael
Jordan: Are you a Bayesian or a Frequentist?
[http://videolectures.net/mlss09uk_jordan_bfway/](http://videolectures.net/mlss09uk_jordan_bfway/)

Around the 10-15 minute mark, he talks about how the two schools of thought
naturally arise from conditioning over the two arguments of the loss function,
either conditioning over the models or over the data.

(Incidentally, he also characterizes Bayesianism as optimistic and Frequentism
as pessimistic. So maybe I'm wrong in that regard above.)

------
3pt14159
I remember how weird I felt in first year university taking frequentist based
statistics. Unbeknownst to me, my statistics training in high school was
completely Bayesian and when the university prof started talking about how we
needed to sample a certain population set in order to come to an accurate
conclusion I felt very uneasy with the whole mess.

I know both systems are useful and reasonable in their own sense, and often
even equivalent, but I found the Bayesian way of thinking so much more
intuitive than the frequentist way.

~~~
icelancer
I've taught a fair number of people statistics, and whatever they learn first
is what is most intuitive - the vast majority of the time. It just so happens
to be that they learn frequentist statistics (or some derivation of them)
first.

------
cscheid
The paper is from 2005, so it might be worth saying that on the submission
title.

And it's also worth mentioning that because Gelman and Shalizi have recently
written a really nice paper about the philosophy of Bayes:
[http://www.stat.columbia.edu/~gelman/research/published/phil...](http://www.stat.columbia.edu/~gelman/research/published/philosophy.pdf)
(PDF) Gelman is a well-known "bayesian", and Shalizi is a well-known "not all
that is interesting is bayes"-ian.

------
graycat
Let's clean up the "controversy":

There's some pure math due to Lebesgue, Kolmogorov, and others that defines,
clearly, precisely, a 'probability space' along with trials, events,
probabilities, and random variables. It's a very nice box of math with a lot
of really nice theorems with rock solid proofs, e.g., the algebra of random
variables, approximation of random variables by other random variables,
convergence of random variables, independence, conditioning and conditional
independence, the central limit theorem, the weak and strong laws of large
numbers, ergodic theory, Brownian motion, martingales and the martingale
convergence theorem, Markov processes, and much more.

That math is clean and rock solid.

Does it work in practice? Often, yes. Much of the connection with practice is
easy to see. For much of the rest, keys are when we have independence or
conditional independence, and there are some surprisingly good, intuitive
approaches to these two. Next, it works. Now, call it done. 'Controversy' over
with.

For the math, see, e.g., J. Neveu, 'Mathematical Foundations of the Calculus
of Probability'.

'Frequentist'? That's essentially answered by the weak/strong laws of large
numbers. Bayesian? That's essentially conditional probability, with the
advanced approach via the Radon-Nikodym theorem with a famous proof by von
Neumann. Can also prove some theorems that say that a Bayesian start doesn't
have to hurt after a while.

------
murbard2
Frequentist statistics is so easy to use! Just put your raw data in a text
file, tar it together with your research paper, and put the tarball through a
hash function. If the first 7 bits are 0, then you can reject the null
hypothesis with p < 0.01.

~~~
murbard2
So this is obviously a joke, but I think it illustrates something important.

First off, it's important to realize that the above approach is technically
correct. The odds of getting 7 zeros in the hash of your research paper by
chance are indeed less than 1%.

However, this is obviously a useless test, its probability of rejecting the
null hypothesis _given that it is false_ is likely 1/128... We don't really
expect the phenomenom that we're measuring to produce good hash preimages. In
technical terms, the test has no power.

However, to even talk about the "power" of a test, one needs a model of what
the effect looks like. For instance, we expect the medicine will increase the
rate of cures, we expect education will increase job success, etc.

And yet, the entire freaking point of p-value testing is to maintain a
pretense that Popperianism makes sense. Validating a hypothesis? While my
simple minded fellow, that would be anathema! Don't you know that Popper
explained that you can only refute hypotheses! So we'll refute the null
hypothesis instead.

Fine, but if you do that, the hash-function method works. The reason it sounds
stupid is that Popperianism is wrong.

~~~
sa1
Can you explain the part where you are criticising Popperianism to me?(honest
question) From what I gather, you seem to be criticising logical positivism
more, but I only have a layman's idea of the philosophy involved.

~~~
eli_gottlieb
OH BOY I GET TO WRITE A RANT ON EPISTEMOLOGY OF SCIENCE AND INTERPRETATIONS OF
STATISTICS! _Eh-hem..._

Anyway, Popper believed that you could never really give affirmative support
to a hypothesis, only fail to falsify it. A good example of why he believed
this is that, after all, Newtonian gravitational mechanics passed hundreds of
experimental tests for centuries at a time, but was still not actually _as
true_ as Einstein's later Theory of General Relativity. Popper also held that
the actual process of coming up with theories required a Real Human Thinker
somewhere in the mix, and that the experimental procedure of forming
hypotheses, testing them, and converging to truth required the magic secret
sauce of Real Human Thought.

This combined very well with Fisher and Neyman-Pearson statistical testing,
under which one infers, "the conditional probability (likelihood) of my
evidence given absolute belief in my null hypothesis is very small", and thus
treats the null hypothesis as falsified in Popperian fashion. Of course, in
actual statistical testing, the critical values of the test statistic will
take a _particular_ alternate hypothesis (parameter value or distribution)
into account, and thus a very low likelihood of the evidence, given the null
hypothesis, is _taken_ (note the italics: this isn't _really_ a valid
probabilistic inference) as support for the built-in alternative hypothesis.

This is, of course, complete bunk: we've done no _quantitative_ reasoning
whatsoever about the _hypotheses_ themselves, null _or_ alternative. Which
leaves us with exactly the same problem in frequentist statistics as in
Popperian philosophy of science: there's a big box marked "And then the magic
of scientific thought happens, in the magical mind of an actual scientist!"
Well, we might ask, if the philosophy of science is supposed to guide
scientists as a genuine epistemological tool, _what does Popper advise that
the scientist should think and believe?_ And the answer is: conjecture things
and test them, _hoping_ that intuition and logic will guide the scientist to
conjecture things which later turn out easy to test but difficult to falsify,
this being taken as approximating eventual truth.

This gets rid of a problem that only exists in the academic study of
epistemology: the Problem of Induction. But look what a price we've paid to
sidestep it!

Scientific epistemology as Bayesian inference, on the other hand, has
considerably fewer such problems. The two jobs remaining for the scientist are
to invent coherent, testable hypotheses and to assign priors to them.
Testability is given a rigorous meaning: a hypothesis is testable when it
generates a non-uniform likelihood distribution over evidence. Priors are
grounded in subjectivity, which sounds dirty but only _really_ matters at
small sample sizes (where frequentist inference would be weak _anyway_ ).
Inductive inference is then given a rigorous meaning: model the set of
hypotheses as a set of mutually exclusive propositions yielding nonuniform
likelihoods over the evidence, collect data, and then use Bayes' Rule to move
information from your (assumed) belief in the data to your (malleable) belief
in the various hypotheses. This _is_ rigorous probabilistic inference, since
Bayes' Theorem is derivable directly from the definitions of conditional and
joint probability.

(Philosophically speaking, this is also how you can escape the trap of
trusting only in _a priori_ reasoning: Bayes' Theorem and Cox's Theorem give a
solid _a priori_ argument that you will always "lose at life" if you _don 't_
reason using evidence and probability theory, compared to someone who does,
_therefore you really, actually should and we 're not just making this up._)

And then we can do what /u/murbard2 is alluding to, and go for "hardcore mode"
on Bayesianism: Objective Informative Bayesianism (better referred to as
Algorithmic Bayesianism). This is the circumstance in which we explicitly
treat statistical inference as a way of moving information from belief in data
to belief in hypotheses (Bayesian) or from belief in hypotheses to belief in
data (frequentist), treat models/hypotheses as computational objects, and
treat probabilities as measuring belief in terms of _information_ (usually
measured in "decibels" if you're being casual or "bits" if you're a Real
Information Theorist). Or, in fact, this approach lets us dissolve the very
notion of belief: information/evidence becomes something that _weighs upon_ a
hypothesis space to locate the truth, like how mass weighs upon space-time to
create gravity. You can thus view the real physical system under examination
as emitting information when an experiment takes place, with some anti-
information (randomness) mucking it up slightly. Your inductive process starts
with a hypothesis space "weighed up" or "evened out" with anti-information
(ignorance) which then comes to reflect the real world as more and more space
is "weighed down" by evidence-information the world emitted. The primary
problem, then, is how to assign priors: how to decide _how much_ information
must "weigh down" a particular hypothesis before it becomes "heavy"
(believable with confidence).

Notably, when we phrase it this way, probabilistic phrasings of Occam's Razor
become intuitive to the point of obviousness: a simpler theory will have a
higher prior, meaning we need less information to make us confident in it.
Thus, given a set of theories which all explain exactly the same data exactly
as well (generate the same likelihoods for that data), and an assignment of
priors such that simpler theories have greater priors, the theory about which
we are most informed by the evidence, and in which we can be most confident,
must therefore be the simplest.

(Of course, you could assign perverse priors that order your theories from the
most complex to the simplest (in descending order of belief), but this just
means you will require more evidence to arrive to the same answers.)

Algorithmic information theory then gives ways to formalize Occam's Razor by
assigning prior probabilities to all possible Turing Machines based on their
algorithmic complexity. This isn't very useful in real life, but _does_
actually _solve_ the Problem of Induction and give a provably optimal way to
generate predictive probabilities for anything computable at all (ie: just
about anything).

~~~
sa1
Thank you, that was very informative. I have read about Bayesian inference in
epistemology before, but I need to study it in more detail.

But that doesn't answer my original question. The problem here being referred
to was people disproving the null hypothesis to indirectly validate the
alternative hypothesis, right?

Is that still Popperianism? Because as far as I understand it, Popper only
tells to keep trying to disprove the actual hypothesis/hypotheses.

Take the case of string theory, it may not ever be possible to have direct
validation of string theory, and it still turns out to be mathematically
useful. As long as there is no evidence that string theory is wrong, Popper
tells that people can still use it where it is mathematically useful.
According to you, I am guessing mathematically useful theories will have a
higher prior. Both these are in opposition to logical positivism, where people
hold that string theory isn't meaningful because it isn't validated.

Of course, I accept your statement that Popperianism isn't the correct
solution to the philosophical problem of how to guide scientists towards the
truth. As far as I understand it, its just a way of saying that things can
still be meaningful if it hasn't been validated yet.

But people misusing p-value testing as a way of confirming the actual
hypothesis without testing the actual hypothesis is just misusing Popper's
idea as an indirect way to have some validation for their actual hypothesis,
and that seemed(seems?) to be closer to logical positivism to me because of
how they feel compelled to have some validation to say that their statement is
meaningful.

~~~
eli_gottlieb
>But that doesn't answer my original question. The problem here being referred
to was people disproving the null hypothesis to indirectly validate the
alternative hypothesis, right? > >Is that still Popperianism? Because as far
as I understand it, Popper only tells to keep trying to disprove the actual
hypothesis/hypotheses.

I actually did answer, but it was kind of buried. _Yes_ , using p-values to
disprove the null hypothesis _is_ Popperianism, at least insofar as it holds
that you can't ever _really_ provide quantitative validation to your
alternative hypothesis and have to sort of wave your hands at its non-
falsification as if that meant something. It seems almost but not quite like
positivism because Popperians _desperately want_ to validate things but are
committed to an ideology telling them it's categorically impossible to do so.
So they _conjecture really fervently_ instead.

If this all sounds kinda stupid, well, I did give a whole rant on Bayesianism,
which _does_ let us quantify validation of hypotheses directly.

~~~
sa1
Well, even Bayesian inference effectively leads to a similar problem - We have
x% confidence in a theorem because it has y% simplicity. Does it take away the
need to conjecture new theories which may turn out to be simpler?

Thanks for continuing to answer. I am learning new stuff.

------
jmount
A fun example I worked out of a situation that is a bit of a problem for
frequentist inference (i.e. designed to be compatible with Bayesian
inference): [http://www.win-vector.com/blog/2014/07/frequenstist-
inferenc...](http://www.win-vector.com/blog/2014/07/frequenstist-inference-
only-seems-easy/)

------
pcvarmint
Lecture video (starts at 03:40):

[https://www.youtube.com/watch?v=DMnCFRtOITI](https://www.youtube.com/watch?v=DMnCFRtOITI)

Interview mentioned at introduction:

[https://projecteuclid.org/download/pdf_1/euclid.ss/106399498...](https://projecteuclid.org/download/pdf_1/euclid.ss/1063994981)

