
What Is Bayesian/Frequentist Inference? - vectorbunny
https://normaldeviate.wordpress.com/2012/11/17/what-is-bayesianfrequentist-inference/
======
Eliezer
I don't think this guy understands the debate. A quick summary:

If you think statistics is a big toolbox, some of the tools give different
answers that are better or worse in various ways, and you can just take out
whatever tool you like, you're a frequentist.

If you think that there's such a thing as a correct probability estimate, and
all coherent reasoning is required to come up with consistent answers
regardless of which different path was taken to arrive at the same
destination, you're a Bayesian. From this perspective, a "confidence interval"
isn't a tool that's useful on some occasions, it's just plain crazy and wrong,
like a weather forecaster who only tells you the probability that it's raining
here _xor_ in Narnia. Sure, the forecast is generated by a process that's
sorta related to the correct answer, but by manipulating the imaginary land of
Narnia you can make the forecast be basically anything. With Bayesianism there
are no degrees of freedom in the likelihood ratio you report. See
<http://xkcd.com/1132/>.

It doesn't do any good to appeal to the idea that Bayesian methods are just
one tool in the toolbox. Only frequentists think in terms of toolboxes in the
first place.

Also Bayes's Rule is tautologically equivalent to Bayes's Theorem. There's
more wrong, but meanwhile, color me unimpressed.

~~~
keithwinstein
I wouldn't want to go up against Eliezer Yudkowsky casually, but here goes:
the guy is basically correct. (Although I also didn't follow his statement
that Bayes Theorem != Bayes Rule.)

You can see my essay here, where I express a similar view:
[http://www.quora.com/What-is-the-difference-between-
Bayesian...](http://www.quora.com/What-is-the-difference-between-Bayesian-and-
frequentist-statisticians)

Confidence intervals and credibility intervals are both mathematical objects
that have well-specified (and different) properties. Confidence intervals are
a worst-case technique and posterior probabilities are a sort of average-case
technique. It's not "wrong" to say that the worst-case runtime of QuickSort is
O(n^2) and it's not wrong to say that, given a uniform probability
distribution over inputs, the expected runtime is O(n log n).

Which statement is useful to make depends on your requirements. They're both
true.

In my "100 independent robots" example, for instance, the credibility interval
or posterior probability produces answers that are not helpful for the
application (and not particularly intuitive either -- the 70% credibility
interval is "wrong" 80% of the time, given a certain value of the parameter).
This can be a question of engineering and there's no need to be dogmatic about
it.

It is perverse and incorrect to say that "Only frequentists think in terms of
toolboxes"! All mathematicians and engineers have access to the whole world of
theorems and algorithms and techniques, all of them true, or meeting their
specifications. No mathematician would argue that the Chinese remainder
theorem is wrong because they are a Galois theorist! And no practitioner of
Bayesian methods should argue that confidence intervals that meet their worst-
case coverage guarantee are "wrong" because the person uses posterior
probabilities.

(My quibble here is with dogmatism, not with Bayesians, because the dogmatic
frequentists are just as bad. They just don't hang out on Hacker News.)

~~~
gjm11
I would hesitate a great deal before entering an argument with either Larry
Wasserman _or_ Eliezer Yudkowsky, but here goes.

You're right that if, for whatever reason, someone is fascinated by "coverage"
then confidence intervals will answer their questions better than Bayesian
posteriors. But I think Eliezer's right that there's something _very wrong_
with thinking that "coverage" in this sense is what matters.

Let's consider your example again. In what circumstances is the following
actually a useful problem to solve? "Given an observation of one thing from a
box, tell me a set of box-types in such a way that for each box-type you'll
choose a set including the right one at least 70% of the time."

I can think of some. For example: a mad scientist starts sending you boxes,
with instructions to start guessing; he's going to monitor your results on
each box-type and if he sees you getting any type of box wrong more than about
30% of the time he'll kill you. Otherwise he'll reward you for nominating
fewer box-types each time. But (1) that's a desperately contrived situation
and (2) the most diehard Bayesian, in that situation, will produce something
like "confidence intervals" because that's what Bayesian decision theory says
to do.

Is there any not-so-contrived situation where the problem solved by confidence
intervals is actually an important one?

By the way, my best guess about the Rule / Theorem thing is that he's
distinguishing between a theorem about conditional probabilities, and a
normative rule saying "when you get new information, update your beliefs like
so".

~~~
keithwinstein
> the most diehard Bayesian, in that situation, will produce something like
> "confidence intervals" because that's what Bayesian decision theory says to
> do.

I disagree that this is what "Bayesian" decision theory says to do. It's what
decision theory says to do, and it's what math says to do, and it's what the
constraints require. It's not particularly "Bayesian" -- it's just what you
have to do.

If everything that happens to be the correct answer (including frequentist
confidence intervals when called for) is now described as Bayesian, then the
term has no meaning and we are all Bayesians. :-)

> Is there any not-so-contrived situation where the problem solved by
> confidence intervals is actually an important one?

What would you do in the case of my 100 robots, where you want 70 of them to
come to the correct decision, and they have to make their decisions
independently? Having them all calculate a posterior independently works
terribly (as I showed, 80% of them come to the wrong conclusion with >73%
belief). Confidence intervals work a heck of a lot better.

The optimal approach would consider what single algorithm works best when run
independently. Finding these solutions (on, e.g., a decentralized POMDP) is an
open problem.

~~~
gjm11
I called it Bayesian because it describes what Bayesians will do. It does
indeed also describe what anyone else sane will do. The point is that in order
to make confidence intervals the right answer you need a situation weird
enough to make even Bayesians use confidence intervals.

In the case of your 100 robots, _why_ am I supposed to want 70 of them to come
to the correct decision? This seems just like my mad-scientist example:
contrived to force confidence intervals (or something very like them) to be
the right answer. Can you explain in what sort of situation this would be a
sensible thing to care about?

------
ced
_The Goal of Bayesian Inference: Quantify and manipulate your degrees of
beliefs. In other words, Bayesian inference is the Analysis of Beliefs._

Bayesian inference is no more about beliefs than logic is (or any scientific
inference, really). "M(G) AND C(M) => C(G)" can be rendered as "If you believe
that glass is a metal, and you believe that metals are good conductors, then
you should also believe that glass is a good conductor". Scientists omit the
"if you believe" out of conciseness.

Some subjective Bayesians will tell you that their job is to produce the
above. Then they're done. "You said you believe that glass is a metal, so I
put that into my Bayesian inference procedure, and it says that you should
also believe that glass is a good conductor."

But this is not what science is about! Obviously, "glass is a conductor"
strongly contradicts empirical data. We have to challenge every assumption,
and possibly change models!

This is why smart Bayesians check the fit of their model, and I would strongly
recommend Gelman's Induction and Deduction in Bayesian Data Analysis to any
statistician interested in that perspective. It places Bayesianism squarely in
the paradigm of traditional scientific analysis.

<http://www.rmm-journal.de/downloads/Article_Gelman.pdf>

~~~
loup-vaillant
"Metal" only refer to a set of kinds of matter. A set _we_ shaped because it
helps us make useful inferences without using too much brain power. Like the
fast rules: "Most metals are good conductors", "Most metals are strong", "Most
metals are hard", "Most metals are heavy".

Then someone comes and shows you that new material called "glass" that is
heavy, hard, and strong (this one is bullet proof). You'd be quick to infer
that it is a metal, and therefore probably a good conductor. But didn't we
tell you that most metals are opaque?

[http://lesswrong.com/lw/no/how_an_algorithm_feels_from_insid...](http://lesswrong.com/lw/no/how_an_algorithm_feels_from_inside/)

------
mjn
I agree with much of this, and tend to be fairly ecumenical/pragmatic in my
own choice of tools, but there are two things that lead to the "identity
statistics" that are only briefly covered here, I think.

One is the entire philosophical debate, e.g. at least some Bayesians think
arguments against the coherence of frequentist statistics are damning enough
to make it questionable whether the methods should be considered rigorous
statistics at all (admittedly this is basically the hardline view) [1].

The other is that it's not always agreed when it's appropriate to look for
coverage versus to analyze beliefs, partly due to the philosophical debate,
and partly because often what you ultimately want is a _decision_ , and there
are arguments for whether you should base decisions on frequentist-coverage
machinery, or on belief-update machinery. For example, to move slightly afield
from bounding a parameter, let's say we want an estimate of the region in
which bombs are likely to fall. This can be formulated in frequentist
statistics as a tolerance interval, with two decision thresholds, one for how
many bombs we want to bound, and one for how confident we want to be in the
bound: we want an interval that includes at least x% of the population with y%
confidence, e.g. that with 99% confidence we'll bound 99% of bombs [2]. On the
other hand, it can be formulated as a question about belief: essentially, we
want to find the range in which we believe (for some suitably conservative
definition of belief) we are going to find falling bombs, which Bayesian
predictive statistics looks at.

[1] One famous/infamous such argument:
<http://en.wikipedia.org/wiki/Likelihood_principle>

[2] I wrote a bit on why tolerance intervals should really be a more prominent
part of the frequentist toolbox:
<http://www.kmjn.org/notes/tolerance_intervals.html>

~~~
shardling
>let's say we want an estimate of the region in which bombs are likely to
fall.

That seems like fundamentally the wrong sort of question. You never actually
care directly about something like -- the point of statistics is to inform
some decision-making process. And it almost always becomes more obvious how to
proceed when you keep in mind what you're actually using the stats for.

~~~
sillysaurus
_the point of statistics is to inform some decision-making process._

An estimate of the region in which bombs are likely to fall would directly
inform your decision-making process ("Don't go over there!") so I don't
understand your objection.

~~~
lisper
I don't know what shardling intended, but here's my take on it: the kind of
estimate you want depends on the question you want to answer. I live in
California, so the analysis I'm doing on the rockets coming out of Gaza is
likely very different from the one being performed by the people living in Tel
Aviv. And both of those analyses are different from the ones being performed
by Hamas. So there is no such thing as a reliable "estimate of the region in
which bombs are likely to fall" independent of the particular question you
want to answer.

Another example, from the original article:

"a weather forecaster is good if it rains 95 percent of the times he says
there is a 95 percent chance of rain"

It's not clear whether or not there's something special about the number 95,
or whether the intent is that a forecaster is good if it rains X% of the time
he says there's an X% chance of rain for any X. So consider a 100 day period
during which it rains 10 days, and a forecaster who every day predicts a 10%
chance of rain. Is that a "good" forecast? If you're a farmer, it might be. If
you're planning a picnic, not so much.

------
madhadron
_sigh_ Here we go again. Inferential statistics is unified by a field called
decision theory, which is the mathematical formulation of how you choose a
"good" mapping from the set of possible outcomes from your experiments to a
set of possible decisions.

Bayesian and frequentist are interpretations of probability theory, nor are
they the only ones (nor are all interpretations even concerned with a
formalization of a notion of "chance"). They are not necessary to statistics.

~~~
loup-vaillant
You mean _causal_ decision theory?

 _Omega comes to you and presents two boxes. One is transparent and contains
$1000. The other is opaque. Then Omega says "I give you 2 choices: either you
take the two boxes, or you take only the opaque one. I have studied your
brain, and have predicted your choice. If I have predicted that you will take
only the opaque box, I have put $1M in it. If I have predicted you will take
both boxes, I put nothing in it." Note that when Omega comes to you, the
content of the opaque box is already fixed. So. What do you chose?_

Decision theory is not solved yet.

------
Evbn
<http://en.m.wikipedia.org/wiki/Bayes_theorem#Bayes.27s_rule>

What is the difference between Bayes Rule and Bayes Theorem?

~~~
dllthomas
Also from that Wikipedia article, "The application of Bayes's theorem to
update beliefs is called Bayesian inference."

While this guy says,

"Bayesian Inference {\neq} Using Bayes Theorem"

If this guy is correct, he should update the wiki...

~~~
ced
Bayes' theorem and Bayes' rule are essentially the same equation. Most people
will use the two interchangeably.

Bayesianism is a perspective on how to do modelling under uncertainty. It
doesn't reduce to "use Bayes' theorem", even though all Bayesian inference
will do that in some fashion.

~~~
dllthomas
Hm, I guess the point of contention would be the "to update beliefs"?

