
For a non-expert, what's different between Bayesian and frequentist approaches? - nkurz
https://www.quora.com/For-a-non-expert-what-is-the-difference-between-Bayesian-and-frequentist-approaches/answer/Jason-Eisner?share=1
======
tel
Bayesian technique says that "probability" is a way of representing our
collective knowledge and opinions about some matter. To say that there's a 1/2
chance of heads is to say that to the best of our knowledge and understanding
we have no way to expect one outcome over the other. It lives in the present.

Frequentist technique says that "probability" is a way of describing
repetition in experimentation. To say there's a 1/2 chance of heads is to say
that should we flip it 100 times the best guess as to the number of heads we
might obtain is (1/2)(100). It lives in a counterfactual future.

This is a foundation of the debate. The two schools are actually trying to
accomplish different things. It turns out that much of the time these two
endeavors land on identical solutions. It also turns out that in human
experience we often flip between the two intentions without notice.

It's not possible to say one is better than the other unless you pick one
semantics or the other to argue from (after which, obviously the other is
poor). Instead, it's viable to look at what you're really trying to
accomplish—often a task where errors and computation have expense and a
tradeoff exists between accuracy, speed, information use, communicability, etc
etc—and understand how each approach helps or harms your end.

Good statisticians, usually no matter how much they punch for one side or the
other, will make these tradeoffs and use the technology which achieves the
best end.

But we're all also at least a little bit philosophers and epistemologists so
it's impossible to not want to take a few swings for one side or the other.
Especially with historical figures like Jaynes leading the way.

~~~
Retric
Bayesian logic is arguably less useful for other people.

ex: If I read someone train of Bayesian logic from 1940 it's almost useless to
me.

It can also be harder to combine information from multiple sources. AKA Study
A is used as a prior in B and C's studies, I need to only count A's impact
once.

However, it's arguably much better for making decisions.

~~~
tel
I'm not sure I understand your 1940s point, but generally Bayesians would like
to communicate (edit) likelihoods instead of posteriors so that they can be
combined like you ask.

Edit: autocorrect ¯\\_(ツ)_/¯

~~~
dllthomas
"Likelihoods"

~~~
tel
Thanks!

------
leephillips
I forget where I stole this from, but I like it:

Imagine that you have a coin, and you are told that it is not fair
(asymmetrical mass distribution). But you are not told in which direction it
is biased. If you flip it, what are the chances that it comes up heads?

A Bayesian would say the p(heads) = 1/2.

A frequentist would say that the only thing we can say about the p(heads) is
that p(heads) ≠ 1/2.

~~~
raverbashing
Statistics are funny. Both are right I'd say

If the chance of biasing it T>H is 50% and biasing the other way is also 50%
then _the average_ is 1/2 (in a Schroedinger's cat kind of way)

At the same time the frequentist is right. p(heads) _for the specific coin you
're given_ is not 1/2

~~~
philh
I'm not at all qualified to talk about this. Without further ado:

Bayesianism has the concept of an A_p distribution, which is roughly "the
probability that the probability is p".

My A_p distribution for this coin would be (almost) 50% on 0 and 50% on 1,
because I don't think it's possible to bias a coin except by making both faces
the same. If I didn't think that, it might be proportional to something like
(p-1/2)^2, because more extreme biases seem more likely.

In both cases, my probability of seeing a heads is 1/2, because that's the
mean of the A_p distribution. But after flipping the coin once, my A_p
distribution updates, and now I can give a probability of seeing the next toss
come up heads.

I haven't read the whole of it, but relevant chapter from Jaynes: [http://www-
biba.inrialpes.fr/Jaynes/cc18i.pdf](http://www-
biba.inrialpes.fr/Jaynes/cc18i.pdf)

~~~
bonoboTP
There's no need to introduce any special new concepts. It's simply a
hierarchical graphical model. You have one variable which stands for p
(distributed according to some prior, like a Beta distribution perhaps), and
you have another (perhaps multiple), Bernoulli distributed variable whose
parameter comes from the first variable and represents the actual coin value
that you get.

If you do multiple coin flips, you have more of these Bernoulli variables but
they all share a common parameter p.

One such example model is the Beta-Bernoulli process:
[http://www.math.uah.edu/stat/bernoulli/BetaBernoulli.html](http://www.math.uah.edu/stat/bernoulli/BetaBernoulli.html)

~~~
philh
> You have one variable which stands for p (distributed according to some
> prior, like a Beta distribution perhaps)

This prior sounds like the A_p distribution.

The point is it's a distribution, not a number. You can't just say "I think
the probability of seeing heads is _p_ ". But if you say "I think the
probability of seeing heads is distributed like this, which has a mean of _p_
", then that captures everything you need to know about the coin to update
your expectations when you see the results.

(In case it was unclear, A_p isn't a specific shape of distribution, or class
of shapes of distributions, like beta or gaussian. It's a distribution over
hypothesis-space, where the hypotheses are probabilities, and can have any
valid distribution shape.)

~~~
bonoboTP
I was suggesting that this is less "mysterious" than it seems to sound, once
you use a hierarchical model, where one random variable's outcome is used as
the parameter for another variable.

Imagine one random variable p at the top center of a page, and then several
variables under it, each connected by an arrow coming from p.

The random variables down there are the coin flips (each results in 0 or 1),
and they are identically and independently distributed with a Bernoulli(p)
distribution. And p itself is a random variable which has a distribution
(expressing how likely each p is).

I'm just clarifying that all this fits into normal probability theory, there
is no need to introduce any sort of "meta-probability theory".

------
jehanson
A frequentist is a person whose long-run ambition is to be wrong 5% of the
time.

A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a
donkey, strongly believes he has seen a mule.

[http://www.statisticalengineering.com/frequentists_and_bayes...](http://www.statisticalengineering.com/frequentists_and_bayesians.htm)

------
pella
xkcd : Frequentists vs. Bayesians :
[https://xkcd.com/1132/](https://xkcd.com/1132/)

~~~
pella
"What's wrong with XKCD's Frequentists vs. Bayesians comic?"

[http://stats.stackexchange.com/questions/43339/whats-
wrong-w...](http://stats.stackexchange.com/questions/43339/whats-wrong-with-
xkcds-frequentists-vs-bayesians-comic)

~~~
Scarblac
Intuitively the main problem is that the chance of the machine lying is
enormously higher than the chance of the sun going nova. The result of
combining them says essentially nothing about the latter anymore.

What's wrong is that it's a useless machine.

~~~
SideburnsOfDoom
I always thought that it was more about the difficulty of collecting winnings
on the bet in the "sun actually gone nova" case. That holds not matter what
the odds.

