
Holes in Bayesian Statistics - myle
https://statmodeling.stat.columbia.edu/2020/02/23/holes-in-bayesian-statistics/
======
mbil
_...the inferential procedure of Bayesian statistics is to assume a prior
distribution and a probability model for data and then use probability theory
to determine the posterior. But if these steps, or something approximating
them, are necessary, if you can’t just look at your data and come up with a
subjective posterior distribution, then how is it reasonable to suppose that
you could able to come up with an unassailable subjective distribution before
seeing the data?_

Is the point to end up with an "unassailable subjective distribution"? I
believe the power of Bayesian thinking is that you can take a subjective
prior, which is necessarily assailable in its subjectivity, and then combine
it with data. The result is something that is better than either the
subjective prior alone or the likelihood estimator gleaned from data alone.

~~~
zozbot234
Yes, it's a bit ironic that the likelihood principle ("all the evidence drawn
from data is summarized by the likelihood function") is something that both
frequentist and Bayesian statisticians like to claim as their own. Frequentist
statistics is basically Bayesian statistics done with a "flat" prior
distribution (where "flat" can depend on how the model is parameterized,
however). Seen from this POV, it may be somewhat weird to claim that Bayesian
stats has "holes" in it.

~~~
steerablesafe
Yes, the only difference is that in Bayesian stats the biggest hole (priors)
is explicit and in your face. In frequentist stat that "flat" prior you
mention is implicit and flies under the radar.

------
astrophysician
Every time I see an article like this, I eagerly read it hoping to find a
cogent and coherent criticism of Bayesian stats, and it always ends up being a
straw man or a very fair critique of somebody not correctly applying or
interpreting an application of Bayes’ theorem.

Bayesian stats is nothing more than a rigorous way to transform beliefs + data
into a posterior. Yes, flat priors are not always the best choice. That’s not
a criticism of Bayesian stats. It’s a statement about how actually formulating
a prior is often times the hardest part of a problem. Is Bayesian stats useful
for describing or understanding QM? Idk, again, not really it’s job...

Use Bayesian stats, not with an air of suspicion, but a respect for the fact
that it will give you the results implied by your data and prior, under your
model assumptions. Nothing more, nothing less. By the way, what is the
alternative if you find yourself in a situation where your result depends
strongly on your prior and you aren’t really sure how to choose your prior?
Wave your hands and find an ad hoc frequentist approach? How about just
admitting to yourself that your data isn’t enough to make up for the fact that
you can’t really quantify your true prior belief?

If you read this and disagree, I sincerely implore you to comment — I just
don’t understand the “debate” aspect of Bayesian stats. Some people
misunderstand what Bayesian stats is sometimes (very understandable) but I
have yet to see a legitimate philosophical or mathematical critique of the
Bayesian approach that really made any sense to me. I would like to know if I
am wrong though...

~~~
steerablesafe
> That’s not a criticism of Bayesian stats. It’s a statement about how
> actually formulating a prior is often times the hardest part of a problem.

Bayesian statatistics doesn't give a satisfying answer/method for choosing
priors. Choosing priors is part of statistics. Therefore Bayesian statistics
has a "hole" in it.

~~~
08-15
E. T. Jaynes gave some very good answers (symmetry groups and maximum entropy
principle). You should read "Probability Theory: The Logic Of Science". In it,
you will learn that the supposed "hole" has been closed pretty nicely.

~~~
bjornsing
My favorite book of all time. :)

------
remarkEon
Upvoted this mostly because I love seeing these debates here on HN about
statistical methods. I don't do this kind of work in my day job,
unfortunately, but studied it in college and always look back to a certain
"fork in the road" moment that, had I adjusted my priors differently (heh),
would've very likely led me to an academic life instead of in business.

------
mikekchar
One of the comments at the site finally cracked a little bit of light for me
on Bell's theorem. To quote from the comment: "What his theorem says is that
the world can not simultaneously be _local_ and have hidden variables. His own
position was that it seemed exceedingly likely that the world was non-local
and that there were hidden variables (such as where is the photon at any given
time)."

I don't know why, but I kept imagining non-local scenarios and thinking, "This
seems like it should be OK, so I don't understand what's going on". Having it
spelled out is tremendously helpful. I still don't understand what's going on,
but at least I don't feel completely crazy ;-)

------
magoghm
From the the paper linked in the post: "A probability model is a tool for
learning, not a suicide pact."

------
blt
I'm turned off by the invocation of quantum mechanics here. Most applications
of Bayesian statistics have nothing to do with QM, so "it's not compatible
with QM" doesn't seem like a strong argument.

QM is the number-one favorite topic of crank scientists and pseudoscience.
When using it in the discussion of a seemingly-unrelated topic, authors should
take extra care to motivate why QM is relevant.

(Of course probability theory and QM are not unrelated topics, but probability
exists independently of QM.)

~~~
syntonym2
While reading the blog post/abstract I had the same thought, but the full
paper makes it clear that the author intends something different:

> The second challenge that the uncertainty principle poses for Bayesian
> statistics is that [...] we routinely treat the act of measurement as a
> direct application of conditional probability.

Furthermore it states that this problem might also arise for other
applications of Bayesian statistics:

> If classical probability theory needs to be generalizedto apply to quantum
> mechanics, then it makes us wonder if it should be generalized for
> applicationsin political science, economics, psychometrics, astronomy, and
> so forth. It’s not clear if there are any practical uses to this idea in
> statistics, outside of quantum physics. For example, would it make sense to
> use “two-slit-type” models in psychometrics, to capture the idea that asking
> one question affects the response to others?

------
throwawayjava
The linked paper is a very nice overview. Of course there problems are known
and there are people trying to fix all of the issues (mostly in the relative
obscurity of non-overhyped corners of academia), but the concise example-
guided description of these problems is great.

Somehow I think the most fundamentally damning critique, and causality shares
this problem, is also the most vague. That applied scientists/experimentalists
look at the "automation" that these approaches are supposed to enable and say
"that's either doing the trivial part of the job or giving you BS answers".

~~~
unishark
I've always felt bayesian statistics got more attention from researchers than
was warranted (including today despite deterministic methods taking over the
world) because it has a nice principled "theory of everything" starting point.
But then of course you have to approximate the heck out of it to be able to
solve it. Often far more than with other methods.

~~~
steerablesafe
You have to approximate the heck out of the Schrödinger equation as well,
otherwise we would be stuck describing the Hydrogen and maybe the Helium atoms
and nothing more.

~~~
unishark
Yes, however as I said in the subsequent sentence, the approximation is "often
far more than other methods". For the most obvious example, a point estimate
like MAP doesn't need to compute the denominator in Bayes law. That's two
(generally easier) terms to approximate rather than three. Those using
Bayesian methods point out the value in providing a full distribution, but the
necessary additional approximations to get it means the location of its
maximum can now actually be less accurate than a simple MAP estimate. But what
always bugged me is multivariate problems where the Bayesian paper presumes
everything is independent and Gaussian. Great after getting all psyched by
that intro talk about the value of getting a distribution, we get the simplest
imaginable one, just a mean and variance for each variable.

------
WhompingWindows
Seems like a weak post, very short on evidence and reasoning for such a
massive topic.

~~~
currymj
the first two words are a hyperlink to a much longer paper.

~~~
tgflynn
That definitely could have been made clearer. I looked all over the page for a
link to an actual paper and missed it. I never would have guessed it was the
link on what appeared to me to be an author's name.

