
What is “Bayesian” Statistical Inference? - fogus
http://lingpipe-blog.com/2009/09/09/what-is-bayesian-statistical-inference/
======
btilly
The fundamental problem is simple. When people do statistics the answer they
want is, "How likely is it that X is true?" The difficulty is that the problem
is ill posed, you lack sufficient information to answer that question.

Classical statistics replaces the question with one that can be answered.
Namely, "How likely would this result be if the null hypothesis were true?"
This has several difficulties. The most noticeable one is that, no matter how
much the professor tells them not to, people replace the question actually
answered with the question that they want to answer. This mistake has been
made by anyone who says, "We confirmed the null hypothesis..."

Bayesian statistics confronts the problem head on by pointing out that the
conclusion you should draw depends on the prior beliefs you start with. And
then they present complicated graphs that show you how your prior affects your
conclusion for some reasonable family of priors. This approach avoids
misrepresenting the question at the cost of presenting your answer in a
complicated way.

My feeling is that as long as the general scientific public remains
unconvinced that the classical hypothesis testing approach leads to wrong
results, simplicity will win. (And will continue to be misunderstood.)

------
pmichaud
Too dense, man. Only people who already get it will get it. People who don't
get it, will still not get it after reading.

Try to explain it to your 5 year old daughter, or your 80 year old
grandmother.

~~~
jackdawjack
Seems pretty clear to me, it doesn't claim to be an article for the non
mathematically inclined. Not that "pop" articles on this subject wouldn't be
pretty cool too.

~~~
hughprime
Well, I'm pretty mathematically inclined, but ignorant of Bayesian statistics,
and I still didn't really get it. Frinstance the first full-length paragraph:

 _The full Bayesian probability model includes the unobserved parameters. The
marginal distribution over parameters is known as the “prior” parameter
distribution, as it may be computed without reference to observable data. The
conditional distribution over parameters given observed data is known as the
“posterior” parameter distribution._

uses too much jargon; I'm sure I'd understand it if he'd defined "marginal
distribution" and "conditional distribution" and clarified exactly what the
difference between observable and unobservable data and/or parameters is. The
hypothetical audience for this seems to be people who are intimately familiar
with statistical terminology but know absolutely nothing about Bayesian
statistics.

~~~
jibiki
I think those concepts are best understood by example.

------
tokenadult
I like Eliezer's page on this much better:

<http://yudkowsky.net/rational/bayes>

It has been discussed on HN before:

<http://news.ycombinator.com/item?id=376631>

~~~
bob_carpenter
Eliezer Yadkowsky's page is a nice intro to Bayes's theorem, the understanding
of which is critical for understanding why the posterior is proportional to
the prior times the sampling distribution.

But Bayesian stats isn't just about applying Bayes's theorem. Its key feature
is using probabilities for model parameters and incorporating posterior
uncertainty of their values into inference.

------
bob_carpenter
Here's what I actually said in response to all this on the original blog,
which showed up as a huge spike in our traffic!

There was a sudden spike in traffic, and it turns out it comes from Y
Combinator Hacker News, where there's a discussion of this post with seven
comments as of today.

The criticisms were sound -- it's too technical (i.e. jargon filled) for
someone to understand who doesn't already get it. Ironically, I've been
telling Andrew Gelman that about his Bayesian Data Analysis book for years.

Unix man pages are the usual exemplar of doc that only works if you mostly
know the answer. They're great once you already understand something, but
terrible for learning.

I think Andrew's BDA is that way -- it's clear, concise and it actually does
explain everything from first principles. And there are lots of examples. So
why is this so hard to understand?

I usually write with my earlier self in mind as an audience. Sorry for not
targeting a far-enough back version of myself this time! The jargon should be
familiar to anyone who's taken math stats. I don't think it'd have helped if
I'd have defined the sum for the prior defined as a marginal.

------
ovi256
Wow thanks, this is a great, well written intro that makes a great refresher !

Anybody starting machine learning and data mining ?

