Hacker News new | past | comments | ask | show | jobs | submit login
What is “Bayesian” Statistical Inference? (lingpipe-blog.com)
26 points by fogus on Sept 10, 2009 | hide | past | favorite | 11 comments

The fundamental problem is simple. When people do statistics the answer they want is, "How likely is it that X is true?" The difficulty is that the problem is ill posed, you lack sufficient information to answer that question.

Classical statistics replaces the question with one that can be answered. Namely, "How likely would this result be if the null hypothesis were true?" This has several difficulties. The most noticeable one is that, no matter how much the professor tells them not to, people replace the question actually answered with the question that they want to answer. This mistake has been made by anyone who says, "We confirmed the null hypothesis..."

Bayesian statistics confronts the problem head on by pointing out that the conclusion you should draw depends on the prior beliefs you start with. And then they present complicated graphs that show you how your prior affects your conclusion for some reasonable family of priors. This approach avoids misrepresenting the question at the cost of presenting your answer in a complicated way.

My feeling is that as long as the general scientific public remains unconvinced that the classical hypothesis testing approach leads to wrong results, simplicity will win. (And will continue to be misunderstood.)

Too dense, man. Only people who already get it will get it. People who don't get it, will still not get it after reading.

Try to explain it to your 5 year old daughter, or your 80 year old grandmother.

Seems pretty clear to me, it doesn't claim to be an article for the non mathematically inclined. Not that "pop" articles on this subject wouldn't be pretty cool too.

Well, I'm pretty mathematically inclined, but ignorant of Bayesian statistics, and I still didn't really get it. Frinstance the first full-length paragraph:

The full Bayesian probability model includes the unobserved parameters. The marginal distribution over parameters is known as the “prior” parameter distribution, as it may be computed without reference to observable data. The conditional distribution over parameters given observed data is known as the “posterior” parameter distribution.

uses too much jargon; I'm sure I'd understand it if he'd defined "marginal distribution" and "conditional distribution" and clarified exactly what the difference between observable and unobservable data and/or parameters is. The hypothetical audience for this seems to be people who are intimately familiar with statistical terminology but know absolutely nothing about Bayesian statistics.

I think those concepts are best understood by example.

Amen... I'm a recovering mathophobe, and I was discouraged by how utterly impenetrable this article was. I mean, I honestly had absolutely no idea what was going on, despite really wanting to understand.

I like Eliezer's page on this much better:


It has been discussed on HN before:


Eliezer Yadkowsky's page is a nice intro to Bayes's theorem, the understanding of which is critical for understanding why the posterior is proportional to the prior times the sampling distribution.

But Bayesian stats isn't just about applying Bayes's theorem. Its key feature is using probabilities for model parameters and incorporating posterior uncertainty of their values into inference.

Here's what I actually said in response to all this on the original blog, which showed up as a huge spike in our traffic!

There was a sudden spike in traffic, and it turns out it comes from Y Combinator Hacker News, where there's a discussion of this post with seven comments as of today.

The criticisms were sound -- it's too technical (i.e. jargon filled) for someone to understand who doesn't already get it. Ironically, I've been telling Andrew Gelman that about his Bayesian Data Analysis book for years.

Unix man pages are the usual exemplar of doc that only works if you mostly know the answer. They're great once you already understand something, but terrible for learning.

I think Andrew's BDA is that way -- it's clear, concise and it actually does explain everything from first principles. And there are lots of examples. So why is this so hard to understand?

I usually write with my earlier self in mind as an audience. Sorry for not targeting a far-enough back version of myself this time! The jargon should be familiar to anyone who's taken math stats. I don't think it'd have helped if I'd have defined the sum for the prior defined as a marginal.

Wow thanks, this is a great, well written intro that makes a great refresher !

Anybody starting machine learning and data mining ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact