Classical statistics replaces the question with one that can be answered. Namely, "How likely would this result be if the null hypothesis were true?" This has several difficulties. The most noticeable one is that, no matter how much the professor tells them not to, people replace the question actually answered with the question that they want to answer. This mistake has been made by anyone who says, "We confirmed the null hypothesis..."
Bayesian statistics confronts the problem head on by pointing out that the conclusion you should draw depends on the prior beliefs you start with. And then they present complicated graphs that show you how your prior affects your conclusion for some reasonable family of priors. This approach avoids misrepresenting the question at the cost of presenting your answer in a complicated way.
My feeling is that as long as the general scientific public remains unconvinced that the classical hypothesis testing approach leads to wrong results, simplicity will win. (And will continue to be misunderstood.)
Try to explain it to your 5 year old daughter, or your 80 year old grandmother.
After that, this - http://yudkowsky.net/rational/technical
The full Bayesian probability model includes the unobserved parameters. The marginal distribution over parameters is known as the “prior” parameter distribution, as it may be computed without reference to observable data. The conditional distribution over parameters given observed data is known as the “posterior” parameter distribution.
uses too much jargon; I'm sure I'd understand it if he'd defined "marginal distribution" and "conditional distribution" and clarified exactly what the difference between observable and unobservable data and/or parameters is. The hypothetical audience for this seems to be people who are intimately familiar with statistical terminology but know absolutely nothing about Bayesian statistics.
It has been discussed on HN before:
But Bayesian stats isn't just about applying Bayes's theorem. Its key feature is using probabilities for model parameters and incorporating posterior uncertainty of their values into inference.
There was a sudden spike in traffic, and it turns out it comes from Y Combinator Hacker News, where there's a discussion of this post with seven comments as of today.
The criticisms were sound -- it's too technical (i.e. jargon filled) for someone to understand who doesn't already get it. Ironically, I've been telling Andrew Gelman that about his Bayesian Data Analysis book for years.
Unix man pages are the usual exemplar of doc that only works if you mostly know the answer. They're great once you already understand something, but terrible for learning.
I think Andrew's BDA is that way -- it's clear, concise and it actually does explain everything from first principles. And there are lots of examples. So why is this so hard to understand?
I usually write with my earlier self in mind as an audience. Sorry for not targeting a far-enough back version of myself this time! The jargon should be familiar to anyone who's taken math stats. I don't think it'd have helped if I'd have defined the sum for the prior defined as a marginal.
Anybody starting machine learning and data mining ?