Hacker News new | past | comments | ask | show | jobs | submit login
Monkeying with Bayes’ theorem (johndcook.com)
52 points by niyazpk on Nov 23, 2012 | hide | past | favorite | 8 comments



Here's previous discussion on HN that fleshes this out considerably: http://news.ycombinator.com/item?id=3693447


From that discussion, this comment is perhaps the best explanation of what might be going on: http://news.ycombinator.com/item?id=3694422


Right, here's the parent comment to that one, which is also relevant: http://news.ycombinator.com/item?id=3693779

Here's one "theoretical" justification of the idea. Just to establish notation, Bayes tells us to maximize:

  P(W|A) = P(A|W) P(W) / P(A) \propto P(A|W) P(W)
where we drop the last factor because we only want to vary W, so we don't care about factors depending on only A. (\propto is "is proportional to")

But instead of religiously following the Reverend's advice, we choose to maximize:

  f(W|A) = P(A|W) P(W)^a
for some a > 0. Taking logs, this is equivalent to maximizing

  g(W|A) = log P(A|W) + a log P(W)
What are these terms doing? The first is the likelihood which measures agreement of data A to hypothesis W. The second is the prior which measures chance of the hypothesis W.

You can look at the generalization (a != 1) as introducing a lagrange multiplier into the maximization. With the lagrange multiplier, the procedure becomes to "maximize agreement of hypothesis to data subject to the constraint that the hypothesis must be this likely". You then choose the lagrange multiplier to mediate the tradeoff between these two desirable goals ("agrees with data" vs. "likely to happen a priori").

Note that since we're just maximizing, it does not matter which term we weight with the constant. Weighting the first term with 1/a is the same as weighting the second with a.

For reasons like this, it is common for a prior to have a leading factor of some sort (in the exponent of the distribution).


The point about hackily downplaying independence by raising the product to a fractional power was nice.

http://www.johndcook.com/blog/2012/03/09/monkeying-with-baye...


I'm confused. So should that part of Bayes' Theorem be raised to the 1.5 power?


No, that's not what we should take away from this. Bayes' theorem is a mathematical theorem, after all, not a theory that you can falsify or improve upon. If you plug the correct inputs into the formula, it is guaranteed that you will get the right inputs back out.

So if you are getting a more useful result by modifying the formula, that means that your inputs are wrong. In this case, it can be helpful to make "modifications" to Bayes' theorem, but really what you're doing is adjusting the inputs.


No. Peter Norvig found a particular instance where a formula motivated by Bayes' theorem worked even better when the formula was changed. The altered formula doesn't have a theoretical basis, but it seems to work.


Comments on the blog post give the theoretical basis: correcting for incorrect assumptions of independence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: