
Monkeying with Bayes’ theorem - niyazpk
http://www.johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/
======
MaxGabriel
Here's previous discussion on HN that fleshes this out considerably:
<http://news.ycombinator.com/item?id=3693447>

~~~
sampo
From that discussion, this comment is perhaps the best explanation of what
might be going on: <http://news.ycombinator.com/item?id=3694422>

~~~
mturmon
Right, here's the parent comment to that one, which is also relevant:
<http://news.ycombinator.com/item?id=3693779>

Here's one "theoretical" justification of the idea. Just to establish
notation, Bayes tells us to maximize:

    
    
      P(W|A) = P(A|W) P(W) / P(A) \propto P(A|W) P(W)
    

where we drop the last factor because we only want to vary W, so we don't care
about factors depending on only A. (\propto is "is proportional to")

But instead of religiously following the Reverend's advice, we choose to
maximize:

    
    
      f(W|A) = P(A|W) P(W)^a
    

for some a > 0\. Taking logs, this is equivalent to maximizing

    
    
      g(W|A) = log P(A|W) + a log P(W)
    

What are these terms doing? The first is the likelihood which measures
agreement of data A to hypothesis W. The second is the prior which measures
chance of the hypothesis W.

You can look at the generalization (a != 1) as introducing a lagrange
multiplier into the maximization. With the lagrange multiplier, the procedure
becomes to "maximize agreement of hypothesis to data subject to the constraint
that the hypothesis must be this likely". You then choose the lagrange
multiplier to mediate the tradeoff between these two desirable goals ("agrees
with data" vs. "likely to happen a priori").

Note that since we're just maximizing, it does not matter which term we weight
with the constant. Weighting the first term with 1/a is the same as weighting
the second with a.

For reasons like this, it is common for a prior to have a leading factor of
some sort (in the exponent of the distribution).

------
Gravityloss
The point about hackily downplaying independence by raising the product to a
fractional power was nice.

[http://www.johndcook.com/blog/2012/03/09/monkeying-with-
baye...](http://www.johndcook.com/blog/2012/03/09/monkeying-with-bayes-
theorem/#comment-144221)

------
jellyksong
I'm confused. So should that part of Bayes' Theorem be raised to the 1.5
power?

~~~
johndcook
No. Peter Norvig found a particular instance where a formula motivated by
Bayes' theorem worked even better when the formula was changed. The altered
formula doesn't have a theoretical basis, but it seems to work.

~~~
Evbn
Comments on the blog post give the theoretical basis: correcting for incorrect
assumptions of independence.

