Please explain the proper way to form priors, then. 50:50 is widely used, as is ...

srean · on Jan 5, 2011

Selecting proper priors is quite a contentious issue, mainly because there does not seem to be one perfect answer. Though there are a few guidelines one can follow. One of them is what you pointed, building up a tower of hyperparameters. Hyperparameters have the same feeling as "turtles all the way down" but aren't so bad if you have sufficient observations. Then one can prove that any bias because of the priors will disappear in the limit. But for one off decisions that is not very useful.

In the legal case example, maybe some clarity maybe had in considering what does the prior mean. A answer is: say you have to bet a million dollars on whether the person is guilty or not without knowing anything about the person how would you distribute your million dollars between the two events. Yes it is subjective and personal, but it is hardly ever going to be 50:50. One can push the $1,000,000 analogy further. One can fix a cost for a mistake: whats the cost of a wrong conviction and whats the cost for setting a guilty man free. Then the final decision can be based on reducing the financial risk based on the likelihoods.

One may bring the socio-economic status in forming the priors but one may not consider any information source that considers the accused.

srean · on Jan 5, 2011

Replying again as I missed out a vital piece. Bayesian reasoning is an online process so after every decision one has to update the priors. The next time one has to use the reason engine one should work with the most recent prior. An alternative but equivalent way of stating the same is that one should look at the entire past to form the valid prior of that instant.

Lets take the example. There is a one to one correspondence with fictitious counts and priors. One way of encoding a 50:50 prior is to construct a possibly fictitious but representative past of (say) 2000 samples split into 1000 guilty and a 1000 not-guilty. After each prediction and assuming that the truth gets known one has to update the counts appropriately, so that the next time we use a different prior.

Our initial prior may be wrong but it will approach the correct one asymptotically. But how fast it approaches the true prior depends on how wrong our initial prior was.