Please explain the proper way to form priors, then. 50:50 is widely used, as is "0 for obviously wrong stuff". The author here suggested "something sufficiently close to 0", which to me is indistinguishable from the second one. Should an accused's guilt prior be based on the jury's own guilt, the number of crimes they've heard about recently, or the judge's conviction rate? Or maybe the accused's socio-economic class?
Bayes's rule doesn't help with the point that suggestive evidence is not convincing evidence. It just points out that prior beliefs are part of the equation, but will hopefully pale in comparison to actual data. In fact, I was taught to set practically useless hyperparameters to ensure that they do. No one does that outside of an experiment.
Let's say I believe (I don't) the height of pygmies is normally distributed, where the mean is also normally distributed with mean 130cm and standard deviation 10cm, and the standard deviation is inverse gamma distributed with shape 7cm and scale 1cm. Assuming the height is actually normally distributed with mean 160cm and sd 15cm (it isn't), how many pygmies must I measure to admit that P(height>160cm)>20%? I'm not sure I can even do the math.
Here P=50% for the unknowable accurate model and P=0.13% for the prior model. How does the situation change when my prior is "sufficiently close to 0"?
Selecting proper priors is quite a contentious issue, mainly because there does not seem to be one perfect answer. Though there are a few guidelines one can follow. One of them is what you pointed, building up a tower of hyperparameters. Hyperparameters have the same feeling as "turtles all the way down" but aren't so bad if you have sufficient observations. Then one can prove that any bias because of the priors will disappear in the limit. But for one off decisions that is not very useful.
In the legal case example, maybe some clarity maybe had in considering what does the prior mean. A answer is: say you have to bet a million dollars on whether the person is guilty or not without knowing anything about the person how would you distribute your million dollars between the two events. Yes it is subjective and personal, but it is hardly ever going to be 50:50. One can push the $1,000,000 analogy further. One can fix a cost for a mistake: whats the cost of a wrong conviction and whats the cost for setting a guilty man free. Then the final decision can be based on reducing the financial risk based on the likelihoods.
One may bring the socio-economic status in forming the priors but one may not consider any information source that considers the accused.
Replying again as I missed out a vital piece. Bayesian reasoning is an online process so after every decision one has to update the priors. The next time one has to use the reason engine one should work with the most recent prior. An alternative but equivalent way of stating the same is that one should look at the entire past to form the valid prior of that instant.
Lets take the example. There is a one to one correspondence with fictitious counts and priors. One way of encoding a 50:50 prior is to construct a possibly fictitious but representative past of (say) 2000 samples split into 1000 guilty and a 1000 not-guilty. After each prediction and assuming that the truth gets known one has to update the counts appropriately, so that the next time we use a different prior.
Our initial prior may be wrong but it will approach the correct one asymptotically. But how fast it approaches the true prior depends on how wrong our initial prior was.
Bayes's rule doesn't help with the point that suggestive evidence is not convincing evidence. It just points out that prior beliefs are part of the equation, but will hopefully pale in comparison to actual data. In fact, I was taught to set practically useless hyperparameters to ensure that they do. No one does that outside of an experiment.
Let's say I believe (I don't) the height of pygmies is normally distributed, where the mean is also normally distributed with mean 130cm and standard deviation 10cm, and the standard deviation is inverse gamma distributed with shape 7cm and scale 1cm. Assuming the height is actually normally distributed with mean 160cm and sd 15cm (it isn't), how many pygmies must I measure to admit that P(height>160cm)>20%? I'm not sure I can even do the math.
Here P=50% for the unknowable accurate model and P=0.13% for the prior model. How does the situation change when my prior is "sufficiently close to 0"?