Hacker News new | past | comments | ask | show | jobs | submit login

Just for context, Andrew Gelman is one of the creators of Stan[1], one of the most popular probabilistic programming platforms for Bayesian interference. He has written a popular textbook on Bayesian methods, Bayesian Data Analysis[2].

Everyone hates picking priors in Bayesian analysis. If you pick an informative prior, you can always be criticized for it (in peer review, for a business decision, etc.) The usual dodge is to use a non-informative prior (like the Jeffreys prior[3].) I interpret Gelman's point as saying this can also lead to bad decisions. Thus, Bayesian analysts must thread the needle between Scyllia and Charybdis when picking priors. That's certainly a real pain point when using Bayesian methods.

However, it's pretty much the same pain point as choosing regularization parameters (or choosing not to use regularization) when doing frequentist statistics. For example, sklearn was recently criticized for turing on L2 regularization by default which could be viewed as a violation of the principle of least surprise, as well as causing practical problems when inputs are not standardized. But leaving regularization turned off is equivalent to choosing an non-informative or even improper prior. (informally in many cases, and formally identical for linear regression with normally distributed errors[4].) So Scyllia and Charybdis still loom on either side.

My problem with Bayesian models, completely unrelated to Gelman's criticism, is that the partition function is usually intractable and really only amenable to probabilistic methods (MCMC with NUTS[5], for example.) This makes them computationally expensive to fit, and this in turn makes them suitable for (relatively) small data sets. But using a lot more data is the single best way to allow a model to get more accurate while avoiding over-fitting! That is why I live with the following contradiction: 1) I believe Bayesian models have better theoretical foundations, and 2) I almost always use non-Bayesian methods for practical problems.

[1]: https://mc-stan.org/

[2]: https://www.amazon.com/Bayesian-Analysis-Chapman-Statistical...

[3]: https://en.wikipedia.org/wiki/Jeffreys_prior

[4]: https://stats.stackexchange.com/questions/163388/l2-regulari...

[5]: http://www.stat.columbia.edu/~gelman/research/published/nuts...




"Everyone hates picking priors in Bayesian analysis."

Everybody hates searching for their keys in the dark.




Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: