Hacker News new | past | comments | ask | show | jobs | submit login

Read the note carefully: "While improvements in calibration are an empirically recognized benefit of a Bayesian approach, the enormous potential for gains in accuracy through Bayesian marginalization with neural networks is a largely overlooked advantage." Because the different parameter settings correspond to different high performing models, averaging their predictions will lead to more accurate predictions.

Averaging predictions can lead to improvements but this isn't necessarily so. It may also result in simply dragging down the accuracy in the direction of the lesser method if it is consistently less good than the others in the batch. Averaging works well when multiple methods tend towards different outliers, that way you end up cancelling some of that out.

I have yet to see someone successfully produce a high dimensional posterior over all of the parameters in the neural network. Usually, people make the simplifying assumption of different posteriors for each parameter, which means that you can't just extract a bunch of different models.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact