I agree that priors over aspects of the world would be more useful, but I don't think that they're important in making natural intelligence powerful. In my experience, the important thing is to make your prior really broad, but containing all kinds of different hypotheses with different kinds of rich structure.
I claim that knowing a priori about things like agents and objects just doesn't save you all that much data, as long as you have the imagination to consider all structures at least that complex.
This approach characterizes a different type of uncertainty than BNNs do, and the approaches can be combined. The BNN tracks uncertainty about parameters in the NN, and mixture density nets track the noise distribution _conditional on knowing the parameters_.
I would argue against the idea that "MLE is just Bayes with a flat prior". The power of Bayes usually comes mainly from keeping around all the hypothesis that are compatible with the data, not from the prior. This is especially true in domains where something black-box (essentially prior-less) like a neural net has any chance of working.
Good point. We wrote this pre-double descent, and a massively overparameterized model would make a nice addition to the tutorial as a baseline. However, if you want a rich predictive distribution, it might still make sense to use a Bayesian NN.
I agree that Bayesian neural networks haven't been worth it in practice for many applications, but I think the main problem is that it's usually better to spend your compute training a single set of weights for a larger model, rather than doing approximate inference over weights in a smaller model. The exception is probably scientific applications where you mostly know the model, but then you don't really need a neural net anymore.
Choosing a prior is hard, but I'd say it's analogously hard to choosing an architecture - if all else fails, you can do a brute force search, and you even have the marginal likelihood to guide you. I don't think it's the main reason why people don't use BNNs much.
I disagree with one conceptual point; if you are truly Bayesian you don’t “choose” a prior, by definition you “already have” a prior that you are updating with data to get to a posterior.
Sure, instead of saying "choose" a prior, you could say "elicit". But I think in this context, focusing on a practitioner's prior knowledge is missing the point. For the sorts of problems we use NNs for, we don't usually think that the guy designing the net has important knowledge that would help making good predictions. Choosing a prior is just an engineering challenge, where one has to avoid accidentally precluding plausible hypotheses.
I agree choosing priors is hard, but choosing ReLU versus LeakyReLU versus sigmoid seems like a problem with using neural nets in general, not Bayesian neural nets in particular. Am I misunderstanding?
Author here! What a surprise. This was an abandoned project from 2019, that we never linked or advertised anywhere as far as I know. Anyways, happy to answer questions.
why (if) was this not picked for further research? i know that oatml did quite amount of work on this front as well and it seems the direction is still being worked on. want to get ur 2 cent on this approach.
BNNs certainly have their uses, but I think people in general found that it's a better use of compute to fit a larger model on more data than to try to squeeze more juice from a given small dataset + model. Usually there is more data available, it's just somewhat tangentially related. LLMs are the ultimate example of how training on tons of tangentially-related data can ultimately be worthwhile for almost any task.
I still am excited by Dex (https://github.com/google-research/dex-lang/) and still write code in it! I have a bunch of demos and fixes written, and am just waiting for Dougal to finish his latest re-write before I can merge them.
Right, but the employer they choose is presumably the one offering the best deal in the world from their point of view. Wouldn't it make more sense to be mad at every other employer for not proposing better pay, than the one that's offering the best pay for that worker?
I think it's very unlikely that any of those would lead to human extinction, especially since most of those take decades to unfold, and would still leave large parts of the earth habitable.
Sure, but think about how humans drove others extinct. We never decided to "kill all wooly mammoths", we just wanted to use their meat / habitats for other things.
The correlation you mention seems noisy enough that I wouldn't want to bet my civilization on it.
AI Safety hasn't been divorced from tech companies, at least not from Deepmind, OpenAI, and Anthropic. They were all founded by people who said explicitly that AGI will probably mean the end of human civilization as we know it.
All three of them have also hired heavily from the academic AI safety researcher pool. Whether they ultimately make costly sacrifices in the name of safety remains to be seen (although Anthropic did this already when they delayed the release of Claude until after ChatGPT came out). But they're not exactly "watching from the sidelines", except for Google and Meta.
I claim that knowing a priori about things like agents and objects just doesn't save you all that much data, as long as you have the imagination to consider all structures at least that complex.
reply