My summary: Neural nets have massive "capacity" which means that, in the face of...

My summary:

Neural nets have massive "capacity" which means that, in the face of finite data sets, they can both (a) reasonably represent generalizable and non-generalizable functions and (b) can take on priors which do not distinguish between those classes of functions. The upshot is that after training, the posterior weight of robust/generalizable models will equal that of fragile/non-generalizable ones.

We need to believe that the priors that we actually use don't have that property if we're to believe in the posteriors produced by BNNs. Should we?

Today, priors in networks arise mostly out of network topology since initialization methods are somewhat constrained by practicalities in training. The article criticizes those who would assert that network topology (+ initialization) leads to a reasonable prior in the space of effective input -> output functions as realized by the network.

To put that in different terms, you might imagine an argument saying that network topologies are biologically inspired and thus represent a decent approximation of the space of "achievable" implementations of functions in the given task. But does an argument like this say anything about the generalization capability of functions favored by this prior? You might characterize this as "Easy" versus "Correct".

I'm not trying to actually represent argumentation that neural network topologies actually are reasonable in shaping "uninformative" Gaussian priors in the weight space into "uninformative" and "generalizable" priors in the function space. There may be some really good arguments out there. But, if we're going to understand NNs as reasonable Bayesian processes, then that question needs to be interrogated.