Hacker News new | past | comments | ask | show | jobs | submit login

Looking now at my thesis, I agree that I don't explicitly argue theoretically for why multilayer networks will (under suitable conditions) converge to Gaussian processes. However, it follows (at some level of rigour) pretty directly from the fact (which I do note) that if a single-hidden layer has multiple outputs, the functions computed by these outputs will be independent (in the prior) as the number of hidden units goes to infinity. So if you add another hidden layer, the functions computed by the units in this layer will be independent (they're like outputs of the previous layer), and the argument for why the outputs from this layer form a GP goes through as before. I'm not sure why I didn't explicitly note this. It's implicitly assumed in my discussion of how the covariance function for networks with step function hidden units changes as you add more layers.

Right, so your argument would work if you allow the layer widths to tend to infinity sequentially (so this corresponds to finite networks where each previous layer is much bigger than the next layer). This is the argument presented by Lee et al. But note it's nontrivial to argue that this limit holds when the widths of all layers tend to infinity at the same time (arguably the more natural limit), which is one of the main contributions of Matthews et al. In my paper here, I also consider this limit where the widths tend to infinity at the same time.

In any case, I'll update the paper to reflect our discussion here. Thanks, Radford!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact