Hacker News new | past | comments | ask | show | jobs | submit login

OK, let's say NNs are GPs. What can we do with this information?



The paper says more than just that they are GPs, I believe it shows that the NTK (Neural Tangent Kernel https://arxiv.org/abs/1806.07572) of a large class of ANN converges at initialization. This NTK allows one to gain insight on the training of NN : speed of convergence, artifacts which can appear (checkerboard patterns can be explained with NTK), generalisation of the NN...

Would be nice to get the same result during training for all these architectures and I believe it will be the next paper of G. Yang and I eager to read it.


You can use it to estimate model uncertainty, Yarin Gal has some nice writeups on this: https://www.cs.ox.ac.uk/people/yarin.gal/website/blog_3d801a... (in this case using dropout networks as GP approximations).


How would we use a property of networks with random weights to estimate uncertainty of trained models in which which the weights are (as much as we can) trained to be not random?


I'd guess it lets us map information processes to physical processes. Information processes follow a graph which follow a power law. Physical processes of course follow GP.

Depending on how complete the map is it may let you know us come up with 'physical' laws of information. I am rooting for something which I call Boltzmann convergence.


This means there is a way to convert any or perhaps just Gaussian markovian model to an ANN and vice versa.

This is interesting because markovian processes are much easier to intuit about.


How?


Probably just use a normal distribution instead of a NN.


Gaussian Process != simple normal distribution




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: