Hacker News new | past | comments | ask | show | jobs | submit login

@fgabriel mentioned this below: if the network is parametrized in a certain way, then the GP evolves according to a linear equation (if trained with square loss). In this linear equation, a different kernel shows up, known as the Neural Tangent Kernel. An intuitive way to think about this is to Taylor expand the parameters-to-function map around the initial set of parameters: f = f_0 + J d\theta, where J is the Jacobian of the neural network function against the parameters. Following this logic, the change in parameters affects the neural network function roughly linearly, as long as the parameters don't venture too far away from their original values. The Neural Tangent Kernel is then given by JJ^T.

In addition to the paper mentioned by @fgabriel, this paper [1] explains it in more detail as well, and the equations you are looking for are 14, 15, and 16.

[1] https://arxiv.org/abs/1902.06720

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact