
Are partial derivatives the computational primitives of deep neural networks? - aidanrocke
https://keplerlounge.com/neural-computation/2020/01/26/partial-derivatives.html
======
aidanrocke
tl;dr

1\. The typical deep neural network tutorial introduces deep networks as
compositions of nonlinearities and affine transforms.

2\. In fact, a deep network with relu activation simplifies to a linear
combination of affine transformations with compact support. But, why would
affine transformations be useful?

3\. After recent discussions on Twitter it occurred to me that the reason why
they work is that they are actually first-order Taylor approximations of a
suitable analytic function.

4\. What is really cool about this is that by this logic partial derivatives,
i.e. Jacobians, are computational primitives for both inference and learning.

5\. I think this also provides insight into how deep networks approximate
functions. They approximate the intrinsic geometry of a relation using piece-
wise linear functions.

This works because a suitable polynomial approximation exists and all
polynomials are locally Lipschitz.

