
Automatic Differentiation via Contour Integration - aidanrocke
https://keplerlounge.com/neural-computation/2020/01/16/complex-auto-diff.html
======
g82918
The more relevant link is [https://keplerlounge.com/neural-
computation/2020/01/16/compl...](https://keplerlounge.com/neural-
computation/2020/01/16/complex-auto-diff.html). Where the author discusses why
this repos exists. This isn't a very good way to do AutoDiff and isn't
exact(Look at standard reverse mode AutoDiff or Julia or Swift's
implementations instead). The point of the repo is that the author believes it
is a biologically plausible mechanism for neurons to be trained. Whether it is
or isn't is the interesting question.

~~~
dang
Ok, we switched to that from
[https://github.com/AidanRocke/AutoDiff](https://github.com/AidanRocke/AutoDiff).
Thanks!

------
yorwba
The discrete approximation to the integral involves sampling the function at a
finite number of points and then computing a weighted sum of the sampled
function values. In the case where _N_ = 2, _θ_ only takes the values 0 and
_π_ and the approximation is _f '_( _x₀_ ) ≈ 1/2 * ( _f_ ( _x₀_ +1) - _f_ (
_x₀_ -1)), i.e. approximating the derivative with a simple difference.

So I don't think this is more biologically plausible than neurons using finite
differences to do gradient descent.

~~~
mpoteat
There's evidence that signal differences in neutrons in the visual cortex is
used for object recognition, whose behavior gives rise to some of the optical
illusions we experience. Weak evidence for temporal differentiation also
having the same underlying mechanism.

------
conistonwater
This is a standard technique in numerical analysis for complex-differentiable
functions, it's just not so common to have inputs be guaranteed to be
holomorphic so people don't use it often,
[http://mpmath.org/doc/current/calculus/differentiation.html#...](http://mpmath.org/doc/current/calculus/differentiation.html#mpmath.diff)
([https://github.com/fredrik-
johansson/mpmath/blob/981102736a8...](https://github.com/fredrik-
johansson/mpmath/blob/981102736a843ae30c26f73b388fdba622a46270/mpmath/calculus/differentiation.py#L198))

~~~
kkylin
Indeed a standard technique in numerical analysis. Among the advantages of
doing things this way (when applicable, i.e., when one has holomorphic
integrands) are significantly better accuracy & stability:
[https://people.maths.ox.ac.uk/trefethen/sirev56-3_385.pdf](https://people.maths.ox.ac.uk/trefethen/sirev56-3_385.pdf)

(Haven't looked closely at the posted code so no idea how directly relevant
this is.)

------
eigenspace
This isn’t automatic differentiation.

An important feature of auto-diff is that it’s numerically exact. This can be
shown in a few lines to be equivalent to finite differencing.

That doesn’t take away from that fact that it’s a neat technique though!

~~~
kkylin
I quite agree this is not exact and is not automatic differentiation (which
_would_ be exact up to round-off). However, for holomorphic functions, contour
integration can be much more accurate. The reason is that under fairly general
conditions, the trapezoid rule for contour integrals is geometrically
accurate. Thus, using stepsize _h_ gives errors that are O(e^{-c/h}). (See the
Trefethen paper cited in my other comment in this post.) In contrast, standard
centered difference has error O(h^2). Moreover, finite differences are
unstable: because of loss of precision in taking differences, the relative
error tends to blow up as h goes to 0. Contour integration, though more
expensive (because it needs many more function evaluations for the same
stepsize h), does not, because integration is better behaved numerically.

~~~
eigenspace
The reason the the difference in accuracy is that this method has N steps
instead of 2.

Another poster in the thread mentioned this as well.

~~~
kkylin
Yes and no -- of course if one uses more points the result can be more
accurate. The real difference between a center difference and contour
integration is that differences suffer from loss of information in finite-
precision arithmetic. For example, in (f(x+h) - f(x-h))/(2h), the numerator
actually contains very few bits of information when h is small. The Cauchy
integral formula, in contrast, behaves quite well so long as the point of
evaluation is bounded away from the contour.

------
pixelpoet
Gotta escape those TeX \cos functions. And probably I should stop making this
not so useful comment, since it's so common? At some point I should rather go
find some opensource code and just make it give a warning, not really sure
where though...

------
koningrobot
This reminds me of Squire & Trapp[1] which seems to be a special case. That
paper provides a way of estimating Jacobian-vector products (aka forward-mode
autodiff) by adding a tiny imaginary noise vector to the input. The complex
output will have the function value in the real part and the Jacobian-vector
product in the imaginary part.

The cool thing is you can make the noise vector arbitrarily small (up to
machine precision), so it doesn't have the issues that finite differences has.
I'm not sure if the same is true of the method described in the article.

[1] Using Complex Variables to Estimate Derivatives of Real Functions,
[https://pdfs.semanticscholar.org/3de7/e8ae217a4214507b9abdac...](https://pdfs.semanticscholar.org/3de7/e8ae217a4214507b9abdac66503f057aaae9.pdf)

~~~
abecedarius
There's a more leisurely blog-post-style explanation of this complex-variable
trick at [https://codewords.recurse.com/issues/four/hack-the-
derivativ...](https://codewords.recurse.com/issues/four/hack-the-derivative)

~~~
improbable22
That's neat. It's perhaps worth mentioning that this trick is only needed when
your computer understands complex numbers (1,i) but not dual numbers (1,ε),
where i^2 = -1 but ε^2 = 0.

------
mehrdadn
> Due to Taylor’s theorem, any differentiable real-valued function may be
> approximated by polynomials so Cauchy’s method is applicable to any function
> that is differentiable.

Sorry, what? Pretty sure this is not Taylor's theorem (and it's false)?

~~~
dan-robertson
Well Taylor’s theorem is that any sufficiently differentiable function can be
approximated by a particular polynomial plus a particular error. It just turns
out that the error term doesn’t need to be very well behaved (eg consider the
Taylor series at 0 for f(x)=exp(-1/x^2) ).

I think instead this is an application of the fundamental theorem of applied
maths which states, approximately, that, in applied mathematics:

\- all Taylor series converge

\- all functions are piecewise smooth

\- all sums and integrals can be transposed

\- if the solution must be x if it exists, then the solution exists (and is x)

\- if it looks right then it is

