
Hack the derivative - luu
https://codewords.recurse.com/issues/four/hack-the-derivative/
======
jacobolus
Yay for functions which are analytic in some neighborhood. Here’s the original
paper from the 60s about the idea in this post:
[http://www.math.fsu.edu/~okhanmoh/media/Lyness,%20Moler,%20S...](http://www.math.fsu.edu/~okhanmoh/media/Lyness,%20Moler,%20SJNA,%201967,%20Numerical%20differentiation%20of%20analytic%20functions.pdf)

The latest work on this general topic is
[http://arxiv.org/pdf/1404.2463.pdf](http://arxiv.org/pdf/1404.2463.pdf) which
manages to compute extremely accurate high-order derivatives (“...even the
100th derivative of an analytic function can be computed with near machine
precision accuracy using standard floating point arithmetic”!!!), see also
[http://www.chebfun.org/examples/cheb/Turbo.html](http://www.chebfun.org/examples/cheb/Turbo.html)

* * *

Anyhow, for folks trying to take derivatives of (or do various other
operations to) arbitrary continuous functions, I recommend checking out
Chebfun – [http://www.chebfun.org](http://www.chebfun.org) – a Matlab library
created by a group of applied mathematicians at Oxford.

Instead of just looking at a couple points near some point of interest,
Chebfun approximates the continuous function over some interval to near
machine precision by a high-degree polynomial, and then operates on that
polynomial.

Operations like numerical differentiation are much more accurate when you make
the right “global” (i.e. not just at one single point) approximation.

Also check out Nick Trefethen’s book Approximation Theory and Approximation
Practice, the first 6 chapters of which are available online:
[http://www.chebfun.org/ATAP/atap-
first6chapters.pdf](http://www.chebfun.org/ATAP/atap-first6chapters.pdf)

~~~
gjm11
The paper by Lyness and Moler is nice, but if it's really "about the idea in
this post" then the fact is somewhat hidden. It's mostly about using the
Cauchy integral formula to turn derivatives into integrals, using the Poisson
summation formula to relate those to finite sums, and using the Moebius
inversion formula to get those relations into a form from which you can
extract the derivatives.

Maybe taking a very short partial sum of one of the series in Theorem 2 gives
you the result in the blog post here, but that seems rather sledgehammer-to-
crack-a-nut-ish when (as the blog post says) all you need is the Cauchy-
Riemann equations.

I think it's true that the Lyness-Moler idea of "numerical differentiation by
numerical complex integration" was, historically, part of the chain of ideas
that led to the very simple "complex-step approximation" discussed in the blog
post. But that's a far cry from saying that their paper is "about the idea in
this post"!

There's some discussion of the history here:
[http://blogs.mathworks.com/cleve/2013/10/14/complex-step-
dif...](http://blogs.mathworks.com/cleve/2013/10/14/complex-step-
differentiation/) where Moler (as in "Lyness and Moler") says "The complex
step algorithm that I'm about to describe is much simpler than anything
described in that 1967 paper. So, although we have some claim of precedence
for the idea of using complex arithmetic on real functions, we certainly
cannot claim to have invented today's algorithm."

And yes, Chebfun is very nice.

~~~
jacobolus
Fair enough. :-)

------
jamessb
The article comments: "In a lot of respects, it’s quite amazing how accurate
many calculations can be made with floating numbers, like orbital mechanics
and heat equations"

I'm not sure it is so surprising. In Nick Trefethen's _Numerical Analysis_
entry in _The Princeton Companion to Mathematics_ , he notes:

"Thus, on a computer, the interval [1 , 2] , for example, is approximated by
about 10^16 numbers. It is interesting to compare the fineness of this
discretization with that of the discretizations of physics. In a handful of
solid or liquid or a balloonful of gas, the number of atoms or molecules in a
line from one point to another is on the order of 10^8 (the cube root of
Avogadro’s number). Such a system behaves enough like a continuum to justify
our definitions of physical quantities such as density, pressure, stress,
strain, and temperature. Computer arithmetic, however, is more than a million
times finer than this. Another comparison with physics concerns the precision
to which fundamental constants are known, such as (roughly) 4 digits for the
gravitational constant G, 7 digits for Planck’s constant h and the elementary
charge e, and 12 digits for the ratio μ_e/μ_B of the magnetic moment of the
electron to the Bohr magneton. At present, almost nothing in physics is known
to more than 12 or 13 digits of accuracy. Thus IEEE numbers are orders of
magnitude more precise than any number in science. (Of course, purely
mathematical quantities like π are another matter.)"
[http://people.maths.ox.ac.uk/trefethen/NAessay.pdf](http://people.maths.ox.ac.uk/trefethen/NAessay.pdf)

~~~
eriktaubeneck
That's an interesting point I had not considered. My view (mostly from the
math side, my shallow dive into physical systems was just from a PDE class in
grad school) is from the view of unstable differential equations where very
small changes to initial conditions can cause massive changes to the output of
the system. My uninformed instincts would tell me that even with orders of
more digits than atoms in a physical system, changes smaller than the
available accuracy to initial conditions could have significant impact on the
eventual state of the system.

~~~
replicant
Another point of view on why it is not striking that we can model heat
conduction well. The heat equation is well-posed, which among other things,
means that the solution depends continuously of the input data. And we observe
this in many physical systems, that small changes in the cause produces small
changes in the effect, therefore, we should expect from good mathematical
models of reality to display this property.

Now, orbital mechanics do display unstable behaviour. I don't dare to
adventure on how people work around this. [https://en.wikipedia.org/wiki/Well-
posed_problem](https://en.wikipedia.org/wiki/Well-posed_problem)

------
jjoonathan
I don't see any mention of automatic differentiation which I thought was the
dominant technique in this area. Is there an advantage to the method described
in OP (other than mathematical cuteness)?

~~~
lloda
No. Automatic differentiation is superior to the complex step method in every
way.

The only reason to use the complex step method is if you're using legacy
languages where it's difficult to implement dual numbers but you have good
support for complex numbers. I don't think anybody should be using the complex
step method in new applications.

Some reading ~ [http://aero-
comlab.stanford.edu/Papers/martins.aiaa.01-0921....](http://aero-
comlab.stanford.edu/Papers/martins.aiaa.01-0921.pdf)

~~~
jordigh
wtf

Automatic differentiation only works for the simplest functions for which you
already know what the Taylor series looks like. For those cases, you might as
well just hardcode derivative functions and the basic derivative rules
(linearity and chain rule). It is not a general-purpose method.

For functions that you can't even express by a simple formula, you still have
to rely on finite differencing.

Don't call "legacy" anything that you don't understand.

~~~
lloda
The question was about automatic differentiation vs the method in the article
(the complex step method) which is essentially a defective variant of
automatic differentiation with a spurious numerical parameter.

Automatic differentiation works for any composition of those ‘simplest
functions’ you mention, which is quite a lot of stuff, including whole
programs.

Approximation methods have their place, sure. Sometimes they're good enough
and sometimes it's all you can do. What does that have to do with anything?

~~~
jjoonathan
Thanks for the link, btw. I'm in the process of digesting the paper.

------
srean
Had the following thought just to entertain myself, I don't expect it to have
any utility in practice, but here goes:

Take a small interval around your smooth function that includes the point
where you want to compute the derivative. Joint the right and left ends by
just reflecting itself, so that you have a differentiable periodic function.
Now sample that at a suitable rate, take its FFT, derive the FFT of the
derivative by multiplying it with 2\pi I. Take the inverse FFT and evaluate it
at the point of interest. May be useful if you want to compute the derivative
at many points, but seems wasteful otherwise.

One of course has to dot the i's and cross the t's (of aliasing effects,
Nyquist limits etc). Perhaps some other _fast_ integral transform would work
even better. I will be surprised if there isn't an old industry around it.

~~~
cabinpark
This is called a spectral method and is widely used in numerical codes.

I highly recommend Spectral Methods in MATLAB by Trefethen (who someone
mentions above) for a very good tutorial. You can freely ignore the MATLAB
part and use whatever programming language you want, as long as you have an
FFT routine.

~~~
srean
Thanks for pointing to the right direction in case I want scratch that itch
more. I am not surprised though that this has been done, expected the same.
From a quick look, no one seems to suggest reflecting the original analytic
function to get a periodic function, I guess some caveats lurk there.

Does other integral transforms work better than Fourier ?

~~~
cabinpark
When you use this method you are implicitly making the function periodic. I
can give you any function on some interval (sufficiently well behaved) and you
can compute the Fourier series of it. Even though it's only defined on the
interval, if you plotted the Fourier series you would still find it to be
periodic. The same idea carries over to the spectral derivative. Even if the
function isn't periodic, the method still works, it's just that the numerics
near the discontinuity are going to be crap and can affect things further
away. But if you are sufficiently far away, things are kosher. Note there are
ways of dealing with this using Chebyshev interpolation (see
[http://math.mit.edu/~stevenj/fft-deriv.pdf](http://math.mit.edu/~stevenj/fft-
deriv.pdf) section 6) which Trefethen's book also discusses.

And no FFTs are the only transformations I am aware of. I've never heard of
anyone using other transformations in a general numerical context, outside of
specialized problems.

~~~
srean
Given the local nature of the problem and the potential for using O(n)
operations (as opposed n log n) my first suspect was wavelet transforms and
sure enough people have done that but that body of work seems more recent than
the use of FFT.

------
kmill
I was trying to figure out how this works algebraically, and I think I
understand for polynomials (or power series) at least. Basically, for a really
small number a, ai has the property that (ai)^2 is effectively zero, so one
can think of this as working with the dual numbers themselves (i.e., throwing
out the x^2 term and onward in a power series). The focus on analytic
functions is then because they are representable by power series at each
point. I think this is what lloda is referring to.

------
edoloughlin
Lost me a bit at

    
    
      Im(f(x+ih))​​/h
    

near the end (that's a fancy I that android FF won't paste). Can anyone
explain where 'm' came from?

Or is 'Im' just a fn returning the imaginary part of its argument?

EDIT: reading the code following, it's clear that Im is just that.

~~~
mbroshi
I was also lost at that part, although for a different reason. The function f
was assumed to be from R to R, so it does not make formal sense to plug in
f(x+ih).

I guess the author has the unstated assumption that f is the restriction to R
of a function on C.

~~~
eriktaubeneck
Great point, and I apologize for the confusion. The function should be C to C,
with the added restriction of R to R.

------
cheez
This was so much fun to read.

~~~
eriktaubeneck
Thanks! Glad you enjoyed it.

------
eccstartup
I DO think Python libraries can do this already.

~~~
jamessb
The scipy.misc.derivative() function uses a central difference formula, rather
than exploiting the fact that a function is analytic:
[https://docs.scipy.org/doc/scipy/reference/generated/scipy.m...](https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html)

Applying the Cauchy–Riemann equations to essentially take finite differences
in the _imaginary_ direction is an interesting trick I hadn't seen before, so
I appreciated an article introducing it.

Edit: A separate python package, Numdifftools, does seem to support complex-
step differentiation
[https://pypi.python.org/pypi/Numdifftools](https://pypi.python.org/pypi/Numdifftools)

