
Implementing the Exponential Function - hackernewsn00b
https://www.pseudorandom.com/implementing-exp
======
kens
A couple other ways to compute the exponential function: The Intel 8087
coprocessor used CORDIC. The chip contained a ROM with the values of
log2(1+2^-n) for various n. These constants allowed the exponential to be
rapidly computed with shifts and adds.

The Sinclair Scientific calculator was notable for cramming transcendental
functions into a chip designed as a 4-function calculator, so they took some
severe shortcuts. Exponentials were based on computing 0.99^n through repeated
multiplication. Multiplying by 0.99 is super-cheap in BCD since you shift by
two digits and subtract. To compute 10^x, it computes .99^(-229.15*x). Slow
and inaccurate, but easy to implement.

~~~
Stratoscope
I was going to say there is a great article about the Sinclair Scientific
here:

[http://files.righto.com/calculator/sinclair_scientific_simul...](http://files.righto.com/calculator/sinclair_scientific_simulator.html)

And then I noticed who I'm replying to...

Despite its flaws, this calculator was quite an achievement, doing everything
it did with a ROM that holds only 320 instructions.

~~~
vardump
That calculator is a hack in the most positive and brilliant sense of the
word. Crazy constraints required crazy solutions. Amazing!

------
benrbray
Great post! The author touches on Chebychev polynomials, which are the basis
for quite a few numerical analysis tricks, including conjugate gradient [1],
accelerated gradient descent [2], and Chebychev semi-iterative methods (which
find the best combination of past iterates to use during an optimization
procedure; sadly I can't find a good reference).

There are a number of facts/folklore about Chebychev polynomials that seem to
go stated without proof in a lot of papers, so a few years ago I wrote up some
notes [3,4] with the most common claims. Maybe someone will find them useful!

[1] (p35) [http://www.cs.cmu.edu/~quake-papers/painless-conjugate-
gradi...](http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf)

[2] [http://blog.mrtz.org/2013/09/07/the-zen-of-gradient-
descent....](http://blog.mrtz.org/2013/09/07/the-zen-of-gradient-descent.html)

[3] [https://benrbray.com/static/notes/chebychev-
polynomials_dec1...](https://benrbray.com/static/notes/chebychev-
polynomials_dec16.pdf)

[4] [https://benrbray.com/static/notes/minimax-
approx_nov16.pdf](https://benrbray.com/static/notes/minimax-approx_nov16.pdf)

~~~
fluffything
And also quite useful in a wide range of numerical methods for PDEs like
discontinuous Galerkin (and spectral methods in general).

------
dvt
This is an awesome article! One of my favorite SO answers I remember
researching dealt with the Padé approximation of tanh[1] (which I found was
significantly better than the Taylor expansion), but the caveat was that it
only worked within a very narrow neighborhood.

I will say that the article didn't really touch on techniques that minimize
IEEE 754 subnormal[2] performance impact, which is a very interesting problem
in itself. A lot of real-life implementations of something like e^x will have
various clever ways to avoid subnormals.

[1]
[https://stackoverflow.com/a/6118100/243613](https://stackoverflow.com/a/6118100/243613)

[2]
[https://mathworld.wolfram.com/SubnormalNumber.html](https://mathworld.wolfram.com/SubnormalNumber.html)

~~~
fractionalhare
Hi, I am the author. You make a good point about subnormals, thank you. I will
jot that down for a future update to the doc :)

It is funny you mention the Padé approximation; I was debating whether or not
to include that in this article. Originally I planned to stop after Minimax
(using Remez), but if I include rational approximation schemes anyway (e.g.
Carathéodory-Fejer), it probably makes sense to implement and analyze that as
well.

~~~
dia80
Just jumping into to say I am loving the article, thanks! I'm dealing with a
lot of exps in an optimization problem right now and frankly only have cursory
understanding of what's going on under the hood so this was very enlightening
for me. Thanks again.

------
dietrichepp
I recently implemented the exponential function for a sound synthesizer
toolkit I’m working on. The method I used was not mentioned in the article, so
I’ll explain it here.

I used the Remez algorithm, modified to minimize equivalent input error. This
algorithm lets you find a polynomial with the smallest maximum error. You
start with a set of X coordinates, and create a polynomial which oscillates up
and down around the target function, so p(x0) = f(x0) + E, p(x1) = f(x1) - E,
p(x2) = f(x2) + E, etc.

You then iterate by replacing the set of x values with the x values which have
maximum error. This will converge to a polynomial with smallest maximum error.

From my experiments, you can use a fairly low order approximation for musical
applications. Used to calculate frequencies of musical notes, 2nd order is
within 3.0 cents, and 3rd order is within 0.13 cents.

~~~
jbay808
I wouldn't recommend doing this directly -- there's no good polynomial that
can represent the exponential function well over a wide range.

Instead, it's better to exploit the definition of IEEE754 as an exponent and a
mantissa. You calculate the exponent directly (because e^x = 2^(x/ln2)), and
use the Remez algorithm to find a polynomial that fits just the mantissa.

~~~
dietrichepp
Honestly, reading this comment made me a bit unhappy. Don't mansplain to me
the things that I've done. The Remez algorithm requires that you specify the
domain. If you tried to use it to approximate the exponential function over
its entire domain the algorithm would diverge.

You have to explicitly choose a range [x1,x2] for the Remez algorithm. It
minimizes the maximum error over that range.

I'm ordinarily happy to share the work I'm doing but this comment reminds me
of the reasons I don't share much with HN.

~~~
jbay808
I'm writing this because I've done it too, for a performance-critical embedded
application, and I think it's an insight worth sharing, for anybody else who
comes across this and is trying to do something similar. Not to put you down
or one-up you or anything; I'm very sorry to have come across that way. Can I
try again?

For a domain like [0,1], the Remez algorithm will do a great job -- because
the exponent is barely changing, just the mantissa. For a range like [-10,
10], you won't find any polynomial of reasonable order that has acceptable
error.

But, there's a really neat trick that _does_ let the Remez algorithm work over
the entire domain of floats, and get excellent performance with just a fourth
or fifth order polynomial, or even third order if you're pressed for time, and
minimal extra computation.

That is, with just a little simplification, you multiply x by (1/ln(2)), floor
it, and stuff the resulting integer into the exponent bits. Then, you take the
remainder (what is left over after taking the floor), and make _that_ your
input to the polynomial, and stuff the result into the mantissa bits.

There's a little more to it; signs and NaNs need to be handled correctly,
subnormal numbers can be treated as a special case with a different Remez
polynomial, and a few other corner cases and exact values might be important
(e^0, e^1, e^-1). But the result works wonders and it's basically just using
Remez but transforming the input to a domain that behaves more like a
polynomial, leveraging the magic of IEEE-754.

~~~
fractionalhare
This is a good explanation of range reduction with respect to floating point.
I use a similar (but not quite identical) techniue in the article; I may
expand it with more information about what's happening with the exponent and
mantissa bits like you did here.

That reminds me, I'm using the domain [-1, 1] for range reduction. But by
symmetry of e^x, I can actually use [0, 1]. Thank you!

~~~
dietrichepp
Depending on the particular architecture you may prefer [-0.5, +0.5].

------
nimish
Need to be careful about confusing smooth (infinitely differentiable) and
analytic (= to Taylor series) functions.

> Any function with this property can be uniquely represented as a Taylor
> series expansion “centered” at a

It's fairly easy to construct tame looking functions where this isn't true.

~~~
bollu
Blessedly, in the complex analytic setting, these coincide, to they not?
[Attempting to sanity-check my own understanding here]

~~~
contravariant
In fact it is enough for functions to be complex differentiable once (in which
case they'll automatically be differentiable infinitely often).

~~~
ithinkso
It's one of those theorems that still boggles my mind even though I know it
for so many years now. Either complex functions are so much well behaved or
complex differentiability is so much stronger condition, I can't decide which.
Top it off with the uniqueness of analytic continuation and you start to
wonder what causes real functions to be such a pain.

If anyone knows some nice articles about this topic I would love to read them

~~~
joppy
One way of understanding why complex differentiability is so strong is looking
at a complex-to-complex function as a real function of two real inputs and two
real outputs. The fact that h rather than |h| appears in the denominator of
the complex derivative causes the derivative to be “aware” of the rotational
nature of complex functions: this turns into a differential equation which
must be satisfied by the real function (the Cauchy-Riemann equations).

------
m4r35n357
[Aside on Taylor Series] The section on recurrence relations is the basis of a
little-known ODE solving technique sometimes known as the Taylor Series Method
(occasionally Differential Transform). It allows an "extension" of Euler's
Method to arbitrary order (using arbitrary precision arithmetic), by
_calculating_ the higher derivatives rather than approximating them as in RK4
and friends.

Here is an open access link to just one of many papers on the subject:
[https://projecteuclid.org/euclid.em/1120145574](https://projecteuclid.org/euclid.em/1120145574)

It includes a link to the actual software they used.

[EDIT] Obviously applying the technique above to dx/dt = x reproduces the
Taylor Series in the article!

------
philzook
Really interesting article!

Some other resources you may find interesting

[https://fpbench.org/index.html](https://fpbench.org/index.html) \- I think I
saw a talk at FPTalks that felt related to your topic "Creating correctly
rounded math libraries for real number approximations"

[https://fpbench.org/community.html](https://fpbench.org/community.html)

[https://hal-ens-lyon.archives-ouvertes.fr/ensl-01529804](https://hal-ens-
lyon.archives-ouvertes.fr/ensl-01529804) \- Correctly Rounded libm

[https://www.mpfr.org/](https://www.mpfr.org/) \- GNU Multi precision floating
point

[http://sollya.gforge.inria.fr/](http://sollya.gforge.inria.fr/) \- A tool for
floating point code development

------
nullc
A collaborator and I recently had fun code golfing integer implementations of
floor(106*log2(x)) and floor(log2(x!)):

[https://github.com/sipa/minisketch/blob/master/src/false_pos...](https://github.com/sipa/minisketch/blob/master/src/false_positives.h#L20)

------
nayuki
Excellent article; it's very detailed in the mathematical explanations and
code.

I was interested in the exponential function too, but approached it from the
standpoint of arbitrary-precision arithmetic. My explanation is an order of
magnitude shorter, but of course my code is a lot slower than a typical
floating-point algorithm. I offer a way to compute correctly rounded
exponentials without resorting to a big math package like
Mathematica/Maple/etc. [https://www.nayuki.io/page/approximating-eulers-
number-corre...](https://www.nayuki.io/page/approximating-eulers-number-
correctly)

------
defaultcompany
"We can see the values are relatively densely clustered near 0 and
increasingly sparse the further you move away from the origin. As another
example, half of all 32-bit floating point numbers reside in the real interval
[-1,1]. "

I knew this in general but this specific statistic was eye-opening to me.
Systems like OpenGL normalize coordinates so they only use values in this
[-1,1] range. That means they effectively lose half the expressive power of
the 32 bits. Are there other machine representations of real numbers which
would be better for that use case? i.e. restricted to that range and evenly
distributed?

~~~
mota7
'half' is ambiguous here: They lose 1 bit of expressiveness for the mantissa,
and 1 bit in the exponent, giving an efficiency loss of 2 bits from 32 == ~3%.

So there's very little loss in using single-precision floating point, and a
lot of gains in smoothly handling larger numbers that arise from addition et
al.

edit: I'm half wrong. The bit in mantissa isn't wasted, the only bit that's
wasted in the sign bit in the exponent, so the efficiency loss is ~1.5%

~~~
defaultcompany
It agrees with my intuition that one bit lost would reduce the values by half.
I probably don't understand what is meant by efficiency loss but my point is
that a different (theoretical non floating point) encoding would be able to
represent twice as many values within [-1,1] for the same 32 bits. For some
applications that seems like it would be a win.

------
olliej
Ugh I had to implement all the transcendental functions a while ago. It was
miserable, especially dealing with intermediate rounding, etc

I don’t recommend it.

That said I liked that there’s a specific exp-1 function (otherwise you drop
most precision).

~~~
mjcohen
To go with that, you need ln(1+x). HP calculators had both.

~~~
jjgreen
As has C (since C99)

[http://www.cplusplus.com/reference/cmath/log1p/](http://www.cplusplus.com/reference/cmath/log1p/)

------
xelxebar
Enjoying the article. Looks like there's a math typo at the tail of the
"Precision Bound" section:

    
    
        $$ \log\left(\frac{x e}{n}\right) \leq \frac{1}{n}\log(\eta) $$
    

is not equivalent to

    
    
        $$ \frac{xe}{n} \leq \eta^{-n} $$
    

The latter should be

    
    
        $$ \frac{xe}{n} \leq \eta^{1/n} $$
    

and the "$x^{-n}$" in the following paragraph should be changed accordingly.

Certainly understandable flub, though. Working with logarithms, it's easy to
mix up minus sign vs. 1/n conversions.

~~~
fractionalhare
Nice catch, thank you. I'll edit that and credit your comments in the errata
:)

------
chillee
There's 2 cool things related to floating point error I learned recently.

First, `x + x + x + x + x == 5 _x`. This is true for values up to 5, but is
not true for 6. Proving this is a fun exercise, but is a little bit painful
and has a lot of cases.

Second, SMT solvers can actually prove stuff like this relatively easy! In
Z3py, one can prove this in 4 lines of code.

from z3 import _ x = FP('x', FPSort(8, 24)) set_default_rounding_mode(RNE())
solve(5*x != x + x+ x+ x + x)

------
nwallin
IMHO it's a bit cheeky to use exp2 from the standard library. It's likely got
all the same math exp does. Fortunately, if you know that your exponent is an
integer, there's a constant time version. It's 5 instructions: add, and,
shift, mov, ret.

[https://godbolt.org/z/KNyoVd](https://godbolt.org/z/KNyoVd)

~~~
fractionalhare
Excellent, thank you! I agree, I had the use of `exp2(x)` on my list of
implementation nits to revisit. I'm going to revise to use your method and
credit the comment.

------
widdma
Nice article, thanks. For anyone interested in the topic, I highly recommend
Trefethen's Approximation Theory and Approximation Practice. It's
approachable, intuitive, and fun while still covering a lot of technical
detail.

~~~
fractionalhare
I have that in the further reading section :) I personally consider it one of
the best textbooks on numerical analysis.

I only wish Trefethen had written it with general C/C++ instead of his own
custom MATLAB library, chebfun. One of these days I'd like to get around to
implementing chebfun in C++.

------
riverlong
Great read and stunning web design (short of the small italics in the abstract
that are a little harder to read). A beautiful first post, certainly.

------
emmanueloga_
Few miscellaneous meta nitpicks:

* The article looks beautiful, but am I the only one that hates left-centered content? Wide monitors are the norm these days, and reading left centered content literally hurts my neck (need to resize manually to center the content, approximately.

* Why would someone make the effort to write such detailed analysis and conceal their identity? I just don't get what's the objective of not providing any sort of provenance... just a gripe of mine.

------
sakex
I was exactly looking for that today, thanks

------
BernardTatin
Very good article! Thanks!

------
nxpnsv
This is a great read!

