Implementing the Exponential Function 215 points by hackernewsn00b on June 28, 2020 | hide | past | favorite | 59 comments

 A couple other ways to compute the exponential function: The Intel 8087 coprocessor used CORDIC. The chip contained a ROM with the values of log2(1+2^-n) for various n. These constants allowed the exponential to be rapidly computed with shifts and adds.The Sinclair Scientific calculator was notable for cramming transcendental functions into a chip designed as a 4-function calculator, so they took some severe shortcuts. Exponentials were based on computing 0.99^n through repeated multiplication. Multiplying by 0.99 is super-cheap in BCD since you shift by two digits and subtract. To compute 10^x, it computes .99^(-229.15*x). Slow and inaccurate, but easy to implement.
 I was going to say there is a great article about the Sinclair Scientific here:http://files.righto.com/calculator/sinclair_scientific_simul...And then I noticed who I'm replying to...Despite its flaws, this calculator was quite an achievement, doing everything it did with a ROM that holds only 320 instructions.
 That calculator is a hack in the most positive and brilliant sense of the word. Crazy constraints required crazy solutions. Amazing!
 Yes, I like that article :-)
 Great post! The author touches on Chebychev polynomials, which are the basis for quite a few numerical analysis tricks, including conjugate gradient [1], accelerated gradient descent [2], and Chebychev semi-iterative methods (which find the best combination of past iterates to use during an optimization procedure; sadly I can't find a good reference).There are a number of facts/folklore about Chebychev polynomials that seem to go stated without proof in a lot of papers, so a few years ago I wrote up some notes [3,4] with the most common claims. Maybe someone will find them useful!
 And also quite useful in a wide range of numerical methods for PDEs like discontinuous Galerkin (and spectral methods in general).
 This is an awesome article! One of my favorite SO answers I remember researching dealt with the Padé approximation of tanh[1] (which I found was significantly better than the Taylor expansion), but the caveat was that it only worked within a very narrow neighborhood.I will say that the article didn't really touch on techniques that minimize IEEE 754 subnormal[2] performance impact, which is a very interesting problem in itself. A lot of real-life implementations of something like e^x will have various clever ways to avoid subnormals.
 Hi, I am the author. You make a good point about subnormals, thank you. I will jot that down for a future update to the doc :)It is funny you mention the Padé approximation; I was debating whether or not to include that in this article. Originally I planned to stop after Minimax (using Remez), but if I include rational approximation schemes anyway (e.g. Carathéodory-Fejer), it probably makes sense to implement and analyze that as well.
 Just jumping into to say I am loving the article, thanks! I'm dealing with a lot of exps in an optimization problem right now and frankly only have cursory understanding of what's going on under the hood so this was very enlightening for me. Thanks again.
 You might enjoy a few of my favorite sigmoids: https://raphlinus.github.io/audio/2018/09/05/sigmoid.htmlThat's not a deep exploration of the numerical analysis (especially compared with OP), but it does make one point: traditional "numerical analysis" techniques tend to assume the traditional four mathematical operators as canonical, but if your goal is to get something computed on actual computer hardware quickly and reasonably accurately, you have a wider palette to play with. In particular, inverse square root is a primitive about as efficient as reciprocal on modern hardware (or if using Carmack's wtf trick).
 This is great! Sometimes I wish I took more analysis back in school.
 Just a suggestion, have you looked at the MPFR project, and derivatives: https://www.mpfr.org/http://mpfr.loria.fr/mpfr-current/mpfr.html#index-mpfr_005fs...It is way to deep in the grubby details for my pay grade, but who knows?
 I recently implemented the exponential function for a sound synthesizer toolkit I’m working on. The method I used was not mentioned in the article, so I’ll explain it here.I used the Remez algorithm, modified to minimize equivalent input error. This algorithm lets you find a polynomial with the smallest maximum error. You start with a set of X coordinates, and create a polynomial which oscillates up and down around the target function, so p(x0) = f(x0) + E, p(x1) = f(x1) - E, p(x2) = f(x2) + E, etc.You then iterate by replacing the set of x values with the x values which have maximum error. This will converge to a polynomial with smallest maximum error.From my experiments, you can use a fairly low order approximation for musical applications. Used to calculate frequencies of musical notes, 2nd order is within 3.0 cents, and 3rd order is within 0.13 cents.
 This is way cool! Mad respect to you audio engineers out there. I tried implementing FFT last year in one of my open-source projects[1] (key word: tried).
 Glancing at your code diff, I think you're missing the bit-reversed permutation. Either way, feel free to steal from: https://www.nayuki.io/page/free-small-fft-in-multiple-langua...
 I wouldn't recommend doing this directly -- there's no good polynomial that can represent the exponential function well over a wide range.Instead, it's better to exploit the definition of IEEE754 as an exponent and a mantissa. You calculate the exponent directly (because e^x = 2^(x/ln2)), and use the Remez algorithm to find a polynomial that fits just the mantissa.
 Honestly, reading this comment made me a bit unhappy. Don't mansplain to me the things that I've done. The Remez algorithm requires that you specify the domain. If you tried to use it to approximate the exponential function over its entire domain the algorithm would diverge.You have to explicitly choose a range [x1,x2] for the Remez algorithm. It minimizes the maximum error over that range.I'm ordinarily happy to share the work I'm doing but this comment reminds me of the reasons I don't share much with HN.
 I'm writing this because I've done it too, for a performance-critical embedded application, and I think it's an insight worth sharing, for anybody else who comes across this and is trying to do something similar. Not to put you down or one-up you or anything; I'm very sorry to have come across that way. Can I try again?For a domain like [0,1], the Remez algorithm will do a great job -- because the exponent is barely changing, just the mantissa. For a range like [-10, 10], you won't find any polynomial of reasonable order that has acceptable error.But, there's a really neat trick that does let the Remez algorithm work over the entire domain of floats, and get excellent performance with just a fourth or fifth order polynomial, or even third order if you're pressed for time, and minimal extra computation.That is, with just a little simplification, you multiply x by (1/ln(2)), floor it, and stuff the resulting integer into the exponent bits. Then, you take the remainder (what is left over after taking the floor), and make that your input to the polynomial, and stuff the result into the mantissa bits.There's a little more to it; signs and NaNs need to be handled correctly, subnormal numbers can be treated as a special case with a different Remez polynomial, and a few other corner cases and exact values might be important (e^0, e^1, e^-1). But the result works wonders and it's basically just using Remez but transforming the input to a domain that behaves more like a polynomial, leveraging the magic of IEEE-754.
 This is a good explanation of range reduction with respect to floating point. I use a similar (but not quite identical) techniue in the article; I may expand it with more information about what's happening with the exponent and mantissa bits like you did here.That reminds me, I'm using the domain [-1, 1] for range reduction. But by symmetry of e^x, I can actually use [0, 1]. Thank you!
 Depending on the particular architecture you may prefer [-0.5, +0.5].
 For those interested in more detail / explanation, the paper below does a good job explaining the basic idea of how this works, but with only a 1st order approximation for the mantissa bits. The first order approximation reduces the calculation of exp to just one multiply and one add - which are often combined into just one instruction on modern processors.
 Most of this is covered in the original article.
 > Don't mansplain to meI honestly don't think this has anything to do with your gender or the gender of the person you're replying to.
 Remez is the most common implementation of Minimax approximation algorithms, so it's probably coming when the author makes it to section 6.
 Yes, this is the plan :)
 I used something similar to implement an approximate square root function in a graphical DSP platform that only has addition and multiplication. A 4th-order polynomial gives satisfactory accuracy from -20dBFS to 0 dBFS, which was the region I cared about. It's hard to do better near the origin (-inf dB) though, because the slope of the square root approaches infinity.
 Need to be careful about confusing smooth (infinitely differentiable) and analytic (= to Taylor series) functions.> Any function with this property can be uniquely represented as a Taylor series expansion “centered” at aIt's fairly easy to construct tame looking functions where this isn't true.
 Nice spot, thank you! I have corrected that. I also included your comment in an acknowledgments and errata section of the doc.
 Blessedly, in the complex analytic setting, these coincide, to they not? [Attempting to sanity-check my own understanding here]
 In fact it is enough for functions to be complex differentiable once (in which case they'll automatically be differentiable infinitely often).
 It's one of those theorems that still boggles my mind even though I know it for so many years now. Either complex functions are so much well behaved or complex differentiability is so much stronger condition, I can't decide which. Top it off with the uniqueness of analytic continuation and you start to wonder what causes real functions to be such a pain.If anyone knows some nice articles about this topic I would love to read them
 One way of understanding why complex differentiability is so strong is looking at a complex-to-complex function as a real function of two real inputs and two real outputs. The fact that h rather than |h| appears in the denominator of the complex derivative causes the derivative to be “aware” of the rotational nature of complex functions: this turns into a differential equation which must be satisfied by the real function (the Cauchy-Riemann equations).
 It's the latter. Complex differentiability is a very strong condition.
 Yes, all locally smooth complex functions (e.g. whose real and imaginary parts satisfy the cauchy-riemann equations) are analytic, one of the main reasons complex analysis is way less of a pain than real analysis.
 [Aside on Taylor Series] The section on recurrence relations is the basis of a little-known ODE solving technique sometimes known as the Taylor Series Method (occasionally Differential Transform). It allows an "extension" of Euler's Method to arbitrary order (using arbitrary precision arithmetic), by _calculating_ the higher derivatives rather than approximating them as in RK4 and friends.Here is an open access link to just one of many papers on the subject: https://projecteuclid.org/euclid.em/1120145574It includes a link to the actual software they used.[EDIT] Obviously applying the technique above to dx/dt = x reproduces the Taylor Series in the article!
 Really interesting article!Some other resources you may find interestinghttps://fpbench.org/index.html - I think I saw a talk at FPTalks that felt related to your topic "Creating correctly rounded math libraries for real number approximations"https://fpbench.org/community.htmlhttps://hal-ens-lyon.archives-ouvertes.fr/ensl-01529804 - Correctly Rounded libmhttps://www.mpfr.org/ - GNU Multi precision floating pointhttp://sollya.gforge.inria.fr/ - A tool for floating point code development
 A collaborator and I recently had fun code golfing integer implementations of floor(106*log2(x)) and floor(log2(x!)):
 Excellent article; it's very detailed in the mathematical explanations and code.I was interested in the exponential function too, but approached it from the standpoint of arbitrary-precision arithmetic. My explanation is an order of magnitude shorter, but of course my code is a lot slower than a typical floating-point algorithm. I offer a way to compute correctly rounded exponentials without resorting to a big math package like Mathematica/Maple/etc. https://www.nayuki.io/page/approximating-eulers-number-corre...
 "We can see the values are relatively densely clustered near 0 and increasingly sparse the further you move away from the origin. As another example, half of all 32-bit floating point numbers reside in the real interval [-1,1]. "I knew this in general but this specific statistic was eye-opening to me. Systems like OpenGL normalize coordinates so they only use values in this [-1,1] range. That means they effectively lose half the expressive power of the 32 bits. Are there other machine representations of real numbers which would be better for that use case? i.e. restricted to that range and evenly distributed?
 'half' is ambiguous here: They lose 1 bit of expressiveness for the mantissa, and 1 bit in the exponent, giving an efficiency loss of 2 bits from 32 == ~3%.So there's very little loss in using single-precision floating point, and a lot of gains in smoothly handling larger numbers that arise from addition et al.edit: I'm half wrong. The bit in mantissa isn't wasted, the only bit that's wasted in the sign bit in the exponent, so the efficiency loss is ~1.5%
 It agrees with my intuition that one bit lost would reduce the values by half. I probably don't understand what is meant by efficiency loss but my point is that a different (theoretical non floating point) encoding would be able to represent twice as many values within [-1,1] for the same 32 bits. For some applications that seems like it would be a win.
 Ugh I had to implement all the transcendental functions a while ago. It was miserable, especially dealing with intermediate rounding, etcI don’t recommend it.That said I liked that there’s a specific exp-1 function (otherwise you drop most precision).
 To go with that, you need ln(1+x). HP calculators had both.
 As has C (since C99)
 I believe that they're all part of the IEEE 754 spec, but some of these functions are just recommended.x87 has pretty much all of them, including two remainder implementations \o/
 I did implement that, and sqrt, etc. With integer math because the misery of needing more precision :D (or is that :( ? )
 Enjoying the article. Looks like there's a math typo at the tail of the "Precision Bound" section: $$\log\left(\frac{x e}{n}\right) \leq \frac{1}{n}\log(\eta)$$  is not equivalent to $$\frac{xe}{n} \leq \eta^{-n}$$  The latter should be $$\frac{xe}{n} \leq \eta^{1/n}$$  and the "$x^{-n}$" in the following paragraph should be changed accordingly.Certainly understandable flub, though. Working with logarithms, it's easy to mix up minus sign vs. 1/n conversions.
 Nice catch, thank you. I'll edit that and credit your comments in the errata :)
 There's 2 cool things related to floating point error I learned recently.First, x + x + x + x + x == 5x. This is true for values up to 5, but is not true for 6. Proving this is a fun exercise, but is a little bit painful and has a lot of cases.Second, SMT solvers can actually prove stuff like this relatively easy! In Z3py, one can prove this in 4 lines of code.from z3 import x = FP('x', FPSort(8, 24)) set_default_rounding_mode(RNE()) solve(5*x != x + x+ x+ x + x)
 IMHO it's a bit cheeky to use exp2 from the standard library. It's likely got all the same math exp does. Fortunately, if you know that your exponent is an integer, there's a constant time version. It's 5 instructions: add, and, shift, mov, ret.
 Excellent, thank you! I agree, I had the use of exp2(x) on my list of implementation nits to revisit. I'm going to revise to use your method and credit the comment.
 I think you can spare the "and", because if you overflow the exponent field, the result will be wrong either way.
 Nice article, thanks. For anyone interested in the topic, I highly recommend Trefethen's Approximation Theory and Approximation Practice. It's approachable, intuitive, and fun while still covering a lot of technical detail.
 I have that in the further reading section :) I personally consider it one of the best textbooks on numerical analysis.I only wish Trefethen had written it with general C/C++ instead of his own custom MATLAB library, chebfun. One of these days I'd like to get around to implementing chebfun in C++.
 Great read and stunning web design (short of the small italics in the abstract that are a little harder to read). A beautiful first post, certainly.
 Few miscellaneous meta nitpicks:* The article looks beautiful, but am I the only one that hates left-centered content? Wide monitors are the norm these days, and reading left centered content literally hurts my neck (need to resize manually to center the content, approximately.* Why would someone make the effort to write such detailed analysis and conceal their identity? I just don't get what's the objective of not providing any sort of provenance... just a gripe of mine.
 I was exactly looking for that today, thanks
 Very good article! Thanks!