
Implementing cosine in C from scratch - azhenley
http://web.eecs.utk.edu/~azh/blog/cosine.html
======
0xfaded
And, should have used Chebyshev polynomials. Or if you look at libc, you'll
find coefficients for a hand picked optimized polynomial that minimizes error
evenly across the floating point domain.

Actually minimizing error in the floating point domain is a really cool
subject. Intuitively, you want the most accuracy when you're close to zero and
the mantissa is 10e-37. When you're in the region with mantissa 1, +- 10e-10
means a lot less.

One "trick" is to approximate a different polynomial, and then multiply by x,
since x itself is naturally distributed across the floating point domain.

E.g, instead of approximating sin(x), approximate sin(x)/x distributing the
error evenly between +-pi/4\. Then multiply by x to get sin(x).

For cos, cos(x) gets small and difficult at +-pi/2\. If you look at libc,
you'll find lots of if statements that apply different hacks to preserve
accuracy depending on how close to zero the result is.

~~~
ChrisLomont
>And, should have used Chebyshev polynomials

If I remember correctly, Pade approximants would be more accurate with less
work. Instead of using a polynomial, you use the ratio of two polynomials.
They generally work better than Chebyshev since they're a larger class of
approximants.

~~~
Someone
Padé approximants
([https://en.m.wikipedia.org/wiki/Padé_approximant](https://en.m.wikipedia.org/wiki/Padé_approximant))
can be better, but “larger class” isn’t a good criterion. The ratio of two
polynomials of identical complexity will have twice the number of degrees of
freedom as each of its polynomials, and will typically take twice as long to
compute (and that’s ignoring the division, which is relatively expensive).

So, instead of dividing two polynomials of degree n, you could compute one of
degree 2n.

~~~
ChrisLomont
>The ratio of two polynomials of identical complexity will have twice the
number of degrees of freedom as each of its polynomials

I wrote with "Pade approximants would be more accurate with _less work._ " for
a reason.

If they have identical complexity they cannot have twice the degrees of
freedom. You're confusing ideas here.

First, there are a lot of ways to evaluate Pade than a naive compute two
polynomials and then divide them. There's quite a large literature on
evaluation methods, just as there are for polynomails (and the naive
polynomial evaluation method is generally terrible - at least consider
Horner's method).

Second, the Chebyshev poly of deg d can simply be the Pade poly P(n) / 1 with
P(n) of degree d. Thus Checbyshev is a special case of Pade at the same cost.
Thus it's a smaller class, in the pure mathematical sense that one space of
functions is a strict subset of the other.

So if you want to check all approximations of fixed computational complexity,
vary the top and bottom degrees appropriately to check a strictly larger space
of approximants for fixed computational cost. You'll usually find that Pade,
for the same computational cost, makes a better approximant. It's simply
richer.

What's more, once you have computed the various powers of x, you can reuse
them (unless you're using Horner's method or similar, which is generally
better).

If you're really looking for minmal cost computation, there is another entire
literature on decomposition of functions to minimal operations, and finding
the optimal one is NP-complete (but completely doable for these low order
approximations). Knuth TAOCP has an intro to the lit under (if I recall)
"addition chains" for evaluating polynomials. There is parallel theory for
Pade.

Thus Pade is a strictly better method. It completely encompasses Chebyshev,
and since I've used it a lot, I have found it always better, _for the same
computational work_.

It simply let's you optimize over a larger class of approximants.

------
magnio
An article in a similar vein, "Implementing the Exponential Function" in C++
and Python, with detailed analysis and visualizations for the error.

[https://www.pseudorandom.com/implementing-
exp](https://www.pseudorandom.com/implementing-exp)

~~~
azhenley
Oh wow, those visualizations are fantastic. Thanks for sharing! Maybe one day
I’ll update my figures.

------
okaleniuk
You can also approximate a cosine with a polynomial
[https://wordsandbuttons.online/sympy_makes_math_fun_again.ht...](https://wordsandbuttons.online/sympy_makes_math_fun_again.html)

This works fast, doesn't use the cache, and performs reasonably well
precision-wise. It also has very small error in +/-PI/2 so, unlike with the
Taylor's series, the continuation is virtually seamless.

Moreover, since cosine is symmetrical, you can throw away the even
coefficients completely and it will become even faster.

~~~
Someone
The truncated Taylor series is a polynomial, too, and also even, but has the
disadvantage that it doesn’t distribute the error evenly (it’s zero at zero,
but in general larger than that of the polynomial that optimizes for
minimizing the maximum error over the interval of interest)

You may want to optimize for something else than minimizing the maximum error,
though, as it may have glaring errors. For example, for _sin(x)_ , it feels
worse to have _sin(0)_ not be exactly equal to zero than to have _sin(π /4)_
not be exactly equal to ½√2.

It may be hard to come up with a good better optimization criterion, though,
and it may be easier to avoid ‘howler’ errors by making your approximation
good enough.

------
cardiffspaceman
Since the fine article mentioned displaying a cosine wave as the goal, I would
also suggest that people in a similar situation might consider the Minsky
algorithm for circle drawing, which generates points around the circle. You
can then take one of the coordinates and use it as the amplitude of a sine
wave. Minsky's algorithm is discussed on this site [1] and in great detail, as
one of the members of the family of DDA algorithms here [2] The two citations
converge in that the HN discussion discusses the circle as a solution of a
differential equation, e.g. this comment [3]

[1]
[https://news.ycombinator.com/item?id=15266331](https://news.ycombinator.com/item?id=15266331)

[2]
[https://www.hindawi.com/journals/mse/2014/916539/](https://www.hindawi.com/journals/mse/2014/916539/)

[3]
[https://news.ycombinator.com/item?id=15268477](https://news.ycombinator.com/item?id=15268477)

------
Const-me
I often use this method:
[https://github.com/microsoft/DirectXMath/blob/83634c742a85d1...](https://github.com/microsoft/DirectXMath/blob/83634c742a85d1027765af53fbe79506fd72e0c3/Inc/DirectXMathMisc.inl#L2158-L2193)

Similar code with optional double precision support and more options:
[https://www.geometrictools.com/GTE/Mathematics/CosEstimate.h](https://www.geometrictools.com/GTE/Mathematics/CosEstimate.h)

If you don’t like C++, both of them are fairly simple to port to C.

------
gdevenyi
Another option not covered is a CORDIC implementation, which is particularly
helpful for use in microcontrollers. I did this back in the day to sort out
compass directions for a navigation signal.

[https://en.wikipedia.org/wiki/CORDIC](https://en.wikipedia.org/wiki/CORDIC)

~~~
azhenley
There is indeed a section on CORDIC with a picture!

I didn’t benchmark it though. I’m curious how it would do.

------
legerdemain
To reduce the number of redundant operations, why not use Horner's method? A
polynomial like a _x_ x _x + b_ x _x + c_ x + d becomes d + x _(c + x_ (b +
a*x). The more terms you have, the more flops you save.

------
h2odragon
Very nice, thanks. It's easy to take these things for granted.

------
greatNespresso
Really cool work, thanks !

