
Practically Accurate Floating-Point Math - ntoronto
http://www.computer.org/cms/Computer.org/ComputingNow/issues/2014/10/mcs2014040080.pdf
======
conistonwater
If you are interested in writing accurate numerical algorithms, as further
reading I highly recommend reading "Accuracy and Stability of Numerical
Algorithms" by Nick Higham, which is a very good comprehensive book. Like this
article says, floating-point arithmetic is often thought of as mysterious, but
it really isn't: it just obeys its own precisely specified rules. I think it's
very good to dispel the mystery, so I really like this article.

Another way to do the same type of error analysis of numerical algorithms is
to use a parameter sensitivity library. In the end, numerical stability can be
thought of as the sensitivity of the final answer to separate independent
changes in each intermediate value obtained in the computation of the answer.
This is sometimes overlooked, I think, and it's quite easy to implement as
well.

I have one quibble. They say "Use (x-y)(x+y) instead of x^2-y^2". This is
true, but (in my opinion) misleading: the function (x,y) -> x^2-y^2 is itself
ill-conditioned when x is near y, and using a more numerically stable
algorithm to evaluate an ill-conditioned function will not improve the
function's condition. If x and y are close to each other, the inaccuracy
introduced by using x^2-y^2 is close to the inaccuracy due to using
approximate values for x and y. So it you have x^2-y^2, I think the first
thing you should worry about is whether your computation is ill-conditioned,
and only then worry whether it's numerically stable. I say this because I
sometimes see people try to evaluate x^2-1, notice it is inaccurate for x near
1, and try to replace x^2-1 with (x-1)*(x+1), which is still inaccurate.

~~~
stephencanon
The correct way to evaluate x^2 - 1 is by using fma(x,x,-1). Now that Intel
and AMD have finally made FMA available in hardware on commodity parts (better
late than never!), it's realistic to start using this much more freely.

~~~
conistonwater
That won't do much for x near 1, because the function x^2-1 is itself ill-
conditioned. In other words, it is relevant that the floating-point value of x
is itself only an approximation to some true value of x. So computing x^2-1
exactly for a given floating-point value of x does not give a good
approximation to the true value of x^2-1. This is a mathematical property of
the function x^2-1, and cannot be fixed with any algorithm. This is basically
why I consider the example x^2-1 => (x-y)(x+y) misleading.

~~~
stephencanon
There are multiple ways to analyze computations; condition number is one of
them. It is relevant when inputs are assumed to be approximations to some
hidden "correct" value. However, this assumption is not always warranted when
dealing with floating-point numbers; sometimes, we would instead like to
analyze errors under the assumption that the inputs are exact values. When
this is the case, fma(x,x,-1) produces a correctly rounded result, whereas
(x-1)(x+1) does not, and the ill-conditioning of the function x^2 - 1 does not
enter into the analysis anywhere.

If you are used to working with complex computations (most of what is usually
referred to as numerical analysis, for example), you likely do not find
yourself in the latter situation very often. However, for those of us who work
on low-level details of floating-point on a regular basis, the latter analysis
is often critical.

~~~
conistonwater
Okay, I agree, that is completely reasonable. I do usually work with fairly
complex algorithms, so I care about individual operations less than about the
final result.

------
SFjulie1
Sorry to tell you, but 99% of so called expert developers don't understand
this. For them a real number is a a float (exactly). And I have been fired for
saying e-commerce should be done with fixed point numbers. (we do + - * / at
most) Headaches really comes with multi currency web sites since conversion
brings incommesurability. They also told me that since we are using linear
equation most of the time (are we?), errors should dissipates like the breeze
(they also kind of say badlands are so statiscally rare that we sould not
care).

Anyway, in real job as a dev, when I made a moving average it was the most
impressive math I done, and when I was resorting to ifft (convolution with a
hammer function) I was told it should not be used because it was
unmaintainable because no one understood.

I have been so ... hammered down by computer engineer so sure of themselves
that I learnt to shut my mouth to keep my job.

Who really read this?

I love it, but I never did any serious brainy stuff as a dev.

~~~
ntoronto
Ugh. It's shameful when people use wrong arguments about _mathematics_ to
oppress. I really wish you the best.

This article first appeared in "Computing in Science and Engineering"
magazine, so its audience is mainly scientists in various fields who use
floating point in their research. Because of that breadth, I tried to make it
as accessible as possible, while teaching enough floating-point principles to
make debugging make sense.

I honestly hope that everyone who uses floating point reads this.

~~~
SFjulie1
Computer science on the field is ugly.

I went in university to learn physics, so I have my geometry and numerical
analysis basic understanding.

I honestly thinks people who learnt it has a cursus for graduating did not
care, and that they actually don't understand their difficulties even though
they learnt about it. They don't see it is a float.

One of the other misconception I have to fight with is what is time.

CS engineer believe in a perfect synchronized time and timestamp that are
monotonicly growing the same everywhere with a 10-9 precision and rely on that
for distributed systems. My guts (and general relativity too I guess) are
telling me they are wrong.

But since I am stuck reimplementing a bubble sort for phone numbers I don't
quite have the time to work on it. I have to live with my tingling intuitions
telling me I am either an overpessimistic ass being a pain for my colleague
that is becoming obsolete by lack of studying, or the whole industry seems
getting more magic oriented than science oriented.

Anyway, like a lot of people I am getting inadapted out there.

At least, I have spare time to work on game of life and complex systems...
sometimes, and I have a lovely caring wife, so I am happy either way :)

------
ThatGeoGuy
Probably an obvious error, but on page 81 (second page), they say the
following:

    
    
        Another correct test case is as follows:
        > (rat (float -1 23 -6))
        - (26) / (34)
    

In truth, this should be - 23 / 64, which is a very different number. Perhaps
there's comical value in an article about accurate math having inaccurate
typography?

~~~
ntoronto
That is indeed a typo. Thanks for catching it.

It's amazing how much you miss when you think you already know what's written.

------
dvt
Even though refreshing and enlightening, the article doesn't cover another
(major) reason why floating-point math is generally avoided (especially in
high-performance applications): computational slowdown when dealing with
subnormals[1][2].

I feel that there is a lot of overlap between cases where you may want to
minimize error while at the same time still be performant (simulations, ray
tracing, rendering, etc.). So you're left with a situation where you end up
minimizing error, but still being slow.

[1] [http://research.ihost.com/osihpa/osihpa-
govind.pdf](http://research.ihost.com/osihpa/osihpa-govind.pdf)

[2]
[https://charm.cs.illinois.edu/newPapers/06-13/paper.pdf](https://charm.cs.illinois.edu/newPapers/06-13/paper.pdf)

~~~
stephencanon
A couple points:

Support for subnormals can easily be disabled (usually be enabling "Flush to
Zero" behavior) to allow for better performance on essentially all recent
hardware.

Over the last few years, subnormal stalls have shrunk rapidly, and gone away
entirely in some cases. Intel started this process with the Sandybridge
architecture. These stalls are also greatly reduced in some arm64
implementations.

~~~
ntoronto
We had this discussion on the Racket user's mailing list a couple of years
ago. One user with an AMD processor experienced significant stalls; others
with Intel processors didn't.

I don't know the status of AMD's floating point at the moment, but I hope
they're doing something about that.

------
ntoronto
This article (like all other IEEE Computing Now articles) is free to download
for a limited time.

~~~
alricb
Is it me, or have all the ells (the letter L) been replaced by ones (the digit
1) in the PDF?

Edit: It's me. It's just a font with sloping serifs, which is what confused
me.

~~~
ntoronto
It's not necessarily just you. I thought it was wrong when I first opened it.
There may be something funny going on with the font - I think the "1" and "l"
are switched.

Edit: Never mind, it copies/pastes correctly. The font is just different from
what I'm used to.

~~~
alricb
TIL: Prestige Elite [1] might not be the best font out there for programming.

1:
[http://en.wikipedia.org/wiki/Prestige_Elite](http://en.wikipedia.org/wiki/Prestige_Elite)

------
wukix
Since we're on the subject of floating-point numbers in Lisp/Scheme:
[https://wukix.com/lisp-decimals](https://wukix.com/lisp-decimals)

This uses the Lisp ratio type in place of floats, to avoid the usual float
issues.

ntoronto, any thoughts?

~~~
malkia
ratio's are good to have in any language, or as a library, but surely it's
slower, and more memory is wasted (also hard to argue about how much exactly),
and it still has to settle at some approximations for irrational numbers like
PI which is used quite a lot when floating point numbers are (or at least in
my experience).

~~~
ntoronto
All correct, for exact rationals. Racket's exact rational arithmetic tends to
take about 1000x the amount of time floating-point arithmetic takes to compute
similar functions, and creates a lot of garbage on the heap. It gets worse,
though: with long-running computations, exact rationals' numerators and
denominators tend to grow without bound, unless you explicitly round them to a
fixed precision. If you take the latter path, you might as well use bigfloats,
which wrap MPFR's multi-precision floats, and are faster.

Just looking at the Wu-Decimal page, though, I can't tell whether they're
internally exact rationals or base-10 floats. If they're base-10 floats, they
have all the same issues base-2 floats have. If they're internally exact
rationals, I wonder what they do for division, which the set D isn't closed
under.

~~~
wukix
Wu-Decimal uses exact rationals (they aren't base 10 floats). Division works
according to normal Lisp semantics, since the CL ratio type is used for
arithmetic. Let's say you divide something by 3 and now you have infinitely
repeating digits: then it is no longer in set D, and Wu-Decimal no longer
considers it to be of decimal type. Instead, it is treated as a fraction,
again per standard CL semantics. The "Printing" example tries to clarify this
(notice that 1/2 prints as '0.5' but 1/3 remains '1/3').

~~~
ntoronto
I've monkeyed with a lot of numeric representations, and every one of them
involves tough theoretical and practical trade-offs. Returning an exact
rational as the result of division sounds reasonable.

My favorite representation so far pairs a signed float in [2^-512,2^512) with
a signed integer to extend the exponent range. The idea is to avoid overflow
and underflow, particularly when multiplying thousands of probabilities.

(On average, adding a few thousand log probabilities or densities and then
exponentiating the sum yields a number with about 9 digits precision. Worst
case for adding log probabilities and exponentiating is about 8 digits
precision, and for adding log _densities_ it's 0 digits. Multiplying the same
number of probabilities or densities retains about 13 digits in the worst
case.)

