
Consistency: How to defeat the purpose of IEEE floating point (2008) - aw1621107
https://yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html
======
acqq
> 99.99% of the code snippets in that realm work great with 64b floating
> point, without the author having invested any thought at all into "numerical
> analysis"

In the accident that they "work great" (really, _great_?) it's only because:

\- The code depends on properties of IEEE FP which were designed exactly so
that it's harder for a casual user to shoot himself in the foot -- ant these
properties were intentionally designed for IEEE FP by the people who DID
invest a lot in "numerical analysis" and the practical consequences of
potential bad decisions.

\- The code depends on libraries that were designed with much more effort than
the author of the above statement can imagine.

In short, yes, we do need _all_ features of IEEE FP. And to produce anything
non-trivial one should indeed learn more about all that, and care.

> Summary: use SSE2 or SSE, and if you can't, configure the FP CSR to use 64b
> intermediates and avoid 32b floats. Even the latter solution works passably
> in practice, as long as everybody is aware of it.

That was, and I guess it hasn't changed, the default with Microsoft's
compilers on Windows for decades already, and probably sensible default for
non-Microsoft scenarios, especially needed for the "consistency" across the
compilers, which matches the title of the article. Oh, and make sure that the
compiler doesn't do any optimization that produces unstable results.

That's about the "production" default. However, I still believe that during
the development of anything non-trivial the evaluation of the results using
different numbers of bits is worth doing.

------
radford-neal
One additional problem is that IEEE floating point fails to require that
addition and multiplication be commutative.

"WHAT?", you say? Surely it has to be commutative!

Well, it is, except in cases where both operands are "NaN" (Not a Number). You
see, there's not just one NaN, but many, with different "payloads", intended
to indicate the source of the error leading to a NaN. The payload gets
propagated through arithmetic. But what happens when both operands are NaN,
with different payloads? The standard says that the result is one or the other
of these NaNs, but leaves unspecified which.

The old Intel FPU chose the NaN with the larger payload, which gives results
independent of the operand order. But SSE uses the payload from the first
operand. And so we get non-commutative addition and multiplication.

The compilers, of course, assume these operations are commutative, so the
results are completely arbitrary.

One practical effect: In R, missing data - NA - is implemented as a NaN with a
particular payload. So in R, if you write something like NA+sqrt(-1), you
arbitrarily get either NA or NaN as the result, and you probably get the
opposite for sqrt(-1)+NA. And both might vary depending on the context in
which the computation occurs (eg, in vector arithmetic or not).

------
onetoo
This is also an issue in video game programming, where this lack of
consistency causes issues in the implementation of replays or lockstep
networking. The core idea of both is to store/share the inputs for each frame,
such that the game's state can be derived from them. Even small
inconsistencies every frame can explode in size due to the sheer amount of
frames.

If you think this article is interesting, you may also be interested in
learning about posits.

They are an alternative to floats with better precision near 0, which, the
authors claim, makes them superior for things like machine learning. Relevant
to this article is the fact that they are defined to be consistent, so if they
become popular this will never be an issue again.

Here is an article from the authors of posit which explains its advantages.
[http://www.johngustafson.net/pdfs/BeatingFloatingPoint.pdf](http://www.johngustafson.net/pdfs/BeatingFloatingPoint.pdf)

Here is a more nuanced look at posits, which explains its disadvantages.
[https://hal.inria.fr/hal-01959581v3/document](https://hal.inria.fr/hal-01959581v3/document)

------
saagarjha
> Compilers, or more specifically buggy optimization passes, assume that
> floating point numbers can be treated as a field – you know, associativity,
> distributivity, the works.

Of course, this largely depends on how "YOLO" your compiler is. I believe GCC
and Clang try reasonably hard to follow IEEE 754, which ICC is much more lax.

~~~
not2b
Compilers do no such thing: they do not assume that floating point + and * are
associative, because they most definitely aren't. Roundoff errors are
different, and the effect of this can be huge.

Consider (a+b)+c vs a+(b+c). Now suppose a is 1, b is 2.0^60, c is -b. The
first expression evaluates to 0, the second evaluates to 1.

Scientific and engineering code is carefully designed for numerical stability
and compilers must not mess this up.

gcc has a "fast math" flag you can use if you don't care about the accuracy of
your results.

~~~
evozer
I don't understand your example, why would

(1 + 2.060) - 2.060 == 0

while

1 + (2.060 - 2.060) == 1?

Am I just misunderstanding what you wrote?

~~~
not2b
No, the site messed up my comment. b is supposed to be 2 raised to the 60th
power. The two asterisks were removed. Let's try ^, b is 2^60, c is minus b. I
edited my original comment.

------
jstewartmobile
Many of the older, business-driven mainframe designs have hardware BCD
instructions--for faithful/performant implementation of grade-school-style
dollars-and-cents base-10 arithmetic.

On the other hand, a great deal of PC evolution has been driven by games--
where performance is king. Hard to beat IEEE floating point on performance &
storage efficiency!

Then there are the rusty sharp edges of x86, but that is life...

I wonder if `-O0` would solve the inconsistency? I don't particularly trust
many compiler optimizations--too much temptation for a compiler writer to go
performance-crazy, and start treating this computer voodoo like it was actual
algebra.

~~~
saagarjha
> I don't particularly trust many compiler optimizations

They're pretty decent most of the time, unless you go do something undefined.

~~~
jstewartmobile
the basic math in his article wasn't undefined

------
seanalltogether
How well does floating point work for 3D games/programs and gpus? That seems
to be a very large category of floating point usage but I have no knowledge on
whether it works well in that space. Would gpus be x% faster if they didn't
have to do floating point, would games have more or less rendering problems
without floating point?

~~~
taneq
Generally floating point works fine for most things. Every now and then it
doesn't, and you have to be aware enough of its weaknesses to (a) detect this,
(b) understand what the hell is going on, and (c) find a workaround for it.

~~~
magicalhippo
Indeed. Had a fun issue once where a colleague had prematurely optimized an
expression to (a * c * e) / (b * d * f), this was in a somewhat hot path hence
he wanted to eliminate divisions.

Turned out that in certain cases, the factors were all tiny and so both the
numerator and denominator became denormalized and the division returned a NaN,
ruining all further computations.

After a lot of debugging trying to find the source of the NaNs and
understanding why exactly this happened the solution was quite clear, simply
expand it back out: (a/b) * (c/d) * (e/f).

This was because due to the nature of the math being implemented, those ratios
would always be roughly around order unity, even though each individual
variable could potentially contain a very small number.

This costs us like 0.1% performance, but made the code handle all inputs
without issue.

~~~
srean
This comes up a lot in calculating odds, or ratio of probabilities. How one
implements these in code is a good indicator of how much experience a person
has with real world scenarios. One of those shibboleths. Another telltale
giveaway is accumulating the dot product of float32s in float32s.

~~~
eutectic
Serious probability calculations are usually best made in log-space, turning
multiplication into addition.

~~~
srean
Indeed --
[https://en.wikipedia.org/wiki/Log_semiring](https://en.wikipedia.org/wiki/Log_semiring)

~~~
taneq
See this is why we keep you maths people around. Sometimes all that abstract
wizardry is really useful! :)

~~~
srean
I just hope that wasn't bitter sarcasm.

I think this would have been a more relevant link

[https://en.wikipedia.org/wiki/LogSumExp#log-sum-
exp_trick_fo...](https://en.wikipedia.org/wiki/LogSumExp#log-sum-
exp_trick_for_log-domain_calculations)

------
PaulHoule
Numeric pros are not that happy w/ IEEE numbers. The main intellectual effort
involved was that Intel had some freshers make a floating point coprocessor,
then the standard just documented what the chip did.

~~~
gjm11
This seems ... not very accurate?

My understanding of the history:

Intel hired William Kahan (a professor at Stanford, already quite eminent, and
familiar with FP on existing mainframes) to help get the FP design right,
precisely because they hoped to make it a standard.

Then _other_ microprocessor companies got an IEEE standardization effort
going. Kahan went along to the first meeting, went back to Intel and persuaded
them to take part too, and brought a proposal based on the (still in progress)
8087 design to the meeting. There were rival proposals from other companies.

The Intel proposal won largely because Intel had thought through the details
better than the others.

(For instance: The biggest fight was over something called gradual underflow.
Intel wanted it, DEC didn't. DEC hired a numerics expert to look into gradual
underflow, with the expectation that he would report that it wasn't useful. He
looked into it and reported that in fact it was a good idea and ought to be
done.)

So: (1) it wasn't just "some freshers", it included at least one really big
name in the field; (2) the standard and the 8087 were being developed at the
same time (and indeed the 8087 didn't quite do what the standard said; later
80x87 generations did); (3) the standard was an inter-company effort and if it
ended up being more or less what Intel proposed that was because Intel's
proposal was actually better.

I have to admit that my summary above is based to some extent on things
written by William Kahan, who of course was the guy Intel hired as a
consultant to make their floating-point design better. So there may be some
bias. If anything above is wrong, I'd be glad of corrections.

------
kstenerud
The problem is that instructions for ieee754 values use the full precision of
those values (or greater), when you almost never need that much. And if you
leave them as-is, you build up bias.

As your calculations progress, your results slowly build up significant digit
bias (which will be different depending on the architecture and libraries). To
get around this, you'd have to round regularly, but that also slows things
down (and is difficult to do in binary float).

If you're taking the results of calculations at their full precision, you're
just asking for trouble. 32-bit binary ieee754 may be able to represent 7
digits of precision, but I sure as hell wouldn't take the results of 32-bit
float operations to more than 6!

The alternative is to get a contract from the compiler that everything will be
done in the same precision with the same bias for the specified type, and just
accept the buildup (which we're currently doing without that guarantee, and
getting burned by it).

