
The Floppy Disk of Floating Point - todsacerdoti
https://www.evanmiller.org/the-floppy-disk-of-floating-point.html
======
mayoff
> You’ll find F2XM1, which computes in a swoop (2𝑥−1) – a kind of pauper’s
> fused multiply-add, some twenty-five years ahead of its time.

No. It computes pow(2, x) - 1.

Suppose x is close to 0, so that pow(2, x) is close to 1, like
1.00000000000000002345678. Because of that leading 1 (to the left of the radix
point), all those zeros also have to be stored, meaning some of the low-end
digits will have to be dropped, so you end up rounding to 1.00000000000000002.
But if you can subtract 1 from the exact result before rounding, then you can
store more of the low-end digits: 2.345678e-18. Hence the F2XM1 instruction.

~~~
lalaland1125
expm1 is the modern equivalent. That (plus the related log1p) are probably
some of the most underutilized standard library math functions. They can
substantially improve the accuracy of tons of calculations.

~~~
nwallin
Can you give an example of a common situation where a programmer should have
used expm1 or log1p but didn't? (the way it's common for people to test if
something is inside/outside a circle or sphere by comparing the square root of
the sum of squares to the radius as opposed to compare the sum of squares to
the square of the radius, or use trig functions when linear algebra will do)

I do a reasonable amount of computational programming but to my recollection
I've never used either of those functions.

~~~
saagarjha
One example I can think of is doing an interest calculation with small
interest rates/short periods, where you'd want to know that someone owes you
0.000000001234 dollars in interest today rather than owing you 1.000000001 - 1
dollars.

------
WalterBright
I'm sad to see the x87 go. I've never been able to convince anyone of the
value of 80 bits of precision. Never mind that I've seen engineers convinced
their sums were correct (because the computer can't be wrong) because they
used insufficient precision for adding too many numbers.

But you can thank (or blame!) me for extending its life - back when Microsoft
was developing their 64 bit Windows, they were all set to not save the x87
state during a context switch, effectively killing the x87. I talked them out
of it.

Microsoft's VC still doesn't support the x87, their "long double" is 64 bits.
I suppose that's for compatibility with other processors it supports.

I'm proud of the Digital Mars C/C++ compiler which still sports full 80 bit
long doubles, as well as the D programming language which still (despite
opposition) delivers 80 bit precision for type real, even for 64 bit Windows.

~~~
dkarl
I had to debug a nasty x87-related issue with a piece of scientific computing
software. The output of each version of the software was deterministic, but
different versions usually (not always) gave very, very slightly different
output, even if the numerical code was unchanged. This bothered the original
author, but he was a hardware engineer, and his attitude towards it was
basically "software sucks, you can only depend on computers if you lay out the
circuits yourself." So the mystery remained until I was brought on and was
able to figure out that the default settings of gcc at the time did not use
any of the SSE2 instructions supported on the workstations we were targeting,
instead using x87 floating point instructions. The state of the x87 stack was
affected by code interleaved with the numerical code (there was a bit more
going on than just number-crunching) causing the 80-bit representations to be
copied to and from 64-bit registers (and hence rounded) at slightly different
points in the computation. This resulted in tiny changes in the output. I
added a flag to enable SSE2 instructions, and thereafter the output only
changed when the numerical code was changed.

~~~
alkonaut
Had the experience on several occasions in C# on 32 bit. The whole idea of the
80bit operations is flawed in an environment where you can’t control the
native code generated with register allocation and so on (so in most high
level languages). We became so used to these bugs that we immediately
recognized them when some calculation differed on just some machines.

As C# is jit-compiled, you could never be sure what code would actually run at
the end user machine, and where the truncation from 80 to 64 bits would occur.

In the end the best cure is to ensure you never ever use x87, which happened
automatically when dropping 32 bit support.

Determinism is too important to give up for 16 extra bits.

~~~
perl4ever
I feel like I read something once that said when writing numerical/scientific
code, traditionally there were so many weird high performance computers that
were used, e.g. Crays or whatever, you'd have to be robust to all sorts of
different types of FP anyway.

Nowadays maybe that sort of diversity is less of an issue? Expecting
determinism in the sense you mean it just seems weird to me.

~~~
alkonaut
Having absolute determinism is probably still difficult but using SSE on x64
on Windows, where all users have compatible compilers (I.e determinism
_without_ diversity) is at least “good enough” nowadays. I haven’t seen any
issues with that scenario so far, even though it’s certainly possible problems
can arise.

------
simonbyrne
One of the interesting things about the 80-bit format was that allowed the use
of 64-bit integers on a 16 or 32-bit machine. This was what Thomas Nicely was
using them for when he found the Pentium FDIV bug:
[https://faculty.lynchburg.edu/~nicely/pentbug/pentbug.html](https://faculty.lynchburg.edu/~nicely/pentbug/pentbug.html)

------
Someone
[https://en.wikipedia.org/wiki/Quadruple-
precision_floating-p...](https://en.wikipedia.org/wiki/Quadruple-
precision_floating-point_format#Hardware_support) shows (non IEEE) 128-bit
floats existed in the 1970s, and that IEEE ones were added to Power in 2015,
and the clang, gcc and Intel C compilers support them in software. There
clearly is a market for them.

Truth is that successive rounding errors accumulate complex calculations can
lose a lot of bits of precision. That’s why you sometimes need to do
calculations in a lot more bits than needed in the answer.

Also, the x87 had 80 bits floats, but not all instructions computed all of
them correctly
([https://www.infania.net/misc/coproc.html](https://www.infania.net/misc/coproc.html))

I don’t know how current FPUs fare there, but would hope they have become
optimal.

~~~
acqq
> but not all instructions computed all of them correctly
> ([https://www.infania.net/misc/coproc.html](https://www.infania.net/misc/coproc.html))

I failed to find which instructions weren't computed correctly in the text
linked, and I'd like to know about that, that is, if it's something I wasn't
aware of. Can you specify a bit more?

~~~
Someone
Search for “Accuracy of calculations performed by a coprocessor” in the
article I linked to.

it says “Intel claims that maximum error in the 8087, 80287, and 80387 for all
transcendental functions does not exceed two bits in the mantissa of the
double extended format”. Typically, those reposts get rounded to double
format, but that doesn’t guarantee that such errors get dropped.

Edit:
[http://www.math.utah.edu/~beebe/software/ieee/](http://www.math.utah.edu/~beebe/software/ieee/)
has lots of links and code for testing floats.

[http://li.mit.edu/Archive/Activities/Archive/CourseWork/Ju_L...](http://li.mit.edu/Archive/Activities/Archive/CourseWork/Ju_Li/MITCourses/18.335/Doc/IEEE754/ieee754.pdf)
gives quite a few corner cases, so it would be helpful when writing a test
suite.

[https://slixe.de/stash/projects/MAS/repos/libfirm/browse/ir/...](https://slixe.de/stash/projects/MAS/repos/libfirm/browse/ir/tv/IeeeCC754?at=8045fb7fc4f9dc8c0ae329e17b78d5fc2d679517)
has source code for a/the compliance checker

[https://core.ac.uk/download/pdf/151193633.pdf](https://core.ac.uk/download/pdf/151193633.pdf)
describes what seems to be a Better version of that test framework (it adds
“++” to the name). Unfortunately I couldn’t find its source code.

~~~
acqq
That statement that you refer to regarding the instructions for transcendental
functions in 80 bits doubles is the result of the way the hardware and the
microcode for these functions are implemented. Internally, out of 80 bits, 64
bits are available for mantissa and the steps for calculating the result
accumulate the error, and then the last two bits in these transcendental
functions are "noisy" which was still very good for its time. And it was
documented.

But that's still only for the "convenient" input range for the implemented
algorithm.

In practice, the programmers sometimes want to pass the values from the less
convenient input range, and then it got much worse when the library functions
in high level languages have expected too much from the implementation in the
processor:

[https://randomascii.wordpress.com/2014/10/09/intel-
underesti...](https://randomascii.wordpress.com/2014/10/09/intel-
underestimates-error-bounds-by-1-3-quintillion/)

The resulting problem in the high level implementation described described was
blind belief in the wrong (in not correctly stating the input range)
documentation by some that then implemented the C library functions for some
environments, only a few years ago:

The Intel's documentation was "buggy. By claiming near-perfect accuracy across
a huge domain and then only providing it in a tiny domain Intel has misled
developers and caused them to make poor decisions."

And it was fixed in 2014:

[https://software.intel.com/blogs/2014/10/09/fsin-
documentati...](https://software.intel.com/blogs/2014/10/09/fsin-
documentation-improvements-in-the-intel-64-and-ia-32-architectures-software)

------
userbinator
The x87 FPU finds an interesting niche application in size-optimised demoscene
productions, where the stack-based architecture and scientific-calculator
operations allows for some impressively compact 3D demo effects. For example,
this one is 256 _bytes_ :

[http://www.pouet.net/prod.php?which=53816](http://www.pouet.net/prod.php?which=53816)

------
beckerdo
A good article. However the real floppy disk is the floating point standard .
These days other number representation are exceeding floating point in terms
of dynamic range, flexibility, fewer exception cases, better rounding, and
cheaper/faster hardware implementations.

Read about Unum computing, pioneered by John Gustafson, and implemented on
many modern software and hardware platforms.

[https://en.wikipedia.org/wiki/Unum_(number_format)](https://en.wikipedia.org/wiki/Unum_\(number_format\))

------
microtherion
One problem with x87 is that it performed all operations in 80 bits,
converting to float and double on load/store.

This sounds nice, but in practice was quite problematic, as results depended
on whether or not intermediate results spilled into memory.

------
klodolph
> ...a kind of pauper’s fused multiply-add, some twenty-five years ahead of
> its time.

Maybe 12 years. The PowerPC had fused multiply-add in 1992. For whatever
reason, it didn't really appear in x86 until 2012.

~~~
jacquesm
And many DSPs had it long before then.

------
msspecter
I have a feeling that Kahan lived in a time where matrix inversion and
eigenvalue computation was considered the new hotness, just like neural
networks today.

It is very easy to build small _invertible_ matrices that cannot be inverted
with 32-bit floats, or even 64-bit floats, thus Kahan's insistence on very
high precision floating points.

Edit: small _invertible_ matrices

~~~
everybodyknows
>lived

The man lives still, not to be numbered among the dead:

[https://en.m.wikipedia.org/wiki/William_Kahan](https://en.m.wikipedia.org/wiki/William_Kahan)

~~~
msspecter
Then he must be apoplectic about the new bfloat16 format - only 7 bit mantissa
- what the heck

It's totally possible that scientists in the future will laugh at our
extremely wide 16-bit neural networks. Computation at the biological synapse
level is considered to have an accuracy of 1 to 3 bits only - opinions differ.

------
vardump
Solar system scale 128-bit floats can be useful for storing coordinates. Any
other reasons to go beyond 64 (or 80) bits?

~~~
klodolph
Why do you need 128-bit floats for storing solar system coordinates? 64-bit
floats will give you millimeter precision on Pluto, if your coordinate system
has an origin in the sun.

~~~
cpeterso
Similarly, this NASA blog post has some fun facts about the precision of pi
that JPL uses in its calculations:

[https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-
decimal...](https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimals-of-
pi-do-we-really-need/)

* JPL's highest accuracy calculations, which are for interplanetary navigation, we use 3.141592653589793.

* How many digits of pi would we need to calculate the circumference of a circle with a radius of 46 billion light years to an accuracy equal to the diameter of a hydrogen atom (the simplest atom)? The answer is that you would need 39 or 40 decimal places.

~~~
082349872349872
So if he'd only had interplanetary probes, غیاث الدین جمشید کاشانی‎ Ghiyās-ud-
dīn Jamshīd Kāshānī could in principle have navigated them to the proper
accuracy in 1424.

By 1596 Ludolph van Ceulen had 20 digits, and even more by the time he
designed his tombstone:
[https://upload.wikimedia.org/wikipedia/commons/e/e8/Monument...](https://upload.wikimedia.org/wikipedia/commons/e/e8/Monument_voor_PI.jpg)

39 or 40 decimal places would have to wait until 1699.

[https://en.wikipedia.org/wiki/Chronology_of_computation_of_π](https://en.wikipedia.org/wiki/Chronology_of_computation_of_π)

------
everybodyknows
> Professor Kahan was last seen ranting about benchmarks.

Incorrect. Scroll to the bottom of this page to find works dated 2016:

[https://people.eecs.berkeley.edu/~wkahan/](https://people.eecs.berkeley.edu/~wkahan/)

------
mmphosis
[https://en.wikipedia.org/wiki/Unum_(number_format)](https://en.wikipedia.org/wiki/Unum_\(number_format\))

~~~
layoutIfNeeded
I hope my new Mill CPU will natively support Unums. And the whole thing should
be memristor-based of course, running on carbon nanotube batteries.

------
gweinberg
"There’s even FYL2XP1, for all those occasions when you need to compute
ylog2(x+1) on the double" When would I want to do that?

~~~
nraynaud
“ The (ε+1) expression is commonly found in compound interest and annuity
calculations. The result can be simply converted into a value in another
logarithm base by including a scale factor in the ST(1) source operand.”
[https://www.felixcloutier.com/x86/fyl2xp1](https://www.felixcloutier.com/x86/fyl2xp1)

