
What every computer programmer should know about floating point, part 1 - haberman
http://blog.reverberate.org/2014/09/what-every-computer-programmer-should.html
======
sheetjs
Here is a fun little anecdote you may consider incorporating into part 2:

The default Excel number format (known as General) draws numbers with up to 11
characters, granular enough to mask the ulps in common numbers like 0.1+0.2.
Excel also supports fraction number formats. The format "?/?" writes the
closest fraction whose denominator is less than 10.

The algorithm used to calculate the fraction seems to agree with the Oliver
Aberth paper "A method for exact computation with rational numbers". Based on
this algorithm, 0.3 is represented as "2/7" and any number slightly larger
than 0.3 is represented as "1/3".

Try setting A1 to 0.3, A2 to =0.1+0.2 and change the cell format to Fraction
.. Up to one digit. Both cells appear to be 0.3 when rendered under the
General format. However, when using the fraction, the cells differ.

Google Docs renders both numbers as 2/7, but the XLSX export correctly
preserves the IEEE754 numbers. Unfortunately, LibreOffice clobbers the last
few bits, leading to incorrect results.

~~~
hawkice
Fun fact: to find the fraction closest to a number with a denominator that
doesn't exceed a certain amount can be generated using the following
algorithm:

(1) Find the N/1 rationals corresponding to the closest integers on either
side of the number.

(2) For rationals a/b and c/d, calculate (a + c) / (b + d). This is not a
common operation, but you read it right. It'll be between a/b and c/d, and be
the number with the smallest denominator between them.

(3) Reduce the range to the segment from above containing your number. If the
new denominator exceeds your limit, choose the smaller-denominator value.
Otherwise, go back to step (2).

[There are some delightful proofs of this using tessellation that I strongly
encourage you to try to figure out, one of the most interesting puzzles I got
from a guy named Anton, the first user of MathOverflow and current Googler.]

~~~
sheetjs
That's commonly referred to as the Mediant algorithm
[https://en.wikipedia.org/wiki/Mediant_(mathematics)](https://en.wikipedia.org/wiki/Mediant_\(mathematics\))
\-- GDocs, LO and other programs use this algorithm when rendering.

Shameless plug: our JS fraction library
[https://github.com/SheetJS/frac](https://github.com/SheetJS/frac) implements
the mediant as well as the Aberth algorithm.

~~~
hawkice
I did not know this! :) Thanks for sharing.

Another thing I heard, but have not been able to prove (and is not in the Wiki
article), is that if you record whether you take the Left or Right segments in
the algorithm, you can construct a continued fraction for any real number as
int(N) + 1 / (1 (L -> -, R -> +) (number of times you took that direction
before "turning around") / 1 (L -> -, R -> +) (#direction) ... ))).

My inability to prove may be primarily because I never really got the flavor
of proofs about continued/infinite fractions -- considering the source I'd say
it's likely true (or I misremembered).

You don't happen to know where I'd look for hints on the proof there, do you?
:)

~~~
sheetjs
I'm sure it's hiding somewhere in Concrete Mathematics
(knuth/graham/patashnik), and I'm sure there are more elegant proofs, but it's
not too hard to work out this proof yourself. Here is a hint:

Suppose you were at a point in the mediant algorithm where the two terms were
a/b and c/d. Taking the left segment transforms this to [a/b, (a+c)/(b+d)] and
taking the right segment transforms this to [(a+c)/(b+d), c/d].

Represent this as a matrix of the form [[a,b],[c,d]]. In this form, what does
"take the left segment" look like in terms of a matrix operation? what does
"take the right segment" look like? Can you find any useful properties (for
example, how do you repeat)

A convenient root matrix is [[0,1],[1,0]]

------
brudgers
For the really ambitious, there's always Knuth's treatment in _TAoCP volume 2:
Semi-Numerical Algorithms_. It runs about fifty pages, covers such interesting
topics as the statistical distribution of floating point numbers in order to
determine average running time, and of course, includes exercises like _[42]
Make further tests of floating point addition and subtraction, to confirm or
improve on the accuracy of Tables 1 and 2._

[http://www.amazon.com/Art-Computer-Programming-Volume-
Seminu...](http://www.amazon.com/Art-Computer-Programming-Volume-
Seminumerical/dp/0201896842)

On the more social side, Dr. Chuck's interview with William Kahan on the
history of the IEEE standard is a good read:

[http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.ht...](http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html)

------
baldajan
re: "As long as they are small enough, floating point numbers can represent
integers exactly."

I recently discovered a wonderful floating point bug in the Apple A7 GPU
(fragment shader). I would pass a float from the vertex shader to the fragment
shader, that was either 0.0, 1.0 or 2.0 and use that as an index to mask color
channels.

As such, I would convert to int and use it as an index. Problem was, the GPU
would sometimes, on some random region of pixels, decide I didn't pass in 1.0,
but 0.99999999999. Perform an int truncation, I have 0.0, I get the wrong
index, my app starts flickering, I want to rip my hair out...

Even on desktop machines, I had similar problems with large sets of tests that
would produce different results depending on which (modern desktop) processor
they ran on.

Lesson: floating point is certainly dangerous for consistent representations
that "matter". Regardless of a compiler decision or what the spec says.

~~~
sillysaurus3
It's hard to understand how such an error could occur. Due to the nature of
the floating point format, it seems very difficult for a bug to transform 1.0
into 0.999999.

Floating point 1.0 has a sign bit of 0, an exponent of 0 (after bias), and a
significand of 1.0. It's hard to imagine a sequence of transformations that
could somehow result in getting 0.999999999 after conversion.

I made a little "floating point binary explorer":
[http://ideone.com/61L4V6](http://ideone.com/61L4V6) (Apologies for the awful
code quality; it's just a quick hack.) It prints each bit of a 32-bit float,
which for 1.0 is "00111111 10000000 00000000 00000000".

The first bit is the sign bit, the next 8 bits are the exponent, and the
remaining 23 bits are the significand. So, sign=0, exponent=127,
significand=0. The significand is actually 24-bit, not 23, and the leading
"hidden bit" is always 1 (nitpick: unless the floating point value is 0.0 or
-0.0). That means the significand in this case isn't actually zero, but "1
followed by 23 zeroes," so it's 2^24.

The final value for a 32-bit float is:

    
    
      (significand / 2^24) * 2^(exponent - 127)
    

That comes out to (2^24 / 2^24) * 2^(127 - 127), which is 1.0 * 2^0 which
equals 1.0.

For representing 2.0, everything is exactly the same, except the encoded
exponent is 128 instead of 127. That comes out to (2^24 / 2^24) * 2^(128 -
127), or 1.0 * 2^1 which equals 2.0.

Shader programs often use half precision floats (16-bit total, with 1 bit for
sign, 5 bits for exponent, and 10 for significand) but the concepts should be
exactly the same. 1.0 and 2.0 should always be 1.0 * 2^0 and 1.0 * 2^1
respectively, regardless of whether we're using half/single/double precision.

I'm having trouble figuring out how your program wound up with 0.999999, even
if there was a bug in the conversion. 32-bit floating point 0.999999 is
"00111111 01111111 11111111 11101111" which is very different from the binary
1.0 float value: [http://ideone.com/Qlqu4W](http://ideone.com/Qlqu4W) It has
an exponent of 126 and almost the highest possible significand, which comes
out to 1.9999 * 2^(126 - 127) = 1.9999 * 2^-1 = 0.9999.

So in order for 1.0 to be erroneously transformed into 0.9999, the bug would
need to invert all the bits of the significand, but subtract 1 from the
exponent.

Is there anything in particular that I should watch out for on the A7 GPU? I
was just hoping to hear about any details you may have left out for
simplicity, or any other factors that could've triggered the bug.

 _> Even on desktop machines, I had similar problems with large sets of tests
that would produce different results depending on which (modern desktop)
processor they ran on.

> Lesson: floating point is certainly dangerous for consistent representations
> that "matter". Regardless of a compiler decision or what the spec says._

Are there floating point bugs on desktop processors? The only one I've heard
of was the infamous Pentium FDIV bug
[http://en.wikipedia.org/wiki/Pentium_FDIV_bug](http://en.wikipedia.org/wiki/Pentium_FDIV_bug)
and the only reason that wasn't hotfixed to match the spec was because the
logic was hardwired into the chip rather than implemented in microcode, making
a patch impossible.

It seems like any deviation from the floating point standard in a modern
desktop CPU would be noticed almost immediately, because there is a lot of
code which relies on the exact IEEE floating point standard in order to
function properly. There was recently a hash algorithm featured on HN which
used floating point, and the only reason such an algorithm can be trusted is
if every desktop processor properly adheres to the spec.

~~~
acqq
I also suspect the 0.9999.. was an actual number in the commenter's inputs to
the fragment shader. Unless he checked the binary representation of the
numbers in question (in the input) he could have missed what's actually going
on. Formatting and printing the number in base 10 hides the only real thing,
which is the actual bits that represent the number.

~~~
jacquesm
It's not rare for GPUs to be pushed so close to the boundaries of what the
underlying hardware can do that you'll see little bits of noise like this.

For 'consumer' stuff this usually does not matter, but I think it is one of
the reasons (besides revenues) that Nvidia has a 'compute' line and a
'display' line. Even if they're based on the exact same tech it would not
surprise me one bit if the safety margins were eroded considerably on the
consumer stuff and if it wasn't tested as good before shipping (on the chip
level).

~~~
sillysaurus3
ATI would probably be a better example. Nvidia has historically been more
precise and has implemented specs pretty faithfully. ATI has of course become
AMD, and the reputation of modern AMD graphics cards among developers seems to
be pretty great.

It would surprise me if sending the value 1.0 from a vertex program to
fragment program ever resulted in anything except 1.0 on a desktop GPU. It's
not unusual to truncate a float in a fragment program and the use the
resulting integer as an index. I think Nvidia has a fairly rigorous battery of
tests which they use to ensure compliance with specs. The tests check to make
sure that values are being interpolated correctly, to check for off-by-one
errors at the edges of triangles, and other such corner cases. The tests are
automated, because once you implement a graphics spec correctly, you can
generate an output image which should be the same every time it's generated.
That way, if an error creeps into the driver, it should be caught pretty
quickly, since it will cause a deviation in the rendered output examined by
the test suite.

I'm not sure how often a problem occurs in an Nvidia consumer card but not
their 'compute' line. Nvidia tries to use the same driver codebase across as
many of their products as possible.

Here's an interesting thread where someone points out that certain OpenCL code
was giving the correct result on x86 and Nvidia cards, but not AMD cards:
[http://devgurus.amd.com/thread/145582](http://devgurus.amd.com/thread/145582)

In that thread, it's suggested that the reason Nvidia gave the correct result
was because their CUDA compute specs had better error margins compared to
OpenCL specs. So in that case, it would appear that Nvidia's compute line
actually enhanced their accuracy in other contexts such as OpenCL. Rather than
implement a different driver for OpenCL, CUDA, and consumer graphics, it seems
that Nvidia uses the same driver to power all three, which seems to enhance
accuracy rather than hinder it.

While writing up this comment and researching this, I stumbled across this
interesting paper from 2011 that details exactly what programmers can expect
when programming to an Nvidia GPU (in this case using CUDA):
[https://developer.nvidia.com/sites/default/files/akamai/cuda...](https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-
CUDA-Floating-Point.pdf) ... It's a pretty fun read which helps to solidify
the impression that Nvidia cares deeply about providing accurate floating
point results.

I also came across this article which was too awesome not to mention:
[http://randomascii.wordpress.com/2013/07/16/floating-
point-d...](http://randomascii.wordpress.com/2013/07/16/floating-point-
determinism/) ... It's not really related to GPU floating point, but it's an
excellent exploration of floating point determinism and the effect of various
floating point settings.

For mobile graphics stacks like those in cellphones / iPads etc, I share your
skepticism. I'm not sure how rigorously tested mobile GPUs are, or how
faithfully those stacks adhere to specs. I was hoping to get more info from
the original commenter about how to reproduce the bug in the A7 to get a
better idea of what to watch out for on mobile.

------
forrestthewoods
The more you learn about floats the more you realize you don't know. God damn
floats are ever surprisingly fickle bastards.

Bruce Dawson has at least 12 posts on floats. There are quite a few more after
this series as well. [http://randomascii.wordpress.com/2012/02/25/comparing-
floati...](http://randomascii.wordpress.com/2012/02/25/comparing-floating-
point-numbers-2012-edition/)

~~~
haberman
The goal of my article is to make them seem a little less fickle and
mysterious. I'm trying to focus on what _can_ be understood, and not the parts
that defy easy understanding. :)

~~~
forrestthewoods
My current favorite float fact is that taking a valid integer and casting it
to a float and back to int can cause an integer overflow. Alternatively the
final value could be either less than or greater than the original!

Floats are dumb. I use them every day but they're dumb and I hate them!
<grumble>

~~~
brucedawson
The round-tripping behavior is certainly cool, but my favorite float-fact is
still this one:

double pi = 3.14159265358979323846; printf("pi = %.33f\n + %.33f\n", pi,
sin(pi));

sin(double(pi)) is equal to the error in the double(pi) constant. If you
manually add the two you can easily get pi to about 34 digits. Useless, but
cool. I discuss this in more detail here:
[http://randomascii.wordpress.com/2012/02/25/comparing-
floati...](http://randomascii.wordpress.com/2012/02/25/comparing-floating-
point-numbers-2012-edition/)

~~~
forrestthewoods
if I didn't encounter it personally it can't possibly be my favorite

------
chewxy
Great job Josh. I wrote a version of this too[0], but yours is much much
better and in depth. I enjoyed it a lot more too :)

[0][http://blog.chewxy.com/2014/02/24/what-every-javascript-
deve...](http://blog.chewxy.com/2014/02/24/what-every-javascript-developer-
should-know-about-floating-point-numbers/)

~~~
haberman
Thanks! I will have to read yours too and see if it will cover things I don't
yet understand.

------
skellystudios
"What every computer programmer should know about floating point, part
1.00001"

~~~
ygra
Technically that'd be part
1.000010000000000065512040237081237137317657470703125

------
JetSpiegel
No floating point article is complete with a reference to the Fast Inverse
Square Root [1].

1:
[https://en.wikipedia.org/wiki/Fast_inverse_square_root](https://en.wikipedia.org/wiki/Fast_inverse_square_root)

~~~
spb
Home of my favorite line of functioning code in any shipping product:

i = 0x5f3759df - ( i >> 1 ); // what the fuck?

------
ja27
Years ago there was a triangle problem on an ACM Scholastic Programming
Contest. It looked simple. Given three lengths, output whether that forms a
triangle or not and whether it's a right triangle or isosceles triangle.
Simple right? A^2 + B^2 = C^2.

Most programmers that were naive about floating-point got burned. One item in
the test data set (which they kept hidden and ran against your program and you
only got a pass/fail result) would fail the simple equality case if you used a
double-precision float but would pass if you used a single-precision. Lots of
learning took place.

~~~
jacquesm
When doing tests for equality with fp you should always test against a
difference and an epsilon. like this:

    
    
      // in C
    
      if (fabs(v1 - v2) < epsilon) {
        // equal
      }
      else {
        // unequal
      }
    

And epsilon should be chosen with some care, as in: understand your problem
domain and run exhaustive tests to make sure you got it right. Test cases
should include 'just below', 'on' and 'just above' cases.

Even better if you convert your values to some non-floating point numbers. As
they say, if you need floats you don't understand your problem (yet).
Obviously that doesn't always apply but it applies more often than you might
think at first.

~~~
pjungwir
> understand your problem domain

It sounds like the article's explanation of how precision erodes as the number
increases would help a lot when thinking about this. A good epsilon is a
function of the size of your numbers.

I've been writing a lot of money-related code lately in C with long doubles,
so floating point has been on my mind, and this article is timed perfectly.
According to my tests I can get past $36 billion before I see $0.000001 error,
which I think is good enough for me, but I should revisit my reasoning now
that I understand things better!

~~~
AndrewOMartin
You may want to use a more intuitively behaving datatype for currency, even if
just to avoid flame on certain community sites on the web. Consider an integer
representing ten-thousandths of a penny, if guaranteed to stay in USD or GBP,
or a dedicated currency library otherwise. Currency's just one of those things
that makes some people very emotional when the notion of any kind of error or
rounding is mentioned.

------
Roboprog
If you do accounting instead of games, floating point is NOT your friend.
(some game devs might argue that floating point is not your friend there,
either, if your "universe" has a convenient "quanta" and scale/size)

Use fixed precision / "binary coded decimal" instead.

[http://roboprogs.com/devel/2010.02.html](http://roboprogs.com/devel/2010.02.html)

~~~
dllthomas
Binary coded decimal is horrible. It blows away a lot of useful
representations, and you don't really gain anything. It also doesn't actually
have a direct any relationship with fractional amounts. Fixed precision can be
used with any denominator while keeping a binary representation.

~~~
Roboprog
BCD is slow. But it's trivial to make the number any size you want. If I have
a library that "glues together" multiple 32 bit or 64 bit integers, great, but
if what I get works with 4 bit chunks, oh well.

Yeah, BCD would suck for a game, but for a biz app, most of your time is going
to be spent churning in and out *ML text (or JSON, or CSV, ...) anyway.

~~~
lutusp
I've been in this game for a long time. BCD was a scheme to avoid the cost of
binary-to-decimal conversion at display time, when processing was much more
expensive than it is now. Obviously using binary storage of integers, you now
can get the same display of integer values that you could then, but you have
to convert from binary to decimal on the fly. The change is that the
conversion is much easier and faster than it was then.

That means the storage differential of 100/256, i.e. the loss of bit real
estate, is the reason BCD isn't used any more.

~~~
dllthomas
_" That means the storage differential of 100/256"_

In terms of "number of representable values", it's 100/256 to the power of the
number of bytes in packed BCD rather than binary. Alternatively, with BCD
you're only using about 80% of your bits.

(Clarifying, more than disagreeing - up-voted).

------
frozenport
` As long as they are small enough, floating point numbers can represent
integers exactly. `

Is this true? For example, if I want to do integer + floating point addition
the CPU might dump the floating point into a 80bit register resulting in a
form which does not play as expected with integers.

~~~
wtallis
If your machine uses 80-bit floating point, then those extended-precision
floats can exactly represent any 64-bit integer, provided that your language's
standard library/runtime correctly converts without an intermediate step using
64-bit floats (53-bit mantissa).

~~~
saitohm
Yes. You can see the compatibility between 80-bit floating points (long double
in C) and 64-bit integer (long long int) with a code like this:

    
    
      long double x0,x1,y0,y1;
      x0 = 18446744073709551615.0L; /* 2^64 - 1 */
      x1 = 18446744073709551616.0L; /* 2^64     */
    
      y0 = x0 + 1.0L;
      y1 = x1 + 1.0L;
    
      printf("%d\n", x0 == y0); /* false (with enough precision) */
      printf("%d\n", x1 == y1); /* true  (precision being lost ) */

------
Shivetya
How often do programmers deal with floating point?

I deal with business math, which while there are positions to the right of the
decimal point we have very well defined numbers and such.

What languages other than javascript have issues such as described in this
article?

~~~
martindevans
I'm a game programmer so I end up dealing with lots of floating point numbers
no matter what language I'm working in because physics and graphics engines
just love their floats. C#, C++ and Typescript are my most frequent languages
but I would expect to encounter these same issues whatever language I'm
working in.

I did some pretty cool work on a fixed point physics engine once - using
integers to represent individual millimeters, so the game field was limited to
a mere 4294.97Km! But the graphics engine was still all floats, there's no
escape :'(

------
kasperset
This video may also supplement the knowledge about floating point:
[http://www.youtube.com/watch?v=jTSvthW34GU](http://www.youtube.com/watch?v=jTSvthW34GU)

------
yoha
Using arrows in both directions on your diagrams is _very_ confusing. Only
keep the right ones. Good write-up by the way.

------
Humjob
Very interesting. You've clarified a lot of points I had floating around in my
head about this topic.

~~~
haberman
That is exactly what I was going for -- glad to hear it! :)

------
jokoon
this needs to be taught in 1st year of any comp sci class

------
haberman
I've worked for a few months on this article as I've learned more about
floating point. I hope this helps others gain the same understanding. Please
let me know of any errors!

~~~
rsc
Title seems like a blatant rip-off of the classic article.

~~~
haberman
Hmm, it certainly wasn't my intent to be a rip-off.

The "What every computer programmer should know about X" seems to be a
somewhat established genre, sort of like "X considered harmful":

What Every Programmer Should Know About Memory:
[http://www.akkadia.org/drepper/cpumemory.pdf](http://www.akkadia.org/drepper/cpumemory.pdf)

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know
About Unicode and Character Sets (No Excuses!):
[http://www.joelonsoftware.com/articles/Unicode.html](http://www.joelonsoftware.com/articles/Unicode.html)

Latency Numbers Every Programmer Should Know:
[https://gist.github.com/jboner/2841832](https://gist.github.com/jboner/2841832)

While it's true that an article about floating-point already exists, it does
seem more aimed at Computer _Scientists_ than Computer _Programmers_ (as the
difference in title would indicate). And I link to the existing article in the
second paragraph and explain the distinction.

But if it's confusing or seen as disrespectful to the original work, I will
consider changing the title.

~~~
brazzy
Well, here's my 5 year old attempt to do exactly the same thing:
[http://floating-point-gui.de](http://floating-point-gui.de)

~~~
rurban
I prefer your version by far over the overly verbose and overly simple new
haberman guide. Knuth and Goldberg made fantastic and precise overviews, and
there is a need to terse it down for programmers, but not too much, down to
ground school level.

