

Bitten by the floating point bug - AndrewDucker
http://mikeducker.wordpress.com/2013/01/29/bitten-by-the-floating-point-bug/

======
smoyer
This isn't a bug ... you really need to understand how floating point numbers
work if you're going to do _ANYTHING_ with them.

At one point, I wrote an entire single-precision IEEE compliant floating point
library in assembly language and while that product failed (an embedded
system), it's been invaluable in the 20 years since that project. When working
with higher level languages, I completely understand how my numbers will be
rounded, precision truncated, etc.

Casting back and forth to integers is often problematic, but even more telling
is that the programs that have to be very precise generally use fixed-point
integers. Banking and accounting don't tend to like single "pixel" rounding
errors!

~~~
acqq
Correct. Not the bug of the floating point hardware. Floating point hardware
which implements IEEE does what it is supposed to do. It's a "PEBKAC."

Moreover, if OP understood the subject a little better, he'd know that 123 is
stored exactly. Specifically, 123 is in binary 1111011. When stored in IEEE
format you'll see in the mantissa bits the exact match: 111011 (the first bit
is implicit). You can check it here:

<http://www.h-schmidt.net/FloatConverter/IEEE754.html>

OP was bitten by assuming that decimal _parts_ have exact representation,
whereas from .1 to .9 only .5 has. Once he divides the exact 123 with the
exact 10 he has to have 12.3 stored in the register, 12.3 has an _infinite
number of bits_ in the given representation and they have to be cut somewhere
to fit any finite storage:

10001(0011)...

If you check with the same converter, IEEE is careful enough to even know that
after last 00 bits that fit in the 64-bits two 11 follow, so what's stored is
01 instead 00 for the last digits.

The only solution for people who don't understand all the pitfalls of the
standardized binary representation would be using base-10 floating point which
is also standardized but sadly the last time I've checked existed only in some
IBM hardware:

<http://en.wikipedia.org/wiki/IEEE_floating_point>

(see IEEE 754-2008)

The software support is equivalently rare.

~~~
MikeDucker
Hi acqq, I did type 123 into that float converter and the numbers I got back
are what I posted. I'm not sure how we're getting different FP reps for the
same number. I simply typed 123 into the decimal representation box. If I'm
missing something please let me know.

~~~
acqq
Read carefully. 123 is represented _100% exactly_ , bit for bit. Look at _the
bits_ in the mantissa, 111011, leading 1 is the implicit and it matches

    
    
       >>> bin( 123 )
       '0b1111011'
    

not at the "interpretation for humans" which shows 1.921875. All _integers_ up
to some very big one are represented exactly in IEEE base-2 FP (or you can say
that the fractions that are the result from the storage of the integers have a
finite binary representation, but then you make harder for you to have an easy
insight).

Disclaimer: I have similar experiences to smoyer, in the eighties of the
previous century I've implemented a reasonable chunk of IEEE 754 in x86
assembly.

~~~
MikeDucker
I see what you're saying, but if you take the 111011 as representing 1/2 1/4
1/8 etc. then it does equal 0.921875. I'm presuming that this value is added
onto 1 (fiddling with the tick boxes confirms this) to be used as the
mantissa.

I'm not doubting what you know about floating points, but I think that this
calculator is still trying to say (whether that be right or wrong based on
IEEE) 6 ^ 1.921875

~~~
Tuna-Fish
Well, it is, but it's still exact. Think of it a while, in base-10 (if it
makes it easier for you).

Let's say you save numbers in format a * 10^b. B has enough range to not to be
a problem, but a can only hold 3 digits. What integers can you represent
exactly?

1 = 1.00 * 10^0. 2 = 2.00 * 10^1, trough to 999 = 9.99*10^2.

Or in other words, so long as mantissa has less bits than the number you are
converting, it can be represented perfectly in floating point. So, 4 is always
4, not some number very close to 4. If it ends up as something else, you're
the one who has messed it up.

------
niggler
A useful and relevant article: "What every computer scientist should know
about floating-point arithmetic" -- he discusses how to use correction factors
to address the problem.

<http://www.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>

~~~
kodablah
Non PDF:
[http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.ht...](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html)

------
monochromatic
That is not a floating point bug. That is a programmer who doesn't grok
floating point.

edit: Also, I am only allowed to say this if OP is a guy. If OP is a girl and
she gets insulted, I'll be hearing about it on the HN front page for days.

~~~
MikeDucker
I never suggested this was a bug in how floating point works. I only meant I
had a bug and it was due to my use of floats and how it bit me in the ass :)

I am still a little surprised (but not that surprised) that essentially 4.0 *
0.1 (and then later) / 0.1 didn't end up with a value that cast to int as 4.
Sure I shouldn't make such presumptions, but if you wrote

float floaty_var = 4.f; int inty_var = floaty_var;

You would generally expect that inty_var would equal 4 and not 3. I have no
idea if the IEEE standards mean that it will or it probably will or it very
well might not. In my experience it always has, which is quite different from
any certainty that 0.4 / 0.1 will cast to an int value of 4, but that's what
happened when I tested with variables in a simple function, just not in the
situation that led to my issue.

Either way I'm well aware it was a PEBCAK, even if my semi-pun blog post title
might have suggested otherwise ;)

~~~
CurtHagenlocher
Computer math is not math. You can't generally assume that (a * b) / b will
produce a. With floating-point, you have representational issues, and will all
types you can get overflow.

~~~
MikeDucker
I have certainly not had a bug that so plainly highlights this before. If it
had been a bug about a more complex calculation than ( 4.0 * 0.1 ) / 0.1 I
wouldn't have thought it worth mentioning. The fact that calc generally does
result in a number that casts to 4 made it more interesting to me that in at
least one case it didn't. I guess it was some state of the FPU (possibly) that
gave a different result when there was sufficient time between the mult and
the div. I will probably never know.

~~~
acqq
There's no "some state in the FPU" every partial result is repesented in 64
bits if you use doubles or in 32 bits if you use floats. Nothing is hidden
"somewhere." It's especially nothing to "never know," every step can be
followed exactly as long as you step through the assembly code or simply
imagine that every partial result must also be represented as the FP number as
specified not as you'd wish it to be. And don't think in decimal while you do
the steps.

~~~
MikeDucker
I remember being told a long time ago by our graphics programmer that the
result of a floating point operation might be different depending on the state
of the FPU. A quick internet search brings up things like this:
<http://www.codercorner.com/FPUFun.htm> which talks about setting the default
rounding mode of the FPU. Not sure if that's what he was talking about, but it
certainly bears no reference to my issue as I'm certainly not explicitly
setting anything like that.

That being that case can you give a reasonable explanation why carrying out (
x * y ) then later that function / y gave a value for x that casting would
keep as x, when leaving the y division until sometime much later during the
run resulted in a value of x - 1? There must have been something different
between the two runs apart from literal time, but if nothing is hidden, then
what is the difference that made the difference?

~~~
acqq
As far as I understand, you are not sure what your code does when. If you'd
follow the steps of the code you wrote, you'd see that when the same values
travel the same code paths they end up being always exactly the same. In
programming the different results can come only from the different steps,
unless you change the environment (which would be your intentional rounding
mode change or if you read some uninitialized memory).

~~~
scott_s
_If you'd follow the steps of the code you wrote, you'd see that when the same
values travel the same code paths they end up being always exactly the same._

Unless, of course, the "code paths" you're talking about are at the C level.
In that case, it is possible for optimizations (or lack of it) to end up with
slightly different values for the same C code. How often things are moved in
and out of the FPU registers can impact things (as I point out elsewhere), as
well as if the C compiler orders operations differently.

~~~
acqq
Once one code path is optimized it is what it is, there can be nothing
"slightly different" as long as your code is the same, if you get different
results for the same inputs you don't follow the same path. When you change
the code, you have the brand new landscape -- it's you who changed it.

Moving out of the FPU registers will happen in always the same points before
you change the code, and even where moving out occurs you won't observe 80
bits partial results unless you intentionally turn that feature on.

If you have any specific counterexample I happily welcome it.

~~~
scott_s
_Once one code path is optimized it is what it is, there can be nothing
"slightly different" as long as your code is the same_

You have missed my point. For the author, he thinks of "the code" as the C
code that he can see. The fact that passing different compiler flags may
result in "the code" ending up in different machine code is not something he's
used to thinking about. So you have to be careful when explaining such
concepts to beginners.

~~~
acqq
I disagree. Changing FP optimizations is not something that can surprise
anybody in producing different results. If the results were exactly the same
we wouldn't need different options in the compiler as there wouldn't be a
trade off to decide about.

It's the "religious" claims (presenting very well thought-through engineering
as if some god, magic or a random number generator is inside, it's not) that
motivate some people to say things like "we'll never know." You won't if you
approach it dogmatically or fatalisticaly.

~~~
monochromatic
You don't think it could possibly surprise anyone that changing optimization
level can _change the meaning of the program_?

~~~
acqq
When you turn on FP optimizations you say to the compiler "sacrifice some bits
here and there, give me more speed." I don't know any manual that explains
such options and doesn't make that clear. Counterexamples welcome here too.

~~~
monochromatic
I'm not saying you're wrong about what it does, or even about whether it's
documented. I'm saying that most compiler optimizations don't behave that way,
and I can certainly see this being a surprise to someone who views
optimizations generally as correctness-preserving transformations.

------
zvrba
> "The problem this has is that the number 4 might be stored as 3.999999992
> and if you cast that to an int naively you could end up with the integer
> value being 3 rather than 4. "

No it might not. The author completely misunderstands FP.

~~~
MikeDucker
Are you saying that a floating point value just under 4 won't necessarily cast
to 3 or something else? (I'm reading the above posted links, but am intrigued
at what I've gotten wrong with my statement you quoted)

~~~
monochromatic
4 is represented exactly. (But 0.4 isn't.)

~~~
MikeDucker
Ah, I see. I'll update my post. Thanks.

~~~
scott_s
More pernicious is you may think that the result of a computation should be an
exact number, but it may not be. Consider:

    
    
      #include "stdio.h"
    
      int main()
      {
          double num = 3.0;
          unsigned long long i;
          for (i = 0; i < 1000000000; ++i) {
              num += 1e-9;
          }
    
          printf("%f\n", num);
          printf("%llx\n", *((unsigned long long*)&num));
    
          double four = 4.0;
          printf("%f\n", four);
          printf("%llx\n", *((unsigned long long*)&four));
    
          return 0;
      }
    

And then executed,

    
    
      [scott_s@local ~] ./float
      4.000000
      40100000058d7800
      4.000000
      4010000000000000

------
echaozh
If it's 4 _0.00000000001/0.00000000001, things may be different. But since
it's just 4_ 0.1/0.1, why not just use round()? With such small error, round()
should produce 4.

~~~
deelowe
He addresses this in the article. Something about cpu usage, yadda yadda. I'm
curious as to how he can be so sure about performance though when
fundamentally not understanding how floats and ints get converted to binary.

~~~
MikeDucker
It's because I'm using ints for position data to allow fast look-ups of nearby
objects in a very simple 2D environment. It's not about the speed of the same
calculations between floats and ints (if that's what you were wondering).

------
0x0
Would it have helped to use, for example, 0.0625 or 0.125 as the scaling
factor, instead of 0.1?

~~~
mturmon
Exactly! Scale by a power of two and most of these surprises will go away.

0.1 is particularly problematic because it happens not to be exactly
representable as an IEEE FP number, or even as a finite-length binary sequence
(<http://www.wolframalpha.com/input/?i=0.1+to+binary>). So taking exact
numbers and multiplying them by this scale factor will tend to produce inexact
numbers, which when truncated, will cause surprises.

Even scaling by something closer to a power of two would be better (1/16 +
1/32 = 0.09375).

------
loeg
Tl;dr: Use

    
    
        int i = round(f);
    

instead of:

    
    
        int i = (int)f;

