
GCC optimized code gives strange floating point results - tbrowbdidnso
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
======
cperciva
I met William Kahan about 15 years ago and asked him about this. His response:
"If your program says _double_ then yes, of course it's a bug if the compiler
does long-double precision arithmetic instead. Isn't that obvious? Why are you
even asking this?"

If the author of the IEEE 754 standard says that the way you do floating-point
arithmetic is obviously a bug, you're probably doing it wrong.

~~~
acqq
And I believe he's right, if he was asked specifically what that program
should do. If all the statements in the program are really just:

    
    
        #include <stdio.h>
    
        void test(double x, double y)
        {
          const double y2 = x + 1.0;
          if (y != y2) printf("error\n");
        }
    
        void main()
        {
          const double x = .012;
          const double y = x + 1.0;
    
          test(x, y);
        }
    

Then the comparison has to be done on doubles, and "but it's extended
precision internally" mentioned in some answers is the wrong answer. All the
variables in the program are stated to be doubles. If I'd write the x86 and
x87 asm instructions I could easily write them so that they work as expected,
even without SSE2.

However if you compile with some "special" optimizations that are documented
to relax some specific code generation rules, then it can be OK that these
specific optimizations behave differently, but then: you asked for it by
choosing that documented optimization.

In the answers there, Richard Henderson wrote 2001-01-16 06:30:04 UTC:

"You are seeing the results of excess precision in the FPU. Either change the
rounding precision in the FPCR, or work around the problem with -ffloat-
store."

He's also right, if the -ffloat-store is documented to be turned off by -O and
if it's documented that that flag also affects the precomputed calculations
that are done in the compile time. If the second detail is not documented, the
results could be unexpected, but again depends of what you expect from the
optimizer. The properly written compiler and optimizer should perform the
calculations during the code generation correct to the definition and not cut
the corners. If the generated code was generated to actually perform the !=
during the run time, then the compile-time calculations are irrelevant, but
the sane default should be something like -ffloat-store when not using SSE2
even with -O, if -O has the semantic "the optimizations that should not
produce really unexpected effects." It sounds complicated, but it has its
reasons.

P.S. answer to nsajko's post below: I mention -ffloat-store not ffast-math.
Read the description of ffloat-store please:

[https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Optimize-
Options....](https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Optimize-Options.html)

and also "Note on x86 and m68080 floating-point math" here
[https://gcc.gnu.org/wiki/FloatingPointMath](https://gcc.gnu.org/wiki/FloatingPointMath)

~~~
kccqzy
With -O2 gcc actually compiles the program to

    
    
        xorl %eax, %eax
        ret
    

That is, it optimized away everything.

~~~
acqq
Yes, that's the correct optimization. That code obviously can't perform
printf("error\n") to manifest the bug, and obviously both additions were made
compile-time producing the results that are equal.

------
trelliscoded
I remember that in university we had a test on a toy 16-bit floating point
ALU. We had to walk through each of the steps in the ALU for the usual
operations, showing the bit patterns at each step and keeping track of
precision loss, etc. Because we had to track state and some aspects of the
test were randomized, there were a few cases where identical operations on
identical inputs resulted in different outputs due to the residual state.

During the next lecture, someone sticks their hand up and says, "this FPU
thing is so nondeterministic, who in their right mind would use floating point
for 99% of the kinds of work that we use computers for?"

The professor says, "you should think real hard about that the next time you
decide to use a floating point type in your program."

~~~
brucedawson
> identical operations on identical inputs resulted > in different outputs due
> to the residual state.

What? What residual state? Floating-point math is tricky, but on a particular
machine/compiler/compile-options set you should expect that identical
operations on identical inputs should give identical results.

What is this "residual state" you mention?

If you have an FPU that lets you change precision or rounding modes then you
can get different results from the same operations - is that what you're
talking about?

I get frustrated when people think that floating-point just randomly changes
bits for no reason. It doesn't. It randomly changes bits for specific reasons,
which can be understood.

~~~
effie
trelliscoded probably meant bits in scratchpad, which are not accessible to
the programmer. I read somewhere that those hidden bits may not be saved and
regenerated properly as part of cpu resuming execution of the program (task
switching). I do not know if that can happen, but even if not, achieving
consistency on subsequent runs is quite difficult - apparently one has to make
sure the program and all libraries it uses were compiled with special compiler
flags that prevent data alignment influencing results of FPU operations.

[https://software.intel.com/articles/consistency-of-
floating-...](https://software.intel.com/articles/consistency-of-floating-
point-results-using-the-intel-compiler/)

------
yokohummer7
Probably a bug report with the most duplicates?

> The x87 FPU was originally designed in (or before) 1980. I think that's why
> it is quite simple: it has only one unit for all FP data types. Of course,
> the precision must be of the widest type, which is the 80-bit long double.

> Consider you have a program, where all the FP variables are of the type
> double. You perform some FP operations and one of them is e.g. 1e-300/1e300,
> which results in 1e-600. Despite this value cannot be held by a "double", it
> is stored in an 80-bit FPU register as the result. Consider you use the
> variable "x" to hold that result. If the program has been compiled with
> optimization, the value need not be stored in RAM. So, say, it is still in
> the register.

> Consider you need x to be nonzero, so you perform the test x != 0. Since
> 1e-600 is not zero, the test yields true. While you perform some other
> computations, the value is moved to RAM and converted to 0 because x is of
> type "double". Now you want to use your certainly nonzero x... Hard luck :-(

> Note that if the result doesn't have its corresponding variable and you
> perform the test directly on an expression, the problem can come to light
> even without optimization.

> It could seem that performing all FP operations in extended precision can
> bring benefits only. But it introduces a serious pitfall: moving a value may
> change the value!!!

TIL long double is a thing.

~~~
vortico
So would turning off denormals fix the problem? That concept has given me 100
problems in the past and had helped me 0 times.

~~~
kr7
The solution is to compile with SSE2 on x86. (flags: -mfpmath=sse -msse
-msse2)

On x86-64, the compiler should default to SSE2.

SSE2 is ~16 years old so compatibility shouldn't be an issue.

~~~
phire
Technically, you only actually need the instructions from the original SSE set
to do floating point operations. SSE2 adds a bunch of really useful integer
floating point instructions.

But the only extra cpus that gets you is the Pentium III, AMD Athlon XP, and
AMD Duron.

SSE2 is supported on every single x86 cpu released after those, such as the
Pentium 4, Pentium M, and Athlon 64.

It's a real shame that people are still using CPUs that don't support SSE4,
such as the AMD Phenom and Phenom II cpus, otherwise everyone would have moved
to exclusive SSE4.

~~~
kr7
SSE1 is single-precision only. SSE2 added double precision.

So the bug will still appear for 'double' using just SSE1.

------
twic
In case anyone is wondering why this showed up right now, it's because
tbrowbdidnso was reading the 'examples of unexpected mathematical images'
article [1] and came across the one about FPU-dependent aliasing artifacts
[2].

[1]
[https://news.ycombinator.com/item?id=13736568](https://news.ycombinator.com/item?id=13736568)

[2]
[http://www.cgl.uwaterloo.ca/csk/projects/alias/](http://www.cgl.uwaterloo.ca/csk/projects/alias/)

~~~
tbrowbdidnso
Eh, was wondering how I wandered to the page myself.

------
Animats

        *DIAGNOSTIC* THE TEST FOR EQUALITY BETWEEN NON-INTEGERS MAY NOT BE MEANINGFUL
    

UNIVAC 1107 FORTRAN IV error message for this, 1967.

------
Negitivefrags
This is the bug that makes std::set< float > crash randomly under GCC with
default compile options.

The fact that they consider this reasonable is insane.

~~~
emmelaich
Is having a set of floats reasonable?

~~~
Negitivefrags
Yes. There is nothing about floats that should make them unusable in a set.

In particular though, it's fairly common to use std::sets (or std::maps which
use the same underlying data structure) to maintain something in a sorted
order.

I ran into this myself when implementing A*. I was using an std::map with the
float as key for the priority queue. Under GCC with default compile arguments
it would randomly hang or crash due to GCC breaking the set invariant.

~~~
Sharlin
> Yes. There is nothing about floats that should make them unusable in a set.

Strictly speaking, there is. The < operator for floating-point types does not
impose a _strict weak order_ because NaNs are a thing.

In any case, comparing floating-point values for strict equality (instead of
using an appropriate epsilon) is more often a bad idea than not.

~~~
Someone
But std::set doesn't use operator< for comparing values, but _!(a <b) &&
!(b<a)_.

[http://www.cplusplus.com/reference/set/set](http://www.cplusplus.com/reference/set/set):

 _" Compare

A binary predicate that takes two arguments of the same type as the elements
and returns a bool. The expression comp(a,b), where comp is an object of this
type and a and b are key values, shall return true if a is considered to go
before b in the strict weak ordering the function defines. The set object uses
this expression to determine both the order the elements follow in the
container and whether two element keys are equivalent (by comparing them
reflexively: they are equivalent if !comp(a,b) && !comp(b,a)). No two elements
in a set container can be equivalent. This can be a function pointer or a
function object (see constructor for an example). This defaults to less<T>,
which returns the same as applying the less-than operator (a<b)."_

I think that guarantees that std::set works for IEEE floating point values,
and that NaNs are at the end of the set.

~~~
DannyBee
"I think that guarantees that std::set works for IEEE floating point values,
and that NaNs are at the end of the set. "

No, this is, by definition, not a strict weak order because it's not
transitive. It also doesn't have transitivity of equivalence.

It's at best partial ordering. "strict weak" order is, by the way, such an
incredibly stupid name it's hard to know where to begin (it's what they call
it in math, but it doesn't make it any more sensible)

It's much easier to understand in terms of the invariants.

Partial orders: f(x, x) must be false.

f(x, y) implies !f(y, x)

f(x, y) and f(y, z) imply f(x, z).

Strict weak ordering: All of the above _plus_ Define equivalence as f(x, y)
and f(y, x) being both false. Then if x equivalent to y, and y equivalent to
z, then x must be equivalent to z.

IE for all x, y, z: if f(x, y) and f(y, x) are _both_ false, and f(y, z) and
f(z,y) are _both_ false, then f(x,z) and f(z, x) must _both_ be false.

It's strict because it's not reflexive, and it's weak because b < a && a < b
does not imply a == b. It just implies they are "equivalent", and as
mentioned, you get a total ordering on the _equivalence classes_.

------
ChuckMcM
Hah! That is such a great read, all of the "this isn't a bug" and the
tombstones of all the "duplicate" bug reports. Reminds me how challenging it
is to do decimal floating point in binary. Mike Cowlishaw could go on for
literally _hours_ how just the concept is so broken and he wrote and shipped a
decimal based floating point library which has none of these issues[1].

I still remember when I was told to never compare my floating point result to
zero because even when it should be it probably won't be, always compare it to
be less than a small sigma.

[1] [http://speleotrove.com/decimal/](http://speleotrove.com/decimal/)

~~~
mauvehaus
> always compare it to be less than a small sigma.

Unfortunately, this approach can result in problems with the usual expectation
that equality is transitive. Consider 0 < x < y < sigma. 0 == x, 0 == y, but x
!= y. This problem applies generally any time you're doing equality
comparisons with sigma.

~~~
SolarNet
You aren't following the instructions. You have to rewrite _all_ comparisons
to use a small sigma.

`0.f == x` becomes `SIGMA > x`

`0.f == y` becomes `SIGMA > y`

`x == y` becomes `SIGMA > abs(x - y)`

All the expression work if written correctly.

~~~
mauvehaus
I probably should have given a better example instead of the hasty one. -sigma
< x < 0 < y < sigma, abs(x - y) > sigma. I hope we can agree that this causes
problems with transitivity :-)

There are _also_ problems with granularity. Suppose x and y are both much,
much larger than sigma, and the smallest representable difference between x
and y is e.g. 1.0. Now there's no possible way that abs(x - y) < sigma.

I suspect that the resolution to this is to compare equality within some
number of ULPs, but: 1) You still have the transitivity problem and 2) This is
not my field of expertise, and there may be a more appropriate solution.

~~~
SolarNet
> -sigma < x < 0 < y < sigma, abs(x - y) > sigma

This statement works fine. If the numbers are farther than sigma apart then
`abs(x - y) > sigma` is true _as it should be_. It's not distance to 0, it's
distance _from each other_ related to 0 that matters.

> There are also problems with granularity. Suppose x and y are both much,
> much larger than sigma

This is part of the same problem. You have to choose a maximum acceptable
value. And ensure you don't overflow it, and then you can choose a sigma based
on that maximum value such that the difference between any two numbers never
exceeds sigma. In practice you already have a maximum value for your domain
that the simulation should never exceed or it means it is incorrect anyway.

------
jlgustafson
I have been following this bug in its various form for 30 years, and it is
covered in pages 55-62 of The End of Error: Unum Computing. It is the "hidden
scratchpad" bug where designers of a computing environment try to be helpful
by providing more precision than what is explicitly requested by the
programmer. If you read the blog postings of William Kahan about this issue,
you will find he speaks fervently on both sides, sometimes defending it and
other times vehemently deriding it. Thus the confusion. Kahan favors any
technique that improves accuracy, even if it creates behavior that can only be
explained through careful numerical analysis (which very few programmers have
the time or skill set to perform).

Unums and posits, incidentally, are designed to eliminate all sources of
irreproducibility and inexplicable behavior like this gcc bug. Hardware
support for posits is on the way, maybe before the year is out. There is a way
out of this mess, but it will take a few years to become mainstream.

~~~
wbl
No, Kahan advocates for what the C standard does which GCC doesn't implement
correctly.

------
phkahler
After reading much of the discussion, my conclusion is that it's past time to
deprecate the x87 FPU. Ever since x86_64, it is expected that SSE2 will be
available and compilers are expected to use it over the old FPU. Face it, this
is an old Intel hardware "feature" that causes problems and has been obsolete
for a long time.

------
caf
The -fexcess-precision=standard option, which is enabled by -std=c99
(requesting ICO C99 compliance), forces excess precision to be rounded away on
all assignments and casts to narrower precision types.

See [https://gcc.gnu.org/onlinedocs/gcc/Optimize-
Options.html#ind...](https://gcc.gnu.org/onlinedocs/gcc/Optimize-
Options.html#index-fexcess-precision-960)

------
Gibbon1
[https://randomascii.wordpress.com/2013/07/16/floating-
point-...](https://randomascii.wordpress.com/2013/07/16/floating-point-
determinism/)

------
solarexplorer
Always a good read:

David Goldberg: What Every Computer Scientist Should Know About Floating-Point
Arithmetic (1991)

[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html)

------
vanderZwan
I'm very surprised nobody has brought up John Gustafson's Unum work. His
latest iteration is the posit, which he presented at Stanford earlier this
month[0][1]

EDIT: apologies to CalChris[2]. What I should have said is _discussed_
Gustafson's Unum work.

[0]
[https://www.youtube.com/watch?v=aP0Y1uAA-2Y](https://www.youtube.com/watch?v=aP0Y1uAA-2Y)
(relevant question at 1h22m37s)

[1]
[http://web.stanford.edu/class/ee380/Abstracts/170201.html](http://web.stanford.edu/class/ee380/Abstracts/170201.html)

[2]
[https://news.ycombinator.com/item?id=13739617](https://news.ycombinator.com/item?id=13739617)

------
Loic
Because I am doing a lot of numerical code in Fortran using the Intel compiler
on Windows for production and GCC/GNU Fortran for prototyping on Linux I was a
bit scared by the title. This is a problem with the equality comparison with
doubles, something I try to avoid as much as possible, as this was the first
thing I learnt to do 25 years ago... never compare double for equality.

Anyway, the bug report is linking to a very nice research paper[0]: "The
pitfalls of verifying floating-point computations", which even so very
technical is a good read on the subject.

[0]: [https://hal.archives-ouvertes.fr/hal-00128124](https://hal.archives-
ouvertes.fr/hal-00128124)

------
nortiero
I think that this is a case of mismatch between two standards, not a bug. The
C standard allows higher precision values to be used in place of lesser ones
(extended precision vs double) and REQUIRES conversion to the correct
precision during assignment (or casting). Also, the C abstract machine has the
option to store those values when/if it deems opportune, it won't change
observables as defined there.

On the other side, IEEE 754 allows extended precision to be used in place of
double and of course requires that any chosen precision be kept or else.

But C Standard mandates IEEE 754 , too!

It seems to me that modern C deliberately chose to ignore such kind of
mismatch in the name of (substantial) performance gains. K&R, good or bad, was
way simpler!

~~~
leni536
> But C Standard mandates IEEE 754 , too!

Are you sure? I can't find it in the standard, specifically not in [1] page
28., section 5.2.4.2.2.

[1] (the final draft of C1X, dated 12 April 2011) [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1570.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)

~~~
nortiero
You are absolutely correct, "mandates" is too strong a word. Annex F, which is
normative, have a way out by not setting __STDC_IEC_559__, GCC6.3.0 does set a
slightly different GCC_IEC_559. Yet I think that my argument still holds:

floating point calculations can be executed at wider range (§5.1.2.3.12);

assignments and casts are under obligation to wipe out that extra precision (§
same paragraph);

I am no language lawyer, but .. given the issue of program observable effects
(§5.1.2.3.12) and the "as is" implied rule that governs optimizations, how
possibly could equality be stable over time?

~~~
leni536
Yeah, you are right. It seems to be that the problem is that we cram two
standards into too few types. Additionally to float, double, and long double
there could be optional iec559_single_t, iec559_double_t and iec559_extended_t
(similarly to stdint.h integer types). The operations on variables of these
types could be strictly iec559 conformant.

------
greglindahl
For the PathScale compiler's initial release on x86, in the unlikely event
that the x87 was used at all, I set -mpc64 as the default. We never got a
customer complaint about it, despite having many numerically-savvy customers.

------
castratikron
More proof that floats are evil. Moral of the story is to never use equality
to compare floats.

~~~
andrepd
Floats are evil? o.O What do you propose instead if you want to deal with non-
integer values?

~~~
tbirdz
There's always fixed point integers[0]. They're less flexible than floats, but
at least the amount of precision available is much more clear and uniform, and
all the rounding is pretty explicit.

[https://en.wikipedia.org/wiki/Fixed-
point_arithmetic](https://en.wikipedia.org/wiki/Fixed-point_arithmetic)

[https://en.wikipedia.org/wiki/Q_%28number_format%29](https://en.wikipedia.org/wiki/Q_%28number_format%29)

~~~
rwallace
Back in the late eighties, early nineties, I was of the opinion that fixed
point is really the way to go, the only thing holding it back was lack of
adequate compiler support. Until I realized what would actually happen if the
world started using fixed point: programmers would always set the dynamic
range too small, so we'd end up with a world full of programs that were
constantly failing because they needed to be run on data larger than the test
cases. The headaches of floating point, annoying though they can be at times,
are much less onerous.

~~~
MaulingMonkey
> Until I realized what would actually happen if the world started using fixed
> point: programmers would always set the dynamic range too small, so we'd end
> up with a world full of programs that were constantly failing because they
> needed to be run on data larger than the test cases.

Floating point doesn't actually fix this issue. A fun bug I dealt with was a
black screen caused by a few pixels having NaN values - which were then picked
up by a blur pass which would poison basically the entire back buffer. The
original NaNs were caused by divides by near-zero values causing out-of-range
results.

You now add precision issues: Multiple coordinate systems are a common hack-
around to maintain local precision when using single-precision based physics
engines. Stuttering effects when time is represented as "floating point
seconds since uptime" when integer wraparound would avoid any problem in the
first place. And this is before even adding e.g. half precision or worse
floats into the mix.

There's a reason floating point isn't used for a lot of serious banking
calculations.

~~~
rwallace
To clarify, any time I in any way praise floating point, I'm always talking
about double precision. I think single precision floating point should never
have been invented, and given that it unfortunately was, it's best to behave
as though it doesn't exist. Don't get me started about half precision.

Financial calculations are another matter; they usually require decimal
arithmetic, not because it doesn't give rounding errors, but because it gives
_the same_ rounding errors an accountant gets when he checks the result with
his desk calculator, and that's the requirement.

~~~
MaulingMonkey
> To clarify, any time I in any way praise floating point, I'm always talking
> about double precision.

64-bit fixed point solves a lot of problems too :)

> I think single precision floating point should never have been invented, and
> given that it unfortunately was, it's best to behave as though it doesn't
> exist. Don't get me started about half precision.

A 4x memory bandwidth & storage penalty for HDR buffers would be absolute
murder (bloating a single 4k RGBA surface to 265MB if I've done my math
right), so I'm glad we have the option. And in the historical context lower
precision options were invented in, RAM wasn't nearly as cheap as it is today.
Tradeoffs.

------
beached_whale
I have found that I often end up forsaking the performance and using a decimal
type. It is definitely slower, but the correctness is often more desirable.

~~~
Gibbon1
A friend mentioned being in a meeting where Mike Cowlishaw argued that
Javascript's floating point type should be decimal floats not binary floats.
Everyone including my friend thought he was out to lunch. My friend says now,
'a great opportunity was pissed away'

Because almost no one actually wants wants binary floating point.

~~~
brucedawson
Why not? Decimal floats have definite advantages if you are dealing with
decimal data, but that is not the only type of data that floats are used for,
and it probably isn't the majority. Binary floating point is better than
decimal floating point if you don't care about decimal-interop (more precision
per bit, more consistent precision), and decimal floating point has all the
same issues (except for confusing conversions to decimals).

So, if you are dealing with raw sensor data, neural nets, character models in
a game, numerical simulations, and many other types of data, the extra
precision of binary floating-point is far more important than any confusion
that arises during those (rare) conversions to decimal.

~~~
beached_whale
It's all about costs and benefits though. I do think that a decimal type
should be standard along side a float so that the most appropriate one can be
chosen. But in many contexts, one needs to go outside the immediate ecosystem
to third party libraries to get an alternative to floating point. Currency is
a good example of a type that should not use floating point arithmetic but is
far too often.

~~~
Gibbon1
I think the difference is binary floating point tends to be on massive
automatically generated sets of data. And binary floats are mostly perfect for
that.

However the bias towards scientific computing is a problem. because there are
large numbers of use cases where the inputs and outputs are decimal floats
because you're dealing with humans and their preference for base 10.

------
Kenji
This is well-known by everyone who works with float determinism. You have to
set the FPU flag with assembler to force the desired width (among other
things). See: fldcw, LDMXCSR, etc.

Take into account rounding modes and denormals too, and make sure to disable
float optimisations that break determinism in your compiler.

------
shruubi
> Regardless of where one wishes to put the blame, this problem will _not_ be
> fixed. Period.

I realise that floating-point numbers are, to use the technical term, "whack",
and I also realise that triaging the issue seems to point to a hardware error,
not a software error, but I can't get passed this "yeah, there is a problem,
so what?" attitude. Am I missing something critical here that makes this
unreasonable to work around or fix?

~~~
pizza234
According [indirectly] to one of the comments, the issue is that there is no
solution which solves all the problems in the context.

Reference:

[https://gcc.gnu.org/ml/gcc/2003-08/msg01257.html](https://gcc.gnu.org/ml/gcc/2003-08/msg01257.html)

Specifically:

    
    
        There are two things missing. The abililty to turn on the workaround in c (i.e. emit FP rounding instruction after every operation), and a bug fix for the register spills.
        
        Even with those things, I think we are still in trouble. In the first case, having explicit rounding instructions eliminates the excess precision problem, but it introduces a double-rounding problem. So we have lost again there. This is probably not fixable without changing the hardware. In the second case, fixing the problem with reload spills eliminates one source of unpredictable rounding, however, we still have the problem that local variables get rounded if they are allocated to the stack and not rounded if they are allocated to an FP reg-stack register. Thus we still have the problem that we get different results at different optimization levels. So we still lose there again also. This might be fixable by promoting all stack locals to long double to avoid unexpected rounding, but that will probably cause other problems in turn. It will break all programs that expect assignments doubles will round to double for instance. If we don't promote stack locals, then we need explicit rounding for them, and then we have double-rounding again.
        
        I really see no way to fix this problem other than by fixing the hardware. The hardware has to have explicit float and double operations.

------
77pt77
Comment 109 has a good exposition I'll just copy here:

Jan Lachnitt 2008-05-20 16:59:31 UTC

I also encountered such problems and was going to report it as a bug in GCC...
But in the GCC bug (not) reporting guide, there is fortunately a link to this
page and here (comment #96) is a link to David Monniaux's paper about
floating-point computations. This explains it closely but it is maybe too
long. I have almost read it and hope I have understood it properly. So I'll
give a brief explanation (for those who don't know it yet) of the reasons of
such a strange behaviour. Then I'll assess where the bug actually is (in GCC
or CPU). Then I'll write the solution (!) and finally a few recommendations to
the GCC team.

EXPLANATION

The x87 FPU was originally designed in (or before) 1980. I think that's why it
is quite simple: it has only one unit for all FP data types. Of course, the
precision must be of the widest type, which is the 80-bit long double.
Consider you have a program, where all the FP variables are of the type
double. You perform some FP operations and one of them is e.g. 1e-300/1e300,
which results in 1e-600. Despite this value cannot be held by a "double", it
is stored in an 80-bit FPU register as the result. Consider you use the
variable "x" to hold that result. If the program has been compiled with
optimization, the value need not be stored in RAM. So, say, it is still in the
register. Consider you need x to be nonzero, so you perform the test x != 0.
Since 1e-600 is not zero, the test yields true. While you perform some other
computations, the value is moved to RAM and converted to 0 because x is of
type "double". Now you want to use your certainly nonzero x... Hard luck :-(
Note that if the result doesn't have its corresponding variable and you
perform the test directly on an expression, the problem can come to light even
without optimization. _It could seem that performing all FP operations in
extended precision can bring benefits only. But it introduces a serious
pitfall: moving a value may change the value!!!_

WHERE'S THE BUG

This is really not a GCC bug. _The bug is actually in the x87 FPU because it
doesn 't obey the IEEE standard._

SOLUTION

The x87 FPU is still present in contemporary processors (including AMD) due to
compatibility. I think most of PC software still uses it. But _new processors
have also another FPU, called SSE, and this do obey the IEEE_. GCC in 32-bit
mode compiles for x87 by default but it is able to compile for the SSE, too.
So the solution is to add these options to the compilation command: -march=*
-msse -mfpmath=sse

Yes, this definitely resolves the problem - but not for all processors. The *
can be one of the following: pentium3, pentium3m, pentium-m, pentium4,
pentium4m, prescott, nocona, athlon-4, athlon-xp, athlon-mp, k8, opteron,
athlon64, athlon-fx and c3-2 (I'm unsure about athlon and athlon-tbird).
Beside -msse, you can also add some of -mmmx, -msse2, -msse3 and -m3dnow, if
the CPU supports them (see GCC doc or CPU doc).

If you wish to compile for processors which don't have SSE, you have a few
possibilities:

(1) A very simple solution: Use long double everywhere. (But be careful when
transfering binary data in long double format between computers because this
format is not standardized and so the concrete bit representations vary
between different CPU architectures.)

(2) A partial but simple solution: Do comparisons on volatile variables only.

(3) A similar solution: Try to implement a "discard_extended_precision"
function suggested by Egon in comment #88.

(4) A complex solution: Before doing any mathematical operation or comparison,
put the operands into variables and put also the result to a variable (i.e.
don't use complex expressions). For example, instead of { c = 2 _(a+b); } ,
write { double s = a+b; c = 2_ s; } . I'm unsure about arrays but I think they
should be OK. When you have modified your code in this manner, then compile it
either without optimization or, when using optimization, use -ffloat-store. In
order to avoid double rounding (i.e. rounding twice), it is also good to
decrease the FPU precision by changing its control word in the beginning of
your program (see comment #60). Then you should also apply -frounding-math.
(5) A radical solution: Find a job/hobby where computers are not used at all.

~~~
77pt77
On a related note, testing floating point equality should be known as a recipe
for unexpected results.

------
gruntz
[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html)

------
nayuki
This idea is related to the strictfp modifier in Java.
[https://en.wikipedia.org/wiki/Strictfp](https://en.wikipedia.org/wiki/Strictfp)

------
gasgiant
As a system administrator, my favorite part of the Bug 323 thread is the mail
bounces from postmaster.

------
laurent123456
Should add the date to the title (2000)

------
hzhou321
People need understand that floating point comparison at floating precision is
_undefined_ , so it is not a bug. The proper floating number representation
should track its precision. Since that is more complicated and potentially
performance affecting, so until then the programmer should manage the
precision tracking on their own (similar to memory allocations), that is, one
should do:

    
    
      if ( f_a < f_b-f_eps)
      if ( f_a > f_b+f_eps)
      if (fabs(f_a-f_b)<f_eps)
    

If you don't do this, then you are on your own. Sometimes it is fine, but
sometimes it may give you grief, just like the dangers of other undefined
behaviors.

PS: I also have a problem that people think "undefined behavior" is
_undefined_ because certain standards say so. Undefined behavior is due to
lack of agreement of common sense or defies common sense. For example, divide
by zero, there is no common sense doing so, and define its behavior does not
change anything. Signed integer overflow, there is no common sense agreement
for that behavior either. The common sense for undefined behaviors is to avoid
it. (On the other hand, there seems to be some common sense agreement on
unsigned integer overflow, and people are writing code based on that behavior.
But that agreement is weak, so I would avoid it if I can regardless what the
standards say.)

PS2: The reason that floating point comparison is undefined behavior because,
unlike fixed precision or integers, floating point number has floating
precision. Floating precision does not have common sense correspondence. In
any real application (where common sense is based), a finite, and above all,
consistent precision, is always assumed.

And the solution, by manually specifying a f_eps, we put that comparison into
a common-sense understanding that you know the result of comparisons is only
meaningful above f_eps. With that common sense, having an internal calculation
in higher precision is always OK -- isn't that common sense?

PS3: The question is why not turn undefined behaviors into errors? Undefined
behavior does not mean there are no behaviors, and the behaviors can be used,
or more often, ignored to achieve certain efficiency. When programmers write
code with undefined behaviors, they are extracting their code out of common
sense domain, but into a special sense domain. Special sense domain is
practically fine. Everyone has their own efficient quirks, just do not expect
your quirks to be understood or supported by out large where common sense
rules (unless you carefully define it and document it). A language that allows
for undefined behaviors allows for individual efficiency, which sometimes it
is rather good.

For example, it is ok to write programs with direct floating point number
comparisons as long as you understand that your program does not care when the
comparison falls into the ambiguous range. Only you who understand your
special application can assess that. If you are the only one who is
responsible for the result of that code, then it is perfectly fine!

~~~
enriquto
Essentially everything you say here is false.

Floating point numbers represent rational numbers, and their arithmetic is
straightfoward and follows common sense: you compute the exact operation using
rational numbers, and the result is the closest floating point number
available.

Floating point operations are never "undefined" either, they are perfectly
deterministic and completely specified by the IEEE 754 standards.

There are three floating point values that do not correspond to rationals:
INF, -INF and NAN. There is also "-0" that corresponds to the same rational as
"0". The arithmetic of these values is extremely common sense to anyone that
knows some calculus: 1/0 = INF, INF*0=NAN, INF+INF=INF, INF-INF=NAN, etc.

There is no ambiguity whatsoever in floating point numbers.

~~~
jlgustafson
You say 1/0 = INF, but the 754 standard says negative zero is different, and
1/(-0) = -INF. Yet it says -0 is numerically equal to 0. It also specifies
that the square root of -0 is -0. This is neither straightforward nor common
sense.

They also do not round to the nearest representable rational number. If you
double MAXREAL, it will round to INF, even though MAXREAL is infinitely closer
to 2*MAXREAL than INF is.

Despite efforts in the 2008 update to the 754 Standard, there still is no
guarantee that the Standard will produce reproducible, bitwise identical
results. The Standards committee admits this. It is important for everyone to
understand the shortcomings of the Standard.

~~~
enriquto
Well, "common sense" is not the same for everybody then. For a mathematician,
the rules 1/0 = INF, 1/(-0) = -INF are perfectly straightforward and common
sense (if you think about the limits).

Regarding sqrt(-0)=-0, I concede it looks a bit strange. Yet, as explained in
the famous Kahan article "much ado about nothing's sign bit", it turns out to
be natural when you work with complex numbers.

As for the nondeterminacy, I suppose you are talking about transcendental
functions? The basic arithmetic should be bitwise reproductible.

