
GNU GCC does not round floating-point divisions to the nearest value - guiambros
https://lemire.me/blog/2020/06/26/gcc-not-nearest/
======
sonofgod
An example where this matters -- a new Mario 64 trick on the Virtual Console
version because summing round-towards-zero numbers over a sine wave cycle
produces a small delta which builds up over a couple of days to make a jump
possible

[https://www.kotaku.com.au/2018/06/the-mario-64-trick-that-
ta...](https://www.kotaku.com.au/2018/06/the-mario-64-trick-that-takes-three-
days-to-complete/)

~~~
dx87
For anyone else interested in similar tricks, here is an in-depth video where
someone talks about getting a star with half a button press. It starts out
straightforward enough, but eventually gets really interesting when they start
talking about parallell worlds that you can only access by abusing bugs to
reach unrealistic speeds.

[https://www.youtube.com/watch?v=kpk2tdsPh0A&t=77s](https://www.youtube.com/watch?v=kpk2tdsPh0A&t=77s)

~~~
LambdaComplex
Best followed by this video, which made me laugh my ass off for several
minutes: [https://www.youtube.com/watch?v=hSa-
pwML4z4](https://www.youtube.com/watch?v=hSa-pwML4z4)

(I grew up watching SpongeBob, if that matters)

~~~
Causality1
That was excellent. If I may suggest a followup:
[https://youtu.be/Pyn3N55elS4](https://youtu.be/Pyn3N55elS4)

------
saagarjha
I believe the issue is that some of this ends up evaluated at compile-time,
and the compiler really has no idea how to do rounding correctly at that
point. When implementing floating point tests for an x86 emulator, we had to
create a macro that would ensure that the calculation was done at runtime
rather than at compile time for this exact reason: [https://github.com/ish-
app/ish/blob/18176b6931d69bdabe48f137...](https://github.com/ish-
app/ish/blob/18176b6931d69bdabe48f137e48d8e8118c22122/emu/float80-test.c#L11).
If anyone knows how to control floating point rounding in C reliably, I'd be
interesting in hearing it :)

~~~
avar
> If anyone knows how to control floating point rounding in C reliably, I'd be
> interesting in hearing it

I skimmed over that code and it seems that hack is required because a) you're
not happy with what the compiler does with a double b) insisting on using a
"double" type.

It's easy to get your own semantics for numbers in C, it's not easy to do so
while insisting on using C's own types whose behavior are contrary to what you
want.

If you want your own floating point semantics why not make a struct with a
significand & base and pass that around? Something like that is how
established numbers-in-C-without-native-C-semantics libraries do it, e.g. GMP.

------
secondcoming
If you add '-ffast-math' to the compiler parameters in his godbolt link you
get the 'right' answer. Doing that means the division is done with the 'divsd'
instruction, which clang uses even without the 'fast-math' flag.

That doesn't mean it's necessarily the right thing to do though.

Clang can be made generate the same code as the 'broken' gcc output by
declaring the variables as 'long double' instead of 'double'. So I think this
is where the issue lies?

Edit: godbolt link requiring large monitor:
[https://godbolt.org/z/tFiZwW](https://godbolt.org/z/tFiZwW)

~~~
saagarjha
-ffast-math should rebrand itself to -ffast-and-more-correct-math ;)

~~~
garmaine
It’s normally not more correct.

~~~
all-fakes
I thought fast-math meant (among other things) that the compiler could
algebraically simplify things using some rules for real numbers that don't
apply to 32-/64-bit floating point numbers. I'd expect that reducing the
number of operations would more often than not reduce the error due to
rounding, which is what most people would want when they talk about
"correctness".

Which step in this (very naive) argument is wrong?

~~~
bonzini
Stuff like Kahan summation breaks horribly with -ffast-math, because if you
assume that sum is associative the error term, which is ((s+x)-s)-x,
simplifies to zero.

[https://en.m.wikipedia.org/wiki/Kahan_summation_algorithm](https://en.m.wikipedia.org/wiki/Kahan_summation_algorithm)

~~~
kevin_thibedeau
That's where you dust off volatile.

~~~
nestorD
Sadly true. That's the only portable way I found to implement something akind
to a Kahan sum so that no compiler / flag would destroy it.

------
brudgers
As I understand it, the IEEE floating point standards do not require exact
rounding for binary to decimal conversions. See _What Every Computer Scientist
Should Know About Floating-Point Arithmetic_
[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html)

The behavior appears to be consistent with the standard. Keep in mind that GCC
needs to provide the same result irrespective of the architecture it runs on
and one reason to use GCC for floating point is to take advantage of dedicated
hardware. And the reason to use floating point is speed not precision (at
least when using decimal floating point). If precision is critical then
arbitrary precision or fixed point is the way to go. Normally it isn’t that
critical.

~~~
maxlybbert
Look for the heading, “The IEEE Standard” in “What Every Computer Scientist
Should Know About Floating-Point Arithmetic,” then scroll down to the
subheading “Operations.” That section begins, “The IEEE standard requires that
the result of addition, subtraction, multiplication and division be exactly
rounded. That is, the result must be computed exactly and then rounded to the
nearest floating-point number (using round to even).”

A few paragraphs later, it adds, “There is not complete agreement on what
operations a floating-point standard should cover. In addition to the basic
operations +, -, × and /, the IEEE standard also specifies that square root,
remainder, and conversion between integer and floating-point be correctly
rounded. It also requires that conversion between internal formats and decimal
be correctly rounded (except for very large numbers).”

I will admit that I worked as a programmer for several years without knowing
that. I learned it from a former physics professor who has now moved on to
become a quant with a hedge fund. He usually didn’t worry much about it, but
when he needed to be sure a particular calculation was as accurate as
possible, he’d start counting individual operations and stay away from
particular function calls.

~~~
brudgers
Yes, you are correct. Thanks. This stuff is hard for me.

Reading down thread, I see that the behavior might be related to x87 since the
build script references i386...which makes IEEE 754 conformance moot. The
methods of achieving conformance when targeting i386 are described in the GCC
documentation.

------
alkonaut
x87 is crazy. Especially since you can’t easily know when something is
truncated from 80 bits to 64. Add a parameter to a function and an unrelated
calculation behaves differently because it now ran out of 80bit registers.
Congratulations, you now have a spurious test failure you will spend weeks
trying to find.

The only way of doing FP on x86 and keeping your sanity is by making sure you
use 32/64 bit floats 100% of the time. That is: use SSE registers and never
the 80-bit x87 arithmetic.

I thought compilers for 64 bit programs these days did exactly this. I know
the .NET Jit does for example.

~~~
nwallin
The article is about 32 bit gcc. 64 bit gcc defaults to SSE, although you can
force x87 if you want to. (although... maybe don't do that)

------
amluto
Perhaps the title should mention that this is 32-bit x86, which is rather
obsolete. This should work okay on x86_64.

~~~
tbodt
Are you sure? The FPU instruction set is the same.

~~~
olliej
The platform ABI can differ on the same cpu.

I originally had the wrong answer here, but I'm including below for historical
purposes and embarrassment. Re-reading on my laptop meant I didn't
misread/read-the-expected?

Basically by default when compiling for IA32 gcc assumes that there is no SSE
unit, and so must use the x87 fpu.

Alas the x87 unit doesn't have either 32 or 64 bit ieee floating point types,
so the calculation at runtime must be rounded to float/double at the end of
the computation. That results in a double round that is incorrect.

Minor additional comment - x87's reduced precision modes (that reduce mantissa
precision to ieee754-32/64) don't reduce the exponent size so it's conceivable
you could get multiple-rounding derived errors even then. But no one would be
toggling x87 state like that anyway, so theoretically not something that would
matter in most cases.

 __Old incorrect answer:

On ia32 long double is the x87 80bit ieee754 numeric type.

On x86_64 on windows, and maybe Linux, long double is just a 64bit ieee float.

So my guess is that GCC computes constant evaluation of floating point
expressions in “long double” for maximum precision, and then _rounds_ to the
required precision at the end. This is not valid going from 64->32 bit floats
and you could probably induce this bug if you tried hard enough.

Anyway it is not sound to round twice in any numeric operation, and by doing
the calculation at one precision and then _rounding_ that to the required
precision that is what gcc is doing.

This is a bug in GCC, floating point is well defined, there isn’t randomness
to it, and they’ve introduced additional rounding that is not valid.

~~~
bonzini
> Anyway it is not sound to round twice in any numeric operation, and by doing
> the calculation at one precision and then rounding that to the required
> precision that is what gcc is doing. > > This is a bug in GCC, floating
> point is well defined, there isn’t randomness to it, and they’ve introduced
> additional rounding that is not valid.

Except the C standard has a way to communicate how the extra rounding is being
introduced, and it's not random (for C, not C++).

There is simply no way around this issue with the x87 instruction set, GCC
(again only for C only, not C++) implements FLT_EVAL_METHOD==2 which is the
most sensible that can be implemented with decent performance and without
breaking 80-bit long double.

If you are sure you don't need long doubles, use -mpc64.

~~~
olliej
as my edit said the problem was gcc was confined to using x87 as the compile
args required x87 - the older version of the comment incorrectly thought it
was the optimizer going wrong, rather than runtime.

If you are constrained to x87 your only option is multiple roundings which is
bad. Ideally you'd just keep processing in the x87 stack as long as possible,
but that would result in a different version of "incorrect" results.

Honestly if you're a compiler required to use x87 with non-80bit floating
point I'm not sure how you can do the right thing :-/

I don't know what gcc has but llvm has a full software floating point
implementation to handle cross compiling so it can match native arithmetic.

~~~
bonzini
Yes, FLT_EVAL_METHOD 2 is an extension of "keep values on the stack as much as
possible" that also requires the compiler to spill intermediate results as
long doubles. Apparently it also doesn't cover constant folding, but knowing
who implemented it in GCC (Joseph Myers) he's extremely thorough and I have no
doubt it's allowed by the standard.

That's the best they can do.

~~~
olliej
yeah, once you are restricted to performing (at runtime) computation at a
precision that does not match the specified storage precision, the compiler is
kind of screwed :-/

------
wiml
This seems like a property of the Intel 8087 math coprocessor and its distant
descendants found in x86 chips, not anything to do with gcc?

~~~
bhouston
But then why does python and JavaScript on the same machines get it right?

~~~
nwallin
Because they're compiled to use SSE instructions instead of x87 FPU
instructions.

If you are still writing 32 bit x86 C code, it's probably some embedded or
legacy CPU, and it's wrong to assume SSE instructions are available. So GCC
correctly defaults to using FPU instructions instead of SSE instructions on 32
bit x86.

Javascript and python will basically never be run in embedded environments,
and their legacy environments probably aren't _that_ legacy that SSE
instructions aren't available. So it's reasonable to default to SSE.

~~~
admax88q
Plus JavaScript is interpreted and JITed, it can detect and decide at runtime
whether to use SSE or x87.

------
montalbano
Learning C recently, I ran into confusion around FLT_EVAL_METHOD as my dirty
(not recommended) trick for getting machine epsilon didn't work (7.0/3.0 -
4.0/3.0 - 1.0). With strict IEEE 754 floating points it works fine (Python,
Julia, Go all work as expected). But gcc, with c11 flags at least, performed
the computation using the 80 bit upsampling mentioned in this post. Caused
some head-scratching initially.

Edit: Also, curiously, things worked as expected for me using -std=gnu11. This
was a fairly old version of GCC though, things might have changed.

------
fancyfredbot
The worst of this is that internal 80 bit x87 registers are truncated to 64
bit when written to memory. You have essentially no control over when this
happens from the language level. It meant that floating point code was all
rather undefined and you got different results on different compilers. As
someone who has had to write multi platform numerical code I'm very thankful
it's gone from AMD64.

------
Traster
If you're interested in the accuracy of floating point operations, I recommend
taking a look at the khronos openCL[1], it's not as clear cut as you would
think. I think there are good reasons not to expect floating point ot be bit
accurate.

[1]:[https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenC...](https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#relative-
error-as-ulps)

------
torstenvl
Clang with default settings seems to get it right:

[https://imgur.com/a/d3LBlLK](https://imgur.com/a/d3LBlLK)

~~~
bonzini
It may gets it right for this particular test case, but it will get others
wrong. It's impossible to say "it gets it right" from a single testcase or
three, you have to look at the compiler innards.

See
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323)
for the gory details; clang works the same as GCC's C++ frontend and the same
that GCC <4.5 used to behave for C.

------
known
DIY rounding function is given here;

In unbiased rounding, .5 rounds to even, rather than always up, so 1.5 rounds
to 2 but 4.5 rounds to 4

[https://www.gnu.org/software/gawk/manual/html_node/Round-
Fun...](https://www.gnu.org/software/gawk/manual/html_node/Round-
Function.html)

------
upofadown
So the complaint is that this loses half a bit of precision on average? Is
that a lot?

~~~
kevingadd
If you lose half a bit on every operation and do multiple operations, that
could be pretty bad. If it's half a bit at the end of a long chain of
operations, that's tolerable and not worse than the precision specified by the
docs for some execution environments (GPU shaders, for example)

It also matters a lot whether the error is predictable and possible to
compensate for.

------
mjcohen
Back in the 90's I was working on some Ada code with some problems that were
traced to conversion of strings to floating point at run time differing from
conversion at compile time.

------
helltone
I haven't seen anyone mentioning it, but I wonder if -ffloat-store fixes this?

------
mikedilger
rust (default settings, amd64) gets 0.501782303180000055

~~~
MayeulC
Does Rust still uses LLVM? If so, I'd tend to expect a result similar to
Clang.

------
gcc_programmer
The thread comments didn't dissapoint:

> someone defended gcc

> someone attacked it

> someone who just started learning C or C++ made a comment

> someone mentioned clang did it right

> someone mentioned Rust

~~~
gcc_programmer
and I got downvoted for not contributing meaningfully to the topic! The set is
full

