
My C code works with -O3 but not with -O0 - mulle_nat
https://www.mulle-kybernetik.com/weblog/2020/compiler_or_cpu.html
======
mokus
The title is actually wrong - the -O0 version is correct, the -O3 version is
not (despite giving the output the author expected).

Casting the value to double ends up converting the long value
0x7fffffffffffffff to the nearest double value: 0x8000000000000000. As the -O0
version CORRECTLY reports, this does not round-trip back to the same value in
the "long" type. Many other values, though not all, down to about 1/1024 of
that value (1 / 2^(63-53)) will also fail to round-trip for similar reasons.

Unless my coffee-deficient brain is missing something at the moment, it should
be the case that any integer with 53 bits or fewer between the first and last
1 bit (inclusive) will roundtrip cleanly. Any other integer will not.

Edit: fixed a typo above, and coded up the idea I expected to work and ran it
through quickcheck for a few min, and this version seems to be correct ('int'
return rather than bool is just because haskell's ffi doesn't natively support
C99 bool):

    
    
        #include <limits.h>
        
        int fits(long x) {
          if (x == LONG_MIN) return 0;
        
          unsigned long ux = x < 0 ? -x : x;
          while (ux > 0x1fffffffffffffUL && !(ux & 1)) {
            ux /= 2;
          }
        
          return ux <= 0x1fffffffffffffUL;
        }

~~~
OskarS
So the problem here is that in this line:

    
    
          if( value < (double) LONG_MIN || value > (double) LONG_MAX)
              return( 0);
    

The cast of LONG_MAX to double rounds upwards (which is allowed by the
standard, which says that rounding direction is "implementation defined"), so
that "value > (double) LONG_MAX" is false, right? Even though the mathematical
value of "value" is larger than LONG_MAX?

Which then leads to this line:

    
    
       l_val = (long) value;
    

Where value is cast to a long despite being outside of the range of longs,
thus causing undefined behavior. So to be clear, BOTH of the -O0 and -O3
versions are correct, since both invoke undefined behaviour.

When -O0 and -O3 give different results, either at least one of them is
incorrect and you've stumbled on a compiler bug, or both of them are correct
and you're invoking UB (the far more common situation).

EDIT: no, I think I misunderstood it: it's not that value is larger than
(double) LONG_MAX, it IS (double)LONG_MAX, so of course "value >
(double)LONG_MAX" is false.

The problem is that the "(long)((double)LONG_MAX))" is undefined behaviour on
implementations that round (double)LONG_MAX upwards instead of downwards.
Which is allowed by the standard. Ok, cool :)

~~~
Joker_vD
Huh. So, how does one check that a cast from double to long would succeed in
C? Just go and do lround()+errno check?

~~~
ekimekim
I think the most correct way would be to explicitly set the rounding mode to
down or towards-zero, then calculate largest_double_that_fits = (double)
LONG_MAX, then check value <= largest_double_that_fits. Since
largest_double_that_fits is guarenteed to be <= LONG_MAX, and value <=
largest_double_that_fits, it follows that value <= LONG_MAX.

~~~
ambrop7
This is not right. The current rounding mode is not guaranteed to apply to
integer to floating point conversions. Quoting C99, section 6.3.1.4, paragraph
2:

"When a value of integer type is converted to a real floating type, if the
value being converted can be represented exactly in the new type, it is
unchanged. If the value being converted is in the range of values that can be
represented but cannot be represented exactly, the result is either the
nearest higher or nearest lower representable value, chosen in an
implementation-defined manner. If the value being converted is outside the
range of values that can be represented, the behavior is undefined."

See it says implementation-defined manner, not according to the current
rounding mode.

Test case:

    
    
        #pragma STDC FENV_ACCESS ON
        #include <stdio.h>
        #include <stdint.h>
        #include <fenv.h>
    
        int main()
        {
            fesetround(FE_DOWNWARD);
            printf("%f\n", (double)UINT64_MAX);
            fesetround(FE_UPWARD);
            printf("%f\n", (double)UINT64_MAX);
            return 0;
        }
    
        $ gcc -std=c99 -frounding-math x.c -lm
        $ ./a.out 
        18446744073709551616.000000
        18446744073709551616.000000
    

In both cases it rounded up.

~~~
ekimekim
I stand corrected, thanks! What a bizzare choice when floating point standards
already have a well-defined system for determining how to round things.

------
0xff00ffee
Hoo boy. This is what happens the first time C programmers start to work with
floating point and don't know the fundamentals.

When you work with floating point, you need to remember you work with a
tolerance to epsilon for comparisons because you are rounding to 1/n^2
precision and different floating point units perform the conversion in
different ways.

You must abandon the idea of '==' for floats.

This is why his code is unpredictable, because you cannot guarantee the
conversion of any integer to and from float is the same number. Period. The
LSBs of the mantissa can and do change, which is why we mask to a precision or
use signal-to-noise comparisons when evaluation bit drift between FP
computations.

He has the first part correct, < and > are your friend with FP. But to get
past the '==' hurdle, he needs to define his tolerance, the code should be
something like:

if (fabs(f1 - f2) > TOLERANCE) ... fits = true.

I was irked by his arrogance when he asks, "Intel CPUs have a history of bugs.
Did I hit one of those?" First, learn about floating point, then, work on an
FPUnit team for 10 years, and even then, don't assume you're smarter than a
team of floating point architects, you're not.

~~~
titzer
Floating point is indeed hard. It's made harder because the C programming
language, due to its history supporting a large number of targets, especially
for high performance computing, does not even mandate IEEE 754. (C predated
IEEE 754). IEEE 754 is actually very precise about rounding, rounding modes,
etc. The x87 floating point coprocessor is also heavily to blame, because of
its internal 80-bit precision. Decades of headaches. Other headaches come from
ARM NEON's vector instructions not implementing subnormals (RTZ mode), which
is not IEEE, which makes vector math differ from scalar math in corner cases.
GPUs also went through a similar evolution. Slowly the industry is moving to
all-IEEE 754 compliant arithmetic. There's a lot more to say, but yes, I agree
with you, it's complicated.

~~~
Gibbon1
Yeah and then we have the problem that most floating point variables shouldn't
be IEEE 754. And we're mostly fucked because most languages don't support
anything else.

~~~
mark-r
If they shouldn't be IEEE 754 then what would you use instead? IEEE 754 exists
because a lot of smart people put a lot of thought into what the best floating
point representation should be. Then the hardware folks put a lot of work into
making it fast and accurate.

~~~
Gibbon1
IEEE 754 is designed for scientific computation and simulation. And it's very
good for that. However it's terrible for representing decimal numbers and
fractions. You end up with the exact problems described in the article.

It's kind of a problem since most programmers aren't writing programs to do
scientific/modeling. They're writing programs that are doing decimal math and
currency where IEEE 754 is inappropriate.

~~~
mark-r
I ask again: what would you use instead? The industry has decided that IEEE
754 is the best all-around compromise, and everything is optimized for that
case. Most programs aren't doing "decimal math", but programmer expectations
are shaped by that since that's what they've used all their life.

~~~
coldtea
> _I ask again: what would you use instead?_

And I'll repeat what the parent said, again: decimal math.

> _Most programs aren 't doing "decimal math"_

And yet they do. Most programs do decimal math with floats, either because the
programmers don't know that's not a thing (and consider floats the same as
decimals), or because the language doesn't give them anything else (e.g.
Javascript up until recently), or because workarounds are cumbersome (e.g.
representing as fractions or working with integers for money with a scale
factor and downscaling in the end, etc).

But in almost all programs, the kind of math the program needs to be done,
from money handling, to estimating distances, to layout, etc are better done
as decimal.

Floating point don't represent a way to solve any kind of actual problem,
except the problem of speed and memory representation. So their use is because
of computer constraints, not because the problem domain requires floating
calculations.

~~~
mark-r
The only point I'll concede is monetary calculations. Estimating distances?
Layout? I can't see how the difference between decimal and binary changes
anything.

Decimal math isn't built into most of the languages or processors we use, so
that's why I keep asking what you'll use. It's a kind of hand-wavy response.
You'll need to seek out an appropriate library and use it consistently, which
is a lot more trouble than just using the built-in IEEE. Not impossible, just
usually more trouble than it's worth.

~~~
coldtea
> _Estimating distances? Layout? I can 't see how the difference between
> decimal and binary changes anything._

Whether it changes everything is neither here not there.

It might still result in the same output (e.g. the lack of precision might not
matter for layout purposes), but there's absolutely no domain need to do the
calculations in FP except for performance or lack of a better number type in
the language.

In all of these cases, the exact match would be better than lack of precision,
aside from the performance and similar (memory, etc) costs.

That is, if decimal was computing-wise as fast and as available as FP, there
would be absolutely no reason to use FP. It's not a choice of mathematical
need (e.g. not imposed by the calculation) but a choice of computing
constraints.

> _Decimal math isn 't built into most of the languages or processors we use,
> so that's why I keep asking what you'll use._

In my experience many languages have it (C#, Java, Python, Javascript is
getting it, etc) and people don't use it.

And in most cases, being in the processors wouldn't matter - we use FP for all
kinds of non-critical performance calculations, not because it's needed for
the speedup, but because it's there.

When there is indeed a need for the CPU-support, I'd use FP myself, sure. But
not in most other cases, and absolutely not for money and other cases where
precision matters (but people still use FP).

------
hannob
This may be an instance of "you should really know the gcc/clang sanitizers
and use them to test your code":

clang test.c -O0 -fsanitize=undefined

./a.out

[...]

test.c:17:12: runtime error: 9.22337e+18 is outside the range of representable
values of type 'long'

Interestingly gcc doesn't throw that warning.

~~~
tyingq
Does that verifier have a strong possibility of false positives? I'm curious
why C compilers have such a strong history of making reasonable checks
optional and hidden behind a bunch of switches.

~~~
hannob
No, actually the false positive rate of these flags is practically zero. (I'm
not sure if it's 100% zero, but I used those extensively, reported many bugs
and every time a developer told me "this is a false positive" they were
wrong.)

The reason they aren't enabled by default is that that's not what they're
designed for. They have a significant performance impact, you can't enable
them all at once, they conflict with other security features and they may
introduce security issues.

These are developer features. They aren't there to run your production code,
they are there to test during development and bug finding.

~~~
saagarjha
Here's one that I encountered that I cannot figure out:

    
    
      $ g++ -x c++ -fsanitize=undefined -
      #include <iostream>
      
      int main() {
          int *a = nullptr;
          std::cout << std::addressof(*a) << std::endl;
      }
      ^D
      $ ./a.out
      <stdin>:5:29: runtime error: reference binding to null pointer of type 'int'
      0

~~~
Karliss
You are not allowed to dereference a null pointer no matter how you use it
even if you later convert it back to address.

~~~
hvdijk
You are allowed to dereference a null pointer, but you are not allowed to
access the result or use it as an initialiser for a reference. You can only
immediately discard the result, or immediately take the result's address.
Using std::addressof(* a) first binds a reference, then uses that reference to
take the address, hence the error. You won't get any UBSAN error with &*a.

~~~
MaulingMonkey
> You are allowed to dereference a null pointer

Can you cite me something for that? The very first mention of dereferencing in
the C++03 standard - ISO/IEC 14882:2003 1.9 Program exection ¶ 4 (page 5) -
would seem to disagree:

> Certain other operations are described in this International Standard as
> undefined (for example, the effect of dereferencing the null pointer). [
> _Note:_ this International Standard imposes no requirements on the behavior
> of programs that contain undefined behavior. ]

EDIT: As does 8.3.2 References ¶ 4 (page 136):

> [...] [ _Note:_ in particular, a null reference cannot exist in a well-
> defined program, because the only way to create such a reference would be to
> bind it to the “object” obtained by dereferencing a null pointer, which
> causes undefined behavior. As described in 9.6, a reference cannot be bound
> directly to a bit-field. ]

~~~
hvdijk
The standard doesn't say dereferencing a null pointer is invalid. The fact
that it gives it as an example of undefined behaviour is a defect in the
standard. In the discussion on core language issue #232, the intent has been
explicitly stated:

> Notes from the October 2003 meeting:

> ...

> We agreed that the approach in the standard seems okay: p = 0; *p; is not
> inherently an error. An lvalue-to-rvalue conversion would give it undefined
> behavior.

[http://open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#232](http://open-
std.org/JTC1/SC22/WG21/docs/cwg_active.html#232)

WRT your edit: no, that says dereferencing a null pointer and binding a
reference to the result produces undefined behaviour. That agrees with what I
was saying. "which" refers to the whole of "to bind it to the "object"
obtained by dereferencing a null pointer", not just to "dereferencing a null
pointer".

~~~
MaulingMonkey
I see! Thanks for the reference.

------
ThreeFx
A precise integer value is only guaranteed to be representable losslessly in a
double if it is up to `64 - 1 (sign) - 11 (exponent) = 52` bits in magnitude.

This should be fairly obvious with knowledge about how floating point numbers
are represented internally IMO.

Edit: Be more precise about what can be represented.

~~~
yoz-y
I've seen quite a lot of errors stemming up from assumptions about floating
point numbers. Not sure there is a good way of handling this in the end,
except exert caution. Even basic assumptions like f + 1 > f will have this
issue.

~~~
andrepd
i+1>i breaks even for unsigneds

~~~
nybble41
At least for unsigned integers i+1!=i, and the only exception to i+1>i is
i==UINT_MAX. For signed integers it may be undefined behavior, of course—again
in just one instance—but for floating point you may have f+1==f, or (f+1)-f>1
depending on the rounding mode, for _many_ values of f which are nowhere near
the limits of the range, not to mention other oddities like (f==f)==false when
f is NaN. If you code involves floating-point operations then the number of
corner cases you need to test increases greatly.

------
dfranke
Clear your floating point exception register by calling
feclearexcept(FE_ALL_EXCEPT). Convert to long by calling lrint(rint(x)). Then
check your exception register using fetestexcept(). FE_INEXACT will indicate
that the input wasn't an integer, and FE_INVALID will indicate that the result
doesn't fit in a long.

Edit: check for me whether just calling lrint(x) works. The manpage doesn't
specify that lrint() will set FE_INEXACT, but it seems weird to me that it
wouldn't.

~~~
pjc50
I had no idea this feature existed! Does it behave usefully in a multithreaded
context?

~~~
clarry
Yes it does. N1570 7.6:

> The floating-point environment has thread storage duration. The initial
> state for a thread's floating-point environment is the current state of the
> floating-point environment of the thread that creates it at the time of
> creation.

------
g82918
To save some time for new readers, the author is unfamiliar with floating
point representation and thinks that a double precision number since it is 64
bits can hold any 64 bit integer and is somewhat confused as to what an xmm
register can hold(they believe that it has 128 bits of precision instead of
being able to hold 2 64-bit doubles, or 4 32-bit singles). They attempt to
find the issue a few ways. The correct solution is not to convert any integer
larger than 2^53 in absolute value since only integers that large can be
successfully converted to double and back( aside from a few others that exist
sparsely).

------
pjc50
> When something very basic goes wrong, I have this hierarchy of potential
> culprits:

I don't know if this is supposed to be a joke or part of the setup for an
explanatory post about undefined behaviour, but that list is in exactly the
wrong order.

~~~
scoutt
I agree, but I'd also say that silicon bugs are rarer, so I put them at the
end of the list.

------
heftig
SSE xmm registers might be 128 bits wide, but the precision is still 64 bits.
The additional (high) bits are zeroed out.

What you're seeing is not excess precision due to wide registers but excess
precision due to optimization and constant propagation, which means GCC
calculates a fast path for (argc == 1) that doesn't round correctly and ends
up with "it fits".

Interestingly it does optimize to the correct "doesn't fit" with -mfpmath=387
-fexcess-precision=standard, so I guess this is a bug in how GCC treats SSE
math. The sanitizer (-fsanitize=float-cast-overflow) also notices the problem.

------
yuriko
Based on my experience, this title is a strong hint that some undefined
behaviour is triggered.

------
mojuba

        if( ! fits)
    

Why this (constently) terrible formatting though? Never seen anyone using this
style.

~~~
inetknght
Ahh you ought to see my style. I've been told it's unique and quite ugly.
Linters tend to very much dislike it. Nonetheless my style has a purpose to me
and I'm sure so does the author's.

~~~
mojuba
I think glueing punctuation to the previous word but always having a space
before the next one makes the words more readable. Probbaly makes sense
although you always need research to back up such claims. For example there's
research that says snake_case is more readable than camelCase (and yet most
languages encourage camelCase for some reason)

------
mulle_nat
With the help of your comments, I could now write the conclusion to my
article. In a nutshell this is the solution:

    
    
        #include <math.h>
        #include <fenv.h>
    
    
        int   fits_long( double d)
        {
           long     l_val;
           double   d_val;
     
        // may be needed ?
        // #pragma STDC FENV_ACCESS ON
     
           feclearexcept( FE_INVALID);   
           l_val = lrint( d);            
           d_val = (double) l_val;       
           if( fetestexcept( FE_INVALID))
              return( 0);
     
           return( d_val == d);
        }
    

The article explains it in more detail. Thanks for the help.

------
NullPrefix
The frst rule of floating point comparison is you do not compare them for
equality, but instead calculate the difference and check if the difference is
less than epsilon.

~~~
ska
This is super common advice but it is generally wrong, at least the second
part.

Comparing floats is more subtle than most programmers realize, and there
really isn't a one-size-fits-all solution.

Things to consider due to the nature of fp representation \- comparing results
close to zero is different (i.e. "is a small" needs a differen test than "are
a & b close"

\- the distance between fp numbers depends on their magnitude, so comparing
two large numbers to each other shouldn't have the same bounds as comparing
two numbers near 1, say[1].

\- if you aren't quite careful you can easily create tests where a == b but b
!= a , which can cause sorting issues, etc.

Hand-wavily speaking if you want to do this "right", you should probably look
at doing the analysis in ULP (units in last place) rather than directly on the
floats. Don't do it for values near zero though. And have a fast path for
differently signed values.

The above doesn't even get into denormalized values.

[1] note that what people usually mean for epsilon is the version of machine
epsilon that is the difference between 1 and the next representable float
above 1 [2]. So by definition this is smaller than the representable
difference between any two numbers in larger decades

[2] MS .NET somewhat confusingly defines Epsilon as the smallest representable
normalized number.

------
ginko
>I am still looking for a better way to check, if a double will convert
cleanly to an integer of the same size or not.

I'd say the cleanest would be to decode exponent and mantissa, check if the
exponent is within the 64-bit limit of long, then check if there's any bits
set below the decimal point. (+plus some extra care for two's complement
negative numbers)

The problem with this is of course that this would be platform dependent.

~~~
clarry
I'd say the correct way to is to use lrint or lround and check for errors the
standard way.

------
apta
This is why using safe languages is important. Even frequent users of C and
C++ end up making mistakes that are difficult to track down.

~~~
syockit
I'm not sure what you mean by safe in this context. A language that forces
type casts to be explicit? That throws runtime error when invalid/imprecise
cast is done?

------
Aardwolf
The most surprising thing for me out of this is that casting a high positive
integer to double will output the _nearest_ double which could be higher, not
the highest one smaller than or equal to the integer value.

Is there a way to get the largest double smaller or equal than some positive
integer?

------
correct_horse
> When something very basic goes wrong, I have this hierarchy of potential
> culprits: the compiler buggy hardware OS vendor last and least me, because I
> don’t make mistakes :)

I really dislike the arrogant programmer trope. Can we all stop?

------
CGamesPlay
How did you decide that "the method works for LONG_MIN"? Did the method return
the expected output of false? Because it really seems like the code is working
correctly on `-O0` and incorrectly on `-O3`...

~~~
eMSF
Why would you expect the output to be false? On any typical system around
(with 8-byte longs), LONG_MIN has the value of -2^63 which (converted to a
IEEE double-precision float) passes the function's checks just fine -- even if
the values around it don't.

~~~
CGamesPlay
Ah, my mistake. Still, seems like the -O0 is actually what's correct and the
-O3 is reporting an incorrect answer.

------
adammunich
I had something similar happen but with GCC generating an internal compiler
error and just plain failing. Still haven't figured out why.

------
syockit
I'd say just put volatile and be done with it. Now your -O3 will also break,
but at least it's consistent with -O0 :p

