
Fun with UB in C: returning uninitialized floats - luu
http://yosefk.com/blog/fun-with-ub-in-c-returning-uninitialized-floats.html
======
Filligree
Can I just suggest that... forcing programmers to become language lawyers just
so their language-lawyer compilers don't get overly clever and optimise away
half their program is _probably a bad thing_?

~~~
rcfox
Someone says this every time the topic of undefined behaviour comes up.

Ignoring the original portability concerns, there's a large of set of
optimizations that are only possible if you can assume that no undefined
behaviour occurs.

Accessing past the bounds of an array is undefined, and generally a bad thing
to do. If the compiler decides that a block of code could only run if you
access outside of the array, why not delete that code? Surely, it'll never
run!

Even eliminating array bounds checking is an optimization that requires the
assumption that you don't go past the end of the array. Languages like Java
and Python pay a premium to ensure you don't do this on each iteration of your
for loops.

~~~
thoughtpolice
> Surely, it'll never run!

Oh, the hilarious irony of giving language-lawyery, glib responses of
"obviously, the code will not run" to users (who probably, you know, wrote
code _with the intention of it running_ ) - users who are complaining about
language lawyering optimizing compilers in the first place. It's like two
people reading the same page in a different book or something.

~~~
mikeash
Let's say I write a function that looks like:

    
    
        int ComputeStuff(int value) {
            if(value < 27) {
                long and complex computation specialized for values under 27
                return result
            } else {
                long and complex computation specialized for values 27 or more
                return result
            }
        }
    

Then I call it from somewhere else like so:

    
    
        int x = ComputeStuff(12);
    

Let's say the compiler decides this is a good candidate for inlining. Since
the programmer wrote code _with the intention of it running_ , are you saying
that the compiler should not take advantage of the fact that it knows the
exact value being passed into the function in this case and can delete half
the code knowing it will never run?

~~~
SoftwareMaven
That's is not remove code due to undefined behavior, so is an apples/oranges
comparison. If we keep your function, but the call looks like this:

    
    
        int value1, value2;
        value1 = compute_value_1()
        ComputeStuff(value2)  # oops, fat-fingered the '2'
    

Do you really think the author meant to not have ComputeStuff run? Since
value2 isn't initialized, it could be optimized out.

Yes, in this case, you would get a warning, but it is illustrative of the
kinds of things can cause optimizers to do very unexpected things to your
code. And it is surprisingly easy to find the UB conditions.

It's worth reading through this three-part post called _What Every C
Programmer Should Know About Undefined Behavior_ [1] from the LLVM folks to
see how UB can screw with you, including removing NULL checks, eliminating
overflow checks, and making debugging incredibly difficult to follow. It also
explains why they can't just generate errors while optimizing.

1\. [http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know.html)

~~~
mikeash
I don't think it is apples and oranges. Here's my next example:

    
    
        int ComputeStuff(int *value) {
            if(value == NULL) {
                long and complex computation for a NULL value
                return result
            } else {
                long and complex computation using the data pointed to by value
                return result
            }
        }
    

Then I call it from somewhere else like so:

    
    
        // NOTE: value must be non-NULL
        void DoStuff(int *value) {
            int pointedTo = *value;
            // do some work with pointedTo
            int computedResult = ComputeStuff(value);
            // do some more work with whatever
        }
    

Now, are you saying the compiler should not take advantage of the fact that it
knows value is non-NULL at this particular call site and eliminate half of the
code in this situation?

------
userbinator
That seems like a very unusual way to define a function. I'd want 'ok' to be
the return value, and the actual value returned to be via the pointer, since
that allows for

    
    
        float c;
        if(get(v, &c))
         ...do something with c...
    

instead of the more verbose

    
    
        bool ok;
        float c;
        c = get(v, &ok);
        if(ok)
         ...do something with c...

~~~
aciuix
I think it is a matter of being consistent. Both ways have certain syntactic
dis/advantages.

The first one enables you to have the function call directly in the if
statement, but requires you to define a variable beforehand.

The latter gives you the option to check the return value, pass a NULL, if you
don't need it for example, and use the return value directly.

------
exDM69
This is an interesting corner case but I'd like to see a practical piece of
code that actually causes this issue when compiled and executed. The example
code is quite contrived and compiler warnings should be raised.

Further, does the signalling NaN behavior happen with SSE (or NEON) or is this
an x87 issue?

~~~
stephencanon
The default behavior in every OS with which I'm familiar (this is specified by
IEEE-754) is for x87, SSE, VFP and NEON _not_ to trap on signaling NaNs. You
have to explicitly unmask the invalid floating-point exception in order for
this to trap. All that would happen with the default floating-point
environment is that the invalid flag would be raised in FPCR.

IIRC, FSTP st(0), to simply clear the stack without using the result as
discussed in the article, doesn't even generate #IA, so it can't trap _or_
raise invalid (it only generates #IA when the store converts to a smaller FP
type (fun fact: this is so FLD/FSTP could be used to implement memcpy way back
when))

------
panic
Is this really undefined behavior? The C spec says (6.7.8.10) "If an object
that has automatic storage duration is not initialized explicitly, its value
is indeterminate." The fact that the indeterminate value could be a signaling
NaN is a feature of floating point numbers, not C.

~~~
aciuix
The example shown in the article is in fact undefined behavior:

 _6.3.2.1,p2 If the lvalue designates an object of automatic storage duration
that could have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared with an
initializer and no assignment to it has been performed prior to use), the
behavior is undefined._

~~~
_yosefk
The funny thing is, returning it just to discard it constitutes "use",
apparently.

