
C compiler undefined behavior: calls never-called function - georgecmu
https://gcc.godbolt.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%22labels%22%3Atrue%2C%22directives%22%3Atrue%2C%22commentOnly%22%3Atrue%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22MQSwdgxgNgrgJgUwAQB4IGcAucogEYB8AUEZgJ4AOCiAZkuJkgBQBUAYjJJiAPZgCUTfgG4SWAIbcISDl15gkAER6iiEqfTCMAogCdx6BAEEoUIUgDeRJEl0JMMXQvRksCALZMARLvdIAtLp0APReIkQAviQAbjwgcEgAcgjRCLoAwuKm1OZWNspIALxIegbGpsI2kSQMSO7i4LnWtvaOCspCohFAA%3D%3D%22%2C%22compiler%22%3A%22%2Fopt%2Fclang%2Bllvm-3.4.1-x86_64-unknown-ubuntu12.04%2Fbin%2Fclang%2B%2B%22%2C%22options%22%3A%22-Os%20-std%3Dc%2B%2B11%20-Wall%22%7D%5D%7D
======
titzer
This is a technically correct compilation of the given C program. All
executions of a program containing undefined behavior are invalid. That means,
not just the part of execution "after" the undefined behavior, but the part of
execution "before" as well.

Since the program unconditionally calls an uninitialized (actually zero-
initialized) function pointer, that is undefined behavior. It therefore has no
valid executions. The compiler is legally allowed to generate whatever code it
wants, and that is considered valid according to the standard. It just so
happens in this case that it probably generated no code at all for main,
falling through to code that it did generate. (and my guess as to why it did
generate code for NeverCalled() is because it's non-static and could be called
by functions outside this compilation unit; NeverCalled keeps EraseAll()
alive.)

edit: the above is only true if the compiler knows main is the entrypoint to
the program.

~~~
antientropic
> It therefore has no valid executions. The compiler is legally allowed to
> generate whatever code it wants

If the compiler concludes that the program has no valid executions, then why
not just generate an error message at compile time?

I wonder in what other engineering disciplines this kind of attitude (i.e.
silently punishing users for mistakes by generating arbitrarily misbehaving
code) is considered acceptable. It seems unethical to me. Perhaps compiler
writers should think more about what constitutes sound engineering, rather
than what happens to be permitted by The Standard.

~~~
tom_mellior
> If the compiler concludes that the program has no valid executions, then why
> not just generate an error message at compile time?

I agree that that would be nice to see in this case, and it seems like this
particular case would be low-hanging fruit, but it's not that simple. In
particular, the compiler in this example most likely does not reason globally;
it never actually decides that "the program has no valid executions". It does
a series of local transformations, presumably based on facts of the form "
_if_ we can reach this point, _then_ certain values are invalid".

Another thing about such error messages would be that they would push
compilers more in the direction of verification tools, _which they aren 't,
and cannot be_.

Consider:

    
    
        int f() {
          return 1 / 0;
        }
        
        int g() {
          int a = 0;
          return 1 / a;
        }
    

Clang warns about the first function, but not about the second one. If it
warned about the second one, I could write a third function that always
divided by zero, but that it could not warn about. If it learned to do that, I
would find a fourth one, etc. This is an arms race that compilers cannot win,
and they shouldn't try. You should use separate verification tools for that,
which do the opposite of compiler warnings: They don't reject the program if
it's trivially buggy, they reject it _unless_ you prove that it is _not_
buggy.

If compilers had more such warnings, programmers would be tempted to say
"look, there is no warning, the code must be fine". That's the whole "if it
compiles, it's correct" approach that gives some gullible people a false sense
of security in other languages.

~~~
mannykannot
I certainly hope that there are not many programmers thinking "if it compiles,
it's correct" after having made their first logic error, though I realize some
Haskell programmers make statements that are close to that.

Clang and gcc will warn you about some of these issues if you give them the
right flags (so much so that -Wall no longer means literally all warnings,
IIRC), but then you run into static analysis' problem of burying you in false
positives for conditional / data-dependent undefined behavior.

I have found it instructive to play with Dafny, one of the languages where the
compiler does require you to prove that a variety of bugs will not occur, but
'play' is the operative word here. It is not a feasible option for C.

C has always been a language for (among other things) creating optimized code,
but the implication of that has changed. Originally, it meant / required being
'close to the metal', but since then, effective automated optimization has
come along, and the metal itself has become considerably more complicated
(partly as a consequence of more capable optimization). As C programmers, we
should understand that these changes have occurred, and what they mean for
programming in C - or not use the optimizer.

~~~
tome
> I certainly hope that there are not many programmers thinking "if it
> compiles, it's correct" after having made their first logic error, though I
> realize some Haskell programmers make statements that are close to that.

No, the Haskell slogan is "If it compiles it works". "It works" is something
quite different to "it is correct".

------
sddfd
So, the compiler finds out that the function pointer Do is either undefined or
points to EraseAll, and since calling an undefined function is undefined
behavior, the compiler assumes that 'Do' always points to EraseAll?

Is this what is happening?

While this is certainly unexpected, I can't fault the compiler here. The
example emphasizes the need for tools to check for undefined behavior.

------
dzdt
The undefined behavior is that the static variable "Do" (a function pointer)
is used without initialization. The compiler optimization is that since, in
the whole program, the only value to ever be assigned to "Do" is the function
"EraseAll", the compiler statically initializes "Do" to "EraseAll". Doesn't
seem too crazy.

~~~
arximboldi
Actually the optimization is not statically initializing Do, it is inlining
the function calls via the Do pointer. I actually think this a quite useful
optimization.

You can see that if you "fix" the program by checking Do before calling it, it
still inlines the call in the True branch:

[https://godbolt.org/g/9KS2r9](https://godbolt.org/g/9KS2r9)

~~~
_pmf_
Of course, in some cases NULL checks will be optimized out (this is why GCC
has acquired the -fno-delete-null-pointer-checks flag ...). At one point, this
has been a huge source of conflict: basically, when you have something like

if (NULL != s) { s->a = 1; }

s->b = 1;

then GCC may optimize away the NULL check because referring to s->b after the
block protected by the NULL check indicates that the NULL check is
superfluous. I this case, this is actually true, but in cases that involve
aliasing that is not detected by dataflow analysis, there might be
circumstances where the NULL check is required. Discussion of something
related is here[0]; my example might not be very good.

An even more insidious case is [1](strong language), where manual overflow
checks are optimized out because the overflow technically invokes UB.

Compiler optimization gone wild.

[0]
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71892](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71892)
[1]
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475)

~~~
pxndx
Your example also presents undefined behaviour, so the compiler is free to do
whatever it wants.

~~~
kstenerud
That's like saying "You used the power drill incorrectly, so it's free to
explode and burn your house down."

At some point safety matters, and the more likely human mistakes matter.

------
MichaelBurge
If you rename "NeverCalled" to "Initialize" and "main" to
"ExecuteAfterInitialization", the behavior is much less mysterious.

NeverCalled is a "public" function since it isn't marked static, so the author
of this code intended for it to be called externally to initialize Do.

The compiler's optimization produces a correct program along the only valid
execution path: Compile this as an object file, and link it with some other
code that calls NeverCalled and then main.

If you mark NeverCalled as static, the compiler outputs an explicit "Undefined
Instruction", because there are then no execution paths that don't invoke
undefined behavior.

~~~
HelloNurse
No, there is no valid execution path because of undefined behaviour. It
doesn't cease to be undefined because of imaginary additional code.

Compilers are not in the business of creatively guessing what "the author of
this code intended".

~~~
MichaelBurge
It's not creative or guessing. If you write a function without "static" it
means you intended for it to be called externally.

It's not theoretical at all: Before I saw this thread, I was in the middle of
writing some Haskell that loads some non-main symbols from an object file
derived from an executable's source code, and invokes them. People carefully
mark which functions are or are not static to ensure their executables don't
get bloated. I'd be very annoyed if my C compiler didn't take symbol
visibility into account when optimizing, because I depend on it all the time.

~~~
HelloNurse
If I write a function without "static" it means I want to _allow_ functions in
other modules module to call it, it isn't implied that it will be actually
called.

Real life analogy: declaring a static function is like locking your car
because you don't want it stolen: only hackers can access it through forbidden
means. If instead you leave your nonstatic car parked in the street, open and
with keys in the ignition, you are making it easy to steal (probably you want
it to be stolen, there might be an UB corpse in the trunk you want to get rid
of), but you cannot be sure that a thief will actually come along and take it
away; you are just advertising that they should.

------
exikyut
So, is this on the underhanded C shortlist yet?

I wonder how many static analysis tools will pick up that the function will be
run.

And then I start wondering if the static analysis tool "scene" realizes how
important their work is for security...

~~~
junke
> the function will be run.

You can't be sure, that's just what one compiler does today. Static analysis
tools tend to refer to the standard.

    
    
        $ tis-analyzer foo.c -val
    
        /tmp/foo.c:16:[kernel] warning: Function pointer and pointed function have incompatible types.
                      Pointer type: int ()
                      Non-function: {0}
                      assert \valid_function(Do);
        ....
        /tmp/foo.c:16:[value] Assertion 'Value,function_pointer' got final status invalid.
    

In other words, the pointer is not initialized.

~~~
gambiting
What worries me is that almost every other compiler supported by godbolt would
wipe your drive. It's not an anomaly of Clang - it's something fundamentally
wrong in the design of compilers that allows them to do this at right
optimization levels.

Which means that if I have a method that wipes my entire production database
but never ever call it, it might still get called because of mistake like this
one where someone forgot to initialize a function pointer.

~~~
phkahler
>> Which means that if I have a method that wipes my entire production
database but never ever call it, it might still get called because of mistake
like this one where someone forgot to initialize a function pointer.

Do you actually have a method like that? Somehow I doubt it. Also, the example
is completely contrived. If you have an uninitialized function pointer it's
going to be pretty uncommon that there's only one possible function for the
compiler to conclude is the right one. Also, you should not be testing on a
production system.

On a note related to your example, many automotive ECUs have the ability to
re-flash over the vehicle network. In many cases the OEM requires that they do
not contain any code to erase or program their own flash memory. This is for
the reason you cite - if somehow (for whatever reason) that code were to
execute it may brick that device. This then requires that the bootloader
download a couple functions to RAM prior to updating its own code - those
functions are the ones that erase and write flash. This is also convenient for
some micro controllers that can't modify flash while code is being executed
from it.

------
narrator
I just read the C++ Primer because I wanted to hack on some C++ code and the
number of things you can do in C or C++ that result in "undefined behavior" is
annoyingly large. There are some functions where they could have added a
bounds check to the implementation, but the whole philosophy of the language
seems to be that when choosing between lack of bult-in checks that would lead
to easy to make errors that would cause undefined behavior and small speed
increases on microbenchmarks the language designers choose the latter.

~~~
maxlybbert
For what it's worth, in _Design and Evolution of C++_ , Stroustrup says that
when he was designing features, he would often think of a simple rule that he
couldn't enforce (e.g., "couldn't enforce" being "if you violate this rule,
the results are undefined") and a complex rule that he could enforce. He
generally tried to pick the simpler, easier to teach rule. Unfortunately, that
means there's more undefined behavior if those simpler rules are violated.

------
jbb67
This might well be valid according to the standard, but I find it very
depressing how many people consider it reasonable behavior.

~~~
guipsp
This is a very specific case, if there is any other function that sets the
pointer, the behavior is what you'd expect.
[https://godbolt.org/g/C3SYXt](https://godbolt.org/g/C3SYXt)

~~~
Ded7xSEoPKYNsDd
It might be what you expect for that particular compiler, but there's no
guarantee. A different compiler might decide that because `Do` has only two
possible values, that it should replace the dynamic function call with an if
condition deciding between two static function calls like I did here manually,
to trigger the bug again:
[https://godbolt.org/g/kWcvvb](https://godbolt.org/g/kWcvvb)

------
ben0x539
This is a really cool, graphic example of the importance of avoiding undefined
behavior even if you can't see it segfault or whatever.

------
alkonaut
Do compilers emit warnings here (do they usually do that when there is a known
UB?)

Here it's clear that "NeverCalled" is never called so it's clear that Do is
never assigned, so the Call can't produce anything useful.

EDIT: Ugh missed the lack of "static" on the NeverCalled. Never mind.

Is this too hard to decide with certainty in C because of how you could do
some pointer gymnastics and just fiddle with the field without the compiler
realizing, so in order not to spit out false positives, compilers just have to
assume the author knows what he's doing and allow the suspected UB without
warning?

I'm just used to higher level "safe" languages (Rust, C#, etc), where lack of
assignment is usually determined with certainty.

~~~
culturedsystems
"Here it's clear that "NeverCalled" is never called so it's clear that Do is
never assigned."

I don't think that is clear. If the given file is linked with another file
that initializes a static variable with a function that calls NeverCalled, the
initialization will be done before main is called, and so NeverCalled _will_
be called. The compiler can't tell from just analysing the one file shown here
whether or not NeverCalled is called.

~~~
alkonaut
> I don't think that is clear

Right. I missed that the uncalled function was public. So the issue is not
clear until you link I suppose, so only detected "globally" whether the
function is called. is it fair to say that that kind of global analysis is
rare in the compiler+linker toolchain world for C, and usually left to static
analyzers instead?

> The compiler can't tell from just analysing the one file shown here whether
> or not NeverCalled is called.

Right, and making it static you get the -Wunused-function. Great. This warning
is about the NeverCalled function being unused though, and not about Do being
unassigned before it's called. Curious, is there any way to get a warning for
this? This is UB, correct? And the assignment is now known not to happen?

    
    
        typedef int (*Function)();
        static Function Do;
        void main() { Do(); }

------
Piskvorrr
Still better than having demons fly out of your nose.

------
jhasse
There's only one value "Do" will ever have. He might as well set it from the
beginning ;)

~~~
caf
The only value it will ever have is NULL (but calling through a NULL function
pointer is UB so it can instead call system() :)

~~~
jhasse
It's uninitialized, not NULL.

edit: I was wrong, see below.

~~~
anon1385
C11 6.7.9 (10):

If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static or thread
storage duration is not initialized explicitly, then:

— if it has pointer type, it is initialized to a null pointer;

~~~
sigjuice
How is the C11 standard applicable to a C++ program?

~~~
anon1385
I didn't notice it was C++. In C++14:

From 3.6.2:

>Variables with static storage duration (3.7.1) or thread storage duration
(3.7.2) shall be zero-initialized (8.5) before any other initialization takes
place

and from 8.5:

>To zero-initialize an object or reference of type T means

>\--if T is a scalar type (3.9), the object is initialized to the value
obtained by converting the integer literal 0 (zero) to T

and from 3.9:

>Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member
types (3.9.2), std::nullptr_- t, and cv-qualified versions of these types
(3.9.3) are collectively called scalar types

------
Davidbrcz
On that subject: CppCon 2016: Michael Spencer “My Little Optimizer: Undefined
Behavior is Magic"
([https://www.youtube.com/watch?v=g7entxbQOCc](https://www.youtube.com/watch?v=g7entxbQOCc))

------
Upvoter33
Compilers (and thus, compiler writers, i.e., people) have gone off the deep
end with C. In search of endless over-optimization (don't worry, they will
tell you that this optimization is "important" because some stupid benchmark
runs 21% faster), programmer intent (as well as mistakes) are turned into
bizarro behavior. Why we accept this is beyond me. For C, in particular, it
has made writing code that uses plain old C ints way more challenging than it
needs to be, among other things. We need to rethink what is important: a
little more performance, or representing (to our best possible guess) what the
programmer actually intended. </rant>

~~~
maxlybbert
> We need to rethink what is important: a little more performance, or
> representing (to our best possible guess) what the programmer actually
> intended.

I remember reading an old book about implementing a compiler. The book
actually recommended that compilers should make an effort to fix simple
mistakes; e.g., compilers should add semicolons if needed, check whether an
undefined variable or function was actually a misspelled variable or function,
etc. I wasn't sure whether I should laugh at the naivety or shudder about what
kind of code that would have led to. Especially given that each compiler would
be lenient in different ways from competing compilers.

~~~
umanwizard
This is basically what happened in real life with HTML, with the results you
predict.

------
caf
icc does this as well on the same example.

~~~
HelloNurse
GCC is the most fun, jmp straight into the uninitialized function pointer.

