
I Do Not Know C: Short quiz on undefined behavior (2015) - waynecolvin
http://kukuruku.co/hub/programming/i-do-not-know-c
======
aidanhs
My 'favourite' bit of surprising (not undefined) behaviour I've seen recently
in the C11 spec is around infinite loops, where

void foo() { while (1) {} }

will loop forever, but

void foo(int i) { while (i) {} }

is permitted to terminate...even if i is 1:

> An iteration statement whose controlling expression is not a constant
> expression, that performs no input/output operations, does not access
> volatile objects, and performs no synchronization or atomic operations in
> its body, controlling expression, or (in the case of a for statement) its
> expression-3, may be assumed by the implementation to terminate

To make things a bit worse, llvm can incorrectly _both_ of the above terminate
-
[https://bugs.llvm.org//show_bug.cgi?id=965](https://bugs.llvm.org//show_bug.cgi?id=965).

~~~
pcvarmint
It means that empty loops (loops with empty bodies) can be completely removed
if the controlling expression has no side effects.

> This is intended to allow compiler transformations such as removal of empty
> loops even when termination cannot be proven.

It means while(i) {} can be eliminated as if i were 0, because there are no
side effects in the loop expression or the loop body, and what would be the
point of the loop if it never terminated on a non-constant expression?

As an optimization, the optimizer is allowed to eliminate it as a useless loop
with no side effects. If you really want an infinite loop, you can use while
(1) {}.

There are cases where automatically generated C code might have empty loops
which are useless.

If you really want to go to sleep, use pause() or similar. An infinite loop
eats up CPU cycles.

~~~
TorKlingberg
It's quite common in embedded systems to have the fault handler end with an
infinite loop, to give the programmer a chance to attach a debugger an inspect
the call stack. Sometimes this behavior is turned on or off with a debug flag,
which can trigger this unexpected optimization if the flag is not a compile
time constant.

~~~
boomlinde
You could always say

    
    
        if (flag) while (1);

~~~
TorKlingberg
Yes, that would make more sense. I have never seen this optimization actually
bite anyone.

------
DSMan195276
I'll be honest, I didn't find any of these to be particularly surprising. If
you've been using C and are familiar with strict-aliasing and common UB issues
I wouldn't expect any of these questions to seriously trip you up. Number 2 is
probably the one most people are unlikely to guess, but that example has also
been beaten to death so much since it started happening that I think lots of
people (Or at least, the people likely to read this) have already seen it
before.

I'd also add that there are ways to 'get around' some of these issues if
necessary - for example, gcc has a flag for disabling strict-aliasing, and a
flag for 2's complement signed-integer wrapping.

~~~
mjevans
I don't think #2 has been fully beaten to death yet.

Assuming a platform where you don't segfault (say that 'page 0' variables are
valid) and thus runtime does proceed; I still can't think of any /valid/
reason to eliminate the if that follows (focus line 2 in the comments).

Under what set of logic does being able to de-reference a pointer confer that
it's value is not 0 (which is what the test equates to)?

In my opinion that is an, often working but, incorrect optimization.

~~~
maxlybbert
C programmers expect dead code removal. Especially when the compiler also
inlines functions (and, of course, inlining makes the biggest impact on short
functions; and one way to get short functions is to have aggressive dead code
removal). And macros can expand into very weird, but valid, code; so the
statement that "nobody would ever write code like that" isn't relevant. The
compiler may well have to handle unnatural looking code.

As others have stated, compilers generally don't actually have special case
code to create unintuitive behavior if it looks like the programmer goofed.

It's possible and desirable for a compiler to remove branches of "if"
statements that it knows at compile time won't ever be true. And, of course,
one special case of statically known "if" statements are checks for NULL or
not-NULL pointers in cases where the compiler knows that a pointer will never
be NULL (e.g., it points to the stack) or will always be NULL (e.g., it was
initialized to NULL and passed to a function or macro).

So the standard allows the compiler to say "this pointer cannot be NULL at
this point because it was already dereferenced." Either the compiler is right
because the pointer couldn't be NULL, or dereferencing the pointer already
triggered undefined behavior, in which case unexpected behavior is perfectly
acceptable. Some programmers will complain because the compiler won't act
sensibly in this case, but C doesn't have any sensible option for what the
compiler should do when you dereference a NULL pointer (yes, your operating
system may give you a SEGFAULT, but the rules are written by a committee that
can't guarantee that there will _be_ an operating system).

~~~
vyodaiken
C programmers should be able to expect that "optimizations" will not transform
program meaning. And because C is so low level, certain types of optimizations
may be more difficult or impossible. If the pointer was explicitly set to
NULL, the compiler can justifiably deduce the branch will not be taken but the
deduction "if the programmer dereferenced the pointer it must not be NULL" is
not based on a sound rule. In fact, the whole concept that the compiler can
make any transformation it wants in the presence of UB is wacky. Optimization
should always be secondary to correctness.

~~~
maxlybbert
> C programmers should be able to expect that "optimizations" will not
> transform program meaning.

That's the official rule, but it's "program meaning as defined by the
standard." It's not perfect, but nobody's come up with a better alternative.
We get bugs because programmers expect some meaning that's not in the
standard. But compilers are written according to the standard, not according
to some folklore about what reasonable or experienced programmers expect.

 __*

Again, the idea isn't that the compiler found a mistake and will do its best
to make you regret it. Derefencing a pointer is a clear statement that the
programmer believes the pointer isn't NULL. The standard allows the compiler
to believe that statement. Partly because the language doesn't define what to
do if the statement is false.

~~~
nwmcsween
> But compilers are written according to the standard.

Written to the writers _interpretation_ of the standard. I bet money that
every compiler written from a text standard hasn't followed said standard. It
would be nice if a standard included code fragments used to show/test the
validity of what is stated.

~~~
adrianN
There are examples in the standard.

------
junk_disposal
Honestly, Optimizing compilers will kill C.

It killed the one thing C was good at - simplicity (you know exactly what
happens where, note I'm not saying speed, as C++ can be quite a bit faster
than C).

Now, due to language lawyering, you can't just know C and your CPU, you have
to know your compiler (and every iteration of it!). And if you slip somewhere,
your security checks blow up
([http://blog.regehr.org/archives/970](http://blog.regehr.org/archives/970)
[https://bugs.chromium.org/p/nativeclient/issues/detail?id=24...](https://bugs.chromium.org/p/nativeclient/issues/detail?id=245))
.

~~~
msbarnett
> Now, due to language lawyering, you can't just know C and your CPU, you have
> to know your compiler (and every iteration of it!).

This mythical time never existed. You _always_ had to know your compiler -- C
simply isn't well specified enough that you can accurately predict the meaning
of many constructs without reference to the implementation you're using.

It used to, if anything, be much much _worse_ , with different compilers on
different platforms behaving drastically different.

~~~
vyodaiken
This is not really correct. The kinds of implementation dependencies usually
encountered reflected processor architecture. The C standards committee and
compiler community have created a situation in which different levels of
"optimization" can change the logical behavior of the code! Truly a ridiculous
state of affairs. The standards committee has some mysterious idea I suppose,
but the compiler writers who want to do program transformation should work on
mathematica or prolog, not C.

~~~
prodigal_erik
Compiler writers have to use program transformation to do well on benchmarks.
Developers who don't prioritize benchmarks probably don't use C, and if they
do they really shouldn't, because sacrificing correctness for speed is the
only thing C is good for these days.

~~~
umanwizard
Speed isn't the only reason to use C. I often use C not because it's fast, but
because C is by far the simplest mainstream programming language. All these UB
warts notwithstanding, no language's interface comes as close as C's to a
basic, no-frills register machine.

~~~
red75prime
You don't need speed, but you want to write in a language, which is closest to
assembly? Hmm. Interesting view on what is simple.

~~~
umanwizard
Maybe we mean different things by the word "simple". What language do you
think is simpler than C?

~~~
adrianN
Scheme, ML, Java, Lua...

~~~
umanwizard
Yeah, we must have different definitions of simplicity, as I expected.

Just off the top of my head, Java has the following complexities that C lacks:
exceptions, garbage collection, class inheritance, generics, and a very large
standard library.

What definition of simplicity are you using when you say Java is simpler than
C?

------
Tharre
I don't think this Q&A format makes for a good case of not knowing C.

I mean I got all answers right without thinking about them too much, but would
I too if I had to review hundreds of lines of someone else's code? What about
if I'm tired?

It's easy to spot mistakes in isolated code pieces, especially if the question
already tells you more or less what's wrong with it. But that doesn't mean
you'll spot those mistakes in a real codebase (or even when you write such
code yourself).

~~~
moosingin3space
This is further compounded by how difficult it is to build useful abstractions
in C, meaning that much real-world C consists of common patterns, and
reviewers focus on recognizing common patterns, which increases the chances
that small things slip through code review.

Agreed that these little examples aren't too difficult, especially if you have
experience, but I certainly do not envy Linus Torvalds' job.

------
hermitdev
It's worth noting that for example #12, the assert will only fire for debug
builds (i.e. the macro NDEBUG is not defined). So, depending on how the source
is compiled, it may be able to invoke the div function with b == 0.

------
eon1
C also:
[https://news.ycombinator.com/item?id=12902304](https://news.ycombinator.com/item?id=12902304)

------
userbinator
IMHO the problem is with compilers (and their developers) who think UB really
means they can do _anything_ , when what programmers usually expect is, and
the standard even notes for one of the possible interpretations of UB,
"behaving during translation or program execution in a documented manner
characteristic of the environment".

Related reading:

[http://blog.metaobject.com/2014/04/cc-
osmartass.html](http://blog.metaobject.com/2014/04/cc-osmartass.html)

[http://blog.regehr.org/archives/1180](http://blog.regehr.org/archives/1180)
and
[https://news.ycombinator.com/item?id=8233484](https://news.ycombinator.com/item?id=8233484)

~~~
sjolsen
>the problem is with compilers (and their developers) who think UB really
means they can do anything

But that's exactly what undefined behavior means.

The _actual_ problem is that programmers are surprised-- that is, programmers'
expectations are not aligned with the actual behavior of the system. More
precisely, the misalignment is not between the actual behavior and the
specified behavior ( _any_ actual behavior is valid when the specified
behavior is undefined, by definition), but between the specified behavior and
the programmers' expectations.

In other words, the compiler is not at fault for doing surprising things in
cases where the behavior is undefined; that's the entire point of undefined
behavior. It's the language that's at fault for _specifying_ the behavior as
undefined.

In other other words, if programmers need to be able to rely on certain
behaviors, then those behaviors should be part of the specification.

~~~
wfo
In some sense the language is the compiler and the compiler is the language;
the language is much like a human language, used for its utility in expressing
things (ideas, programs). You can tell if your human language words work by
determining if people understand you. If people start being obtuse and
refusing to understand you because of an arbitrary grammar rule that isn't
really enforced, you'd be right to be upset with the people just as much as
the grammar.

It in fact doesn't matter at all what the standard says if GCC and LLVM say
something different, because you can't use the standard to generate assembly
code.

The standard doesn't have anything to say about UB, so it's the compiler's
responsibility to do the most reasonable, non-shocking thing with it possible:
if I'm a GCC developer and you ran GCC on one of these fairly mundane examples
and it compiled without error then ran rm -rf / or stole your private RSA keys
and posted them on 4chan and I said "well, you can't be mad because it's
undefined, it's the standard's fault" you'd probably punch me in the face
after some quick damage control.

If it deletes an if loop or terminates a spinlock early that's potentially
even worse than those two examples.

~~~
sjolsen
>In some sense the language is the compiler and the compiler is the language;
the language is much like a human language, used for its utility in expressing
things (ideas, programs). You can tell if your human language words work by
determining if people understand you. If people start being obtuse and
refusing to understand you because of an arbitrary grammar rule that isn't
really enforced, you'd be right to be upset with the people just as much as
the grammar.

The shortcoming of this interpretation is that programs are not (only)
consumed by humans; they're consumed by computers as well. Computers are not
at all like humans: there is no such thing as "understanding" or "obtuseness"
or even "ideas." You cannot reasonably rely on a computer program, in general,
to take arbitrary (Turing-complete!) input and do something reasonable with
it, at least not without making compromises on what constitutes "reasonable."

Along this line of thinking, the purpose of the standard is not to generate
assembly code; it's to pin down exactly what compromises the compiler is
_allowed_ to make with regards to what "reasonable" means. It happens that C
allows an implementation to eschew "reasonable" guarantees about behavior for
things like "reasonable" guarantees about performance or "reasonable" ease of
implementation.

Now, an implementation may _choose_ to provide stronger guarantees for the
benefit of its users. It may even be reasonable to expect that in many cases.
But at that point you're no longer dealing with C; you're dealing with a
derivative language and non-portable programs. I think that for a lot of
developers, this is just as bad as a compiler that takes every liberty allowed
to it by the standard. The solution, then, is not for GCC and LLVM to make
guarantees that the C language standard doesn't strictly require; the solution
is for the C language standard to require that GCC and LLVM make those
guarantees.

Of course, it doesn't even have to be the C language standard; it could be a
"Safe C" standard. The point is that if you want to simultaneously satisfy the
constraints that programs be portable and that compilers provide useful
guarantees about behavior, then you need to codify those guarantees into
_some_ standard. If you just implicitly assume that GCC is going to do
something more or less "reasonable" and blame the GCC developers when it
doesn't, neither you nor they are going to be happy.

------
sparky_
I suppose this sort of ambiguity is what drives the passion of Rust and Go
programmers.

~~~
barsonme
Sorta. I write mostly Go (some JS, PHP) and I got 6/10, forgetting mostly
stupid stuff like passing (-INT_MIN, -1) to #12.

But some of those are prevalent in Go. For example, 1.0 / 1e-309 is +Inf in
Go, just as it is in C—it's IEEE 754 rules. int might not always be able to
hold the size of an object in Go, just like C. In Go #6 wraps around and is an
infinite loop, just like C.

The questions that don't, in some way, translate to Go are #2, #7, #8, and
#10.

But, to your credit, I do like how Go has very limited UB (basically race
conditions + some uses of the unsafe package) and works pretty much how you'd
expect it to work.

------
federicoponzi
Before: What? I know C. After 3 questions: Ok, I don't know C. Well played
sir.

------
E6300
1\. Unless C's variable definition rules are completely different from C++'s,
int i; is a full definition, not a declaration. If both definitions appear at
the same scope (e.g. global), this will cause either a compiler error or a
linker error. A variable declaration would be extern int i;

~~~
khedoros1
C's variable definition rules are different from C++'s. gcc happily compiles
those two lines, g++ exits with the "redefinition" error.

~~~
E6300
That was unexpected.

~~~
shabbyrobe
Said every C programmer ever!

------
brianmurphy
As a former C programmer, you know not to fool around at the max bounds of a
type. That avoids all of the integer overflow/underflow conditions. When in
doubt, you just throw a long or unsigned on there for insurance. :)

------
Kenji
I'm sorry, but the answer this website gives to 1. is wrong. See for yourself:

    
    
      int i;
      int i = 10;
      
      int main(int argc, char* argv[]){
      	return 0;
      }
      

Try to compile it. It doesn't work (gcc.exe (GCC) 5.3.0), the error is:

    
    
      a.cc:2:5: error: redefinition of 'int i'
       int i = 10;
           ^
      a.cc:1:5: note: 'int i' previously declared here
       int i;
           ^
    

Either I misunderstood the author and this example, or I do know C.

~~~
mauricioc
Judging by the .cc extension, you are compiling this with a C++ compiler.
Quoting from Annex C (which documents the incompatibilities between C++ and
ISO C) of the C++ standard:

    
    
       Change: C++ does not have “tentative definitions” as in C E.g., at
       file scope,
    
       int i;
       int i;
    
       is valid in C, invalid in C++. This makes it impossible to define
       mutually referential file-local static objects, if initializers are
       restricted to the syntactic forms of C. For example,
    
       struct X { int i; struct X *next; };
       static struct X a;
       static struct X b = { 0, &a };
       static struct X a = { 1, &b };
    
       Rationale: This avoids having different initialization rules for
       fundamental types and user-defined types.
       
       Effect on original feature: Deletion of semantically well-defined
       feature.
    
       Difficulty of converting: Semantic transformation.
    
       Rationale: In C++, the initializer for one of a set of
       mutually-referential file-local static objects must invoke a
       function call to achieve the initialization.
    
       How widely used: Seldom.

~~~
Kenji
_facepalm_ of course, even if I use gcc, if I compile a.cc it switches to the
c++ compiler. Thanks.

------
nightcracker
I got every single one right. Does that mean I know C through and through?
Perhaps. But all of these are the 'default' FAQ pitfalls of C, not the really
tricky stuff.

------
AndyKelley
I made this post as a response. Disclaimer: yet another programming language
trying to dethrone C. People seem to be less enthusiastic about the subject
these days.

[http://andrewkelley.me/post/zig-already-more-knowable-
than-c...](http://andrewkelley.me/post/zig-already-more-knowable-than-c.html)

------
kvakkefly
Anyone who enjoys this will also enjoy
[http://cppquiz.org](http://cppquiz.org)

------
Hydraulix989
I feel bad because I'm smart enough to answer these questions correctly in a
quiz format __, but if I saw any of them in production code, I would not even
think twice about it.

 __(the quiz questions themselves lead you on, plus I read the MIT paper on
undefined behavior that was posted on here back in 2013)

------
rdc12
Isn't this line from #3, undefined behavior not mentioned in the article
(sequence point violation)

 _zp++ =_ xp + *yp;

~~~
msbarnett
That's not a sequence point violation. The C standard makes it clear that zp
gets xp + *yp prior to the increment. Quoting 6.5.2.4

> The result of the postfix ++ operator is the value of the operand. After the
> result is obtained, the value of the operand is incremented. (That is, the
> value 1 of the appropriate type is added to it.) See the discussions of
> additive operators and compound assignment for information on constraints,
> types, and conversions and the effects of operations on pointers. The side
> effect of updating the stored value of the operand shall occur between the
> previous and the next sequence point.

The last sentence is key.

------
wmu
#4 is not really language issue, rather a floating point numbers feature.

------
raarts
(2015)

