
A glimpse of undefined behavior in C - shawndumas
http://blog.chris-cole.net/2013/11/30/a-glimpse-of-undefined-behavior-in-c/
======
sz4kerto
Knowing how exactly sequence points work might be an advanced topic, but I am
not sure I'd hire a junior C/C++ programmer if he has not heard of sequence
points. If somebody has, then he clearly won't do something like this code. I
am not 100% sure in how they work either (haven't programmed C for 4 years),
so I just avoid these kind of unnecessary complications in the code, if not
for myself, then for the guy who comes after me, and who, if he has a bad day,
may inadvertently cause some trouble.

Really, that's one of the reason why Java still works well at big companies.
The code is usually so verbose and 'un-smart' that it's quite hard for
programmers with minority complex to obfuscate it when they're trying to prove
that they're smart. It's a shame however that if the language is primitive
then you can always create a big framework to obfuscate stuff. :)

~~~
colanderman
_I am not 100% sure in how they work either (haven 't programmed C for 4
years), so I just avoid these kind of unnecessary complications in the code_

Hear, hear. I am considered the "language lawyer" of my embedded group and
often get asked questions about C minutiæ. It isn't uncommon that my answer is
"I don't know how that works, because I would never write something that
requires an answer to that".

Modern compilers are blind to shorthands etc.; the only useful knowledge about
"C" (really C compiler & hardware) minutiæ relates to (a) how to convince the
compiler to optimize certain high-level constructs (e.g. loop unrolling), (b)
how to convince the compiler to emit certain low-level constructs (e.g. SIMD
instructions), and (c) how the hardware behaves (e.g. the cache model).
Generally anything else – you can rewrite it so you _don 't_ have to think too
hard about how C works.

EDIT: Understanding type promotion is an exception to this. The integer type
hierarchy is unfortunately (a) deeply baked into C and (b) mostly brain-dead –
C mostly conflates physical integer width with modular arithmetic and provides
no non-modular integer types, often leading to subtle software bugs (see last
week's story about binary search).

EDIT: So is understanding const-ness. At least you can ignore this if you
don't get it (or if C doesn't, as is the case with certain nested const
types).

~~~
sz4kerto
Const-ness is really different. It's a concept I really miss from other
languages as it _adds_ more semantic information to the code. It does not
obscure, it clarifies. It's a way to give orders instead of giving
recommendations.

~~~
oulu2006
I really liked const-ness as well; in functional languages const-ness comes
for free as all state is by default immutable -- one the reasons I jumped to
Scala.

------
thisisnotatest
Aside: this test code would have been a bit easier to debug if it had used
decimal bases. That is, replace

    
    
      int a[] = {10,20,30};
      int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];
    

with

    
    
      int a[] = {1,2,3};
      int r = 1 * a[i++] + 10 * a[i++] + 100 * a[i++];
    

Then if your program outputs r = 111, it's obvious that it's doing

    
    
      a[0] + 10*a[0] + 100*a[0]
    

and if it outputs 321, it's obvious that it's doing

    
    
      a[0] + 10*a[1] + 100*a[2]
    

No disassembly required.

~~~
Perseids
Alternatively just think a moment longer and realize that the lower limit of
each value (10) already produces 60 and thus each must must be 10.

------
natch
What an annoying post. Talks like he knows a lot, when he doesn't know this
most basic of basic points of how C works. It's understandable to not know
everything, even to have the odd hole in your basics like this. But the know-
it-all voice (and for such a basic topic) "I explained operator precedence..."
is just going to make you someone nobody wants to work with. This is not
undefined behavior, it's one of the first things you learn about C after
encountering the pre/postfix operators IF you pay attention to the details,
which, with C, you should know to do.

~~~
clarry
> This is not undefined behavior, it's one of the first things you learn about
> C after encountering the pre/postfix operators IF you pay attention to the
> details, which, with C, you should know to do.

What exactly is not undefined behavior? The post describes a statement where a
variable is modified multiple times between sequence points. What does the
standard say about this?

N1256, 6.5 says

    
    
      Between the previous and next sequence point an object
      shall have its stored value modified at most once by the
      valuation of an expression.
    

J.2 Undefined behavior

    
    
      The behavior is undefined in the following circumstances:
      [..]
      * Between two sequence points, an object is modified more
        than once, or is modified and the prior value is read
        other than to determine the value to be stored (6.5).
    

What's with the know-it-all voice?

------
TorKlingberg
Yes, don't change the same variable in multiple places in the same statement
in C.

In some cases the behaviors is well defined, if there are sequence points
between the assignment, but I would rather keep my code simple.

------
akairobotto
Sequence points are an interesting topic on their own.

[http://stackoverflow.com/a/4176333](http://stackoverflow.com/a/4176333)

~~~
pmr_
I think they are the only thing relevant in the article. The introductory
example is just one example of how not knowing about sequence points leads to
bugs. There are many more and it makes for a much more interesting read to
first learn about sequence points and then applying that knowledge to various
examples.

This article goes about it the wrong way.

------
micro-ram
The way I learned that in C a long time ago was "post increment" was "post
statement increment".

~~~
ballard
There was a popular myth way back in the 90's that preincrement was always
faster than postincrement because the former could generate a temporary. As if
this were the case:

    
    
        /* x++ */
        inline int postincrement(int *x) {
          int temp = *x;
          *x = *x + 1;
          return temp;
        }
    
        /* ++x */
        inline int preincrement(int *x) {
          *x = *x + 1;
          return *x;
        }
    
    

But in typical usage, where the expression value is not used, it doesn't make
any difference:

    
    
        for (int i = 0; i < 10; ++i) {
    
    

If one looks at the assembly, it's clearly equivalent to:

    
    
        for (int i = 0; i < 10; i++) {
    
    

Incidently, the latter seems more readable.

~~~
ezy
I always thought this "myth" was confined to C++ code (iterators) where there
may in fact be code that does something analogous to what your
"postincrement()" does.

Compilers are sometimes smart enough to remove the unnecessary operation in
C++ (e.g. switch to ++x themselves), but I always use ++i for the same reason
people simplify their usage of C in between sequence points --- it's an easy
transformation, and why tempt fate?

~~~
nly
The compiler can't necessarily optimise i++ away in C++, because the
++operator or the destructor of the temporary iterator object returned may
have side effects. It's still not _quite_ equivalent to ballards
postincrement() though, because the C++ standard allows for that temporary
copy operation to be optimised away

------
RamiK
Why? What reason could there possibly be that such well known, gaping holes in
the standard have persisted for all these years and multiple standard
revisions? C90, C99, C11 and now C14... If performance or backward
compatibility is a concern, surely an optional macro like __STDC_STRICT__
could be suggested in the standard with clear expectations that could be
relied on.

~~~
dreish
That's a reasonable question. It's a shame this comment gets downvoted instead
of the various people falling all over themselves to show off how smart they
are by posting incorrect guesses about how undefined behavior doesn't matter.

Here's a good article that addresses the question "Why have undefined
behavior?"

[http://blog.regehr.org/archives/213](http://blog.regehr.org/archives/213)

This one linking to the above is also worth reading:

[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know.html)

~~~
to3m
People seem to be forgetting that an implementation includes an execution
environment as well as just a compiler, and that a perfectly valid (and, to my
mind, very much recommended) option for undefined behaviour is to behave in
some documented manner characteristic of the platform. I don't see this "no
obligation" stuff as a great idea, not when it results in so much surprising
behaviour - and especially not for simple situations where there's something
obvious that the compiler could do, that would surprise absolutely nobody
familiar with the system in question.

I always imagined undefined behaviour was there to avoid tying
implementations' hands, by having the standard not mandate things that vary in
practice. But it seems that people are assuming it's there to give compiler
writers carte blanche to do whatever they like, and then point at the standard
as justification. Given how much stuff could potentially be added to C to
improve it, I don't know why people are spending all this time trying to
figure out all the ways in which the letter of the law allows them to confuse
the programmer.

See also somebody else's rant about strict aliasing:
[http://robertoconcerto.blogspot.co.uk/2010/10/strict-
aliasin...](http://robertoconcerto.blogspot.co.uk/2010/10/strict-
aliasing.html)

------
EpicEng
An entire blog post focused on a daily recurring SO question. Too bad those
people won't know to search for "sequence point".

------
randv
I observed this when I was back in school using a simpler expression i = 0; i
= i++;

the answer will be different in different compilers. I use this as an
interview question ever since, not to get the right answer but to understand
the candidates thought process in solving the problem and his/her
understanding of operator precedence :-)

------
SilasX
Lesson I took away:

The ++ operator should be deprecated except when it's the only operation on
that line.

~~~
deletes
I think users of for/while loops will disagree. Don't forget that =, += , ...
can all be abused in the same way.

~~~
SilasX

        (int i=0;i<10;i++)

would count as three lines for purposes of that rule, and the

    
    
        i++

as one. I could have been more precise, but I thought it would be understood.

The point is, to be on the safe side, don't use ++/\-- in any line (in the
sense of ;{}-delimited statement) that is also doing something else.

>Don't forget that =, += , ... can all be abused in the same way.

I didn't, and people should use the same safeguards around them, i.e. don't
mix them within lines that do other things, "cleverness" or "C golf" be
damned.

------
forgottenpaswrd
One of the things we train any newcomer to our company is to never do what
this man does:a complex single line with expected behavior based on your
(arbitrary) conventions.

Yo always do multiple lines, with comments on each one.

It sounds ridiculous, but this simple thing made some kind of bugs impossible:
those that you have in front of you but you can't see in a million years. The
atomic operation in code is the line, you can't debug a complex line(1).

It is painful forcing people to do that, people use to hate being told what to
do, but at the same time they love the outcome so much. In the end everybody
loves it.

1.With assembly you are debugging an instance of your code. You are not
debugging what will be created with any compiler, any os or architecture.

~~~
stinos
While I agree the code presented is just, plain and simple, bad, and the first
way to the solution is splitting it up, I really don't get why you want
comments for every line? Or don't you mean literally every line maybe?

I mean suppose I split the first part of the statement like this:

    
    
      int r = a[ i ];
      ++i;
    

These lines are self-explanatory. It's pretty hard to argue they need a
comment? Especially when used in a function that already should have a
name/comment explaining what it does?

------
ChuckMcM
Interesting, my first guess would have been '60' since the old C89 / K&R C had
the behavior

    
    
       (prefix ops) => ( statement ) =>(postfix ops) 
    

Which made code like

    
    
       *a++ = *b--; 
    

Change the pointers after the copy as opposed to having them change before the
copy.

It is interesting that this has become compiler defined.

~~~
EpicEng
It hasn't. You're not modifying a single variable twice within a single
sequence point in that example.

~~~
tsahyt
That's really all there is to it, isn't it? I've never understood why some
people feel the urge to modify the same variable multiple times in the same
statement. It doesn't help readability, it's harder to reason about and worst
of all leads you directly into the land of undefined behavior, where
everything that can go wrong, will go wrong.

~~~
EpicEng
Yeah, the need to shove as much code as possible into one line seems to go
away with experience. Once you have to debug and maintain that crap you
realize very quickly that one liners are not necessarily something to strive
for.

~~~
tsahyt
I used to strive for clever solutions. The shorter the better. I still like to
look at them but I want them as far away as possible from my code base. Sure,
if a solution is clever and efficient, while still readable, maintainable and
easy to understand when documented properly, I'd use it. However, my general
rule of thumb these days is: "Write this code, so an utter idiot can read it".
Because that's what's going to happen: One day I'll have to read it again, and
I'll feel like an idiot, trying to understand what once went through my brain.

That's something that took me way too long to learn and I feel that novice
programmers should embrace that more. We all like to feel like Einstein
sometimes, but a big code base is not the place for solutions only we can
understand... temporarily.

------
RBerenguel
I think I stumped into this once, but didn't have the time atm to check what
the problem was... I simply rewrote the code to remove the infixes. The good
thing with C is that you can decide not to shoot yourself in the foot

------
mmariani
I just ran this after compiling it with clang-500.2.79 based on LLVM 3.3svn,
and the output was 140. Maybe GCC is to blame?

~~~
hbar
It's undefined in the spec. So GCC and clang are both correct.

------
brihat
Isn't this actually called "implementation-defined" behaviour in the std
rather than "undefined" behaviour? I generally get a segfault for undefined
behaviors

~~~
twoodfin
It's undefined. That means the compiler and runtime can choose to segfault, do
something that looks right, silently corrupt your program, launch the nuclear
missiles, whatever. And there's no requirement that this behavior be
repeatable, so the code can work with debug flags turned on but fail in a
release build.

You can never rely on your compiler's implementation of undefined behavior.

~~~
ygra
> You can never rely on your compiler's implementation of undefined behavior.

I wish I could somehow get my coworker to understand this. He's mostly "Yeah,
well, I know it's undefined, but there's really no way this could be anything
else than <foo>. I know what the CPU does there."

------
goldenkey
Not "well-defined" is the proper terminology.

~~~
pmr_
No. You might be right if we were talking about "proper use" in English as a
spoken language, but "undefined behavior" is a technical term defined by the C
and C++ ISO standards.

~~~
goldenkey
I see, okay did not know "undefined behavior" was part of ISO. It's pretty
unfortunate that I have to pay $30 to see the ISO standard...

[http://isocpp.org/std/the-standard](http://isocpp.org/std/the-standard)

