
Undefined behavior is often a thing for C programmers (2011) - pcr910303
https://blog.llvm.org/posts/2011-05-14-what-every-c-programmer-should-know_14/
======
flohofwoe
Unpopular opinion: the problem isn't so much undefined behaviour, but rather
that compiler vendors have chosen to exploit undefined behaviour in their
optimization passes.

PS: some pragmatic cleanup of UB and IB in the C standard would be welcome
though.

~~~
shultays
The reason for having UB in the first place is so compilers can exploit them
so we can have more optimized programs.

UB is a feature, not a bug. So you don't really "clean" it up. Cleaning up of
UB means a lot of overhead for programs.

~~~
geofft
Yes, but also, UB is a "feature" to work around the fact that C is a
relatively un-expressive language with a simple type system. It's great that
the compiler can optimize

    
    
        for (i = 0; i < *len; i++)
            array[i] = 0;
    

without having to check on each iteration whether len secretly points inside
the array and got modified by the loop. It would be better if this were not a
_heuristic_ and there were clear compile-time information saying for sure that
this optimization is safe.

~~~
sharpneli
If len and array are both of the same type then the compiler must assume they
can alias (or one of them is char*). One has to explicitly state with restrict
that they cannot alias.

In case of similar but where it’s float array[N] etc then compiler can assume
that they do not alias.

~~~
geofft
Right, that's a heuristic, and the consequence of that heuristic is that a
bunch of things where you intentionally have pointers of different types to
the same memory - the sort of low-level data manipulation that C ought to be
good for - is UB unless you are careful about it and you find one of the
exceptions to this rule.

~~~
sharpneli
Not a heuristic. Well defined, even if it could be defined better. I'd
personally make all types not alias by default and then have a separate
keyword 'alias' to specify things that must alias.

One of the reasons why Fortran was faster for decades was the default aliasing
rules, restrict finally got us out of that trap. If we'd always assume
everything alias plenty of code would become way slower.

Also C has well defined ways to access memory via different types. Make an
union type and access via that. It makes it explicit.

~~~
AlotOfReading
Is it really fair to call type punning "well defined"? To my knowledge, it was
considered implementation defined from C89 all the way through C11.

~~~
sharpneli
Before C11 char* was the only way to do type punning, badly. So it was
basically forbidden, but still well defined.

Since then they specified that unions are the way to go.

------
dboreham
Why oh why does "undefined behavior in C" come up constantly here? I wrote C
every day for decades. I was aware there's such a thing as undefined behavior.
I can't remember it ever being even a miniscule factor in my daily work.
What's changed?

~~~
bluetomcat
UB is blown out of proportion by compiler vendors and language-lawyer types
concerned with portability across exotic ranges of machines, including those,
where, for example, CHAR_BIT != 8.

For the most part, avoiding UB is common sense when you grasp the basic
abstract C memory model - types of storage and their lifetime, avoiding out of
bounds array access, keeping track of the lifetime of all allocations, etc.

~~~
srtjstjsj
That's not the issue. The issue that simple bugs become global program
transformations into nonsense due to aggressive optimizations, when they
should instead be compile-time errors.

------
raphlinus
As good a time as any to pimp my own essay on the topic, from a couple years
ago:

[https://raphlinus.github.io/programming/rust/2018/08/17/unde...](https://raphlinus.github.io/programming/rust/2018/08/17/undefined-
behavior.html)

------
merricksb
If interested, see discussion at time of publication in 2011:

[https://news.ycombinator.com/item?id=2548410](https://news.ycombinator.com/item?id=2548410)

------
mijoharas
the links to part 1[0] and part 3[2] in the text are dead (at least for me).
Here they are to save other people time:

[0] [https://blog.llvm.org/posts/2011-05-13-what-every-c-
programm...](https://blog.llvm.org/posts/2011-05-13-what-every-c-programmer-
should-know/)

[1] [https://blog.llvm.org/posts/2011-05-14-what-every-c-
programm...](https://blog.llvm.org/posts/2011-05-14-what-every-c-programmer-
should-know_14/)

[2] [https://blog.llvm.org/posts/2011-05-21-what-every-c-
programm...](https://blog.llvm.org/posts/2011-05-21-what-every-c-programmer-
should-know_21/)

------
xg15
Why the title change? The original title "Why undefined behavior is often a
scary and terrible thing for C programmers" may be a bit polemic but occurs
verbatim in the article - and IMO does a better job of expressing the
article's message.

------
not2b
The example case is flagged by a number of static checkers, for example
Coverity, as a likely error (pointer is dereferenced before it is tested for
null).

------
fortran77
And yet, they managed to write the Linux kernel in it, as well as thousands of
other things we use every day.

~~~
barrkel
One of the more pernicious problems is security checks written to detect
underhanded violations of constraints being eliminated because the compiler
doesn't recognize the underhandedness is possible.

In a past life, I depended on signed overflow to detect out of bounds memory
allocations. That kind of stuff gets compiled away these days, you need to be
cleverer to detect when the user is deliberately trying to invoke UB. And
because the compiler doesn't believe in UB, there's the risk it's going to
remove your detection logic.

~~~
sharpneli
Making signed overflow to be well defined would be so much better.

IIRC C++ just decided that signed numbers are two's complement and that's
that. C could really use similar thing.

~~~
steerablesafe
> IIRC C++ just decided that signed numbers are two's complement and that's
> that.

Signed integer overflow is still undefined in C++.

~~~
sharpneli
You are correct. I misremembered, it was just a proposal that was not taken
into the spec.

~~~
steerablesafe
No, it's taken into the spec. It affects unsigned <-> signed conversions and
maybe other things. Not integer overflow though.

~~~
sharpneli
Ah. Makes sense as that’s relatively cheap to emulate by just doing conversion
when loading and storing from memory, but allows the use of HW even in case of
ones complement implementation.

