
Undefined Behavior in LLVM [pdf] - ingve
http://www.cs.utah.edu/~regehr/llvm-ub.pdf
======
joosters
Whenever a story about UB comes up on hacker news, the comments usually fill
with both the predictable "Don't use C" comments and also a lot of ill-
informed ranting about how stupid all this UB is.

For the first group of commenters, please don't bother posting again, just
search HN and you can find a lifetime's supply of C love/hate arguments.

For the second group of people, I'd recommend reading two fantastic guides to
UB:

[http://blog.regehr.org/archives/213](http://blog.regehr.org/archives/213)

[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know.html)

These should be enough to explain why UB is trickier than you think...

~~~
tomp
The second guide explains some cases for undefined behavior, but it mixes up
some things (or explains them poorly)...

For example, I understand why _signed integer overflow_ and _violating type
rules_ are UB; the first allows the compiler to optimize many inequalities,
while the second enables various optimizations that rely on alias analysis.

But I can't understand why e.g. _use of an uninitialized variable_ or
_oversized shift amounts_ are UB. Unless these also enable some powerful
optimizations (the article doesn't mention any, and I can't think of any
either), it would be much better to leave them _implementation defined_ , not
_undefined_. In that case, reading an uninitialized variable could result in
any value, but it would still make the whole program valid.

Edit: should have said "platform-defined" instead of "implementation-defined".

~~~
josephlord
Surely undefined behaviour in a language specification is by definition
implementation defined?

Now the definition could vary between doing something that seems obviously
correct, crashing or even deleting every file on your disk but whatever it
does is up to the compiler implementer.

~~~
pdw
"Implementation defined" has a specific meaning in the context of the C
standard: it means that an implementation needs to define and document a
particular behavior. E.g. for GCC:
[https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html](https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html)

For undefined behavior, the standard makes no requirements at all on the
compiler.

------
jackweirdy

        int main() {
            int *p = (int*)malloc(sizeof(int));
            int *q = (int*)realloc(p, sizeof(int));
            *p = 1;
            *q = 2;
            if (p == q)
                printf("%d %d\n", *p, *q);
        }
        $ clang -O realloc.c ; ./a.out
        1 2
    

That's about as eyebrow-raising as you can get!

~~~
chrisseaton
I would have thought that the optimiser would see that p and q are aliased
after the `p == q` condition and so not use the written values after that.

But yeah aliasing that the compiler can't see (if it's not doing a stamp based
on the equality condition) I can understand why that's undefined as what else
would you want the compiler to do? Treat all memory as volatile? That would
destroy performance.

~~~
derefr
My default answer would be "disallow usage of p after the realloc." Which is
why I like Rust, I guess.

~~~
chrisseaton
But I believe the compiler has no knowledge of the semantics of these
functions to do that.

~~~
gpderetta
The behaviour of those functions is defined by the standard. The compiler is
aware that realloc invalidates its pointer parameter.

~~~
mhogomchungu
The pointer is invalidated only if the object is moved and the object is not
always moved and hence the pointer is not always invalidated.

[http://pubs.opengroup.org/onlinepubs/7908799/xsh/realloc.htm...](http://pubs.opengroup.org/onlinepubs/7908799/xsh/realloc.html)

~~~
derefr
Do note that I wasn't proposing to invalidate the memory-address handle
contained in p; but rather to _lexically_ invalidate p itself, such that it
would no longer be legal for it to occur as an rvalue until a new value was
assigned to it as an lvalue.

You know, like Rust.

~~~
shultays
In this case it is easy to see that p is no longer valid but what about other
pointers that points to the same location but not trivially visible for
compilers? It won't be possible without run time checks or memory system like
the one in rust.

~~~
lmm
> It won't be possible without run time checks or memory system like the one
> in rust.

So do that then.

~~~
shultays
run time checks are run time checks, they add overhead. (imho) dangling
pointers not worth the crippling the language like rust does.

~~~
pcwalton
I've written hundreds of thousands of lines of working code in the "crippled"
language you're referring to.

And dangling pointers result in use-after-free, which frequently results in
remote code execution vulnerabilities. Do you have a practical alternative to
eliminating those RCE vulnerabilities, or should we accept them?

To be clear, your position is an intellectually coherent one to take. But we
should be clear about its implications.

~~~
ben0x539
Isn't servo, which I assume where most of your Rust code lives by now, trading
some safety to un-"cripple" Rust by having GC-backed smart pointers that allow
for uses-after-free if they aren't used correctly?

~~~
pcwalton
Absolutely not, for several reasons:

\- The reason why we use GC pointers in Servo is that JavaScript is garbage
collected, and we can't do anything about that. If not for that, we probably
wouldn't use GC at all.

\- In fact, to prove this point, we don't use GC anywhere except the DOM. Most
of my code is layout and rendering, which is GC-free, written in the safe
language, and is very much not crippled in any way.

\- GC pointers are designed to be safe. There are some known safety holes
right now, mainly due to lack of time to plug them and wanting to balance the
time spent to fix up the current system against the time needed to add proper
GC support to Rust.

------
willvarfar
One thing I miss from this presentation is good war story examples of
unexpected or horrid things happening from UB causing the compiler to omit
critical code.

Here are some examples: [http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know_14.html)

Anyone got any more? Please, anecdotes help explain this! :D

~~~
Kristine1975
More examples in this paper with the title "Undefined Behavior: What Happened
to My Code?":
[https://homes.cs.washington.edu/~akcheung/papers/apsys12.htm...](https://homes.cs.washington.edu/~akcheung/papers/apsys12.html)

