
The Value of Undefined Behavior - ingve
https://nullprogram.com/blog/2018/07/20/
======
arcticbull
Its undefined because it's not guaranteed to happen, that's the whole point,
and that's what makes it undesirable. Rust has it right here: don't allow
undefined behavior and make the programmer specify the behavior they want.

If they want wrapping overflow, fine. If they don't, great. Tell the compiler
don't make it guess/assume/optimize/hope. What this article is arguing for IMO
is not the concept of behavior that happens to sometimes work and sometimes
not depending on the target machine (because thats crazy) but the
functionality itself, which is valuable, and should be explicit and defined.
The issue is the last-generation languages like C/C++ don't give you
mechanisms to express what you want so you're left hoping the compiler figures
out what you meant. That's not a world I want to live in anymore, we can do
better.

UB is bad, period. Define what you want, make the compiler do it.

~~~
cperciva
_Rust has it right here: don 't allow undefined behavior and make the
programmer specify the behavior they want.

If they want wrapping overflow, fine. If they don't, great._

What if I know overflow can't happen (e.g. I have an integer counting from 1
to 10)? How do I say "it doesn't matter how you handle overflow; just do
whatever is fastest"?

~~~
steveklabnik
The parent spoke a bit too broadly. Here’s the rules:

* overflow is a “program error”, not UB

* compilers must either check for overflow and panic, or two’s compliment wrap

* when debug_assertions are on, implementations are required to panic on overflow

This means release mode wraps today, but if it’s cheap enough someday, we
could make it panic.

If you want specific semantics on overflow, then you should use the various
wrappers and/or methods that let you do them directly, rather than relying on
any of the above. But you don’t _have_ to specify; by default, the above is
the semantics.

~~~
Paul-ish
What if rust got "unchecked" (unsafe) versions of arithmetic operations that
have C arithmetic semantics. Kind of the opposite of "checked_add".

~~~
steveklabnik
It’s possible! Nobody has ever proposed it...

------
samatman
This is a fine opportunity to link to one of my favorite essays on the
subject, Undefined Intimacy With the Machine:
[http://thoughtmesh.net/publish/367.php](http://thoughtmesh.net/publish/367.php)

This argues that it is precisely C's undefined behavior which allowed for its
dominance.

~~~
arcticbull
I have a hard time accepting the premise that a language is better because you
have to hope that it does what you want instead of telling it, and for that
reason it has become successful.

In fact, the ANSI C spec even states from your link that "Undefined behavior
gives the implementer license not to catch certain program errors that are
difficult to diagnose." That's just not an acceptable tradeoff anymore -- in
fact I don't know it ever was. That's akin to shipping your org chart because
you can't make your program do what everyone agrees it should do. You know,
catch programming errors, in this case.

~~~
samatman
The paper argues that some behaviors being undefined let compilers target the
widely varying architectures of the early microcomputer era, while still using
a language that remains at least somewhat portable between them.

~~~
arcticbull
Yes, but your program isn't doing what you think you told it to, which is
worse, because the compiler tells you that it will, know what I mean? A false
guarantee of safety is kind of the worst.

~~~
DSMan195276
> Yes, but your program isn't doing what you think you told it to,

Your program is still doing what you told it to if you weren't relying on any
undefined behavior (Which, obviously, you shouldn't be). That's the point - if
you write C that doesn't rely on any UB, then it can run on a large variety of
architectures, work exactly the same on each one, and still get close to the
maximum speed for each architecture because they can take the fastest possible
option for any of the UB cases (And your program shouldn't care what they do
in that situation).

~~~
fiddlerwoaroof
The problem I generally have with C is that the scope of UB is really wide.
Languages like Common Lisp have UB too, but it’s generally easier to avoid in
“normal” code.

~~~
dfox
In CL case the "is an error" wording generally covers all of C's
"implementation-defined", "unspecified" and "undefined". And surprising amount
of normal looking constructs are in the implementation-defined category.

When you add type declarations and (optimize (safety 0)) into the mix you get
similar scope of UB as in C (another thing is that in reality nobody actually
writes code like that)

~~~
fiddlerwoaroof
I’m thinking of things like it being UB to modify a reader literal because the
reader is allowed to do things like tail sharing.

~~~
dfox
I think that modifiing reader literal is an error mostly to allow
implementations to store such pairs in read-only pages (eg. in .rodata section
when you build standalone executables) and to allow simple implementation of
CDR-coding for reader literals (dfsch has three distinct implementations of
abstract <pair> class for exactly this reason and to be able to actually
signal meaningful error condition when you violate this restriction).

~~~
fiddlerwoaroof
But it’s not an error: for some of these cases the standard doesn’t specify
the consequences and most implementations don’t throw an error if you modify a
reader literal, even if you end up modifying the interned list.

~~~
dfox
"Is an error" is wording used by the standard for things that are left
undefined. For things that should signal some kind of error condition the
standard uses wording "an error is signalled", the set of such cases is
surprisingly small.

------
mehrdadn
To those annoyed about, say, signed integer overflow being undefined: do you
even have a habit of ensuring that what's inside your loop will behave
correctly around an overflow boundary? Is the real problem really only the
loop increment for you?

~~~
3pt14159
The problem is tractability. If you send me your data, your source, and your
OS and if I compile it with a slightly different compiler and I can't
reproduce it gets incredibly frustrating to hunt down _why_ something isn't
working.

~~~
mehrdadn
Interesting, this is a less common complaint than what I usually hear. It kind
of makes sense if you run into this a lot (I don't in C++, but maybe it
happens more in C?), but at the same time I'm wondering why can't you just
e.g. use -fwrapv if you want to rely on integer wrapping in your release
executable? It also doesn't sound like it's related to using a different
compiler, since using the same compiler with a flag like -fwrapv or with
optimizations disabled should also prevent UB from being exploited.

~~~
3pt14159
Sorry, I miscommunicated. I was not speaking about integer wrapping, I was
speaking about undefined behaviour in a general sense.

------
makecheck
Excessive flexibility for optimizers, schedulers, etc. adds a high price to
debugging, and it’s not like we have perfect programs that can only be
improved in these fancier ways.

Programs are not all written by experts. Performance may be left on the table
in areas that have nothing to do with optimizers or schedulers (e.g. picking
an absolutely terrible algorithm or not noticing unnecessary memory
allocations).

 _If_ something must remain “unspecified”, I at least want a debugging tool
for that behavior. If something “may” happen, give me a switch to _make_ it
happen. If something is data-dependent, show me the data that triggers a
different outcome or give me a way to scramble data enough to increase the
chance of triggering a change in behavior.

------
taneq
"Value" is the wrong word, because it implies goodness. It's taken me a while
to come around to this viewpoint but I think "undefined behaviour" is
unambiguously bad in pretty much every context. I understand why it was
originally included in language specs but in modern times, where even the
meanest embedded processors are generally 32 bit 2's complement processors
with an ALU, we can afford to define the behaviour of the language which we're
using.

~~~
millstone
One example of undefined behavior is dereferencing a free'd pointer. How
should that be defined?

~~~
taneq
In a perfect world it would either be impossible by design or result in a
compile error. I'd settle for "reliably crash with an error message saying
where in the code it happened".

~~~
bluecalm
You can get it in C. Just set the pointer to NULL after free, you will get
your crash, problem solved.

~~~
gpderetta
How would that work if you have two pointers pointing to the same object?

~~~
bluecalm
If your design is such that you destroy thinks more pointers point to you have
way bigger issues than UB or even language you use.

~~~
gpderetta
Are you saying that a program should have exactly one pointer to each
allocated object?

------
MrBingley
The problem with the first example is that `int` and `unsigned` are not the
correct types to be using for array indexing. As we saw, they were 32 bits
wide on a 64 bit platform, which lead to an additional sign extension and (in
the case of `unsigned`) truncation instruction. However, this can be avoided
by using `ptrdiff_t` and `size_t`, which are defined to match the platform
size and avoid the extension and truncation altogether. The fact that signed
integer overflow being undefined allowed the compiler to elide the `int`
truncation is somewhat irrelevant, since using the correct integer types leads
to even faster code that doesn't rely on undefined behaviour at all.

------
saagarjha
> What irritates a lot of people is that compilers will still apply the strict
> aliasing rule even when it’s trivial for the compiler to prove that aliasing
> is occurring

I think better diagnostics for this would be a win for everyone.

------
swayvil
Defined behavior is a mere shadow of undefined behavior. All form comes from
the constraining definition. One who limits his explorations to the defined
has crawled into a hole and pulled in the hole after him.

Ya, I know, offtopic.

------
Dylan16807
* Most forms of undefined behavior don't have any meaningful use to compilers.

* With integer overflow it would be relatively easy to write a spec that allows "n+1>n" optimizations but not utterly _anything_ to happen. With strict aliasing it's probably not too hard either. You'd still see some program misbehavior in the strict aliasing case, but it would be a vastly reduced set of misbehavior. And allowing signed numbers to act more like abstract numbers would often _decrease_ misbehavior.

------
iainmerrick
If these are the best justifications we can come up with, it doesn't change my
opinion that the current reliance on UB by compilers is a bad idea. Anyone got
any better ones?

~~~
saagarjha
It makes your code many times faster in a lot of cases?

~~~
iainmerrick
This article gives two examples where UB arguably helps you, and in each case
the article itself mentions a cleaner and better way of doing it.

Magic behavior of "int" (but not "unsigned int") --> just use a type that's
the correct size and better conveys your intention, like "size_t".

(And that one doesn't make your code "many times faster", in the example given
it saves a single instruction.)

Type-based assumptions about aliasing (with a special case for char*) --> be
explicit about your intention with the "restrict" keyword.

------
IshKebab
This is a good history lesson on why UB was introduced, but times have changed
and it is definitely not justifiable today.

Saving one instruction was probably worth it in the 70s, when security wasn't
a thing, but it definitely isn't now.

~~~
millstone
Are you objecting to the UB in the specific named cases (aliasing and signed
overflow) or UB in general?

~~~
IshKebab
In general.

------
Asooka
I really think that undefined behaviour should be switched to machine-
dependent behaviour everywhere and we should just have tools that can suggest
optimized variants of hot functions - to be rewritten in ways that take
advantage of assuming undefined behaviour doesn't happen. Or lift the `extern
"lang"{}' construct from C++ to C and have `extern "C-strict"' which turns on
full strict undefined behaviour optimizations. Yes, I _know_ I can turn most
of them off. We ship C++ software and always compile with singed-overflow
enabled and strict-aliasing disabled (among others). Those are just too
dangerous to be turned on for an entire codebase of a large C++ application
where you have no idea what will be inlined where and what LTO will do. But if
some UB allows for measurable performance gains, we should be able to turn it
back on for restricted safe scopes.

And please, if the compiler can prove that the program invokes UB in "strict-
UB" mode, that should be an error, not a cause for the code to be deleted. I'm
talking about the case of this SPEC 2006 code [1]

It doesn't make sense to me to just apply potentially disastrous micro-
optimizations that save a couple of percent of run-time on the whole program.
This sounds like a classic case of optimizing before measuring.

1 [https://blog.regehr.org/archives/918](https://blog.regehr.org/archives/918)

~~~
tedunangst
> This sounds like a classic case of optimizing before measuring.

But you have done the measurements, right?

