I'm not sure that's a productive way to think about UB. The "weirdness" happens ...

WalterBright · on Nov 28, 2022

The C and C++ (and D) compilers I wrote do not attempt to take advantage of UB. What you got with UB is what you expected to get - a seg fault with a null dereference, and wraparound 2's complement arithmetic on overflow.

I suppose I think in terms of "what would a reasonable person expect to happen with this use of UB" and do that. This probably derives, again, from my experience designing flight critical aircraft parts. You don't want to interpret the specification like a lawyer looking for loopholes.

It's the same thing I learned when I took a course in race in high performance driving. The best way to avoid collisions with other cars is to be predictable. It's doing unpredictable things that cause other cars to crash into you. For example, I drive at the same speed as other traffic, and avoid overtaking on the right.

bombcar · on Nov 28, 2022

I think this is a core part of the problem; if the default for everything was to not take advantage of UB things would be better - and we're fast enough that we shouldn't NEED all these optimizations except in the most critical code; perhaps.

You should need something like

    gcc --emit-nasal-daemons

to get the optimizations that can hide UB, or at least horrible warnings that "code that looks like it checks for null has been removed!!!!".

badsectoracula · on Nov 29, 2022

AFAIK GCC does have switches to control optimizations, the issues begin when you want to use something other than GCC, otherwise you're just locking yourself to a single compiler - and at that point might as well switch to a more comfortable language.

titzer · on Nov 29, 2022

> What you got with UB is what you expected to get - a seg fault with a null dereference, and wraparound 2's complement arithmetic on overflow.

This is how it worked in the "old days" when I learned C. You accessed a null pointer, you got a SIGSEGV. You wrote a "+", then you got a machine add.

WalterBright · on Nov 29, 2022

In the really old DOS days, when you wrote to a null pointer, you overwrote the DOS vector table. If you were lucky, fixing it was just a reboot. If you were unlucky, it scrambled your disk drive.

It was awful.

The 8086 should have been set up so the ROM was at address 0.

badsectoracula · on Nov 29, 2022

This is the right approach IMO, but sadly the issue is that not all C compilers work like that even if they could (e.g. they target the same CPU) so even if one compiler guarantees they wont introduce bugs from an overzealous interpretation of UB, unless you are planning to never use any other compiler you'll still be subject to said interpretations.

And if you do decide that sticking to a single compiler is best then might as well switch to a different and more comfortable language.

titzer · on Nov 28, 2022

This is the problem; every compiler outcome is a series of small logic inferences that are each justifiable by language definition, the program's structure, and the target hardware. The nasal demons are emergent behavior.

It'd be one thing if programs hitting UB just vanished in a puff of smoke without a trace, but they don't. They can keep on spazzing out literally forever and do I/O, spewing garbage to the outside world. UB cannot be contained even to the process at that point. I personally find that offensive and rude that tools get away with being so garbage that they can't even promise to help you crash and diagnose your own problems. One mistake and you invite the wrath of God!

nayuki · on Nov 28, 2022

> I personally find that offensive and rude that tools get away with being so garbage that they can't even promise to help you crash and diagnose your own problems.

This is literally why newer languages like Java, JavaScript, Python, Go, Rust, etc. exist. With the hindsight of C and C++, they were designed to drastically reduce the types of UB. They guarantee that a compile-time or run-time diagnostic is produced when something bad happens (e.g. NullPointerException). They don't include silly rules like "not ending a file with newline is UB". They overflow numbers in a consistent way (even if it's not a way you like, at least you can reliably reproduce a problem). They guarantee the consistent execution of statements like "i = i++ + i++". And for all the flak that JavaScript gets about its confusing weak type coercions, at least they are coded in the spec and must be implemented in one way. But all of these languages are not C/C++ and not compatible with them.

titzer · on Nov 28, 2022

Yes, and my personal progression from C to C++ to Java and other languages led me to design Virgil so that it has no UB, has well-defined semantics, and yet crashes reliably on program logic bugs giving an exact stack traces, but unlike Java and JavaScript, compiles natively and has some systems features.

Having well-defined semantics means that the chain of logic steps taken by the compiler in optimizing the program never introduces new behaviors; optimization is not observable.

galangalalgol · on Nov 28, 2022

It can get truly bizarre with multiple threads. Some other thread hits some UB and suddenly your code has garbage register states. I've had someone UB the fp register stack in another thread so that when I tried to use it, I got their values for a bit, and then NaN when it ran out. Static analysis had caught their mistake, and then a group of my peers looked at it and said it was a false warning leaving me to find it long afterwards... I don't work with them anymore, and my new project is using rust, but it doesn't really matter if people sign off on code reviews that have unsafe{doHorribleStuff()}

lmm · on Nov 28, 2022

On the contrary, the latter is a far more effective way to think about UB. If you try to imagine that the compiler's behaviour has some logic to it, sooner or later you will think that something that's UB is OK, and you will be wrong. (E.g. you'll assume that a program has reasonable, consistent behaviour on x86 even though it does an unaligned memory access). If you look at the way the GCC team responds to bug reports for programs that have undefined behaviour, they consider the emit_nasal_demons() version to be what GCC is designed to do.

masklinn · on Nov 28, 2022

> There are definitely situations where many interacting rules and assumptions produce deeply weird, emergent behavior

The problem is how due to other optimisations (mainly inlining) the emergent misbehaviour can occur in a seemingly unrelated part of the program. This can the inference chain very difficult, as you have to trace paths through the entire execution of the program.

The issue occurs for other types of data corruption, it’s why NPE are so disliked, but UB’s blast radius is both larger and less reliable.