
A Guide to Undefined Behavior in C and C++ (2010) - pmoriarty
https://blog.regehr.org/archives/213
======
banachtarski
For those viewing this thread, one year after this article was written, this
was standardized:

[https://en.cppreference.com/w/cpp/numeric/fenv](https://en.cppreference.com/w/cpp/numeric/fenv)

------
kstenerud
And therein lies a major problem with c and c++ compilers:

It's effectively impossible to write bug free code. Bugs in c and c++ usually
trigger undefined behavior. It is therefore impossible to write a conforming
program, which makes any guarantees in the spec meaningless.

I've hit heisenbugs like these that only trigger when optimized, and resist
write(), out(), fflush(), etc and it's infuriating.

Or even worse: programs that no longer work when compiled on a newer compiler.
With other languages, you are at least spared from this kind of code decay.

But everyone's writing compilers to the spec so tough :/

~~~
banachtarski
This is a myopic viewpoint. Undefined behavior is critical for code that needs
to be fast. The premise of languages like C and C++ is that the airgap between
the language's abstract execution and memory model and the hardware's is thin
to non-existent.

*(edit accidentally submitted early)

In this case, the UB is due to the compiler's ability to reorder statements.
This is such a fundamental optimization that I can't imagine you're really
suggesting that a language without this optimization capability is a
"problem." Rearranging instructions is critical for pretty much any
superscalar processor (which all the major ones are), and I hate to imagine
the hell I'd be in if I had to figure out the optimal load/store ordering
myself.

~~~
kstenerud
My point is that it is humanly impossible to write a bug-free program. In C
and C++, bugs usually manifest themselves in UB.

To make matters worse, compilers, ever searching for diminishing returns on
performance improvements, have been steadily making the CONSEQUENCES of UB
worse, to the point that even debugging is getting harder and harder. These
languages are unique in their growing user hostility.

~~~
banachtarski
Well don't use it. You aren't the target customer because for me, fixing the
performance bottleneck is a lot harder than finding a divide by zero, and I
certainly don't want to pay for the compiler to check things like divide-by-
zero without me asking it to. When I don't care about performance, I reach for
a scripting language or something. It's a tool, don't get all emotionally
worked up about it.

~~~
pjmlp
We are entitled to be emotional about it, because we all have to use tools
which have the misfortune to be written in C derived languages.

Even if I don't touch those languages for production code, my whole stack
safety is dependent how well those underlying layers behave, and how
responsibile the developers were towards writing secure core.

Which as proven by buffer oveflows in IoT devices not much.

~~~
xmiller
So where is _your_ production code written in Ada or Pascal?

~~~
pjmlp
Ada cannot tell.

Pascal was replaced by Java and C#.

~~~
jacoblambda
Ada absolutely can tell. Divide by zero throws an exception and buffer
overflows are caught by essentially every modern compiler.

~~~
pjmlp
Sure it can, but that wasn't the question. Rather what happened to the code I
have written.

NDAs make us not able to tell about stuff.

What magic variant of C or C++ compiler are you using that throws errors on
buffer corruption, unless you are speaking about using code with debugging
mode enabled in production instead of using a proper release build.

------
ilovecaching
This is why if you're currently writing C or C++, please, please look into
Rust.

Titus Winters just did a great talk at Pacific++ outlining how difficult it is
to understand the quirks of C++. Not even Bjarn can keep all of it in his
head. The dice are completely loaded against you to fail and introduce bugs,
unintended behavior, and vulnerabilities in your code.

Rust isn't a silver bullet that will suddenly fix your life, but it doesn't
deal with the crazyness of C undefined behavior and doesn't have to be
backwards compatible. It's designed from the ground up to make comprehensive
sense, and safety is a feature enforced for you by the compiler in many cases.

~~~
banachtarski
> This is why if you're currently writing C or C++, please, please look into
> Rust.

Ugh if you read the article, you would know that the UB described here is the
same in Rust. There is this massive group think about which languages are safe
and/or fast, and whenever an article like this comes up, invariably someone
says "Rust!" without really looking at the problem itself.

Personally, I am in wait and see mode on Rust. Without better generics support
(aka template <size_t> or similar) and a number of other things, it still
doesn't meet my bar for writing fast generic code, and who knows how complex
it will be at that point? I say the jury is still out.

~~~
newacctjhro
> Ugh if you read the article, you would know that the UB described here is
> the same in Rust.

Rust doesn't have a formal memory model yet, but it's already known that UB in
Rust is quite restricted:

[https://doc.rust-lang.org/nomicon/what-unsafe-does.html](https://doc.rust-
lang.org/nomicon/what-unsafe-does.html)

Most importantly, UB in Rust should only arise if you're writing unsafe code
(barring compiler bugs). Typically, most Rust code is safe. This is a huge
win.

------
anderskaseorg
(2010)

~~~
duneroadrunner
Yeah, with modern C++, you can largely choose to avoid using elements that are
prone to undefined behavior. For example, rather than native integers, you
could use a compatible integer class that checks for division by zero [1]. Or
one that checks for overflow too [2]. The Core Guidelines lifetime checker
aims to (eventually) make native pointers and references memory safe via
(severe, but not quite as severe as Rust) usage restrictions. And when you
need to circumvent the restrictions, you can use an unrestricted smart pointer
with run-time checking [3][4].

[1] shameless plug:
[https://github.com/duneroadrunner/SaferCPlusPlus#primitives](https://github.com/duneroadrunner/SaferCPlusPlus#primitives)

[2]
[https://github.com/boostorg/safe_numerics](https://github.com/boostorg/safe_numerics)

[3]
[https://github.com/duneroadrunner/SaferCPlusPlus#registered-...](https://github.com/duneroadrunner/SaferCPlusPlus#registered-
pointers)

[4] [https://github.com/duneroadrunner/SaferCPlusPlus#norad-
point...](https://github.com/duneroadrunner/SaferCPlusPlus#norad-pointers)

~~~
the_why_of_y
Interesting project; are you aware of any actual users?

Meanwhile, in the real world, C++ programmers typically use operator+(int,int)
with UB on overflow because it's conveniently built into the language.

The problem with C++ isn't that doing the right thing is impossible, it's that
doing the wrong thing is the default, with no dependencies and no syntactic
overhead.

~~~
jcelerier
> Meanwhile, in the real world, C++ programmers typically use
> operator+(int,int) with UB on overflow because it's conveniently built into
> the language.

In more than a decade of coding with C++ I have never been bitten by a signed
overflow bug which is UB. However I have been hit by unsigned underflow (e.g.
if(2u - 3u > whatever)) way too often even though it is perfectly "legal" from
the point of view of the language.

