For years I've been growing steadily more worried about UB and its nebulous, Heisenfootgun nature. I got a few paragraphs into this noble work, scrolled down to check how much was yet to go, and decided to try reading the Rust book again.
I mean, I'll stick with C/C++ for now. I can write 'good enough' code which does what I expect and seems to be reliable. I just hate the feeling that without spending a truly ungodly amount of time on it, even the best code is peppered with unknown unknowns waiting to bite me at some critical moment.
I can highly recommend "Programming Rust" by Jim Blandy. Personally I found it got to the differences from C/C++ a lot quicker than the official Rust book.
Of course Rust still has plenty of undefined behaviour, it is just restricted to unsafe blocks. (There are occasional exceptions, such as the infamous floating point cast bug [1].)
I don't mind there being some UB (in fact sometimes it's necessary, especially as I eventually want to use it on embedded targets that need to do all sorts of weird memory jiggery-pokery). I don't even mind bugs; all software has bugs. What I want to move away from is a language which embraces widespread and subtle UB as a design philosophy.
Is this really such a big deal mentally? Idealized there are only two behaviors: Expected and unexpected. Whether that unexpected behavior is a result of UB, or implementation defined behavior, or logic errors, or unforeseen corner cases, doesn't in practice actually matter. Your computer isn't actually any more likely to cause a nuclear explosion because it hit UB than it is to do that because it hit some other kind of bug.
Of course reducing and any kind of bug class is a plus, Rust is great, but I never had the feeling that my C++ code is more error prone than my Java or Python code just because some things are completely UB. What actually happens with UB is that the application crashes and/or data gets scrambled - I've yet to see my PC to feel free to stand up and walk away just because it was technically allowed to.
In the case of undefined behavior the actual behavior depends on the information your compiler has when it compiles that particular piece of code. It can thus change for apparently completely unrelated reasons, for example if the heuristics flip from "Don't inline this" to "Inline this". That's really not nice to debug.
It's about exposure to risk. Imagine you have a giant bowl of M&Ms and you ate one every day. Now say that one of them is laced with cyanide, maybe not enough to kill you, just enough to make you sick for a couple of weeks. It's a really big bowl, though - there are millions of M&Ms in there so you're probably fine to eat them. Now say 2, or 5, or 10 of the M&Ms are laced. How many M&Ms would you accept before you decided to get your M&Ms elsewhere?
Agreed that generally once you've built and tested your code with a given compiler, I seem to be as reliable with C++ as I am with other languages. But if I can remove a whole class of reasons for me to have a really bad week, that seems like a winning move.
Furthering the M&M analogy: Let's say you have a box with 10,000,000 M&Ms. 1000 of them are poisonous, and of those 1000 10 are deadly, 290 make you sick for a week, and 700 will mildly inconvenience you for a day.
My point is that removing UB is like removing 1 deadly, 20 sickening, and 70 inconveniencing M&Ms. Which, by every metric, is a great thing. The only thing that bothers me is that some people act like removing UB is equivalent to removing all 10 deadly M&Ms, which obviously isn't the case.
Oh, absolutely. Simply removing UB isn't going to solve all your problems, but it does remove some of the sneakiest ones. To keep the analogy going, let's say that all of the poisoned M&Ms were blue, except for the UB ones. So you at least know to be more wary of the blue ones and only eat them when necessary (or when you are able to put contingencies in place), but the other colours could still hit you with UB at an inopportune time.
That analogy makes me think of meeting people. Let's say there are 10,000,000 people. 1000 have a contagious virus, and of those 1000, 10 give a deadly new coronavirus, 290 give you the common cold for a week, and 700 lead to harmless sneezing...
As someone that writes a lot of C++ and quite likes the language, I think it is feasible, given a team of experienced c++ programmers and a good testing infrastructure to write programs that trigger almost no UB for any reasonable input.
What I worry about is the unreasonable (i.e. hostile) input.
Really? Isn't is still the case that signed integer arithmetic can lead to UB? Despite this, there's a push for Almost Always Int from the C++ powers-that-be.
C++20 specifies two's complement for signed integer arithmetic, like it was already for unsigned, which means most (basically everything except -INT_MIN) is now no longer UB.
That’s correct, and it’s quoted in JF’s paper linked above: “ The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.”
>Status-quo If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined. The author had hoped to make this well-defined as wrapping (the operations produce the same value bits as for the corresponding unsigned type), but WG21 had strong resistance against this.
I really don't understand why WG21 is against this, do you happen to have a link to arguments against it? The only one I frequently see is that "it's faster that way" with a link to godbolt where iterating with a 32-bit signed integer on a 64-bit system is faster than iterating with a 32-bit unsigned integer, because the compiler exploits UB here to ignore overflow. Which is of course a completely useless example because simply using a correctly sized integer for iteration, signed or unsigned, will always result in the fastest most correct code.
At least in the case of overflow, the article mentions a few arguments:
“””
This direction was motivated by:
Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs;
Implementation leeway for tools such as sanitizers;
Data from Google suggesting that over 90% of all overflow is a bug, and defining wrapping behavior would not have solved the bug.
“””
Just the other day I saw an article here titled "Rust Lang in a nutshell" and I was thinking, "Rust" and "in a nutshell" have become a contradiction in terms, what with all the language's complexity. But seeing this litany of UB in C++ puts that very much in perspective.
FWIW, I welcome this enumeration as a good overview but, wow, does it motivate me to consider alternatives.
I mean, I'll stick with C/C++ for now. I can write 'good enough' code which does what I expect and seems to be reliable. I just hate the feeling that without spending a truly ungodly amount of time on it, even the best code is peppered with unknown unknowns waiting to bite me at some critical moment.
reply