Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want some nice examples of how undefined behavior results in weirdness, see https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

An interesting example from there is how the compiler can turn

    int table[4];
    bool exists_in_table(int v)
    {
        for (int i = 0; i <= 4; i++) {
            if (table[i] == v) return true;
        }
        return false;
    }
Into:

    bool exists_in_table(int v)
    {
        return true;
    }


The amazing part about examples like that is people read them, check that the compiler really does work on that basis, and then continue writing things in C++ anyway. Wild.

Suppose I should expand on this. The idea seems to be either 1/disbelief - compilers wouldn't really do this or 2/ infallibility - my code contains no UB.

Neither of those positions bears up well under reality. Programming C++ is working with an adversary that will make your code faster wherever it can, regardless of whether you like the resulting behaviour of the binary.

I suspect rust has inherited this perspective in the compiler and guards against it with more aggressive semantic checks in the front end.


>The amazing part about examples like that is people read them, check that the compiler really does work on that basis, and then continue writing things in C++ anyway. Wild.

Well, in modern C++ this code would look like this:

    std::array<int, 4> table;
    bool exists_in_table(int v)
    {
        for (auto &elem : table) {
            if (elem == v) return true;
        }
        return false;
    }
Or even simpler:

    std::array<int, 4> table;
    bool exists_in_table(int v)
    {
        return std::ranges::contains(table, v);
    }
There's no shortage of footguns in C++, but nonetheless, modern C++ is safer than C.


Weirdly, GCC fails to optimize this, but Clang does (if you make the table static as in the original example).


I actually would prefer to get the second output. The result is wrong, but consistantly and deterministically so. The naive implementation of the broken code is a heisenbug. Sometimes it will work, and sometimes it won't, and any attempt to debug it would likely perterb the system enough to make the issue not surface.

It wouldn't suprise me if I have run into the latter situation without relizing it. When I got the the problem, I would have just (incorrectly) assumed that the memory right after the array happened to have the relevent value. I would be counting my blessings that it happened consistantly enough to be debuggable.


I agree that it is better to get deterministic and predictable behavior.

Reminds me of when for a while, I worked on HP 9000s under HP-UX and in parallel on an Intel 80486-based Linux box, and what I noticed is that the Unix workstations crashed sooner and more predictably with segmentation faults than Linux on the PC (not sure if this has changed since the early 1990s - probably had to do with the MMU); so developing on HP under Unix and then finally compiling under Linux led to better code quality.


> The amazing part about examples like that is people read them, check that the compiler really does work on that basis, and then continue writing things in C++ anyway.

That isn't idiomatic C++ and hasn't been for a long time. Sure, it's possible to do it retro C-style, because backward compatibility, but you generally don't see that in a modern code base.


The modern codebase has grown from a legacy one. The legacy one with parts of the codebase that were C, then got partially turned into object oriented C++, then partially turned into template abstractions. The parts least likely to have comprehensive test coverage. That place is indeed where a compiler upgrade is most likely to change the behaviour of your application.


every day new greenfield projects start in C++ - nowadays, 20, 23...


There are plenty of similar things in C++. I do not think C++ is safer than C. std::vector does not do bounds by default checking last time I checked.


And thank G-d it doesn't do it!


Remind me, how is this a good thing again? Especially considering that (if you write modern C++) the compiler should optimize away bound checks most of the time (and in all critical places) either way.


This is excellent thing and here is why:

> the compiler should optimize away bound checks most of the time (and in all critical places)

Unfortunately, this is true much rarer than you might think. In order to optimize it away, a compiler has to prove that the bounds check result is always true and that happens surprisingly not all the time to say the least. When it can't optimize it away, bounds checking will slow down the code significantly, especially in tight loops. And that slowdown will be very hard to debug unless you know exactly where to look for, - you'll basically assume that "that's how it works at the highest speed and it can't be improved (a.k.a. "buy a better hardware!")

Second, with a proper programming hygiene, in many cases bounds checking are just redundant. There are at least 2 methods for direct iteration over a vector, that doesn't require it: ranged `for (auto& e: vector){}` and by utilizing iterators. There are also `<algorithms>` library with implementation of many useful container iteration functions that require you at most to only specify a functor that do some operation on vector element.

And third - if you think that you really must have bounds checking, it's about as trivial to implement as:

  template<typename T, typename A>
  class vectorbc : public ::std::vector<T,A>{
    public:
    using std::vector::vector;

    T& operator[](std::size_t idx){
         return this->at(idx);
    }
  }


Genuninely curious, why do you say g_d not god?


It's just as "amazing" to read these takes from techno purists. You use software written in C++ daily, and it can be a pragmatic choice regardless of your sensibilities.


And we have the core dumps to prove it.

When any Costco sells a desktop ten thousand times faster than the one I started on, we can afford runtime sanity checks. We don’t have to keep living like this, with stacks that randomly explode.


I don't know what line of work you're in, but I use a desktop orders of magnitude faster than my first computer also, and image processing, compilation, rendering, and plenty of other tasks aren't suddenly thousands of times faster. Not to mention that memory safety is just one type of failure in a cornucopia of potential logical bugs. In addition, I like core dumps because the failure is immediate, obvious, and fatal. Finally, stacks don't "randomly explode." You can overflow a stack in other languages also, I really just don't see what you're getting at.


> Not to mention that memory safety is just one type of failure in a cornucopia of potential logical bugs.

You can die of multiple illnesses so there's no point in developing treatment for any particular one of them.

> I like core dumps because the failure is immediate, obvious, and fatal.

Core dumps provide a terrible debugging experience, as the failure root cause is often disjoint from the dump itself. Not to mention that core dumps are but one outcome of memory errors, with other funnier outcomes such as data corruption and exploitable vulnerabilities as likely.

Lastly, memory safe languages throw an exception or panic on out of bound access, which can be made as immediate and fatal as core dumps. And much more obvious to debug, since you can trust that the cause starts indeed at the point of failure


I don’t mean a call stack, I mean “stack” in the LAMP sense—the kernel, drivers, shared libraries, datastores, applications, and display servers we try to depend on.


I dunno, my computers seems to keep running slower and slower despite being faster and faster. I blame programmers increasingly using languages with more and more guardrails which are slower. I'd rather have a few core dumps and my fast computer back.


Definitely. There's loads of value delivered by C++ implementations, including implementations of C++ and other languages. The language design of speed over safety mostly imposes a cost in developer / debugging time and fear of upgrading the compiler toolchain. Occasionally it shows up in real world disasters.

I think we've got the balance wrong, partly because some engineering considerations derive directly from separate compilation. ODR no diagnostic required doesn't have to be a thing any more.


But it isn’t Rust.


Lots of things 'aren't Rust'. In fact almost everything isn't Rust. For now. That may change in due course but right now I would guestimate the amount of Rust code running on my daily drivers to pretty close to zero%. The bulk is C or C++.


Hardly anything is. Literally none of the programs on my machine are coded in Rust. (Firefox is reputed to have a bit in it.)


If you're running Windows 11, you have Rust running in your kernel. And also some userspace system libraries.

https://www.bleepingcomputer.com/news/microsoft/new-windows-...

Someone else posted statistics that show Firefox being 10% Rust, but I'm not sure it makes sense to include HTML and Python and JavaScript in the comparison. If you compare Rust against C/C++, it's 20%


Who runs Windows?

There is no such language as C/C++. I presume you mean the sum of C and C++, and that you omit external libraries from the tally.


> I presume you mean the sum of C and C++

"I'd just like to interject for a moment. What you're refering to as Linux, is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX."

Isn't this a tad pedantic? You obviously understood what I was saying.

>and that you omit external libraries from the tally.

Mozilla vendors their dependencies. They're counted.



> check that the compiler really does work on that basis, and then continue writing things in C++ anyway. Wild.

My compiler (MSVC) doesn't do that[0]. Clang also doesn't do this[1]. It's wild to me that GCC does this optimization[2]. It's very subtle, but Raymond Chen and OP both say a compiler can create this optimization, not that it will.

[0]: https://godbolt.org/z/bdx4EMzxe

[1]: https://godbolt.org/z/z833Wa391

[2]: https://godbolt.org/z/6b8aq59M9


What's amazing is programmers haven't tared and feathered the standards committee and compiler writers for allowing crap like that.


They've tried. Language lawyers are very good at their jobs, though.


Well, the argument brought up is that users want it this way, so this is existing practice which is implemented and should be standardized. So please complain and file bugs.

Also use the compiler and language feature that help, such as variably-modified types such as bare pointers, attributes, compiler flags, etc.


I think at this point unfortunately regulators needs to start getting involved.


What's odd about that example is that the optimization is only valid if the loop in fact overflows the array every time. So the compiler is proving that the array is being overflowed and rather than emitting a warning to that effect, it generates absurd code.


> So the compiler is proving that the array is being overflowed and rather than emitting a warning to that effect

   <source>:5:13: warning: unsafe buffer access [-Wunsafe-buffer-usage]
      5 |         if (table[i] == v) return true;
        |             ^~~~~
https://godbolt.org/z/zGxnKxvz6

This one is weirdly hard to get a compiler warning out of which is a fair critque, but so many of the "Look what the compiler did to my UB silently!" issues are not at all silent and would have been stopped dead with "-Wall -Wextra -Werror"


As noted elsewhere in this thread, GCC by default does the "optimization" and doesn't warn. No doubt there are other examples where Clang is the one that misbehaves.

How are we supposed to know whether our code is being compiled sensibly or not, without poring over the disassembly? Just set all the warning flags and hope for the best?


I think that a big problem is that for every compile that seems "not sensible" and is actually not sensible, there are 100s or 1000s of compiles that would look absolutely insane to a human but are actually exactly what you want when you sit down and think about it for a long time.

Almost all of the "don't do the overly clever stuff!" proposals would throw away a huge amount of actually productive clever stuff.


I think what the GP means by "not sensible" is that proving that the code is broken in order to silently optimize it more aggressively is not sensible. If your theorem proven can find a class of bugs then have it emit diagnostics. Don't only use those bugs to make the code run faster. Yes, make the code run faster, but let me know I may be doing something nonsensical, since chances are that it is nonsensical and it doesn't cost anything at run time.


A warning is only useful if it prescribes a code transformation that affirms the programmer's intent and silences the warning (unless the warning was a true positive and caught a bug). You cannot simply emit a warning every time you optimize based on UB.

There is no `if(obvious out-of-bound access) silently emit nonsense har har har` in the compiler's source code. The compiler doesn't understand intent or the program as a whole. It applies micro transformations that all make sense in isolation. And while the compiler also tries to detect erroneous programming patterns and warn about those, that's exceedingly more difficult.


>You cannot simply emit a warning every time you optimize based on UB.

And I'm not saying it should do that. I'm saying if the compiler is able to detect erroneous code, then it should emit a warning when it does so. An out of bounds access is an example of code that is basically always erroneous.

>There is no `if(obvious out-of-bound access) silently emit nonsense har har har` in the compiler's source code. The compiler doesn't understand intent or the program as a whole. It applies micro transformations that all make sense in isolation.

Yes, I understand that. However, like I said in my first response, this optimization in particular is only valid if the array is definitely accessed incorrectly. If the compiler is able to perform this optimization, there are only two possibilities: either the compiler can determine in some cases (and in this one in particular) that an array is accessed incorrectly and doesn't warn about it; or it can't determine that condition and this optimization is caused by a compiler bug and there are cases where the compiler incorrectly performs it, breaking the code. If the former is the case, then someone wrote the code to check whether an array is always accessed correctly. Either that, or nobody wrote it and the compiler deduces from even more basic principles that arrays must always be accessed by indices less than their lengths; which, I mean, that might be the case, but I seriously doubt it.


> if the compiler is able to detect erroneous code

Today in most cases nobody is writing this code. Neither C nor C++ have any mandate for such detection.

There is a proposal, which could perhaps make it into C++ 26, to formally specify "erroneous behaviour" and have the compiler do something particular and warn you that what you're doing is a bad idea for the specified cases†, but it's easily possible that doesn't end up in the IS, or that compiler vendors aren't interested in implementing it. Meanwhile, if it happens at all it's up to the vendor.

† "Erroneous behaviour" is one possible approach to the uninitialized locals problem in C++. Once upon a time C says local variables can be declared and used without initializing them, this actually has Undefined Behaviour, which is very surprising for C and C++ programmers who tend to imagine that they're getting the much milder Unspecified Behaviour, but they are not. Many outfits use compiler flags to say look, when I do this, and I know sometimes I'll screw up, just give me zeroes, so that's Defined Behaviour, it's not Intended Behaviour but at least it's not Undefined. This includes all major OS vendors (Microsoft, Apple, Red Hat etc.)

Some people brought this approach to WG21, but there was pushback, if uninitialized variables are zero, then they're not really uninitialized are they? This has two consequences, 1. Performance optimisations from not initializing data evaporate; and 2. It is now "correct" to use this zero initialization behaviour because it's specified by the language standard, so maybe you can't lint on it.

Erroneous Behaviour solves (2) by saying no, it's still wrong, it's just safely wrong, the compiler can report this is wrong and it must ensure your results are zero.

Another proposal offers a syntax to solve (1) by saying explicitly in your program, "No, I'm a smart C++ programmer, do not initialize these values", akin to the markers like ~~~ you may have seen to mean "Don't initialize this" in some other languages.


> However, like I said in my first response, this optimization in particular is only valid if the array is definitely accessed incorrectly.

No. The code does not show any undefined behavior if any of the elements of `table` is equal to `v`, because then the loop is ended by an early return. The compiler certainly did not prove that this code always has undefined behavior.


Right and the next part is the hard part: defining this clearly. What I'm saying is that there is a surprising amount of "wait, actually I do want that" when you dig into this proposal.


A reasonable compiler would let you turn off a specific warning for a section of code.


They pretty much all do.

#pragma clang diagnostic push

#pragma clang diagnostic ignored "-Wwhatever"

// code

#pragma clang diagnostic pop


I was going to comment that GCC doesn't, but it seems it was added as some point since the last time I checked. I know at one time GCC had as a policy not to allow doing that.



I agree there should be a warning. But it is not trivial to teach a compiler when to warn or not to not generate too many false positives.

Not as good as warning, but UBSan catches this at run-time: https://godbolt.org/z/Mdjn7h8dj


> whether our code is being compiled sensibly or not

I'm failing to see what's not sensible about how that code is compiled.

The only possible way that function could return false is if you read past the end of the array and the value there happens to be different from `v`. Is it really the more sensible to rely on that, rather than fixing a known behavior in case of array overflow?


If the compiler's going to interpret undefined behaviour as license to do something that runs counter to the programmer's expectations, the most sensible course of action is for the compiler to yell very loudly about it instead of near-silently producing (differently!) broken code.

Currently that piece of code doesn't trigger a warning with -Wall. It's not even flagged with -Wextra - it needs -Weverything.


One man's "broken code produced by the compiler" is another man's "excellently optimized code by the compiler".

Where to draw the line is not always clear, but here's a very clear-cut example[1] where emitting a warning would be bad. If you don't want to watch the video, it's basically this:

- the code technically contains undefined behavior, but it will never be actually triggered by the program

- changing the code to remove undefined behavior forces the compiler to emit terrible code

Making the compiler yell at the programmer in this case would be terrible, but it's clearly a consequence of what you're asking.

[1] https://youtu.be/yG1OZ69H_-o?t=2358


Exactly. I think a lot of this noise is by non-practitioners of the language. The compiler is steel-manning this loop. It is generously interpreting the 4 as irrelevant, and deducing that the loop must always exit early. The author can’t possibly have meant to access beyond the end, because that’s not defined. QED. It seems altogether sensible to me.


Wow, I must congratulate you because this reads equally well both as a serious argument and as a parody of that argument.

So let me reply to your comment as if it were serious: yes, if the programmer by supernatural means knows that the "v" is always presented somewhere in the array, then this function works exactly as intended: it would always return true, and the compiler optimises it to do so as quickly as possible! But... perhaps there is some other way to pass such programmer's knowledge ("the arguments are guaranteed to be such that this loop is guaranteed to finish early") to the compiler in a more explicit way? Some sort of explicitly written assertion? A pre-condition? A contract, if you like?

See, it's very difficuly to maintain such unspoken contracts and invariants during the codebases' life because they're unspoken and unwritten. Comments barely count since compilers generally ignore them.


Thanks! I think anyone would have to be nuts to write a loop like this in C++ or tolerate C as a language. C++'s `ranges::find` does what it says, and communicates between the author and the reader as well as the author and the compiler.


> One man's "broken code produced by the compiler" is another man's "excellently optimized code by the compiler".

To be fair it's not the compiler's fault that the source program is broken - the argument is over whether the compiler is being helpful or being obtuse, and this particular case I'd argue the latter.

Thanks for the video link - it's an interesting example, but the crucial difference there, I think, is that in that case the compiler isn't doing something counter to the programmer's intent. The code isn't incorrect (assuming a non-pathological buffer size) - it's merely more convenient for the compiler when expressed with int32_t indices rather than uint32_t indices.

I do appreciate, though, that deciding what to yell about and what not to yell about is an extremely non-trivial problem.


I am not sure this warning is proving any overflow, you can get the same warning by just accessing table[i].

https://godbolt.org/z/Gxd3rK9Ts


No, the logic for the optimization is:

- a correct program does not access table[4]

- therefore the loop must always exit early

- the only way to exit early is to return true


    int table[4];
    bool exists_in_table(int v)
    {
        for (int i = 0; i <= 4; i++) {
            if (table[i] == v) return true;
        }
        return false;
    }

i -> 0 goto return true or next iter i -> 1 goto return true or next iter i -> 2 goto return true or next iter i -> 3 goto return true or next iter

i -> 4 goto return true or exit loop and return false Since the branch in on undef behavior it is okay for the compiler to choose any branch destination or none at all (i.e. remove all further code). The compiler in this case likely chose to just remove the branch and any destinations. All that the prior code does is return true, since there is no next iter, so thats all what is left.


No, the compiler knows the array isn't overflowed, because C programs don't contain overflows. Therefore the loop must exit via one of the return true statements.


> What's odd about that example is that the optimization is only valid if the loop in fact overflows the array every time.

No, the optimization is valid if the function is always called with "v" that is actually exists in the table; in this case the function should always return true so it's only proper for the compiler to throw out the extraneous code.

And writing the loop in such way is programmer's promise/guarantee to the compiler that the function will indeed be called only in that manner. That's the essence of the UB: it's the programmer who promises to the compiler that she will perform all the necessary checks (or formal proofs of impossibilty) herself; the compiler may go forward and rely on the implied invariants and preconditions.

And this is, of course, the main problem of the UB because 95% of the time the programmer does not actually intend to make such a gurantee: she simply is unaware (for whatever reason) that there is a pre-condition (checked by nobody!) that's required for the program's correctness to hold... or it's even just a typo she made.


No, the compiler is proving that the return true statement must be executed given the axiom that the loop cannot overflow.

This is tricky, because the code is perfectly valid if it always early-exit (and I have written code like this myself that avoids bound checking by guaranteeing an early exit, when micro-optimizing), so it is hard to statically reject it. On the other hand it seems a very obvious thing to warn on.


Note that not all claims you find about UB on the internet are true. For example, in C, UB can not time-travel before observable behavior. And in general UB can not time-travel before any function call when the compiler can not show that the function returns to the caller (MSVC got this wrong though).


This is a great example.

There's an obvious UB, the compiler sees it at compile time and should just stop the programmer from doing the mistake at all times.

Instead, it's totally possible to just let the programmer tear his leg off for no clear reason.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: