Hacker News new | comments | show | ask | jobs | submit login
With Undefined Behavior, Anything Is Possible (raphlinus.github.io)
94 points by kibwen 3 months ago | hide | past | web | favorite | 28 comments



Nice article! From the outside, it seems so obvious that the fix for C is to reduce the scope of undefined behavior, either by simply defining it or making it "implementation defined". But it seems equally obvious that this is not going to happen.

The standards committee invented the concept of “undefined behavior” to capture this range of behavior. Essentially, it’s a license for the implementation to do anything it wants. And that’s reasonable; it’s hard to imagine nailing down the behavior any further without compromising performance or the fundamental nature of the problem. But given this hammer, the committee applied it far more broadly.

However, compiler authors got bolder over time, feeling that everything allowed in the standard was fair game, at the same time getting more sophisticated in what optimizations could be done with stronger assumptions.

Are "compiler authors" and the "standards committee" really two distinct groups? I've presumed that the expansion of "undefined behavior" in C is an example of "regulatory capture". That is, the specifications of the language are being decided by those who see "undefined" as yet another opportunity for optimization, as opposed to the users of compilers who tend to be more worried about broken programs.


IMHO the standards committee has been very reasonable in its definition of undefined behaviour; the standard even mentions one of the possible interpretations of UB is "behaving during translation or program execution in a documented manner characteristic of the environment", which is pretty much what everyone writing C with target platform(s) in mind is expecting.

In other words, I wouldn't blame the standards committee; they just set out to define a minimal "subset" language which would be suitably portable so as to be widely applicable, and left the rest to the implementers. They basically said "here is what we have standardised. The rest is up to you to define what is reasonable."

The "adversarial" attitude is entirely attributed to dogmatic compiler authors who are treating the standard as gospel in their theoretical world, and completely ignoring the real-world implications --- including the users of their products. What they don't seem to understand is the standard is merely a starting point, not the be-all-and-end-all of C. "Just because the standard allows, doesn't mean you should."

Linus has some more (less ranty) posts on the topic here: https://yarchive.net/comp/linux/gcc.html


I think the "compiler authors" and "standards committee" are in fact two mostly distinct groups. Now, there is some overlap, but more importantly feedback certainly flows both ways between the standards group group and compiler authors, and the standards committee is usually well aware of the concerns of the compiler authors. If it were any other way, it probably wouldn't work since it would be easy to ask for a feature that couldn't be implemented efficiently.

Said another way, the committee doesn't just come up with useful features, and then leave it to the compiler authors to implement them: the accepted features usually have already a specific high level implementation in mind, and you'll sometimes find various restrictions and quirks in the feature definition that is specifically an accommodation for how compilers are going to implement it. There are lots of examples, but one that comes to mind: RTTI simply doesn't work for objects without a virtual method. Since there is no obvious link between the two it seems like an arbitrary restriction, but it is a reflection of the fact that compilers mostly all want to implement RTTI as a special entry in the vtable, so allowing RTTI on objects w/o a virtual method would require a vtable in every object or some other expensive mechanism. So you get the RTTI you can practically implement.

I think that's all fine and reasonable: a total disconnect between implementation just wouldn't work here (this is true even in higher level languages, but there is arguably more wiggle room there because the performance baseline is different).

> is an example of "regulatory capture". That is, the specifications of the language are being decided by those who see "undefined" as yet another opportunity for optimization, as opposed to the users of compilers who tend to be more worried about broken programs.

My view is different. At least most of the classic examples of "annoying" or "unexpected" undefined behavior seem to quite old and have underlying reasons based on the hardware of the day:

1) signed overflow being weird due to non 2s complement machines 2) signed shifts being weird for the same reason 3) aliasing rules due to all sorts of interesting concerns about storing objects in different memory spaces, segmentation, different pointer representations, alignment (this one is the most iffy) 4) pointer arithmetic rules same as (3) but less iffy 5) reading uninit memory: hardware with trap representations, etc

The list goes on, but most of those have been with us for a while. The most aggressive use of most of these things, that we hear about now as some kind of terrible abuse of the rules by the compiler authors has mostly occurred many years after the rules were in place.

Certainly there is probably resistance to changing some of these rules to become defined, from the compiler authors - but I don't think it explains (mostly) their existence in the first place. Some of the rules are kind of papering over things that could actually be done more efficiently if higher level semantics were available. For example many of the "signed overflow allows loops to be optimized/vectorized more efficiently" would not be needed if there were some built-in way to iterate over things like arrays (arrays largely don't even exist in C as you mostly end carrying around a decayed pointer and size). Another example is the aliasing system: with stronger rules about aliasing memory areas, or better support for first-class arrays, or even just earlier introduction of restrict into the language, you'd wouldn't need to lean on the aliasing rules as heavily to get the effect through the strict aliasing "backdoor".

I see the most of the problems with UB in C and C++ as stemming from three main factors:

1) C is more than 40 years old, and C++ is more than 30. Many of the UB rules have been with us, formally or informally for a long time, and a long time ago there was both more weird hardware in actual use, and more uncertainty about what type of hardware would dominate in the future.

So a lot of the things that were left as UB, as well as lots of implementation defined things like the sizes of all the various primitive types, may have made actually made sense decades ago as a strategy for wide adoption and future trends.

More recent languages have the advantage in a dramatic stabilization in hardware: everyone has really settled on 2's complement, and non-weird word sizes, etc. It is very hard (I think) for C to roll back, practically and ego-wise, to roll back of the very old decisions which may have made sense when they were made.

2) C and C++ are both designed by committee. The committee is full of representatives from various hardware and software companies. These representatives are obviously going to "talk their own book" and make sure the language works well on their hardware. This is why we have all sorts of concessions to Itanium in the standard - do you think the committee could have said "no" to what seemed like the future architecture at the time? Itanium failed but we are left with its legacy in the standards. Similarly we have the "consume" stuff in the C++11 (and C11?) memory model. Probably have of the complexity in the entire memory model is just related to this concept. The model is _much_ more difficult to understand and many different concepts that wouldn't have existed needed to be introduced to support it. Not only that it affected all sorts of stuff outside of the memory model such as the whole concept of "carries dependency" which needs to be pervasive in the language spec.

That's _all_ there to support the POWER memory model in some scenarios. It seems like a shame to me. If you consider the amount of code in the world, on a cycle-weighted basis, that is running on power and would be affected by leaving this out it must be a small faction of 1%, but how can the committe say "Sorry IBM, you just aren't important enough anymore, you'll have to live without this crazy consume stuff?". It's just not going to happen. As some kind of consensus based body, rather than say a private, profit driven business, a weakness it that it's hard to say "no" and everyone's little idea or weakness gets accommodated.

3) C and C++ were very reluctant to make things "implementation defined" and almost always favored performance over complexity. I think there is some method to that madness: implementation defined is kind of ugly. How do you ensure compilers really define it? Will they use the same kind of precise language as the standard? What happens to portability in a language with many impl-defined aspects? Every compiler would essentially define their own sub-language.

Still I think it was worth trying to lean on this a bit more. For one, you could define some "default" way of doing things, like 2s complement arithmetic or pointers-that-are-just-integers-and-otherwise-work-like-you-think and have some standard macro you can check to see if that's the case. Almost like C did with `__STDC_IEC_559__`: if you define that you are using standard IEEE 754 floats (it didn't work out that well because some deviations from the large IEEE standard means that many compilers don't define it).

Given that, you could simply write programs that check the expected macros and error out if not supported and still have "portable" code as long as the target architecture is non-weird. You could just up-front check that you aren't running on a deathstation 9000.

If you wanted, you could also have conditional compilation: do the fast/normal thing if the hardware worked as expected (e.g., sign-extending shifts), and then fall back to some slower hack if not.


> C and C++ were very reluctant to make things "implementation defined" and almost always favored performance over complexity. I think there is some method to that madness: implementation defined is kind of ugly. How do you ensure compilers really define it? Will they use the same kind of precise language as the standard? What happens to portability in a language with many impl-defined aspects? Every compiler would essentially define their own sub-language.

Personally I think this was a mistake. A lot of things could be left as implementation-defined instead of undefined: most arithmetic operations, for example. The issue with undefined behavior is that you can’t have it in your program, period. And if you do, your entire program is technically invalid, and the compiler could do unexpected things to it without warning. However, with implementation defined behavior, you can use language constructs (e.g. #ifdef) to choose the correct implementation.


I also think it was mostly mistake, but was trying to provide what I imagine was the idea at the time.

That said, I think defining things as implementation-defined is only really useful when you have specific feature-check macros you can use to determine the behavior.

Otherwise you are kind of just trading one type of portability for another: rather than relying on the way your compiler handles UB directly, you are relying on the way your compiler does some ID (implementation defined) behavior: if you run your code on a different compiler, or a different version of the current compiler, the ID could change completely and the unexpected result probably quickly leads to either UB or some functional bug anyways.


Wouldn't the better solution be having compiler support for rejecting compilation of constructs that have UB?


  int add(int a, int b)
  {
    return a + b;
  }
Do you want the compiler to reject this because (assuming it has external linkage) there's no guarantee that there is no call with arguments such that the addition would overflow?


This discussion came up recently, but my stance still is that if you know for sure at compile time that some code always results in UB or is just non-conforming, you should not fuck around and just remove a big chunk of code because the standard allows you so, but either simply refuse compilation (my preference) or at least still just emit code representing what was written (effectively the implementation defined behavior route).


Sure, and compilers do now sometimes emit warnings like "shift amount is always out of range".

Overall though, this isn't how compilers work: they aren't "fucking around" with code they know for sure is broken and will be executed. They are just applying a large series of optimization passes which includes passes like removing unreachable code, and that interacts with an analysis of paths that cannot be take for various reasons (including that end in UB) to remove "big chunks of code".

The same passes that screw you over in any of these famous examples are the ones that help code generation in the 99% of remaining cases including many "obvious" optimizations that you'd want the compiler to perform.

I know the situation with undefined behavior is distressing and the examples make it look like the compiler writers are out to get you, but that's not really the case.


> That's _all_ there to support the POWER memory model in some scenarios.

Hmm. I’m ignorant of the actual history of the standardization process, but these days wouldn’t memory_order_consume benefit ARM systems equally (that is, if compilers were able to actually implement it)? The ARM architecture similarly provides an ordering guarantee for loads whose address depends on previous loads, in what is otherwise a weak memory model.


You are probably right: even as I was writing it, I was thinking "Isn't ARM like this too?". A big difference is that recent version of the ARM ISAs have added "acquire" and "release" loads and stores, so they will perform well with those semantics in the future (and ARM chips generally pick up the new ISAs quickly).


Could one write a C compiler that intentionally caused noisy havoc in every situation left undefined by the language spec? As opposed to trying to do something reasonable. Then at least you’d know that you were on shaky ground?


That is more or less what the sanitizers do, and it's a reasonable approach to trying to discover UB.


Here's something I was wondering about:

Suppose I've got some nasty DSP algos that depend on undefined behavior of type-punning in C. To address this I use the union member access trick which is supported by a gcc extension.

If my only target is web assembly, then what are the reason that I should care that my binary is actually a gcc dialect of c and not standard c?


That puts you pretty firmly in what I call the "semi-portable" category (I would say "unportable" but wasm is very likely going to have flavors, such as a 64 bit pointer variant).

But if you're being careful, you can do type-punning. The "modern" way is generally with memcpy, but union can be valid if done carefully. The clearest statement I was able to find of this is [1]. I find it frustrating that it's so hard to get a clear read on this. For example, if as Regehr says union-based punning is reliable, then what was the concern that the linked Linux kernel patch trying to address? I haven't dug into it in super detail.

[1] https://blog.regehr.org/archives/1307#comment-18418


IMO it’s pretty iffy and you should steer clear of it, since the rules around it seem to be murky. I think the actual rule is that it’s illegal in C++, and valid in C99 and C11 but only because it was included in a footer note as an afterthought in the standard.


From a standards perspective that’s pretty much right AFAIK. From a practical perspective, it’s also explicitly endorsed in the GCC manual, so I don’t think there’s much risk of popular compilers breaking it in the foreseeable future:

> The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


So what is the well-defined and performant way to fetch the exponent and mantissa of a double in standard C?


Use frexp from <math.h>.


The implementation for frexp uses type-punning.

Also, it adds a denormal and inf check which I don't need in this particular hot loop.


Does it? On my computer it's a builtin (__builtin_frexp). But even if it did use type-punning, that doesn't make it legal for your program to do it. I don't believe there's any requirements on the C standard library to follow the standard themselves.

To answer the second part of your response, the correct way to do this yourself would be to memcpy the double and then inspect the result directly. Any compiler worth its salt will end up generating the assembly code you want (namely, no call to memcpy, just a couple of arithmetic shifts and masks).


> To answer the second part of your response, the correct way to do this yourself would be to memcpy the double and then inspect the result directly. Any compiler worth its salt will end up generating the assembly code you want (namely, no call to memcpy, just a couple of arithmetic shifts and masks).

What does a compiler "worth its salt" do with the memcpy at "-O0"?


See for yourself: https://godbolt.org/z/nsg9nt


To be clear, the use case I have in mind has a soft realtime performance requirements.

So I only see two options:

1. Use memcpy, blindly hope that gcc optimizes it to the same bitmath I'm currently using, and blindly hope that gcc supports that optimized path for the foreseeable future.

2. Use the union type-punning trick, and blindly hope gcc supports their extension for the foreseeable future.

#1 above requires me to blindly hope twice, whereas #2 requires only one instance of blind hope. Furthermore, I already know how the blind hope of gcc optimizing memcpy into bitmath turns out-- gcc does not optimize it in this way.

So I cannot see any alternative to relying on the union type-punning trick in this case.

Edit: clarification


That article seems to imply that the only way out of this mess is to turn off strict aliasing, since easy type punning and other related low-level processing is literally one of the main reasons to use C. I agree completely.


> If my only target is web assembly, then what are the reason that I should care that my binary is actually a gcc dialect of c and not standard c?

How much do you trust the GCC implementors to a) maintain the behaviour that's relevant to your program b) maintain GCC at all? How much do you trust that the rest of the ecosystem will continue to support GCC and/or this GCC dialect?

Think of "the gcc dialect of c" the same way you'd think of any semi-obscure programming language, and ask the same questions you would: what kind of debuggers/profilers exist for it? How easy is it to hire people with experience in it? How much momentum is there behind it?


The obvious and uncontroversial answer is that there is no sane way this gcc extension can be dropped. Dropping the extension would mean there is no performant way to get at the bits of IEEE 754 floating point data in C (and perhaps other data types, but that's the one in my use case).


It's curious that this 30-year old thing continues to be such a big problem but there are no fixes in sight, while most C code continues to have UB bugs. And many programmers have attitudes along the lines of "that's just your opinion, man". (Eg the Linus vs signed overflow rants)

The T-shirt is good though :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: