This is the most critical aspect. We have enough trouble already without the compiler actually fighting against security because this would fail in a machine from the 70s
IMO, this is just plain ignorance - people arguing against the standard, while believing only their favorite platform is significant. C code is still big in embedded, you can't just trash the standard like that.
x86 SIMD is saturating as well
> If you want your naive overflow checks to work on your x86 project, why not just use a compiler option like -fwrapv?
Fair enough. Or people can stop pretending that UB is just an excuse to throw your hands in the air and do whatever they want with the code. Including null checks.
Because when things blow up it's on the major platforms.
Who are the standards committee people having strong resistance against this, and what in the world is their argument?
The standard is not a suitable babysitting tool for GCC maintainers, which is what I suspect is the motivation here.
Typically, behavior that reasonably may vary from compiler/machine is considered implementation defined, and not undefined.
Ah, but, for example, __attribute__((packed)) is undefined behavior; is the compiler free to break that?
This "free to break" is a juvenile fiction based on the idea that the only document that applies is ISO C; there is no other contract or promise between user and implementor.
The fact remains that the checks looks correct to anyone familiar with 2's complement, yet unfamiliar with the intricacies of the C and C++ standard, which are not low level, contrary to what they were being told at school.
The resistance of the committee about this very issue, as shown by revision 2 of this proposal¹, is enough to make me hope C and C++ will go the way of COBOL.
That said, 2's complement guarantees that converting everything to unsigned before performing an operation, then converting back afterwards, is guaranteed to produce the same results as -fwrap (except for division and modulo). Source to source transformation tools may help us bypass undefined behaviour without resorting to non-standard compiler flags, or implementation defined behaviour.
(Unfortunately it sounds like this proposal has been revised to leave overflow behavior as undefined. That’s a biggie. Oh well)
This raises the question of whether the optimization was valid, given the target architecture. I am assuming that, technically speaking, it was, because that is what the standard allowed, but that view leaves unexamined the question of whether it is contrary to the purposes for having and using the C language in the first place.
I can imagine an argument pointing out that this optimization is applied at an abstract level of representation prior to code generation, and that it would be a violation of modularity to take into account the target architecture at that point. This, however, would be a point about the compiler architecture, which I think should, where practical, yield to concerns about the overall purpose and use of the compiler, where the principle of 'no (or minimal) surprises' is important.
More pedantically, they could have written: ...was correctly performing security checks if the standard had been limited to two’s complement integers. But that was clearly the intent.
Regardless, the fact that someone wrote code that specified undefined behavior and got what they asked for instead of what they wrote is not the whole problem. Unless this outcome is what the standards committee wanted (in which case we have a different problem), then it is very reasonable to ask the question of whether we are making staying within defined behavior unduly difficult through rules that refuse to treat nearly hypothetical circumstances as special cases.
* The language standards came from a time where there was no standard (de facto, that is) for signed integer arithmetic across instruction architectures. Bear in mind that many people involved in standardization (rightly) want to standardize what is in actual practice in the world. If the world hasn't settled on one thing, it is difficult to standardize. (It's why the system administration parts of Unix were not addressed by IEEE 1003.1, for example. There were a whole lot of significantly different ways in which system administration was done.)
* Programmers were coding "knowing" that 2s-complement arithmetic led to certain tricks for detecting overflow and other sorts of bit twiddling (https://news.ycombinator.com/item?id=17044546); "knowing" that their processor architectures were 2s-complement; and "knowing" that compilers naively just translated straight to the arithmetic machine instructions of the target architecture.
* Compiler implementors were writing compilers knowing that programmers did not in fact have these guarantees, and implementing their optimizers as if the target processor architectures were not 2s-complement (in particular, as if integers had infinite bits); even when the actual machine code generation parts of their compilers were designed with the knowledge that the target processor architecture was 2s-complement.
The whole problem is that this is a mess that does not hang together.
There are several ways out of it. One is to make Sean Eron Anderson's life a living hell (https://graphics.stanford.edu/~seander/bithacks.html), and attempt to stamp out every piece of samizdat doco and programmer folklore that circulates these tricks, or at least make every one of them carry a lengthy "health warning" that the world is not, in fact, guaranteed to provide 2s-complement arithmetic to programmers. Another is to give in and say that the heretofore unwarranted assumptions by the programmers are now in fact supported, and that the compiler implementors have to change their now invalid compiler designs.
A third is to do part of each, by accepting and legitimizing the programmer folklore to an extent, but realizing that programmers often "know" quite the opposite case and assume that they are not using 2s-complement arithmetic. Where one programmer can be surprised to find that (x + 1) > (x) is always true because on the 2s-complement architecture that xe expects it isn't; another programmer can be surprised to find that ((x * 2) / 2) == (x) is not always true because in elementary school arithmetic multiplication by 2 is the inverse of division by 2, and be further surprised that (say) some deep nesting of macros that results in such things doesn't reduce to a no-op.
This appears to be defining signed integer overflow semantics, which prevents the compiler from doing certain basic optimizations, for example, that (x*2)/2 == x. Is that part of this? Has anyone measured the perf cost on a real program?
...and it could be argued that if the compiler couldn't prove that was true, then it just helped you find a possible overflow bug in the code. If you actually wanted the compiler to unconditionally assume (x*2)/2==x and optimise accordingly, then you'd have to tell it such; e.g. MSVC has the __analysis_assume() feature.
For example consider the classic binary search blunder: `mid = (low + high)/2`. Defining signed overflow may avoid the UB in computing mid, but now we have a surprise negative value and it's easy to guess what happens next.
It will be fun to see the trophies from this: perf regressions, bugs exposed, bugs fixed.
I find it amusing (and somewhat frustrating) that people who complain about risks from optimizations typically exhibit a lack of awareness of the tools that compilers already provide to diagnose related bugs.
Which one(s) is/are most useful depends on the user's needs.
That being said I think it's a rather weak justification for making overflow UB, after all if you want to trap on overflows wouldn't want to catch unsigned overflow as well?
Fixed that for you.
If x is 3, then (x * 2) / 2 is also equal to 3 (as per the GP) but (x & ~1 * 2) / 2 is equal to 0.
(If you meant ((x & ~1) * 2) / 2, then that is equal to 2).
So the compiler optimizable fix is something like:
((x & INT_MAX) * 2) / 2 = x
((x & (INT_MAX / 2)) * 2) / 2
What I don't like about this proposal is defining signed integer overflow as 2's complement wrapping. Yes, yes, I know undefined behaviour is evil, and programs wouldn't become much slower. However, if a program has signed overflow, it's likely a bug anyway. Defining signed overflow as 2's complement wrapping would mean not allowing other behaviours, in particular, trapping. On architectures with optional overflow traps (most architectures not called x86 or ARM), trapping would be much more preferable to quiet bugs. Meanwhile, while it is undefined behaviour, the implementation would be still free to define it, for instance in GCC it can be done with `-fwrapv`.
There are programs that check for overflow after the fact. Is that a bug?
The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.
What they could have done is made it implementation defined, like sizeof(int), which depends on the implementation (hardware) but on the other hand isn't undefined behavior (so on x86/amd4 sizeof(int) will always be equal to 4).
size_t size = unreasonable large number;
char buf = malloc (size);
char *mid = buf + size / 2;
int index = 0;
for (size_t x = 0; x < big number; x++) mid[index++] = x;
char *temp = mid + index;
*(temp++) = x;
(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)
It's undefined because in the majority of situations, it is the result of a bug, and the actual value (such as a wrapped value) is unexpected and causes a problem.
For instance, oh, the Y2038 problem with 32 bit time_t.
1. If it's a bug, it should overflow or crash (implementation defined, not undefined), or do what Rust does, crash on -o0 (or, if it's illegal to change defined behavior based on optimization level, create a --crash-on-overflow flag) and overflow on everything else.
2. There is plenty of code where it's intentional (such as the infamous if(a+5<a)).
char * buf = malloc(size);
char * const buf = malloc(size);
BIG_MACRO(x, y, z, buf); // error!
It's also useful in C++, since innocent-looking function calls can steal mutable references:
cplusplusfun(x, y, z, buf); // error: arg 4 is non-const ref
Changing pointers returned by malloc is sometimes done:
if ((newptr = realloc(buf, newsize)) != 0)
buf = newptr;
If you enact a coding convention that all unchanged variables must be const, the programmers will just get used to a habit of removing the const whenever they find it convenient to introduce a mutation to a variable. "Oh, crap, error: x wasn't assigned anywhere before so it was const according to our coding convention. Must remove const, recompile; there we go!"
If you want to actually enforce such a convention of adding const, you need help from the compiler: a diagnostic like "foo.c: 123: variable x not mutated; suggest const qualifier".
I've never seen such a diagnostic; do you know of any compiler which has this?
I think that the average C module would spew reams of these diagnostics.
I'm sure it's still possible to come up with an optimization that takes into account signed-ness, and doesn't give in to performance or code-size much.
However, the optimization argument for signed overflow seems weird to me, because I can't see any reason why this argument would not apply to unsigned overflow as well.
If we keep undefined behavior to optimize things like "if (n < n + 1)" when n is signed, why not do the same when n is unsigned?
Conversely, if there is a good reason not to, then why would it not apply to signed overflow as well?
Compiling it and making it run? Sure. Bending over backwards to ensure it runs fast? Hell no.
A compiler targeting x86 platform can implement sizeof int == 8, or whatever it pleases, as far as C std is concerned.
In practice compilers dont get creative about this. But there are real world cases where stuff is different, for example: http://www.unix.org/version2/whatsnew/lp64_wp.html
If implementations are forced to define signed overflow, then these optimizations are necessarily lost. So implementation-defined is effectively the same as fully-defined.
Nothing is stopping your C compiler from making the guarantee sizeof(int)=4 on x86/amd64.
Even today, an implementation may define unsigned overflow.
That means that programmers don’t have to use trial and error to figure out how the compiler behaves and don’t have to _hope_ they found all the corner cases.
The parentheses are part of the operand and only needed for type names, to make them into cast expressions.
The fact that this behavior is now blessed by ISO C makes no difference to it being wrong, and causing some security issue in the program.
On the other hand (no pun intended), early mechanical (decimal) manual adding machines made use of complement arithmetic too:
Burroughs had a unique numeric representation. Numbers were 48 bits. Sign, sign of exponent, exponent, mantissa, with the binary point at the low end. Integers were thus valid floating point numbers. The math operations would maintain a value as an integer, with a zero exponent, if possible.
IEEE floating point also maintains integer values as integers until they don't fit, but the representation is not integer-like.
Except the "natural" way also recognizes a single zero with no sign, so it's still not accurately modeling that.
If you wanted to model natural arithmetic accurately you'd need 2 bits for the sign (positive, negative, unsigned). At that point, all of single bit signed magnitude, and complements are compromises.
Most software doesn't handle this properly, they don't realise abs doesn't always return a positive number (as abs(INT_MIN)=INT_MIN), and many other similar problems.
In an ideal world, I would only use unsigned when you care about things like being able to use all bit representations, then have made the all-1s number something like NaN, for ints.
Interestingly, posits as originally proposed do have this property (except for infinity).
Negabinary operations are extremely simple and elegant. Like 2s complement and 1s complement, it suffers from asymmetry in its range, though even more so.
Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ⁿ.
...but that's not a change - it's been the case in C all along.
"Change Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2N."
This is no change, since we have that already, e.g. see https://en.cppreference.com/w/cpp/language/implicit_conversi... and the conversion operation on the bit pattern is the identity for two's complement representation. The relevant section in the latest C++ standard is:
4.8 Integral conversions [conv.integral]
1 A prvalue of an integer type can be converted to a prvalue of another integer type. ...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2 n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is
no truncation). — end note ]
Therefore the inverse conversion exists and is the identity as well, this is what should be sanctioned.
Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.
A language spec can provide a more detailed two's complement model with certain behaviors being defined that only make sense on two's complement machines, without tossing other machines out the window.
There could be a separate spec for a detailed two's complement model. That could be an independent document. (Analogy: IEEE floating-point.) Or it could be an optional section in ISO C.
Two's complement has some nice properties, but isn't nice in other regards. Multi-precision integer libraries tend to use sign-magnitude, for good reasons.
What I suspect is going on here is that someone is unhappy with what is going on in GCC development, and thinks ISO C is the suitable babysitting tool. (Perhaps a reasonable assumption, if those people won't listen to anything that doesn't come from ISO C.)
No, I use 16bit values to represent angles in embedded systems all the time. I routinely expect arithmetic on these values to roll over as 2's complement and I expect to take differences of angles using 2's complement all the time. I'm fully aware that this is undefined behavior and needs to be verified on each compiler/processor combination. It has always worked and yet it's undefined behavior. It would be nice for it to be defined. There are no modern machines that would be impacted by this.
Your compiler implementors can do that in their documentation; it doesn't have to be pushed into the standard.
There are reasons for it being regarded as not nice to define something like that or make it some compiler option or pragma and whatever. Overflow is in fact an error in many situations, because it can happen unexpectedly; it's useful for the compiler or machine to trap overflows.
What I was referring to in my above remark is mainly the removal of support for sign-magnitude; if you read my response more carefully you will see that I favor ways of making the behavior defined without sacrificing things.
Anyway, you can use unsigned arithmetic instead to do portable two's complement. Unsigned integers have the required roll-over behavior.
Some 28 years ago I made an emulator for the MC68000 processor. I used unsigned 32 bit integers for all the arithmetic, including the signed operations. E.g. the difference between a signed and unsigned addition was only how the flag are calculated, like Z, X and C.
You can store the angle as a union of signed and unsigned type. Do arithmetic on the unsigned member, where overflow is defined. (Both members are equivalent angles)
This conversion doesn't have undefined behavior; it produces an implementation defined result.
C programs can simulate two's complement math using unsigned types, avoiding UB. Then rely on IB to convert between signed and unsigned.
I take your point about the possible motives behind this proposal, which seem quite plausible.
It's like saying we have to drop USB 1.0 support in an OS in order to fix missing features in the Bluetooth stack.
And not having an ISO C standard for sign-magnitude machines (which is not a necessary consequence of the proposed change, it is just the worst case, depending on how ISO chose to deal with the consequences for such machines) does not necessarily force an end to actual C support for them.