Hacker News new | past | comments | ask | show | jobs | submit login
Signed Integers Are Two’s Complement (open-std.org)
158 points by scott_s on May 30, 2018 | hide | past | web | favorite | 123 comments

> Naïve overflow checks, which are often security-critical, often get eliminated by compilers. This leads to exploitable code when the intent was clearly not to and the code, while naïve, was correctly performing security checks for two’s complement integers.

This is the most critical aspect. We have enough trouble already without the compiler actually fighting against security because this would fail in a machine from the 70s

Why are you assuming "machine from the 70s"? I know modern processors (DSPs) that need to saturate on integer arithmetic in order to maintain correctness. If you want your naive overflow checks to work on your x86 project, why not just use a compiler option like -fwrapv?

IMO, this is just plain ignorance - people arguing against the standard, while believing only their favorite platform is significant. C code is still big in embedded, you can't just trash the standard like that.

I've used DSPs and embedded systems, those compilers have sufficient quirks already and usually don't follow the standard to the letter. I'm not worried about "signal" values, I'm worried about pointer arithmetic values.

x86 SIMD is saturating as well

> If you want your naive overflow checks to work on your x86 project, why not just use a compiler option like -fwrapv?

Fair enough. Or people can stop pretending that UB is just an excuse to throw your hands in the air and do whatever they want with the code. Including null checks.

Because when things blow up it's on the major platforms.

This got rejected in the next revision of this proposal. Naive overflow checks are still undefined.

If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined. The author had hoped to make this well-defined as wrapping (the operations produce the same value bits as for the corresponding unsigned type), but WG21 had strong resistance against this.

Who are the standards committee people having strong resistance against this, and what in the world is their argument?

Undefined doesn't mean incorrect; it's just the absence of a requirement. A compiler writer can add requirements locally (like "signed integer overflows have wrapping behavior") that are missing in the standard.

The standard is not a suitable babysitting tool for GCC maintainers, which is what I suspect is the motivation here.

This isn’t entirely untrue in theory, but in practice it is. Compilers optimize under the assumption that undefined behavior never occurs, and the standard is written with this assumption. Any conforming compiler is free to break code that relies on this.

Typically, behavior that reasonably may vary from compiler/machine is considered implementation defined, and not undefined.

> Any conforming compiler is free to break code that relies on this.

Ah, but, for example, __attribute__((packed)) is undefined behavior; is the compiler free to break that?

This "free to break" is a juvenile fiction based on the idea that the only document that applies is ISO C; there is no other contract or promise between user and implementor.

GCC documentation is another such contract, which provides -fwrapv. If you want -fwrapv, use -fwrapv.

I am not sure why people pick GCC in particular for this issue. LLVM does exactly the same.

Okay, then the proposal is already mostly useless.

Isn't the quoted part 180 degrees wrong? Such code was not "correctly performing security checks", since it was undefined behaviour - two's complement or not. Which was the whole problem.

Adding "assuming the compiler is not insane and uses -fwrap" would be a bit cumbersome.

The fact remains that the checks looks correct to anyone familiar with 2's complement, yet unfamiliar with the intricacies of the C and C++ standard, which are not low level, contrary to what they were being told at school.

The resistance of the committee about this very issue, as shown by revision 2 of this proposal¹, is enough to make me hope C and C++ will go the way of COBOL.

That said, 2's complement guarantees that converting everything to unsigned before performing an operation, then converting back afterwards, is guaranteed to produce the same results as -fwrap (except for division and modulo). Source to source transformation tools may help us bypass undefined behaviour without resorting to non-standard compiler flags, or implementation defined behaviour.


The checks were not correct for all possible representations. If the only representation were two’s complement, there could be fewer areas of undefined behavior, so simple and straightforward security checks would be much more likely to be correct.

(Unfortunately it sounds like this proposal has been revised to leave overflow behavior as undefined. That’s a biggie. Oh well)

> The checks were not correct for all possible representations.

This raises the question of whether the optimization was valid, given the target architecture. I am assuming that, technically speaking, it was, because that is what the standard allowed, but that view leaves unexamined the question of whether it is contrary to the purposes for having and using the C language in the first place.

I can imagine an argument pointing out that this optimization is applied at an abstract level of representation prior to code generation, and that it would be a violation of modularity to take into account the target architecture at that point. This, however, would be a point about the compiler architecture, which I think should, where practical, yield to concerns about the overall purpose and use of the compiler, where the principle of 'no (or minimal) surprises' is important.

Don't leave out the last part: ...was correctly performing security checks for two’s complement integers.

More pedantically, they could have written: ...was correctly performing security checks if the standard had been limited to two’s complement integers. But that was clearly the intent.

But the revised version is limited to twos complement and is still undefined, so such a check is still incorrect.

Okay, you may be right; I haven't looked through all diffs in detail (nor do I understand all intricacies of the standard).

This is a good example of why the aphorism 'technically right is the best kind of right' is problematical (of course, those who like it will point out that I am not technically correct...)

Regardless, the fact that someone wrote code that specified undefined behavior and got what they asked for instead of what they wrote is not the whole problem. Unless this outcome is what the standards committee wanted (in which case we have a different problem), then it is very reasonable to ask the question of whether we are making staying within defined behavior unduly difficult through rules that refuse to treat nearly hypothetical circumstances as special cases.

The whole problem is in fact this.

* The language standards came from a time where there was no standard (de facto, that is) for signed integer arithmetic across instruction architectures. Bear in mind that many people involved in standardization (rightly) want to standardize what is in actual practice in the world. If the world hasn't settled on one thing, it is difficult to standardize. (It's why the system administration parts of Unix were not addressed by IEEE 1003.1, for example. There were a whole lot of significantly different ways in which system administration was done.)

* Programmers were coding "knowing" that 2s-complement arithmetic led to certain tricks for detecting overflow and other sorts of bit twiddling (https://news.ycombinator.com/item?id=17044546); "knowing" that their processor architectures were 2s-complement; and "knowing" that compilers naively just translated straight to the arithmetic machine instructions of the target architecture.

* Compiler implementors were writing compilers knowing that programmers did not in fact have these guarantees, and implementing their optimizers as if the target processor architectures were not 2s-complement (in particular, as if integers had infinite bits); even when the actual machine code generation parts of their compilers were designed with the knowledge that the target processor architecture was 2s-complement.

The whole problem is that this is a mess that does not hang together.

There are several ways out of it. One is to make Sean Eron Anderson's life a living hell (https://graphics.stanford.edu/~seander/bithacks.html), and attempt to stamp out every piece of samizdat doco and programmer folklore that circulates these tricks, or at least make every one of them carry a lengthy "health warning" that the world is not, in fact, guaranteed to provide 2s-complement arithmetic to programmers. Another is to give in and say that the heretofore unwarranted assumptions by the programmers are now in fact supported, and that the compiler implementors have to change their now invalid compiler designs.

A third is to do part of each, by accepting and legitimizing the programmer folklore to an extent, but realizing that programmers often "know" quite the opposite case and assume that they are not using 2s-complement arithmetic. Where one programmer can be surprised to find that (x + 1) > (x) is always true because on the 2s-complement architecture that xe expects it isn't; another programmer can be surprised to find that ((x * 2) / 2) == (x) is not always true because in elementary school arithmetic multiplication by 2 is the inverse of division by 2, and be further surprised that (say) some deep nesting of macros that results in such things doesn't reduce to a no-op.

By all means, make the change for the new version of the language. In the meantime, though, why are people assuming two's complement?!

General knowledge. Have you read the entire standard? I haven't, but I know my machine uses two's complement internally, and C and C++ are generally supposed to be close to the bare metal.

The link is an outdated version r0, this is r2 (don't know if it's the latest) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...

The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.

Gah, what is the point then?

The point is that signed integers are two's complement. After all, it is the title.

Naive overflow checks are often done by adding two positive numbers together and checking if the result is negative. If wrapping is undefined, this no longer works, and the overflow check may be eliminated.

And the joy I originally had is once again lost.

You can always get the most up-to-date public version using wg21.link/pxxxx. So for this proposal: https://wg21.link/p0907

In April, the author of this proposal said on Twitter that the C++ standards committee agreed to this proposal for C++20: https://twitter.com/jfbastien/status/989242576598327296?lang...

This proposal, but not this revision. (Tweets are careful about this.) In particular, signed integer overflow is still undefined.

Glad to hear! I enjoy standards documents which result in a net decrease of a standard due to increasing simplicity.

> Overflow in the positive direction shall wrap around

This appears to be defining signed integer overflow semantics, which prevents the compiler from doing certain basic optimizations, for example, that (x*2)/2 == x. Is that part of this? Has anyone measured the perf cost on a real program?

It prevents risky optimisations; this now requires the compiler to prove that such optimisations won't change the semantics of the code, e.g. in your case by essentially proving that the high 2 bits of x (only 1 in the unsigned case, due to sign-extension) will never be set.

...and it could be argued that if the compiler couldn't prove that was true, then it just helped you find a possible overflow bug in the code. If you actually wanted the compiler to unconditionally assume (x*2)/2==x and optimise accordingly, then you'd have to tell it such; e.g. MSVC has the __analysis_assume() feature.

The compiler could help you more by emitting trapping arithmetic. Code that triggered signed overflow will probably not be "fixed" by this change alone.

For example consider the classic binary search blunder: `mid = (low + high)/2`. Defining signed overflow may avoid the UB in computing mid, but now we have a surprise negative value and it's easy to guess what happens next.

It will be fun to see the trophies from this: perf regressions, bugs exposed, bugs fixed.

This is already an option: `-ftrapv` or `-fsanitize=signed-integer-overflow`.

I find it amusing (and somewhat frustrating) that people who complain about risks from optimizations typically exhibit a lack of awareness of the tools that compilers already provide to diagnose related bugs.

However it also prevents "abort on overflow" implementations from being conforming, which look like a much better way of finding actual overflow bugs in the code.

Do such implementations currntly only abort on signed overflow and not on unsigned overflow? Aborting on unsigned overflow is currently not conforming, is it?

It is not conforming, which is why `-fsanitize=unsigned-integer-overflow` is not enabled by default by ubsan. However it is available if you want to try it.

That's a question of static vs. dynamic analysis for bug detection.

Which one(s) is/are most useful depends on the user's needs.

Static analysis was always possible, dynamic isn't if you want to remain standard-compliant and overflow is defined.

That being said I think it's a rather weak justification for making overflow UB, after all if you want to trap on overflows wouldn't want to catch unsigned overflow as well?

The link is revision 0. Revision 1 reverted defining signed integer overflow. In revision 1, signed integer overflow is still undefined, precisely for this reason.

For some examples of the types of optimizations enabled by allowing overflow to be undefined: https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...

> for example, that (x&~1*2)/2 == x

Fixed that for you.


If x is 3, then (x * 2) / 2 is also equal to 3 (as per the GP) but (x & ~1 * 2) / 2 is equal to 0.

(If you meant ((x & ~1) * 2) / 2, then that is equal to 2).

You are of course correct. Not only have I mistook precedence, but I confused the lower bit with the upper one where overflow occurs.

So the compiler optimizable fix is something like: ((x & INT_MAX) * 2) / 2 = x

  ((x & (INT_MAX / 2)) * 2) / 2


My version works up to x = INT_MAX and your version still fails for negative ints, so I don't see the benefit of restricting the range. Getting closer :)

I like the idea of forbidding signed integer representations other than 2's complement, as it is de facto standard, pretty much nobody makes CPUs with non-standard integer representations, partly due to C programs assuming 2's complement integer representation.

What I don't like about this proposal is defining signed integer overflow as 2's complement wrapping. Yes, yes, I know undefined behaviour is evil, and programs wouldn't become much slower. However, if a program has signed overflow, it's likely a bug anyway. Defining signed overflow as 2's complement wrapping would mean not allowing other behaviours, in particular, trapping. On architectures with optional overflow traps (most architectures not called x86 or ARM), trapping would be much more preferable to quiet bugs. Meanwhile, while it is undefined behaviour, the implementation would be still free to define it, for instance in GCC it can be done with `-fwrapv`.

> However, if a program has signed overflow, it's likely a bug anyway.

There are programs that check for overflow after the fact. Is that a bug?

No, but since most overflows are bugs, I think the right solution is to standardize something like GCC's __builtin_add_overflow for overflow checks. Rust does this.

Seems like this was changed in more recent revisions, according to another comment:

The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior.

The real issue isn't that C doesn't have a standard int overflow, but that it's undefined.

What they could have done is made it implementation defined, like sizeof(int), which depends on the implementation (hardware) but on the other hand isn't undefined behavior (so on x86/amd4 sizeof(int) will always be equal to 4).

It's undefined for a reason.

  size_t size = unreasonable large number;
  char buf = malloc (size);
  char *mid = buf + size / 2;
  int index = 0;
  for (size_t x = 0; x < big number; x++) mid[index++] = x;
A common optimization by a compiler is to introduce a temporary

  char *temp = mid + index;
prior to the loop and then replace the body of the loop with

  *(temp++) = x;
If the compiler has to worry about integer overflow, this optimization is not valid.

(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)

Are you sure this was the intent of the standard writers back in the midlate 80s and not something that modern compilers just happened to take advantage of? I'd really expect it to be the former.

Integer overflow is certainly not undefined for this reason.

It's undefined because in the majority of situations, it is the result of a bug, and the actual value (such as a wrapped value) is unexpected and causes a problem.

For instance, oh, the Y2038 problem with 32 bit time_t.

>It's undefined because in the majority of situations, it is the result of a bug,

1. If it's a bug, it should overflow or crash (implementation defined, not undefined), or do what Rust does, crash on -o0 (or, if it's illegal to change defined behavior based on optimization level, create a --crash-on-overflow flag) and overflow on everything else.

2. There is plenty of code where it's intentional (such as the infamous if(a+5<a)).

You meant

    char * buf = malloc(size);
You dropped an asterisk. Since changing pointers returned by malloc() is a bad idea, I'd make it:

    char * const buf = malloc(size);

This is only useful if buf is involved in some preprocessor macrology which perpetrates a hidden mutation of buf.

   BIG_MACRO(x, y, z, buf); // error!
the programmer is informed that, to his or her surprise, BIG_MACRO mutates buf and can take appropriate corrective action.

It's also useful in C++, since innocent-looking function calls can steal mutable references:

   cplusplusfun(x, y, z, buf); // error: arg 4 is non-const ref
No such thing in C, though; function calls are pure pass-by-value.

Changing pointers returned by malloc is sometimes done:

   if ((newptr = realloc(buf, newsize)) != 0)
     buf = newptr;
In my experience, C code doesn't use const for anywhere near all of the local variables which could be so qualified.

If you enact a coding convention that all unchanged variables must be const, the programmers will just get used to a habit of removing the const whenever they find it convenient to introduce a mutation to a variable. "Oh, crap, error: x wasn't assigned anywhere before so it was const according to our coding convention. Must remove const, recompile; there we go!"

If you want to actually enforce such a convention of adding const, you need help from the compiler: a diagnostic like "foo.c: 123: variable x not mutated; suggest const qualifier".

I've never seen such a diagnostic; do you know of any compiler which has this?

I think that the average C module would spew reams of these diagnostics.

> If the compiler has to worry about integer overflow, this optimization is not valid.

I'm sure it's still possible to come up with an optimization that takes into account signed-ness, and doesn't give in to performance or code-size much.

size_t is unsigned, overflow is defined.

The type of index is, however, signed int.

You're right, I read diagonally :)

However, the optimization argument for signed overflow seems weird to me, because I can't see any reason why this argument would not apply to unsigned overflow as well.

If we keep undefined behavior to optimize things like "if (n < n + 1)" when n is signed, why not do the same when n is unsigned?

Conversely, if there is a good reason not to, then why would it not apply to signed overflow as well?

This case is not worth optimising, because the index should be size_t just like the original size. Then the compiler knows it won't overflow, and doesn't have to check.

And, the fix is easy: just use types of the same width for the counter and the boundary. Using a narrower counter is just begging for errors to happen. This is not a good coding style, and there is no point in having the compiler condoning it.

Compiling it and making it run? Sure. Bending over backwards to ensure it runs fast? Hell no.

Just a nitpick. Implementation is about the particular compiler and runtime (stdlib) implementation, not the hardware. Hardware is the platform hosting the implementation (this are ISO C-standard defined terms).

A compiler targeting x86 platform can implement sizeof int == 8, or whatever it pleases, as far as C std is concerned.

In practice compilers dont get creative about this. But there are real world cases where stuff is different, for example: http://www.unix.org/version2/whatsnew/lp64_wp.html

The modern case for keeping signed overflow as UB is that it unlocks compiler optimizations. For example, it allows compilers to assume that `x+1>x`.

If implementations are forced to define signed overflow, then these optimizations are necessarily lost. So implementation-defined is effectively the same as fully-defined.

I suppose the question is, which of these optimisations are actually useful for the compiler to do automatically? Yours is the example that's always thrown about, but it always seems like the kind of optimisation that the programmer should be responsible for.

> on x86/amd4 sizeof(int) will always be equal to 4

Nothing is stopping your C compiler from making the guarantee sizeof(int)=4 on x86/amd64.

I think you are in agreement with the comment you are replying to.

The comment suggested the standard make it implementation defined rather than undefined. There's not a meaningful difference here.

Even today, an implementation may define unsigned overflow.

Yes, there is. Implementation defined means that a conforming implementation _must_ document its behavior.

That means that programmers don’t have to use trial and error to figure out how the compiler behaves and don’t have to _hope_ they found all the corner cases.

And that is how we get #if defined(_THIS_THING_SOME_COMPILER_DEFINES) && !defined(__BUT_NOT_THIS_ONE_THAT_COMPILER_X_DEFINES) soup ;)

Better than than silently ignoring an if guard preventing an overflow, and then overflowing anyways on addition.

Oh, I see, I wonder if greenhouse_gas is suggesting a feature similar to sizeof() that can be used to portably adapt your program's design to the target's overflow capability.

C language lawyer in training: sizeof is not a function.

The parentheses are part of the operand and only needed for type names, to make them into cast expressions.

Getting rid of this useless (crap #!§$§$§$) legacy stuff was overdue, so i am very happy to see it done. I personally think it is _the_ most important proposal for C++20, since it will remove a lot of pointless pressure from secure coding attempts and in turn make the world a little bit more secure.

I don't see how. Integer overflows still can be security issues even if they wrap.

They can be security issues since the compiler is allowed to optimize stuff. The compiler can check, that some checks the user added "don't make sense" since those only would be hit if something undefined happens. An example is shown in https://www.tripwire.com/state-of-security/vulnerability-man... but there are many more.

No, they can be security issues in that the program expected x + 1 to be a value bigger than x, but it is suddenly a big negative value.

The fact that this behavior is now blessed by ISO C makes no difference to it being wrong, and causing some security issue in the program.

There will always be unavoidable issues, since C/C++ is a system language, designed to survive under tight performance pressure. The point here is, to remove pointless obstacles.

Alas, nope. Later revisions of this proposal still have undefined signed overflow. We still need -fwrap for the easy overflow checks.

I'm curious why some old architectures didn't use two's complement for signed numbers. What advantage did one's complement or signed magnitude have over two's complement?

Two's complement has the bizarre property of being asymmetric about zero. So things like `abs` can overflow, among several other oddities. It's not unambiguously better.

With one's complement it is easier to multiply by minus one: just invert all bits. It is also symmetrical around the zero, so sequences of random numbers will truly tend to average to zero.

It can be useful to distinguish between positive and negative zero in some cases, for example when dealing with values that have been rounded to zero or limits approaching zero.

That's true for limits approaching any number, so if that's important you'll need more than negative zero.

Was that ever really a reason for signed magnitude, or did people just make use of the 2nd representation of zero because because it was available and they couldn't be bothered putting that information in another variable or using floating point or fixed point, or anything else that would have achieved the same result?

I have a feeling signed-magnitude predates binary and complement arithmetic --- it is, after all, the "natural" way humans work with numbers. A lot of the early non-binary computers used some form of sign-magnitude, all the way back to punch card formats:


On the other hand (no pun intended), early mechanical (decimal) manual adding machines made use of complement arithmetic too:



Burroughs 5xxx and 6xxx machines used signed-magnitude.

Burroughs had a unique numeric representation. Numbers were 48 bits. Sign, sign of exponent, exponent, mantissa, with the binary point at the low end. Integers were thus valid floating point numbers. The math operations would maintain a value as an integer, with a zero exponent, if possible.

IEEE floating point also maintains integer values as integers until they don't fit, but the representation is not integer-like.

TIL about signed overpunch

Except the "natural" way also recognizes a single zero with no sign, so it's still not accurately modeling that.

If you wanted to model natural arithmetic accurately you'd need 2 bits for the sign (positive, negative, unsigned). At that point, all of single bit signed magnitude, and complements are compromises.

The big one (for me) is that it's really annoying having one more negative value than positive value.

Most software doesn't handle this properly, they don't realise abs doesn't always return a positive number (as abs(INT_MIN)=INT_MIN), and many other similar problems.

In an ideal world, I would only use unsigned when you care about things like being able to use all bit representations, then have made the all-1s number something like NaN, for ints.

In addition to what's mentioned in the already great sibling comments, it's worth noting that IEEE floating point is signed-magnitude.

Sign-magnitude for the significand, and offset-binary for the exponent. The reason for this odd combination is probably historical.

It's done so that the bits will compare the same way whether treated as float or int. (modulo NaNs and stuff)

That is only true for non-negative floating point numbers. It's still useful though.

Interestingly, posits as originally proposed do have this property (except for infinity).

Would have been nice if IEEE stored the complement sign.

It's also practical for hw implementation and has other nice qualities. E.g. comparisons and sorting are easy. Radix sort works for floating point (with some bit magic). Terms and conditions may apply (oddities w.r.t. INF, Nan, etc).

It doesn't seem odd to me. When describing things in nature, (+x, -x) tend to have more symmetry than (x, 1/x).

Is there a good way to represent floating points in order to do complement arithmetic?

JS is IEEE754, not a (one's or two's) complement-based representation.

Ah I misunderstood the question.

If you know a bit about the range you need to support you could use a fixed point representation

Don't forget negabinary! https://en.m.wikipedia.org/wiki/Negative_base

Negabinary operations are extremely simple and elegant. Like 2s complement and 1s complement, it suffers from asymmetry in its range, though even more so.

I always felt like negative bases are just strange enough, yet just practical enough, that they almost could have arisen as a system of numbers in a natural language. For example, phrasing 11 as 191, “one more than 90 less than 100”, would be unusual for such a small number but definitely sounds “naturalistic”, like phrasing 1990 as “a thousand, a hundred less than a thousand, ten less than a hundred” in Roman numerals, 99 as “four-twenty ten-nine” in French, or 9 as “five four” in Khmer.

That document has this listed as a Change:

Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ⁿ.

...but that's not a change - it's been the case in C all along.

There seems to be an error in this proposal:

"Change Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2N."

This is no change, since we have that already, e.g. see https://en.cppreference.com/w/cpp/language/implicit_conversi... and the conversion operation on the bit pattern is the identity for two's complement representation. The relevant section in the latest C++ standard is: 4.8 Integral conversions [conv.integral] 1 A prvalue of an integer type can be converted to a prvalue of another integer type. ... 2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

Therefore the inverse conversion exists and is the identity as well, this is what should be sanctioned.

FYI you can prevent all of non-2 complement problems with -fwrapv which forces 2' complement wrapping math in gcc/clang/icc

Also relevant is HAKMEM item 154 in which Bill Gosper concluded that the universe is two’s complement: http://catb.org/jargon/html/H/HAKMEM.html

Quick question: in the proposed rewording of intro.execution¶8, why is the following rewriting “((a + b) + 32765)” not reintegrated at the end of the untouched text? Have I misunderstood that with two's complement this would be legal?

have they considered introducing new types for wrapping integers, checked integers and saturating integers. i understand why they might not want to make a change that could have a large effect on existing programs. but if you introduce new types then the new types will only effect new programs that choose to use them and this seems to be something that could be a library change than a language change.

Requiring two's complement just means you can't have a sensible C language on some sign-magnitude machine.

Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.

A language spec can provide a more detailed two's complement model with certain behaviors being defined that only make sense on two's complement machines, without tossing other machines out the window.

There could be a separate spec for a detailed two's complement model. That could be an independent document. (Analogy: IEEE floating-point.) Or it could be an optional section in ISO C.

Two's complement has some nice properties, but isn't nice in other regards. Multi-precision integer libraries tend to use sign-magnitude, for good reasons.

What I suspect is going on here is that someone is unhappy with what is going on in GCC development, and thinks ISO C is the suitable babysitting tool. (Perhaps a reasonable assumption, if those people won't listen to anything that doesn't come from ISO C.)

>> Even if nobody cares about such a machine, nothing is achieved other than perhaps simplifying a spec.

No, I use 16bit values to represent angles in embedded systems all the time. I routinely expect arithmetic on these values to roll over as 2's complement and I expect to take differences of angles using 2's complement all the time. I'm fully aware that this is undefined behavior and needs to be verified on each compiler/processor combination. It has always worked and yet it's undefined behavior. It would be nice for it to be defined. There are no modern machines that would be impacted by this.

> It would be nice for it to be defined.

Your compiler implementors can do that in their documentation; it doesn't have to be pushed into the standard.

There are reasons for it being regarded as not nice to define something like that or make it some compiler option or pragma and whatever. Overflow is in fact an error in many situations, because it can happen unexpectedly; it's useful for the compiler or machine to trap overflows.

What I was referring to in my above remark is mainly the removal of support for sign-magnitude; if you read my response more carefully you will see that I favor ways of making the behavior defined without sacrificing things.

Anyway, you can use unsigned arithmetic instead to do portable two's complement. Unsigned integers have the required roll-over behavior.

Some 28 years ago I made an emulator for the MC68000 processor. I used unsigned 32 bit integers for all the arithmetic, including the signed operations. E.g. the difference between a signed and unsigned addition was only how the flag are calculated, like Z, X and C.

Yes, it is annoying to rrad comments that assume overflow is always a programming error.

You can store the angle as a union of signed and unsigned type. Do arithmetic on the unsigned member, where overflow is defined. (Both members are equivalent angles)

Or you could just convert from the unsigned to the signed type, rather than dragging unions into it.

This conversion doesn't have undefined behavior; it produces an implementation defined result.

C programs can simulate two's complement math using unsigned types, avoiding UB. Then rely on IB to convert between signed and unsigned.

Not always, but overflows are usually bugs. This is replicated finding of dynamic overflow checking tools, over and over.

And if overflow is well-defined, then those tools must switch your C dialect to a non-ISO-C conforming one in order to do their job.

The worst case is that there would not be an ISO C for such machines. As they are very unusual, this does not strike me as a big deal, and definitely less of an issue than making it easier to avoid invoking undefined behavior.

I take your point about the possible motives behind this proposal, which seem quite plausible.

But gutting support for sign-magnitude machines has nothing to do with making certain two's complement behaviors defined.

It's like saying we have to drop USB 1.0 support in an OS in order to fix missing features in the Bluetooth stack.

I don't think that analogy works well, because if there was a dependency in standards that led to this outcome, would it not be better to break the dependency going forward, or to have avoided it in the first place?

And not having an ISO C standard for sign-magnitude machines (which is not a necessary consequence of the proposed change, it is just the worst case, depending on how ISO chose to deal with the consequences for such machines) does not necessarily force an end to actual C support for them.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact