
Signed Integers Are Two’s Complement - scott_s
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r0.html
======
raverbashing
> Naïve overflow checks, which are often security-critical, often get
> eliminated by compilers. This leads to exploitable code when the intent was
> clearly not to and the code, while naïve, was correctly performing security
> checks for two’s complement integers.

This is the most critical aspect. We have enough trouble already without the
compiler _actually fighting against security_ because this would fail in a
machine from the 70s

~~~
fulafel
Isn't the quoted part 180 degrees wrong? Such code was _not_ "correctly
performing security checks", since it was undefined behaviour - two's
complement or not. Which was the whole problem.

~~~
iainmerrick
The checks were not correct for all possible representations. If the only
representation were two’s complement, there could be fewer areas of undefined
behavior, so simple and straightforward security checks would be much more
likely to be correct.

(Unfortunately it sounds like this proposal has been revised to leave overflow
behavior as undefined. That’s a biggie. Oh well)

~~~
mannykannot
> The checks were not correct for all possible representations.

This raises the question of whether the optimization was valid, given the
target architecture. I am assuming that, technically speaking, it was, because
that is what the standard allowed, but that view leaves unexamined the
question of whether it is contrary to the purposes for having and using the C
language in the first place.

I can imagine an argument pointing out that this optimization is applied at an
abstract level of representation prior to code generation, and that it would
be a violation of modularity to take into account the target architecture at
that point. This, however, would be a point about the compiler architecture,
which I think should, where practical, yield to concerns about the overall
purpose and use of the compiler, where the principle of 'no (or minimal)
surprises' is important.

------
lloda
The link is an outdated version r0, this is r2 (don't know if it's the latest)
[http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2018/p090...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r2.html)

~~~
jwilk
_The main change between [P0907r0] and the subsequent revision is to maintain
undefined behavior when signed integer overflow occurs, instead of defining
wrapping behavior._

~~~
badsectoracula
Gah, what is the point then?

~~~
sanxiyn
The point is that signed integers are two's complement. After all, it is the
title.

~~~
barrkel
Naive overflow checks are often done by adding two positive numbers together
and checking if the result is negative. If wrapping is undefined, this no
longer works, and the overflow check may be eliminated.

------
scott_s
In April, the author of this proposal said on Twitter that the C++ standards
committee agreed to this proposal for C++20:
[https://twitter.com/jfbastien/status/989242576598327296?lang...](https://twitter.com/jfbastien/status/989242576598327296?lang=en)

~~~
sanxiyn
This proposal, but not this revision. (Tweets are careful about this.) In
particular, signed integer overflow is still undefined.

------
ridiculous_fish
> Overflow in the positive direction shall wrap around

This appears to be defining signed integer overflow semantics, which prevents
the compiler from doing certain basic optimizations, for example, that (x*2)/2
== x. Is that part of this? Has anyone measured the perf cost on a real
program?

~~~
userbinator
It prevents _risky_ optimisations; this now requires the compiler to _prove_
that such optimisations won't change the semantics of the code, e.g. in your
case by essentially proving that the high 2 bits of x (only 1 in the unsigned
case, due to sign-extension) will never be set.

...and it could be argued that if the compiler couldn't prove that was true,
then it just helped you find a possible overflow bug in the code. If you
actually wanted the compiler to unconditionally assume (x*2)/2==x and optimise
accordingly, then you'd have to tell it such; e.g. MSVC has the
__analysis_assume() feature.

~~~
caf
However it also prevents "abort on overflow" implementations from being
conforming, which look like a much better way of finding actual overflow bugs
in the code.

~~~
mort96
Do such implementations currntly only abort on signed overflow and not on
unsigned overflow? Aborting on unsigned overflow is currently not conforming,
is it?

~~~
scatters
It is not conforming, which is why `-fsanitize=unsigned-integer-overflow` is
not enabled by default by ubsan. However it is available if you want to try
it.

------
GlitchMr
I like the idea of forbidding signed integer representations other than 2's
complement, as it is de facto standard, pretty much nobody makes CPUs with
non-standard integer representations, partly due to C programs assuming 2's
complement integer representation.

What I don't like about this proposal is defining signed integer overflow as
2's complement wrapping. Yes, yes, I know undefined behaviour is evil, and
programs wouldn't become much slower. However, if a program has signed
overflow, it's likely a bug anyway. Defining signed overflow as 2's complement
wrapping would mean not allowing other behaviours, in particular, trapping. On
architectures with optional overflow traps (most architectures not called x86
or ARM), trapping would be much more preferable to quiet bugs. Meanwhile,
while it is undefined behaviour, the implementation would be still free to
define it, for instance in GCC it can be done with `-fwrapv`.

~~~
loup-vaillant
> _However, if a program has signed overflow, it 's likely a bug anyway._

There are programs that check for overflow after the fact. Is that a bug?

~~~
sanxiyn
No, but since most overflows are bugs, I think the right solution is to
standardize something like GCC's __builtin_add_overflow for overflow checks.
Rust does this.

------
greenhouse_gas
The real issue isn't that C doesn't have a standard int overflow, but that
it's undefined.

What they _could_ have done is made it _implementation defined_ , like
sizeof(int), which depends on the implementation (hardware) but on the other
hand isn't undefined behavior (so on x86/amd4 sizeof(int) will _always_ be
equal to 4).

~~~
cjensen
It's undefined for a reason.

    
    
      size_t size = unreasonable large number;
      char buf = malloc (size);
      char *mid = buf + size / 2;
      int index = 0;
      for (size_t x = 0; x < big number; x++) mid[index++] = x;
    

A common optimization by a compiler is to introduce a temporary

    
    
      char *temp = mid + index;
    

prior to the loop and then replace the body of the loop with

    
    
      *(temp++) = x;
    

If the compiler has to worry about integer overflow, this optimization is not
valid.

(I'm not a compiler engineer. Losing the optimization may be worth-while. Or
maybe compilers have better ways of handling this nowadays. I'm just chiming
in on why int overflow is intentionally undefined in the Fine Standard)

~~~
kazinator
Integer overflow is certainly not undefined _for this reason_.

It's undefined because in the majority of situations, it is the result of a
bug, and the actual value (such as a wrapped value) is unexpected and causes a
problem.

For instance, oh, the Y2038 problem with 32 bit time_t.

~~~
greenhouse_gas
>It's undefined because in the majority of situations, it is the result of a
bug,

1\. If it's a bug, it should overflow or crash (implementation defined, not
undefined), or do what Rust does, crash on -o0 (or, if it's illegal to change
defined behavior based on optimization level, create a --crash-on-overflow
flag) and overflow on everything else.

2\. There is plenty of code where it's intentional (such as the infamous
if(a+5<a)).

------
beyondCritics
Getting rid of this useless (crap #!§$§$§$) legacy stuff was overdue, so i am
very happy to see it done. I personally think it is _the_ most important
proposal for C++20, since it will remove a lot of pointless pressure from
secure coding attempts and in turn make the world a little bit more secure.

~~~
sanxiyn
I don't see how. Integer overflows still can be security issues even if they
wrap.

~~~
johannes1234321
They can be security issues since the compiler is allowed to optimize stuff.
The compiler can check, that some checks the user added "don't make sense"
since those only would be hit if something undefined happens. An example is
shown in [https://www.tripwire.com/state-of-security/vulnerability-
man...](https://www.tripwire.com/state-of-security/vulnerability-
management/compiler-undermining-secure-coding/) but there are many more.

~~~
kazinator
No, they can be security issues in that the program expected x + 1 to be a
value bigger than x, but it is suddenly a big negative value.

The fact that this behavior is now blessed by ISO C makes no difference to it
being _wrong_ , and causing some security issue in the program.

------
fred256
I'm curious why some old architectures didn't use two's complement for signed
numbers. What advantage did one's complement or signed magnitude have over
two's complement?

~~~
0xcde4c3db
It can be useful to distinguish between positive and negative zero in some
cases, for example when dealing with values that have been rounded to zero or
limits approaching zero.

~~~
lopmotr
Was that ever really a reason for signed magnitude, or did people just make
use of the 2nd representation of zero because because it was available and
they couldn't be bothered putting that information in another variable or
using floating point or fixed point, or anything else that would have achieved
the same result?

~~~
userbinator
I have a feeling signed-magnitude predates binary and complement arithmetic
--- it is, after all, the "natural" way humans work with numbers. A lot of the
early non-binary computers used some form of sign-magnitude, all the way back
to punch card formats:

[https://en.wikipedia.org/wiki/Signed_overpunch](https://en.wikipedia.org/wiki/Signed_overpunch)

On the other hand (no pun intended), early mechanical (decimal) manual adding
machines made use of complement arithmetic too:

[https://en.wikipedia.org/wiki/Comptometer](https://en.wikipedia.org/wiki/Comptometer)

[https://en.wikipedia.org/wiki/Method_of_complements](https://en.wikipedia.org/wiki/Method_of_complements)

~~~
Animats
Burroughs 5xxx and 6xxx machines used signed-magnitude.

Burroughs had a unique numeric representation. Numbers were 48 bits. Sign,
sign of exponent, exponent, mantissa, with the binary point at the low end.
Integers were thus valid floating point numbers. The math operations would
maintain a value as an integer, with a zero exponent, if possible.

IEEE floating point also maintains integer values as integers until they don't
fit, but the representation is not integer-like.

------
microcolonel
The equivalent for C/WG14: [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n2218.htm](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n2218.htm)

~~~
caf
That document has this listed as a Change:

 _Conversion from signed to unsigned is always well-defined: the result is the
unique value of the destination type that is congruent to the source integer
modulo 2ⁿ._

...but that's not a change - it's been the case in C all along.

------
beyondCritics
There seems to be an error in this proposal:

"Change Conversion from signed to unsigned is always well-defined: the result
is the unique value of the destination type that is congruent to the source
integer modulo 2N."

This is no change, since we have that already, e.g. see
[https://en.cppreference.com/w/cpp/language/implicit_conversi...](https://en.cppreference.com/w/cpp/language/implicit_conversion)
and the conversion operation on the bit pattern is the identity for two's
complement representation. The relevant section in the latest C++ standard is:
4.8 Integral conversions [conv.integral] 1 A prvalue of an integer type can be
converted to a prvalue of another integer type. ... 2 If the destination type
is unsigned, the resulting value is the least unsigned integer congruent to
the source integer (modulo 2 n where n is the number of bits used to represent
the unsigned type). [ Note: In a two’s complement representation, this
conversion is conceptual and there is no change in the bit pattern (if there
is no truncation). — end note ]

Therefore the inverse conversion exists and is the identity as well, this is
what should be sanctioned.

------
FrozenVoid
FYI you can prevent all of non-2 complement problems with -fwrapv which forces
2' complement wrapping math in gcc/clang/icc

------
fanf2
Also relevant is HAKMEM item 154 in which Bill Gosper concluded that the
universe is two’s complement:
[http://catb.org/jargon/html/H/HAKMEM.html](http://catb.org/jargon/html/H/HAKMEM.html)

------
jgtrosh
Quick question: in the proposed rewording of intro.execution¶8, why is the
following rewriting “((a + b) + 32765)” not reintegrated at the end of the
untouched text? Have I misunderstood that with two's complement this would be
legal?

------
polthrowaway
have they considered introducing new types for wrapping integers, checked
integers and saturating integers. i understand why they might not want to make
a change that could have a large effect on existing programs. but if you
introduce new types then the new types will only effect new programs that
choose to use them and this seems to be something that could be a library
change than a language change.

------
kazinator
Requiring two's complement just means you can't have a sensible C language on
some sign-magnitude machine.

Even if nobody cares about such a machine, nothing is achieved other than
perhaps simplifying a spec.

A language spec can provide a more detailed two's complement model with
certain behaviors being defined that only make sense on two's complement
machines, without tossing other machines out the window.

There could be a separate spec for a detailed two's complement model. That
could be an independent document. (Analogy: IEEE floating-point.) Or it could
be an optional section in ISO C.

Two's complement has some nice properties, but isn't nice in other regards.
Multi-precision integer libraries tend to use sign-magnitude, for good
reasons.

What I suspect is going on here is that someone is unhappy with what is going
on in GCC development, and thinks ISO C is the suitable babysitting tool.
(Perhaps a reasonable assumption, if those people won't listen to anything
that doesn't come from ISO C.)

~~~
phkahler
>> Even if nobody cares about such a machine, nothing is achieved other than
perhaps simplifying a spec.

No, I use 16bit values to represent angles in embedded systems all the time. I
routinely expect arithmetic on these values to roll over as 2's complement and
I expect to take differences of angles using 2's complement all the time. I'm
fully aware that this is undefined behavior and needs to be verified on each
compiler/processor combination. It has always worked and yet it's undefined
behavior. It would be nice for it to be defined. There are no modern machines
that would be impacted by this.

~~~
markrages
Yes, it is annoying to rrad comments that assume overflow is always a
programming error.

You can store the angle as a union of signed and unsigned type. Do arithmetic
on the unsigned member, where overflow is defined. (Both members are
equivalent angles)

~~~
sanxiyn
Not always, but overflows are usually bugs. This is replicated finding of
dynamic overflow checking tools, over and over.

~~~
kazinator
And if overflow is well-defined, then those tools must switch your C dialect
to a non-ISO-C conforming one in order to do their job.

