
Signed overflow in C - ingve
https://www.imperialviolet.org/2016/05/15/signedoverflow.html
======
nkurz
_If anything 's messed up, then that would be my doing._

I thought at first that some of the words in the piece were distorted as a
joke about undefined behavior, but as I look at the source, I think something
is just broken with the custom fonts.

 _If signed overflow were defined, then 12_ * _i + 8 would have to have
correct overflow behaviour for 32-bit values and so this trick couldn 't be
done._

I feel like this misses the difference between "undefined" and "implementation
defined". Despite the example given, those of us (like me) who complain about
compilers removing safety checks rarely claim that signed 32-bit ints must
have guaranteed behavior on overflow. Overflowing might be easiest, if that's
what the hardware does naturally, but it doesn't have to be defined to do
this.

In our minds, it would be just fine for the compiler to generate an "add"
instruction and let the registers fall where they may. What we object to is
the compiler's reasoning that since "undefined behavior can be presumed not to
occur, I'll just delete this spurious NULL check that the programmer
accidentally put here".

Granted, there are cases where a compiler can take proper advantage of such
optimizations, but it's emphatically not the case that the only alternative to
the current "nasal daemon" approach is disabling all loop optimizations.
Simply changing "undefined" to "implementation defined" would satisfy a
majority of the objections.

~~~
pcwalton
> Granted, there are cases where a compiler can take proper advantage of such
> optimizations, but it's emphatically not the case that the only alternative
> to the current "nasal daemon" approach is disabling all loop optimizations.
> Simply changing "undefined" to "implementation defined" would satisfy a
> majority of the objections.

No. Since a huge number of loop optimizations are dependent on trip count
detection, this is not in fact the case. Loop trip count detection is ruined
by making signed overflow implementation defined.

See this thread from just last week explaining this:
[https://news.ycombinator.com/item?id=11653940](https://news.ycombinator.com/item?id=11653940)

Additionally, there's less of a difference between "undefined" and
"implementation defined" than you might think. Consider the example in the
original post:

    
    
        if (x + y < x) {
            return 0; /* Overflow */
        }
        length = x + y;
    

With implementation defined overflow, the compiler is still free to remove
this check, because the value of "x + y" would become an undefined value on
overflow, and comparisons on undefined values are themselves undefined. If you
want the compiler to do anything else, then you're proposing a drastic
overhaul of (for example) LLVM's notion of "undef" [1], one that would have
massive consequences for optimization.

[1]: [http://llvm.org/docs/LangRef.html#undefined-
values](http://llvm.org/docs/LangRef.html#undefined-values)

~~~
efaref
Am I the only one who thinks that writing

    
    
        if (x + y < x)
    

is utterly insane? It completely breaks the abstraction of algebra that's
being used in the language. Much better would be:

    
    
        if (ADDITION_WOULD_OVERFLOW_INT(x, y))
        {
           ...
        }
    

This is (a) far easier to read, and (b) more likely to be more correctly
implemented as something like:

    
    
        #define ADDITION_WOULD_OVERFLOW_INT(a, b)            \
            (((a) > 0 && (b) > 0 && (b) > INT_MAX - (a)) ||  \
             ((a) < 0 && (b) < 0 && (b) < INT_MIN - (a)))
    

For bonus points you could write this as part of the compiler support library
to derive the types and limits automatically, or even define it as part of the
compiler. Why do GCC/clang not have:

    
    
        __builtin_addition_would_overflow(a, b)
    
    ?

~~~
efaref
A comment below reminded me of these:
[https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-
Builtins...](https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-
Builtins.html), which are close, but have an insane API as they replace your
arithmetic operations with ugliness.

I think they can be fixed with sensible macros, though:

    
    
        #define ADDITION_WOULD_OVERFLOW(a, b) \
          ({ typeof((a) + (b)) __r; __builtin_add_overflow((a), (b), &__r); })
    

Hopefully the compiler would optimise:

    
    
        if (ADDITION_WOULD_OVERFLOW(a, b))
        {
           return 0;
        }
        return a + b;
    

to only do the addition once.

~~~
Kristine1975
How about:

    
    
      #define ADD_OR_DEFAULT(a, b, d) \
          ({ typeof((a) + (b)) r; __builtin_add_overflow((a), (b), &r) ? (d) : r; })
    

and then:

    
    
      return ADD_OR_DEFAULT(a, b, 0);
    

P.S: Names beginning with two underscores are reserved for the implementation.

~~~
efaref
The problem with solutions like this is that they break the abstraction of
using algebraic notation for calculations. We're conditioned to expect
calculations to be written in algebraic notation, and indeed one of the
original selling points of C is that it lets you express calculations in that
form. To undo that is a massive shame.

The calculation itself absolutely MUST look like this:

    
    
        a + b
    

While maybe it's not too bad for simple addition, think of compound
expressions with multiple terms.

------
Kristine1975
Friendly reminder to use clang's and gcc's intrinsics to check for integer
overflow (signed or unsigned):

[https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-
Builtins...](https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-
Builtins.html)

[http://clang.llvm.org/docs/LanguageExtensions.html#checked-a...](http://clang.llvm.org/docs/LanguageExtensions.html#checked-
arithmetic-builtins)

~~~
pascal_cuoq
On the other hand, using nonstandard extensions makes it harder to move away
from a compiler the direction of which you are not happy with, and can make
your situation worse in the long term.

~~~
Kristine1975
Just put the intrinsics into their own functions and re-implement those when
switching compilers.

------
pjc50
An interesting pet project for someone to attempt would be a language with
fully-specified arithmetic.

So rather than declaring vaguely that you want an "integer", you'd specify
bounds, signed-ness (two or one's complement), and what happens on overflowing
the bounds (wrap / saturate / exception / trap / set flag / set NAN etc.)

Saturated arithmetic intrinsics are widely available but generally not used by
generated code. Similarly I've never seen an attempt at doing "integer NaN",
which could be quite useful in some situations.

You'd probably also want some flags to say "give me a compiler error or
warning if this code doesn't compile to intrinsics and is being emulated
instead".

~~~
vardump
Generic (compound) constraints could also be useful.

Like this:

(Value >= 0 && Value <= 1000 && (Value & 1) == 0) || Value in(value_enum)

Would limit Value to even numbers between [0 .. 1000] except if it's an enum
(for error values).

~~~
pjc50
Hmm, I can see why that would be useful but that makes it much harder to
reason about in order to automatically eliminate checks wherever possible; you
can guarantee that adding a number in [0,100) to [0,100) will produce a number
in [0,200) without needing to apply checks, but one of your Values may
generate an error if you just increment it.

~~~
vardump
Compilers already reason about values like that. They can prove certain bits
in an integer to have certain values, etc, not just about ranges. Actual
checks could be optimized away pretty often.

Simple example: Say, you multiply a number by two. From that, the compiler
data flow analysis knows the lowest bit can only be zero. So it can safely
optimize away anything that checks that bit.

------
wnoise
> Given that the size of a Point is 12 bytes, the ref­er­ence to the z
> mem­bers of points is turned into points + (12 * i + 8).

No, it's not. In the C abstract machine, arithmetic on pointers is not the
same as arithmetic on integers.

This really is "points + i", but on pointers to an object with sizeof 12, and
then a dereference with an offset of 8. These pointers _don 't_ have the
overflow undefined behavior (but do have other undefined behavior if you have
a pointer outside a C object). As implemented in assembly, yes, it comes to a
multiplying by 12, but that's not a C int at that point. (And can be done with
addressing trickery, as the article says.)

~~~
kevinnk
Perhaps someone who has a better grasp of the standard can correct me, but as
far as I can tell, your interpretation is correct and the given example is not
undefined behavior.

From 6.5.2.1 (Array Subscripting) of the C11 draft:

"A postfix expression followed by an expression in square brackets [] is a
subscripted designation of an element of an array object. The definition of
the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
Because of the conversion rules that apply to the binary + operator, if E1 is
an array object (equivalently, a pointer to the initial element of an array
object) and E2 is an integer, E1[E2] designates the E2-th element of E1
(counting from zero)."

From 6.5.6 (Additive Operators):

"When an expression that has integer type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the pointer
operand points to an element of an array object, and the array is large
enough, the result points to an element offset from the original element such
that the difference of the subscripts of the resulting and original array
elements equals the integer expression. [...] If both the pointer operand and
the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined."

To me, those sections together read that there should be no overflow as long
as i is smaller than the length of `points` + 1, and thus no undefined
behavior. Does anyone have another interpretation?

~~~
comex
Nope, you're correct. On the other hand, earlier today I read a post about a
pretty similar-seeming phenomenon (signed overflow UB reducing the number of
lea instructions that have to be generated):

[https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...](https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7)

Perhaps the author had heard that -fwrapv causes extra 'lea's but got mixed up
as to why. (TL;DR: it makes it harder for the compiler to prove that you're
not a sadist using _negative array indices_.)

------
quotemstr
Signed overflow optimization is just a cheap trick for avoiding the need to do
_real_ bounds analysis. Every single loop optimization possible by abusing
signed overflow is also possible with unsigned values (and thus
implementation-defined signed overflow) if you can prove your bounds.

I'm sick and tired of compiler authors whining about how we "need" undefined
signed overflow for optimization, particularly because I usually use unsigned
loop counters and so don't benefit from this optimization, but would benefit
from real, robust bounds analysis that worked for both signed and unsigned
types.

~~~
jjnoakes
GCC and clang are open source. Please contribute your patches.

------
akst
What's going on with the zalgo text? Is this an aesthetic choice, or is this a
legitimate rending issue?

[http://i.imgur.com/tryjhMi.png](http://i.imgur.com/tryjhMi.png)

~~~
Kristine1975
Doesn't happen here (Firefox). Seems indeed like a rendering issue in your
browser.

~~~
akst
Hmm, must be Safari then.

------
johnp_
Fwiw here's the Rust side of things (also with reference to `lea`):

[https://huonw.github.io/blog/2016/04/myths-and-legends-
about...](https://huonw.github.io/blog/2016/04/myths-and-legends-about-
integer-overflow-in-rust/)

tl;dr:debug mode checks for overflow and panics when it happens; release mode
is defined to wrap as two's complement

------
tempodox
Is there a request overflow? This page just won't load and gives no error,
either.

------
wnoise
For those wondering, it appears the author took it down:
[https://news.ycombinator.com/item?id=11714346](https://news.ycombinator.com/item?id=11714346)

------
inlined
Yes (x - y < x) can't detect overflow because it overflows too, but what's
wrong with (UINT_MAX - x < y)?

~~~
Kristine1975
It seems to me that you're mixing (signed) int (x and y) and unsigned int
(UINT_MAX).

As for securely implementing overflow checks for signed addition, this seems
to be useful:
[http://blog.regehr.org/archives/1139](http://blog.regehr.org/archives/1139)

------
Sam_Harris
Not a link.

