
How Should You Write a Fast Integer Overflow Check? - cremno
http://blog.regehr.org/archives/1139
======
pbsd
We can take advantage of two's complement arithmetic, plus the knowledge that
the overflow flag is the xor of the carry into the sign bit with the carry out
of the sign bit, to devise a simple expression:

    
    
        int checked_add_n(int_t a, int_t b, int_t *rp) {
            uint_t s = (uint_t)a + (uint_t)b;
            *rp = s;
            return ((a ^ s)&(b ^ s)) >> (BITS - 1);
        }
    

This is essentially doing the same thing as checked_add_3. First, we extract
the carry into the sign bit with

    
    
        ci = s ^ a ^ b; // upper bit of ci contains the relevant bit
    

Carry out can be found via the usual majority expression:

    
    
        co = ci&a ^ ci&b ^ a&b;
    

Xoring the two, and simplifying using De Morgan's laws and their relatives
results in the expression above.

Although the metric is far from optimal, clang gives me 6 arithmetic
instructions (minus the ret): [http://goo.gl/Amdo93](http://goo.gl/Amdo93)

~~~
torrent-of-ions
You can't take advantage of two's complement arithmetic if you want portable
C, though.

~~~
userbinator
Are there any machines in widespread use today that _don 't_ use two's
complement? I guess if there were, we'd hear a lot more about this issue... I
know that some mainframes are an example, but they're somewhat of a niche.

~~~
nitrogen
Signed integer overflow is undefined in C, so optimizers can do really weird
things to code that has overflows.

~~~
anaphor
Yes, see: [http://people.csail.mit.edu/nickolai/papers/wang-
undef.pdf](http://people.csail.mit.edu/nickolai/papers/wang-undef.pdf)

~~~
kps
And see John Regehr's ( [http://blog.regehr.org/](http://blog.regehr.org/), of
the headline article) other posts. He's written a great deal about undefined
behaviour in C/C++ and how modern compilers handle it. Check the ‘Compilers’
and ‘Software Correctness’ categories.

------
sltkr
I think "smallest number of instructions" is a pretty weird metric to use if
you are going for "fast" solutions.

checked_add_3() has a lot of instructions that can be executed in parallel.
checked_add_4() has a lot of conditional code which can be skipped. (Branching
might be expensive in that case, though on the other hand overflow is rare, so
these branches can probably be predicted accurately.)

I have little doubt that the assembly version is fastest in practice, but to
compare the other versions, they should be benchmarked.

------
professorTuring
EDIT 2: Editing...

EDIT 1: as spotted by @pbsd this was not working good with some values...

I have added a way of checking the overflow and I have tested it:

    
    
      #define BITS (8*sizeof(int))-1
    
      int checked_add(int a, int b, int *rp)
      {
          *rp = a+b;
          
          return (a^b) < 0 ? 0 : (a^(*rp)) < 0  ? 1 : 0) ;
      }
    

It is around 30% faster than the fastest in the blog and I believe the code is
way cleaner. What we are doing is checking if all of the values has the same
sign (the "^" is just a "xor"), if so, no overflow in other case, overflow.

This is the testing program (g++ compliant):

    
    
      #include <stdio.h>
      #include <time.h>
      #include <stdlib.h>
    
      #define BITS (8*sizeof(int))-1
    
      int checked_add2(int a, int b, int *rp) {
        uint ur = (uint)a + (uint)b;
        uint sr = ur >> (BITS-1);
        uint sa = (uint)a >> (BITS-1);
        uint sb = (uint)b >> (BITS-1);
        *rp = ur;
        return
          (sa && sb && !sr) ||
          (!sa && !sb && sr);
      }
    
    
      int checked_add(int a, int b, int *rp)
      {
          *rp = a+b;
          return ((a^b) | (a^(*rp)) < 0)  ? 1 : 0 ;
      }
    
    
      int main(int argc, char* argv[])
      {
          int a, b, c;
          long clong;
          srand(time(NULL));
    
    
          for(unsigned int i = 0; i < 50000000; ++i)
          {
              bool overflow = false;
              a = rand();
              b = rand();
              clong = (long)a + (long)b;
    
              overflow = checked_add(a,b,&c);
    
              if(clong != (long)c && !overflow)
                  printf("Overflow not detected %i + %i = %i \n", a, b, c);
    
          }
    
          return 0;
      }
    
    

Time for "checked_add": 2.694s

Time for "checked_add2": 3.200s

This is the dump (6 instructions so far):
[http://goo.gl/cnBKdS](http://goo.gl/cnBKdS)

~~~
pbsd
Your function is incorrect. Consider the case a = b = 0x80000000.

~~~
professorTuring
Ok, I think this should do the trick:

return((a ^ b) | (a ^ (*rp)) < 0) ? 1 : 0;

doesn't it?

~~~
pbsd
Not yet. Consider a = -1, b = 1. Due to the OR, any a and b with different
signs will return overflow.

~~~
professorTuring
Ok this is my dumbest time in life xD (I just directly didn't take into
consideration that part because it never has overflow...)...

~~~
barrkel
Most people don't appreciate how complicated C's mix of signed and unsigned
is, especially when undefined behaviour of signed overflow is included.

~~~
professorTuring
Definitely.

Child, this is why I always add unit tests.

------
xiaq
It has always made me wonder why there is no direct way for checking for
overflow in most languages, since most CPUs _do_ have an overflow flag.
Dedicating an operator or keyword is probably too much, but at least it can be
a routine in stdlib that usually gets inlined.

I have also come to think "hey this perhaps could be a security hole" every
time it comes to unchecked overflow. Does anyone know such exploits?

~~~
tlarkworthy
The most amazing achievement of the computer software industry is its
continuing cancellation of the steady and staggering gains made by the
computer hardware industry.

    
    
            — Henry Petroski

------
solarexplorer
The assembly subroutine is not optimal because the compiler can not inline it.
A ASM expression [1] is much better.

Also, it is weird to write the result to memory just to avoid to repeat the
addition. A memory operation is quite complex, but there a few things simpler
than adding two registers. A better abstraction would be to write a function
that returns the overflow bit and nothing else. The addition itself is
(almost) free anyway.

[1] [http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Extended-
Asm.htm...](http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Extended-Asm.html)

------
jl6
The assembly version looks simpler and more maintainable than any of the C
versions. What's the overhead of using it? Having to rewrite for each
architecture you target (likely to be only 2 or 3)? If this level of
optimisation matters to you then I guess you'd be down in assembly anyway?

~~~
cokernel_hacker
Even if you could write perfect assembly, sticking it in your program makes it
more difficult for optimizations to flow across the assembly you have written
because it is opaque to the compiler.

Better yet, imagine that the compiler has performed value range propagation
[1] to determine that the range checks, after inlining, were redundant or
simply elidable. This would be infeasible if it were written in assembly.

[1] [http://llvm.org/devmtg/2007-05/05-Lewycky-
Predsimplify.pdf](http://llvm.org/devmtg/2007-05/05-Lewycky-Predsimplify.pdf)

~~~
userbinator
If compilers don't understand inline Asm, maybe instead of avoiding it, why
not try improving the compiler so it can understand? I know this can't be done
if e.g. you use newer instructions that it doesn't know about and wouldn't
emit anyway, but in the case of faster, shorter sequences using existing
instructions and having an equivalent in C, I don't see why it can't
"decompile" the instructions you write and use that information for
optimisation. Even better if the compiler can _learn_ from it and use that
faster/shorter instruction sequence when it sees that expression in the
future.

 _Better yet, imagine that the compiler has performed value range propagation
[...] This would be infeasible if it were written in assembly._

VRP doesn't depend on the instruction set; it could be LLVM IR or it could be
x86 or ARM or 6502 or anything else. It's just looking at values and
operations on them.

~~~
ksk
Huh? ... decompile the optimized instructions to what? There is no "equivalent
C". That's the entire point. Otherwise someone would just write it in C and
the codegen would emit the optimized instructions.

~~~
erichocean
> Huh? ... decompile the optimized instructions to what?

To the compiler's own IR, whatever that is.

------
sehugg
The phrase "premature optimization" gets thrown around a lot, but it seems to
me we'd avoid this whole discussion (and benchmarking, and testing) if we
could safely and easily inline the assembly version of the function. After
all, a machine instruction is just a library function burnt into hardware.

Maybe a better term is "premature nonportability".

------
muraty
From `The CERT Oracle Secure Coding Standard for Java`

public static int safeAdd(int l, int r) throws OverflowException {

    
    
        if (r > 0
                ? l > Integer.MAX_VALUE - r
                : l < Integer.MIN_VALUE - r)
            throw new ArithmeticException(String.format(
                    "Integer overflow: %d + %d", l, r));
        return l + r;
    

}

[https://www.securecoding.cert.org/confluence/display/java/NU...](https://www.securecoding.cert.org/confluence/display/java/NUM00-J.+Detect+or+prevent+integer+overflow)

~~~
jerven
In java 8 you should probably use the Exact methods in java.lang.Math (In
cases where the size is int or long and overflow errors need to be detected,
the methods addExact, subtractExact, multiplyExact, and toIntExact throw an
ArithmeticException when the results overflow)

------
haberman
As a similar but slightly different problem I wrote this article about testing
whether a value can be converted from one type to another without loss:
[http://blog.reverberate.org/2012/12/testing-for-integer-
over...](http://blog.reverberate.org/2012/12/testing-for-integer-overflow-in-
c-and-c.html?m=1)

------
jerven
I would have expected to see the "jo" instructions in the assembly. Does
anyone know why it was not used?

~~~
gsg
Because the function doesn't have anywhere to jump to. Instead it generates
zero-or-one from the overflow flag (with seto) and the caller will then do
something like test eax, eax/jnz handle_overflow.

In other words, the abstraction boundary introduced by the function call
mechanism gets in the way of using the overflow flag directly with jo.
Inlining would, presumably, re-expose the opportunity.

~~~
jerven
That is another part I missed, the ASM implementation seemed to optimise the
function while potentially deoptimising the whole program.

For most code overflows are rare, jumping to deal with the rare cases but
otherwise just doing the normal is often best for whole program performance.

But to be honest I am not current in assembly as I like managed languages to
much.

------
Someone
For the referenced libo, the file overflow.c
([https://github.com/xiw/libo/blob/master/overflow.c](https://github.com/xiw/libo/blob/master/overflow.c))
surprised me: very short, and with a curious #include as its last line.

Anybody know the logic behind this?

------
toolslive
Some compilers (MLton, fe) have static analysis in place to skip the check
when no overflow can occur.

~~~
gsg
Note that MLton compiles a language which requires int arithmetic to raise an
exception on overflow, the overhead of which provides a strong motivation for
such analyses.

------
brianbarker
I saw Regehr and knew it had to be the U prof.

~~~
brianbarker
Lol, -3 for this? Must be some BYU fans up late...

