
Efficient Integer Overflow Checking in LLVM - rspivak
http://blog.regehr.org/archives/1384
======
vbezhenar
Efficient integer overflow should really be implemented in the processor. This
problem exists and often leads to security holes. And developer in 99% cases
is not prepared to deal with overflows, so it shouldn't happen unless
developer explicitly used some overflowing operator.

Unfortunately almost all languages use overflow by default. I only know that C
language has unspecified behavior for overflowing signed integers, so it's
legal to crash here. That's a sad state of things.

~~~
chrisseaton
> Efficient integer overflow should really be implemented in the processor.

But it already is. As a comment the article links to describes, add and then
jo is fused by the processor, so the jo causes no extra uops and we can assume
it's the same performance as a real add_and_jo instruction would be. The
article also says the that the impact on the icache is insignificant so the
fact that it's two instructions rather than one until the fusing also doesn't
matter.

So what else did you want?

(Edit: spc476 points out that addressing mode arithmetic can't be used with
jo, so there is that)

~~~
caf
The fused uop also can't be executed on as many ports as a plain add.

------
Animats
Excellent! I've wanted compilers to optimize overflow checks and subscript
checks since my proof of correctness work decades ago. This was done in the
1980s for Pascal, but the technology was forgotten. It's finally going
mainstream.

The next optimization, which the parent article doesn't cover, is hoisting
checks out of loops. This is a huge win for subscript checks. Most newer
compilers optimize FOR loops, at least.

Having detected an error, what do you do? An exception would be nice, but few
languages other than Ada handle machine level exceptions well. Aborting the
whole program is drastic.

~~~
chrisseaton
Do you have a reference for the proof you did? I'm interested in optimising
overflow checks in Ruby.

~~~
Animats
The system: [1] The Nelson-Oppen decision procedure: [2]

There's a subset of of arithmetic and logic that's completely decidable. It
includes addition, subtraction, multiplication by constants, "if", the theory
of arrays, a similar theory of structures, and the Boolean operators (on
bools, not integers.) This is enough for most subscript and overflow checks.
Complete solvers exist for this subset. If you try to prove all constraints
that need run-time checks with this approach, you can eliminate maybe 80-90%
of run time checks as unnecessary.

Some checks cannot be eliminated, but can be hoisted out of loops. This means
making one check before entering the loop, rather than one on each iteration.
FOR statements usually can be optimized in this way. That's almost a no-
brainer in modern compilers. The "Go" compiler does this. Heavier machinery is
needed for more complex loops.

[1]
[http://www.animats.com/papers/verifier/verifiermanual.pdf](http://www.animats.com/papers/verifier/verifiermanual.pdf)

[2]
[http://www.cs.cmu.edu/~joshuad/NelsonOppen.pdf](http://www.cs.cmu.edu/~joshuad/NelsonOppen.pdf)

~~~
dbaupp
_> The "Go" compiler does this_

Really? I was under the impression the Go compiler did only simplistic
optimisations, to get their wonderfully fast compile times. Do you have a
source (e.g. the source)?

~~~
Animats
That was from the developers mailing list. I think the only optimization
they're doing is for FOR statements, where you know the bounds on a loop
variable at loop start. This is a big win for loops that iterate over an array
and do something trivial, like setting the array elements to 0. Subscript
check performance is mostly an inner-loop problem.

~~~
dbaupp
Oh, I'm curious why you pointed out Go specifically, since that sounds like
fairly standard bounds check eliminations that pretty much all optimising
compilers can do (compilers built on both LLVM and GCC's infrastructure), and
most compilers aren't tied to specific looping constructs for it: it can work
with while loops, or for loops, or whatever.

And, my impression is that bounds checks are only avoided in Go with range-
based for loops (which are very similar to C++ or Rust iterators, which avoid
bounds checks in probably exactly the same way), not a loop over integers with
indexing.

------
greggman
I thought this was going to be about efficient overflow checking at a USER
level. I want to be able to write efficient code something like

    
    
        bool SafeAddInts(int a, int b, int* out) {
          if (willOverFlowAdd(a,b)) {
            return false;
          }
          *out = a + b;
          return true;
        }
    

Or better

    
    
        bool SafeAddInts(int a, int b, int* out) {
          int temp = a + b;
          if (magicCheckProcessorFlagsForOverflow()) {
            return false;
          }
          *out = temp;
          return true;
        }
    

Or something along those lines.

It's all great the compiler can maybe generate code that crashes my program
there's overflow but what's the efficient case for handling it at a user level
given that overflow is undefined in the general case given the result is
undefined?

~~~
dbaupp
You may be interested in
[http://blog.regehr.org/archives/1139](http://blog.regehr.org/archives/1139)
by the same author, and also the (GNU) compiler built-ins for exactly that
task [https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-
Builtins...](https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-
Builtins.html) .

(Note that your second example can't work in C/C++, since the check happens
after the operation.)

~~~
daurnimator
> (Note that your second example can't work in C/C++, since the check happens
> after the operation.)

I think the poster already knew that, hence the `magic`.

------
_chris_
I'm confused by SPECInt throwing overflows - did the blogger verify the output
matched the golden spec? I assume he did, but I'd really like verification.

With SPEC it's really easy to use a wrong compiler flag or make a change to an
expected int-width and end up with garbage outputs and be none the wiser.

~~~
pavpanchekha
The blogger in question is John Regehr, a very well known researcher in
efficient and sound compilation. I suppose I don't have his word, but I can
only imagine that he did make sure his modifications were safe.

------
Halienja
Take a look at this -
[http://clang.llvm.org/docs/LanguageExtensions.html#builtin-f...](http://clang.llvm.org/docs/LanguageExtensions.html#builtin-
functions)

