Efficient Integer Overflow Checking in LLVM

vbezhenar · on April 9, 2016

Efficient integer overflow should really be implemented in the processor. This problem exists and often leads to security holes. And developer in 99% cases is not prepared to deal with overflows, so it shouldn't happen unless developer explicitly used some overflowing operator.

Unfortunately almost all languages use overflow by default. I only know that C language has unspecified behavior for overflowing signed integers, so it's legal to crash here. That's a sad state of things.

spc476 · on April 9, 2016

The sad thing is---there are processors that can do that. The MIPS can trap on overflow (the ADD instruction will do that) but C compilers tend to use the non-trapping versions of said instructions (ADDU---add unsigned).

The VAX architecture (to go back some thirty-five years) can trap or not---there's a bit in the flags register that controls overflow traps.

On the x86 line, you can trap on overflow, but the check has to happen after each instruction. The obvious method (using INTO) is slow (http://boston.conman.org/2015/09/05.2) while using a conditional jump (JO) doesn't incur much overhead (http://boston.conman.org/2015/09/07.1) but it does lead to overall slower code because you have to use instructions that set the overflow flag. LEA (can be used to add and multiply) never sets the overflow flag, so instead of using a single instruction to, say, multiply by 16 and add 7, you have to use multiple instructions.

chrisseaton · on April 9, 2016

> Efficient integer overflow should really be implemented in the processor.

But it already is. As a comment the article links to describes, add and then jo is fused by the processor, so the jo causes no extra uops and we can assume it's the same performance as a real add_and_jo instruction would be. The article also says the that the impact on the icache is insignificant so the fact that it's two instructions rather than one until the fusing also doesn't matter.

So what else did you want?

(Edit: spc476 points out that addressing mode arithmetic can't be used with jo, so there is that)

caf · on April 10, 2016

The fused uop also can't be executed on as many ports as a plain add.

Animats · on April 9, 2016

VAXen did that. Integer overflow checking could be enabled on a per-function basis by setting the entry mask read by the CALL function at function entry.[1] However, you couldn't configure this for C's "unsigned overflow OK, signed overflow is an error" semantics. There was only one bit for enabling integer overflow detection. (You could also enable decimal overflow detection, in the unlikely event you were using the base 10 arithmetic instructions.) Overflow caused an exception at the machine level, which UNIX got as a signal.

I once rebuilt the UNIX tools on a VAX with the integer overflow detection bit set. About half of them still worked.

[1] http://odl.sysworks.biz/disk$axpdocmar981/opsys/vmsos71/ovms...

WalterBright · on April 10, 2016

It's one thing to use a jo (branch on overflow) instruction. But there's another problem:

    a = b + c + 47;

can be implemented as:

    LEA a, 47[b+c]

(a, b and c are assigned to registers)

This does not generate any trap or flag when it overflows, it just wraps. It's an important optimization, too.

_chris_ · on April 9, 2016

> Efficient integer overflow should really be implemented in the processor.

I disagree. It greatly adds to the complexity, verification, cost, and power for the HW to do this, which is especially onerous in that almost nobody cares or runs code that exercises it!

Instead, the real performance hit comes from the compiler being forced to serialize otherwise independent instruction streams. So why waste all that HW effort when you can pay it all in the SW when and only when you want to use it? The user probably won't even notice the difference between SW and HW overflow support on real workloads.

Animats · on April 9, 2016

Excellent! I've wanted compilers to optimize overflow checks and subscript checks since my proof of correctness work decades ago. This was done in the 1980s for Pascal, but the technology was forgotten. It's finally going mainstream.

The next optimization, which the parent article doesn't cover, is hoisting checks out of loops. This is a huge win for subscript checks. Most newer compilers optimize FOR loops, at least.

Having detected an error, what do you do? An exception would be nice, but few languages other than Ada handle machine level exceptions well. Aborting the whole program is drastic.

nikic · on April 9, 2016

Isn't value range inference for eliminating overflow / out-of-bounds checks, and LCM (/PRE) for hoisting checks out of loops pretty standard in any optimizing compiler nowadays?

chrisseaton · on April 9, 2016

Do you have a reference for the proof you did? I'm interested in optimising overflow checks in Ruby.

Animats · on April 9, 2016

The system: [1] The Nelson-Oppen decision procedure: [2]

There's a subset of of arithmetic and logic that's completely decidable. It includes addition, subtraction, multiplication by constants, "if", the theory of arrays, a similar theory of structures, and the Boolean operators (on bools, not integers.) This is enough for most subscript and overflow checks. Complete solvers exist for this subset. If you try to prove all constraints that need run-time checks with this approach, you can eliminate maybe 80-90% of run time checks as unnecessary.

Some checks cannot be eliminated, but can be hoisted out of loops. This means making one check before entering the loop, rather than one on each iteration. FOR statements usually can be optimized in this way. That's almost a no-brainer in modern compilers. The "Go" compiler does this. Heavier machinery is needed for more complex loops.

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf

[2] http://www.cs.cmu.edu/~joshuad/NelsonOppen.pdf

dbaupp · on April 10, 2016

> The "Go" compiler does this

Really? I was under the impression the Go compiler did only simplistic optimisations, to get their wonderfully fast compile times. Do you have a source (e.g. the source)?

Animats · on April 10, 2016

That was from the developers mailing list. I think the only optimization they're doing is for FOR statements, where you know the bounds on a loop variable at loop start. This is a big win for loops that iterate over an array and do something trivial, like setting the array elements to 0. Subscript check performance is mostly an inner-loop problem.

dbaupp · on April 10, 2016

Oh, I'm curious why you pointed out Go specifically, since that sounds like fairly standard bounds check eliminations that pretty much all optimising compilers can do (compilers built on both LLVM and GCC's infrastructure), and most compilers aren't tied to specific looping constructs for it: it can work with while loops, or for loops, or whatever.

And, my impression is that bounds checks are only avoided in Go with range-based for loops (which are very similar to C++ or Rust iterators, which avoid bounds checks in probably exactly the same way), not a loop over integers with indexing.

pcwalton · on April 10, 2016

I didn't see general LICM, but I suspect they'll add it eventually on the SSA backend.

greggman · on April 10, 2016

I thought this was going to be about efficient overflow checking at a USER level. I want to be able to write efficient code something like

    bool SafeAddInts(int a, int b, int* out) {
      if (willOverFlowAdd(a,b)) {
        return false;
      }
      *out = a + b;
      return true;
    }

Or better

    bool SafeAddInts(int a, int b, int* out) {
      int temp = a + b;
      if (magicCheckProcessorFlagsForOverflow()) {
        return false;
      }
      *out = temp;
      return true;
    }

Or something along those lines.

It's all great the compiler can maybe generate code that crashes my program there's overflow but what's the efficient case for handling it at a user level given that overflow is undefined in the general case given the result is undefined?

PeCaN · on April 10, 2016

GCC[1] and Clang[2] both support this with __builtin_add_overflow and family:

  int a = get_number_a(), b = get_number_b(), c;
  if (__builtin_add_overflow(a, b, &c))  // Note: Unlike your example, returns true iff overflow occurred
    uh_oh();
  else
    everything_is_ok();

These are quite efficient on many common architectures.

MSVC has SafeInt[3] which is more awkward but gets the job done.

1. https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...

2. http://clang.llvm.org/docs/LanguageExtensions.html#checked-a...

3. https://msdn.microsoft.com/en-us/library/dd570023.aspx

dbaupp · on April 10, 2016

You may be interested in http://blog.regehr.org/archives/1139 by the same author, and also the (GNU) compiler built-ins for exactly that task https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins... .

(Note that your second example can't work in C/C++, since the check happens after the operation.)

daurnimator · on April 10, 2016

> (Note that your second example can't work in C/C++, since the check happens after the operation.)

I think the poster already knew that, hence the `magic`.

chrisseaton · on April 10, 2016

That'll badly pollute your control flow graph though. You'll have a branch and a later merge for every SafeAddInts you do. Implicit handling of overflow with some kind of trap avoid this.

anon4 · on April 10, 2016

It's no worse than checking return codes though. And I really don't want integer overflow check to be turned on for all integers. I'm writing stuff that has to run, even if it occasionally produces a wrong answer. Having hidden checks inserted by the compiler that just crash the program would be really bad. Yes, I should check the inputs, but if I ever forget to check one, it's better that some ranges of input data produce nonsense results rather than crashes.

_chris_ · on April 9, 2016

I'm confused by SPECInt throwing overflows - did the blogger verify the output matched the golden spec? I assume he did, but I'd really like verification.

With SPEC it's really easy to use a wrong compiler flag or make a change to an expected int-width and end up with garbage outputs and be none the wiser.

pavpanchekha · on April 10, 2016

The blogger in question is John Regehr, a very well known researcher in efficient and sound compilation. I suppose I don't have his word, but I can only imagine that he did make sure his modifications were safe.

Halienja · on April 12, 2016

Take a look at this - http://clang.llvm.org/docs/LanguageExtensions.html#builtin-f...