Hacker News new | comments | show | ask | jobs | submit login
Fast integer overflow detection (2012) (kqueue.org)
64 points by luu on Dec 14, 2014 | hide | past | web | favorite | 45 comments



(This article may need a [2012])

GCC 5 has builtin, efficient overflow checks:

https://gcc.gnu.org/gcc-5/changes.html

A new set of built-in functions for arithmetics with overflow checking has been added: __builtin_add_overflow, __builtin_sub_overflow and __builtin_mul_overflow and for compatibility with clang also other variants.


Anyone knows if something like this is available for C++ when using Visual Studio? I just started experiencing issues with this a few days ago when porting my application to x64. Would be nice if it was possible to enable something in debug mode so that an error is triggered if an overflow occur


Look at valgrind?

Although I don't know if it works with VS.


Valgrind doesn't do integer overflow checking. In general, it's not even possible to add such checks via binary translation, as on two's complement systems most unsigned and signed operations are compiled to the same machine instruction.


Valgrind doesn't work on windows.


...Which only helps you if you discard compatibility, unless I'm mistaken.

The question remains of how to detect overflow on add in portable C code.


Detect on add? Not sure. Detect before add? Simple!

bool int_add_safe(int a, int b) { if (a >= 0 && b >= 0) return INT_MAX - a >= b; if (a < 0 && b < 0) return INT_MIN - a <= b; return true; }

If you ever have an instance that will result in an overflow you've got some kind of bug on your hand. Or at least some kind of circumstance that requires special handling. You don't want to overflow, not perform the add, or perform the add and discard the overflow. All three sucks and sweep a bug under the rug in some instance. Computers suck sometimes. :(


It's possible to simplify your code.

    bool int_add_safe(int a, int b) {
        if (b < 0) {
            return (a >= INT_MIN - b);
        } else {
            return (a <= INT_MAX - b);
        }
    }
Codegen comparison: http://goo.gl/Tgj7nu (If you switch to GCC it makes a mess of your version). You're right in that all portable solutions suck, but on x86 it will certainly be faster just to do the add and check the overflow flag.


Nice, with the conditional operator and a single return it's even terser:

    bool is_safe_to_add_ints( int a, int b ) {
        return b < 0 
               ? a >= INT_MIN - b 
               : a <= INT_MAX - b;
    }


Neither of those two compile down to what I imagine would be the best on x86, namely an add and then a check of the overflow flag.

Is there any portable version that compiles down to that on GCC or Clang on x86?


Once you have a function consistently used in your code the portable version can be used for the targets for which you don't provide an asm implementation, and the asm (or the solution customized to the compiler) can be selected implementation otherwise, covering the most common processors of today.


Alternatively, bug the compiler writers to make their compiler recognise that portable version. That way, you don't have to maintain all those special code paths.

Getting the compiler writers to help you is easier if your project is significant, or if you use a portable version that a big open source project uses.


My proposal costs the implementer just a few #if lines in one file. Your alternative costs much more people much more time, and the gain is then just... the implementer saves a few #ifs.


By that logic most HLLs shouldn't have been made.

Remember - write once run many times. Implementing it in GCC / etc once means that many people won't have to do it themselves - especially as it's far more likely that one implementation will be bug-free than many people writing their own.


It's not the feature that anybody would use as is, when you'd use it you'd prefer to call the function. But then what's behind the function is irrelevant for the users of the function. We don't force compiler authors to put in the compiler what can be nicely put in the libraries.

It's: write one header with a few #if's once, everybody use it, whenever she needs it.

You'd just call

    is_safe_to_add_ints( s, t[i] ) 
and be sure that on the common platform it's optimal thanks to the #if's in the header, and not write

    bool bsafe = (t[i] < 0) ? s >= INT_MIN - t[i] : s <= INT_MAX - t[i];
every time you'd want to check if the addition would be safe.


But you want

   if( !canSafelyAdd(a,b)) return ERROR;
   c = a + b;
To compile down to an add and a check of the overflow bit. For that, you need help from the compiler.

There are many examples where compilers special-case what is in a library. For example, compilers know they do not have to call strlen every time through this loop:

    for(int i = 0; i < strlen(s); ++i)


The enough help you need is, in the implementation of the library function #if X86 #if GCC etc and then the assembly for each case, otherwise the slow but standard implementation. And that's one file, the code never to be directly used but by function name. It's already done this way in the current libraries for a lot processor specific stuff.


Since the nature of overflow is basically completely undefined by the C standard the idea of detecting it with portable C code strikes me as an obvious oxymoron. You will have to sacrifice some kind of portability in order to detect it.


You have two fundamental misunderstandings here.

First, you misunderstand what I - and the article - mean by detecting overflow. To put it another way - you have two integer values, A and B, and you want to know if A + B is representable in a given (signed) integer type. If neither A nor B are of the longest supported type, it's easy, but if A or B are of the longest supported (signed) integer type on the system, it's much harder. That's what this article is talking about.

Say, for example, you're doing the Pythonic approach of storing integers as either an integer or a pointer to a bigint. Well, if you have two non-bigints, you need to know if the result will fit in a native word or if you're going to have to make a bigint and do the slow path.

Second, detecting if an operation would cause undefined behavior is something you can indeed do in valid portable C code, for some operations at least. Among other things, that's exactly what checked access for an array does!


"or if you're going to have to make a bigint and do the slow path."

You can cheat, sort of, and run your signedint code on a 64 bit processor and pretend you're running on a 63 bit processor and hit the slow path when you hit MAXINT for a 63 bit processor, not for a 64 bit processor. You can always add 63 bit signed ints on a 64 bit signed int capable processor.

Or maybe simpler explanation is there do exist 64 bit signed ints that can't be added and represented in a 64 bit signed register. But there do not exist any 32 bit signed ints that can't be added together and fit in a 64 bit register.

Its possible to play benchmarking games to make an unrealistic result that makes this look horrific. In reality under "normal" workloads it'll work just fine and the time you save by implementing artificially low limits on a slightly larger processor will save you a lot of debugging and writing and troubleshooting time.

"Well, yeah, this is a 64 bit proc, but to make life simpler we pretend its 1 sign bit and 62 int bits and anything bigger gets promoted to some arbitrary precision software library"

Another trick is not to use signed ints or at least minimize it, which is not as crazy as it may sound. If you can eliminate a whole class of arithmetic errors by changing program style...


Generally you'll be doing this anyways (as you'll be storing it with a bit to tag if it is a pointer).

However, it is... interesting... that a language that is so low-level does not have any simple construct for doing this, especially seeing how trivial it is in x86 assembly (add the numbers, then check the overflow flag).

And saying "just don't used signed arithmetic" misses the point. If a "general-purpose" programming language cannot do something that is trivial in the next layer down, and that occurs as commonly as wanting to do something if there would be an overflow, I am inclined to blame the language, not the coder.


> especially seeing how trivial it is in x86 assembly (add the numbers, then check the overflow flag)

I'm no expert, especially not on modern x86 processors or compilers for them, but I suspect it isn't so trivial as scattering such instructions may affects branch predictor tables and indirectly degrade performance in other parts of code.


You require one branch regardless, and the trivial version is one branch, i.e. already at the minimum amount of branches.


no, you can check your inflating operations before executing them.

But the idea that in 2014 this integer overflow detecting in C, is still a headline is frankly scary. We are all living a very fragile life.


Integer overflow is really a strange problem - it is no problem at all on the assembly level but then we abstracted away the details in our higher level languages for convenience or what ever only to realize that we now regularly shoot our feet. It is nothing more than a language design mistake.


Hm? In Lisp or Python, you don't have overflow issues because ints can't overflow. It's only a problem at the assembly/low level.

EDIT: The point of the parent comment is that integer overflow "isn't a problem at the assembly level, only in high level languages." But high level languages actually prevent the problem. You don't even have to worry about ints overflowing when you use Python or Lisp. You do have to worry if you're assembly or are using a lower level language like C. So the situation is actually opposite from what the parent comment describes.


No, it is a problem with languages if you want to have competitive performance or precision. On the assembly level it's trivial to check the OF. But compilers and the libm didn't care enough, and languages didn't know better, as old knowledge was forgotten or ignored.

The good old Lisp's used those assembly ops to detect the overflow flag for decades and promote to bigint afterwards, and bad languages like Perl, Python, Ruby (created much later then the good old languages) did it the slow way with multiple checks before and after and then promote to float and loose precision then. lua at least refused to deal with this crap and does everything with double and looses precision by definition.

Newer languages are now coming back to the old lisps and auto-promote to bigint, not double. But they haven't found out yet how to check the OF after the op, as only a few can compile natively or deal with inline assembly.


Arbitrary-precision arithmetic is probably not a general solution but it might indeed be worthwhile to consider using it more often and especially if performance is not a major concern.


Performance is not a major concern. Security and reliability trump performance considerations. In my experience, especially at large companies, unfounded "performance concerns" are the root of much evil.

This isn't true all of the time. HFT and gamedev, for example. But gamedev only uses C++ because everyone else uses C++, and HFT often uses languages other than C/C++ from what I've heard. Though the performance-critical sections are usually transformed into C.

Though as I write this, I think of Chromium and Firefox and realize that C++ was probably the correct choice for both. On the other hand, I hear Rust is making some pretty great strides in that context.


I would be interested in the impact of just replacing every fixed size integer with variable length ones. Plain integer array - gone. Single cycle math operations - gone. I really can't tell what the impact would be.


A sane implementation of bigints uses pointer tagging such that arithmetic on sufficiently small numbers (e.g. two 63-bit operands on a 64-bit machine) is still just a couple of cycles, with just an extra comparison and a (most likely optimally predicted) branch in the hot path.

Then, assuming the bigints are implemented in the core language, we could make the compiler optimize the bigint checks away when possible. For example, consider this hypothetical function:

    void memzero_range(char* array, int start, int end) {
        for(int i = start; i < end; i++)
             array[i] = 0;
    }
Now, what can we infer from the 'array[i] = 0;' in the body of the loop? Certainly, on a 64-bit machine it's not possible for i to go over 2^64 without experiencing undefined behaviour. So a smart (and conforming) compiler could just do if (is_bigint(start) || is_bigint(end)) abort(); once, and then execute the loop itself without any overhead from bigints.

Or well, that's at least my personal dream of what should happen.


HighER level than assembly is where the problem exists. You point out that it's even stranger. It doesn't exist at very low levels or at very high levels. But C has this problem anyway.


> It is nothing more than a language design mistake.

Well, Ada[0] had ranged integer types which allowed the compiler to determine whether overflow can occur statically. I don't know why modern languages like C#, Java, Go and Rust don't do this instead of taking the performance hit... in general no JIT will save you, and it's useful for things like formatted I/O and parsing as well

[0] https://en.wikipedia.org/wiki/Ada_%28programming_language%29...


I'm pretty sure Ada's ranged integer is tested dynamically, not statically. It's still a cool language feature. To get a static detection of integer overflow, I think that you need to use some variation of dependent typing.

So it's possible in Liquid Haskell. And of course, it's easily done in dependent-typed languages like Agda, Coq, and Idris.

Edit: I had earlier written Nim has this static range checking capability, but it looks like it also uses a runtime check.


Interoperability may be a factor. Everything that enters your world will tell you nothing about the possible ranges and you pretty quickly have to assume the worst case everywhere or add range limiting assertion early. It still sounds like a good idea but maybe it's harder to do than it seems.


Range checking early doesn't strike me as a bad thing at all.


Me neither. But it won't be for free and integer overflow is only an issue for a tiny fraction of all the code and developers are lazy...


It seems like a lot of problems in C/C++ is caused by "this code might be removed duration optimisation." Makes the whole environment undefined.

Why has nobody pushed for marking key sections of code with "don't optimise away" flags?

In C# you have this:

    [MethodImplAttribute(MethodImplOptions.NoOptimization)] 
    public static string GetCalendarName(Calendar cal)
    {
       return cal.ToString().Replace("System.Globalization.", "").
                 Replace("Calendar", "");
    }


In gcc there are some pragmas you can use to to this:

for instance `#pragma gcc optimize("O0")` should work.

https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Function-Specif...

But the whole "this code might be removed during optimization" isn't exactly true. The compiler has to keep the meaning of your code the same. But once you introduce undefined behavior into the code, everything is meaningless from then on and gcc can do tons of wacky stuff.


>Why has nobody pushed for marking key sections of code with "don't optimise away" flags?

The issue isn't that certain code is optimized away. That's just a symptom of the fact that it's incorrect C code relying on the way a specific system treats undefined behavior. Checking for overflow can be done with correct C code that doesn't rely on undefined behavior, so putting in a hack that lets incorrect C work in some situations does not seem like the right answer.


In practice, there is no such thing as "no optimization" in C.

If you map every variable read to a memory read and every variable written to a memory store, your code will run incredibly slow. At the least, you will want to do register allocation and constant folding.

That's not so bad w.r.t. undefined behavior, but you also will want to do dead code elimination. If you don't, program size will blow up (think of macro definitions for DEBUG) That dead code elimination is what causes many of those portability issues.

In C, undefined behaviour is the price we pay for small code size and fast execution.

Also, are you sure C# doesn't optimize code marked DontOptimize? I'm asking because the documentation at http://msdn.microsoft.com/en-us/library/system.runtime.compi... says "The method is not optimized by the just-in-time (JIT) compiler or by native code generation (see Ngen.exe) when debugging possible code generation problems.", but doesn't tell what that last phrase means.


(Most) things that are undefined behavior are so for a reason.

In this case - you don't know if the system you're on even stores numbers as twos complement. And if it doesn't, it may not have the same behavior. So even the unoptimized version may fail.


I think in most cases, people writing code to check for overflow do know whether the system they're on stores integers as twos complement, and they'll have a pretty good idea what will happen when overflow occurs. A portable approach for this sort of thing would be very useful, but in the mean time, perhaps we could settle for compilers that don't strip out reasonable code?

I know, I know - I'm asking a lot.

My favourite two links on the subject: http://blog.metaobject.com/2014/04/cc-osmartass.html, http://robertoconcerto.blogspot.co.uk/2010/10/strict-aliasin...


The good thing is that this allows to optimize out in reasonable places too.


Chromium has implmenented a safe numerics library for overflow detection - see:

https://code.google.com/p/chromium/codesearch#chromium/src/b...




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: