
How undefined signed overflow enables optimizations in GCC - ot
http://kristerw.blogspot.com/2016/02/how-undefined-signed-overflow-enables.html
======
pak
Wasn't there substantial controversy over gcc optimizing away certain code
that tried to check for integer overflows?

Check out this infamous thread:
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475)
...it gets pretty nasty!

The takeaway is that although overflow is technically undefined, a lot of
security-related code was naively implemented by checking for overflows
occurring in the most typical way (wraparound), and this check began to be
removed by this sort of optimization. This made some pragmatically minded
security folks quite angry, because although such behavior is technically
undefined by the spec, it was the most widely used method of defending
_against_ overflows, and recognizing that such code was already everywhere in
the wild they feared all the vulnerabilities that would be introduced by gcc
dropping branches for i > i + proposed_increment.

~~~
bluecalm
>>The takeaway is that although overflow is technically undefined

As they say: technically correct is the best kind of correct.

>>This made some pragmatically minded security folks quite angry

They sound very entitled to me. The C standard is clear on the issue, it's not
some arcane hidden thing, it's just a fundamental behavior of basic types in
the language. It's very annoying to read the entitled comments from a person
who is clearly in the wrong bashing the GCC crew.

>> it was the most widely used method of defending against overflows, and
recognizing that such code was already everywhere in the wild they feared all
the vulnerabilities that would be introduced by gcc dropping branches for i >
i + proposed_increment.

As mentioned by maintainers in the thread there is already a flag to catch
unsigned overflows. They even mentioned a way to catch those overflows in the
code.

I want a compiler to be technically correct and do optimizations which are
possible as the whole point of writing things in C is for them to be fast and
efficient. It's nice that GCC has quite a big repertoire of flags, checks,
sanity checks, warnings, sanitizers etc. They are very helpful. Demanding that
the core compiler doesn't use the language specs to produce better code is
pathetic though.

It also didn't "break" any real world things. You just need to use proper
compiler flags to compile your non-conforming code if you want a newer version
of the compiler. Then thank the hard working compiler maintainers for
providing those to you as they could've spent their time on something more fun
than fixing shitty code for you.

/rant

~~~
userbinator
_The C standard is clear on the issue_

The problem is succinctly summarised by the saying "In theory, there is no
difference between theory and practice. In practice, there is." Unfortunately
what most programmers think of as "C" is subtly different from how the
standard defines it. In other words, perhaps we should fix compilers (and
eventually the standard) to match reality instead of the other way around.

The standard even states in the section on undefined behaviour that "behaving
during translation or program execution in a documented manner characteristic
of the environment" \--- exactly what C programmers are usually expecting from
UB --- is a possible choice.

~~~
to3m
Yes. People are quick to point to the standard when it comes to compilers
doing something surprising, and quick to use this as evidence that programmers
are silly to expect the expected. But if compilers did exactly what these
programmers expect, they could point to the standard just the same...

(At least this article provides a good explanation of advantages to be had
from being aggressively pedantic about UB. Many by way of evidence just wave
their hands and claim people are stupid.)

~~~
stouset
If you don't go by the spec, what _do_ you go by? Intuition? What if yours is
different from someone else's?

In this case as well, acknowledging that signed integer overflow is undefined
behavior allows for _more_ consistent behavior in certain circumstances. For
instance, the compiler knows that `x < x + c`, where `c > 0`, and optimize it
to `true`. Even if you don't care about the optimization, is it not _equally_
surprising that such a statement could ever be `false`?

No matter which way you go, _some_ behavior is going to be surprising. And in
these scenarios, the only plausible thing to do so is to follow the standard,
so at least there's a consistent ideal that everyone's intuitions can converge
to, which is crucially important. Each compiler implementing a different
C-alike standard just leads to more confusion.

~~~
userbinator
_If you don 't go by the spec, what do you go by?_

To quote the standard, "a documented manner characteristic of the
environment."

If the machine is two's complement (which is the vast majority of them), then
you'd expect that behaviour. If it's sign-magnitude or one's complement or
saturating, then the behaviour would be different but still consistent.

 _Even if you don 't care about the optimization, is it not equally surprising
that such a statement could ever be `false`?_

Overflows wrapping around is not surprising, it's what decades of computing
hardware (or centuries if you include mechanical calculators like
[https://en.wikipedia.org/wiki/Pascal%27s_Calculator](https://en.wikipedia.org/wiki/Pascal%27s_Calculator)
) have always done.

~~~
stouset
> If the machine is two's complement (which is the vast majority of them),
> then you'd expect that behaviour. If it's sign-magnitude or one's complement
> or saturating, then the behaviour would be different but still consistent.

How is this different in practice from undefined behavior, other than the
compiler can't take advantage of it for optimizations?

------
eloff
Since this is likely to degenerate into a discussion on the merits and
pitfalls of optimizing undefined behavior, how about a look at this thread on
mechanicalsympathy where the topic was beat to death just a short while ago:

[https://groups.google.com/forum/#!topic/mechanical-
sympathy/...](https://groups.google.com/forum/#!topic/mechanical-
sympathy/jAqF8f0HdM0)

One of the juicier replies discussing some of the undefined behavior in C (by
Rajiv Kurian):

1\. A program in a hosted environment does not define a function named main
using one of the specified forms - not a compile time error, it's UB.

2\. The arguments to certain operators are such that could produce a negative
zero result, but the implementation does not support negative zeros.

3\. Two declarations of the same object or function specify types that are not
compatible - again not a compilation error, it's UB.

4\. An unmatched ' or " character is encountered on a logical source line
during tokenization. During tokenization!!

5\. An exceptional condition occurs during the evaluation of an expression -
how specific.

6\. An exceptional condition occurs during the evaluation of an expression -
hmmm

7\. A structure or union is defined as containing no named members - ookay.

8\. The value of an unnamed member of a structure or union is used - why is
there a compiler at all!

9\. The character sequence in an #include preprocessing directive does not
start with a letter - lol

10\. The program modifies the string pointed to by the value returned by the
setlocale function - dear god.

11\. The string set up by the getenv or strerror function is modified by the
program .

12\. The array being searched by the bsearch function does not have its
elements in proper order - not a logical error just straight up UB.

This is the tip of the iceberg. How confident is any one now that their
program doesn't invoke UB? Things many programmer would assume are compiler
errors are actually UB. What would happen if gcc/clang exploited each one of
these?

~~~
rdc12
12\. Well breaking the invariants of a function is never a good idea, but
putting defined behaviour in that case would put substational restrictions on
how the function could be implemented (and still almost certainly do something
weird anyway latter)

~~~
umanwizard
It could be implementation-defined without being UB.

------
barrkel
This is a list of optimization patterns; it's not an analysis of how much
actual code gets sped up, and it doesn't consider the accommodations that
would be made in the alternative universe: where defined signed overflow is
the default, and there are other tools for when this impacts performance too
much.

The question is if the alternative universe is better than this one, where you
have to work harder to protect against unwanted overflow - and especially the
compiler eliminating your efforts to test whether it occurred or not.

~~~
joosters
The command line switch '-fno-strict-overflow' will grant you access to this
alternative universe.

~~~
barrkel
The alternative universe includes a different C++ standard and potentially
additional language constructs or types to opt out. Different defaults have
cultural implications far beyond the mechanics of any single compile.

------
quotemstr
Any range information the compiler intuits from signed values not overflowing
could, in principle, be given explicitly with the following code:

    
    
      if (foo > 1000) __builtin_unreachable();
    

Or macros that make that kind of thing look prettier. I prefer using unsigned
values everywhere for both safety and semantics. (It's not possible to have a
negative number of bananas, so why should I use a type that admits a value of
-3 bananas?)

~~~
ot
Clang also has a more explicit __builtin_assume(expression) builtin, although
anecdotally GCC makes better use of these hints. Folly has a function to make
it portable:

[https://github.com/facebook/folly/blob/bb5ed8070d533c016e1e9...](https://github.com/facebook/folly/blob/bb5ed8070d533c016e1e93cd274e76ce28a780bb/folly/Assume.h#L35)

------
nullc
It doesn't say that much about the loop optimizations--

The cases where I've seen the largest gains come from loops where analysis
shows that the loop runs, say, exactly 8 times or forever (due to overflow).
Knowing it runs 8 times exactly lets the loop get unrolled, which then lets
vectorization work... and then some inner loop runs more than 3x faster.

~~~
chrisseaton
In languages with GCs or managed in some other way, knowing the upper bound of
loop iterations is also important because it allows you to remove the check-
if-we-need-to-stop-the-world operation from the loop.

This applies even if the loop isn't unrolled. For example if I see a loop that
definitely doesn't iterate more than 2^32 times (because it doesn't overflow)
and I can see it only has a few simple instructions in it, then I may decide
to omit the check from the loop because I know it can't take that long.

~~~
barrkel
In Java specifically, I've gotten good results by following this technique:

\- start with an empty routine for the main loop to be optimized

\- add the simplest loop you can that iterates over your source data, written
such that has a simple obvious bounds check

\- test the loop in a microbenchmark harness and look at the generated
assembly if necessary to verify that bounds checks get hoisted out, the loop
is being unrolled, etc.

\- slowly add more code and continuously verify that assumptions are met

\- if the loop body gets too complex and the JVM stops optimizing it properly,
try and split the loop into multiple seperate (possibly nested) loops that are
in their own functions, and repeat the process

\- when finished, experiment with inlining loops again

Coming at it from the other angle - a complex loop that you poke until the
generated code is decent - is much more frustrating. E.g. manually unrolling
loops is fairly pointless - you really really want the JVM to unroll the
loops, because it will do so with fewer bounds checks that you can't eliminate
yourself. Start with the JVM producing good fast code, and keep it that way as
the complexity grows.

~~~
chrisseaton
Yes I understand that the java.util.concurrent code includes a lot of loops
that look unusual, and in fact they're carefully constructed in such a way as
to just avoid the poll (the check-if-we-need-to-stop-the-world part) for
inner-loop concurrency operations.

------
MichaelBurge
I never see anyone do this, but it doesn't seem like a bad idea to run a
program through a C interpreter and see if it ever triggers undefined
behavior. As I understand it, the C standard is written so that all undefined
behavior is able to be caught in this manner. I could even see something like
gcc coming with a 'create an interpreted executable for debugging purposes'
flag.

It wouldn't constitute a formal proof or anything, but it would solve the
problem of people leaving overflow and null pointer checks in plain sight
expecting them to be hit.

~~~
throwaway2048
Undefined behaviour isn't a set of checks the compiler goes through before
deciding to call invoke_nasal_demons() which you can then just replace with
yell_loudly_about_UB(). Undefined behaviour is a set of conditions that the
standard says they can pretend do not occur, ever, so certain results are
easier to obtain because the UB is never checked for and assumed to have not
occurred.

UB is fundamentally not a well defined condition because it is the absence of
knowledge about the state of a program, and many UB conditions are very very
hard to detect if not actually impossible (cf Turing) which is why the
standard says the compiler doesn't have to care to begin with.

Its very nature means detecting UB in anything approaching a reliable way is
impossible, the C standard is not remotely compatable with such an idea.

------
brianberns
I wonder how often actual code uses these patterns. Is there anyone who would
write (-x)/(-y) instead of x/y?

Also, some of these patterns claim to eliminate something, but they actually
just move it around. For example, -(x/y) -> (-x)/y doesn't eliminate negation,
so it's not clear to me that the latter form is faster than the former.

~~~
Ono-Sendai
It doesn't really matter if someone would write such code by hand. Code can be
transformed into that pattern by compiler transformations such as inlining,
CSE, constant folding etc..

------
cbsmith
This kind of makes a great case for why signed/unsigned ought to be a feature
independent of overflow.

------
Kenji
So, does that mean we should use unsigned integers whenever possible, such
that we have defined overflows and still benefit from optimizations?

~~~
lmm
No. You should use a language better aligned to the processor - one in which
you can express a "loop exactly n times" construct directly, without having to
add a ceremonial iteration variable.

~~~
chrisseaton
> one in which you can express a "loop exactly n times" construct directly

The problem with this advice is that there are so many things that the
processor may be interested in knowing declaratively, rather than having to
claw it back from the semantics of your application somehow, that the problem
has become totally unmanageable. And even if we could manage it, the next
generation of processor microcodes may be completely uninterested in what
we've carefully encoded into your programs are actually want some other
information instead.

~~~
1amzave
I don't think the problems with that advice are singular. Perhaps the most
ultimately-relevant is that looping exactly N times without a "ceremonial"
counter is really only very rarely useful. Truly, how often do you see the
canonical `for (i = 0; i < N; i++)` pattern in C with _no_ references to `i`
anywhere in the loop body? (Hint: if there are any, it's not "ceremonial".)
Blindly repeating the _exact_ same sequence of code just isn't something you
want to do very often.

To address the issue of "okay, in what languages _could_ we do this anyway?":
taken in full mathematical generality, such a capability would necessarily
require bignum support, which cuts down your options significantly. I'm not
sure if that was actually the intent, but if we instead accept the limitation
of "up to some largeish power of two", constructing a macro to do this
"directly" in C would also be quite easy. So... _(shrug)_

~~~
lmm
I agree that it's rare, but aren't those exactly the cases where this kind of
optimization is relevant? In the cases where you're using i and it could
overflow, the optimization couldn't be applied anyway.

------
ssalazar
EDIT: I see- a different reading of the post than my original take would
suggest that the author is saying, you as a C/C++ programmer are "not allowed"
to compile code that would cause under/overflow. The compiler assumes that you
are following this rule, enabling these optimizations. The author's wording is
unclear as to who or what is not being allowed to under/overflow, hence my
confusion.

\---

Right off the bat-

> Signed integers are not allowed to wrap in C and C++

What? This is...not true.

    
    
      $ cat > test.c
      #include <stdio.h>
      #include <limits.h>
      
      int main()
      {
          int i = INT_MAX;
          printf("%i\n%i\n", i, i+1);
      }
      
      $ cc test.c 
      $ ./a.out 
      2147483647
      -2147483648
    

Im not an expert on these matters, but I suspect signed integers are _not
defined as wrapping_ and in fact not defined as doing anything particular in
under/overflow situations, though in one common convention, two's complement
integer arithmetic, they often do. The optimizations rely on the fact that the
programmer can't depend on any specific overflow or underflow behavior.

~~~
hyperpape
You are incorrect, or perhaps correct in a nit picky way[0]. It is undefined
behavior, as any Google search can tell you. Your compiler is just being nice
--it is allowed to print "screw you" as the result of that program.

For more about undefined behavior, try here:
[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know.html?m=1)

[0] Since its undefined behavior, the compiler can choose to wrap it. If
that's what you're saying, I think it's pretty confusing expressed the way you
did.

In spite of that, a well defined C program is not allowed to overflow, and a
compiler can compile a program that overflows it absolutely any way it chooses
to. It is in fact able to optimize the program based on the assumption that
the overflow cannot happen, even if it is actually possible.

~~~
geofft
I think 'ssalazar is interpreting the phrase "not allowed to wrap" to mean "if
the C compiler causes them to wrap, it is violating the spec". Undefined
behavior means _anything_ is allowed; this includes reasonable behavior.

(Remember that undefined behavior exists for the purpose of allowing
optimizations. Adding code to print "screw you" is rarely an optimization.
Doing just what the programmer expected is _usually_ an optimization, which is
why UB is so tricky; it's just not always an optimization, and doing something
unexpected might be faster.)

~~~
hyperpape
You're right. I added a note when I thought of this, and if that's what the
parent meant, I think it's an exceedingly unhelpful addition.

------
quotemstr
So compilers use a tricky and unintuitive corner of the C standard as a half-
assed substitute for doing real program-wide bounds analysis.

~~~
bluecalm
It's not a tricky corner of C. It's the most basic C thing ever if you are not
not negligent and actually check the standard for what happens when you
overflow a signed int.

I mean, the C standard isn't that long nor complicated. There aren't many
basic types in C, learning how they behave in basic situations like division
by 0, overflow etc. is an hour long task.

~~~
eloff
May I refer you to the 13 pages of section J.2 Undefined behavior of the C
standard[1]. I hope you don't have any other appointments after your "hour
long" task...

[1][http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1256.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)

~~~
bluecalm
We are talking about the basic operations on basic types in C, not all
possible undefined behaviors. You don't need to learn a list of possible
undefined behaviors. You just need to check how your basic types behave when
you try to store numbers too big for the number of bits they have. It's
something you learn in like first few hours of your introductory C course.

~~~
pjmlp
Assuming you aren't targeting multiple systems, which with its own C compiler.

