

Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior - ColinWright
http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf

======
osivertsson
More info: [https://css.csail.mit.edu/stack](https://css.csail.mit.edu/stack)

Code: [https://github.com/xiw/stack](https://github.com/xiw/stack)

------
octo_t
Simple solution: remove the undefined behaviour from your codebase.

~~~
DanWaterworth
Better solution, don't use languages that allow undefined behaviour.

~~~
wslh
Do you mean Haskell? because its compiler is very advanced and does complex
optimizations.

~~~
tel
It's semantically well-defined, though. The compiler will never change the
value of your result, it will only discover more efficient ways to compute it.

------
Too
C is deceivingly simple, I wonder how many people know about strict aliasing
rules for example. To verify a C-program you should really 1. Do static
analysis, preferable using many different tools, 2. Test on different
optimization levels, 3. Test on different compilers.

------
bowyakka
I posted this a while ago
[https://news.ycombinator.com/item?id=6230503](https://news.ycombinator.com/item?id=6230503)

It still intrigues me that this happens, I do wonder if it is limited to
languages like C or if it could conceivably happen in anything with an
advanced runtime (think the porter stemmer bug that broke java7 on release)

Maybe we can introduce this to jit-spraying ?
([http://dsecrg.com/files/pub/pdf/HITB%20-%20JIT-
Spray%20Attac...](http://dsecrg.com/files/pub/pdf/HITB%20-%20JIT-
Spray%20Attacks%20and%20Advanced%20Shellcode.pdf))

------
mpweiher
Seems to be a problem of language-lawyers coding compiler optimizations. Stop
doing that. See also "Doctor, it hurst when I move my arm this way".

~~~
mikeash
Not at all. The problem is that there are useful, effective optimizations
allowed by the language spec which _also_ have unexpected consequences on
subtly incorrect code.

Take an oversized shift, for example. The result is undefined, and for good
reason. This allows an expression like x << y to be translated directly to a
single machine instruction on just about every architecture. If there was a
defined result for oversized shifts, then many architectures' instructions
would not match it, forcing the compiler to generate inefficient code on those
architectures.

For every example of a weird optimization causing problems, you can find an
example of that same optimization making code go much faster.

You could say that compilers should value correctness over speed, and I would
agree with you, _but:_ all of these optimizations _are_ correct. These are C
compilers, and they compile C programs just fine. Where they have trouble is
on programs that aren't actually valid C, but are just close. You can argue
that they are being technically correct in a bad way, but then if you abandon
the standard and decide that you should "correctly" compile a certain class of
invalid programs, how do you decide what "correct" means?

~~~
cliffbean
Ordinarily, unsigned left shift silently shifts bits off the end.

    
    
      0xffffffff <<  0 is 0xffffffff
      0xffffffff <<  1 is 0xfffffffe
      ..
      0xffffffff << 30 is 0xc0000000
      0xffffffff << 31 is 0x80000000
    

It is defined to ignore overflow. Therefore, I claim that there's enough
consistency here that we can extend the pattern to this:

    
    
      0xffffffff << 32 is 0x00000000
    

I claim this really is the mathematically correct answer. It's consistent with
the most obvious pattern, and there are no other relevant patterns.

C has chosen to call this undefined behavior instead of returning the correct
answer here, and I call that choosing optimization over correctness.

~~~
joosters
Are you so sure that all architectures that implement a left shift operator
work the same way? For example, if you shift left by 'x' they might shift left
by 'x modulo 32'

With C, you could still use the shift instruction on that architecture. With
your 'correct answer', the compiler would have to output instructions to check
the value of x prior to the shift.

Besides which, shifting is just one specific case. C has an easily
understandable and documented behavior: overflow is _undefined_ , no matter
how you achieve it. We don't have to go into horrific detail about what
happens when you overflow with adds, multiplys, shifts and so on.

~~~
mikeash
On some architectures it's the way he wants it (always zero on overflow), on
some it's the way you suggest (shift is mod 32), and on some it just generates
junk.

Bitshifting is an operation that people generally expect to be extremely fast.
Adding a check on every single operation generated by a C compiler just to
guarantee identical results upon overflow probably would not go over well.

