
Will It Optimize? - DallaRosa
http://ridiculousfish.com/blog/archives/2010/07/23/will-it-optimize/index.html
======
udp
Previous submission and discussion:
<http://news.ycombinator.com/item?id=1540567>

------
Roritharr
I remember learning about compiler optimization at university. Sadly all my
colleagues knew(and were taught) was Java and the Sun JVM we used at the time
didn't even do most of the stuff we learned, not even the tail-recursion
optimization, therefor the opinion of most of the students was that the
professor should stop babbling about things that have no application in real
life.

This was the one time i was on the side of academia...

~~~
iam
Thankfully since then the HotSpot JVM has become one of the most advanced VMs
in terms of JIT optimization, doing things that are so wild and crazy that
other JITs can only dream of.

------
ajross
Quibbles: the strlen() example is poorly explained. It doesn't hoist it
because it's "built-in", it hoists it because strlen() is marked "pure" (i.e.
without side effects) in the declaration. Any function can get that attribute.

~~~
wzdd
There must be more to it than that. For example, if something in the loop body
writes to s, then the result of "strlen" could change and the optimisation
will be wrong.

So, it must know a bit about what strlen() does, mustn't it -- otherwise it
wouldn't know that it has to ensure that \0 isn't written to s inside the loop
body.

~~~
saucetenuto
GCC can statically prove that `s` isn't changed at all in the loop body, just
accessed. Thus the analysis goes: `strlen()` is pure, and its argument never
changes, so its result never changes, so I can hoist it.

~~~
ssp
No, there is more to it than that. It has to perform an aliasing analysis to
prove that nothing that s _points to_ changes. Consider this code:

    
    
        unsigned sum(const unsigned char *s, char *t) {
            unsigned result = 0;
            for (size_t i=0; i < strlen(s); i++) {
                result += s[i];
                *t = 'x';
            }
            return result;
        }
    

gcc can't hoist strlen() in this case, because *t = 'x' could potentially
overwrite the null termination of s.

~~~
sausagefeet
Can GCC handle more complicated loops? For example what if I go several
functions deep in in my loop with s? What if I call a function in a shared
object? Does gcc do this analysis for all cases where a function is marked
pure or is strlen a specific case?

~~~
ajross
The compiler can't see inside functions in the general case (because they're
opaque binary blobs in a library that hasn't been linked yet), so no. It's
limited to optimizing stuff it can "see", which broadly means the current
function and anything inlinable into it.

But yes, if all your functions are marked inline and visible to the compiler,
there's no theoretical limit to GCC's ability to hoist constants out of the
innermost parts of the loop. In practice, there are surely some performance
limits to how much inlining will be expanded.

------
alecco
Oh, I remember getting upset last time with this "test". For example:

    
    
      3. Multiplication by 2 to addition - integer
      Will GCC transform an integer multiplication by 2 to addition?
    

The statement (not function) x * 2 is shift x left 1 bit on almost all
compilers. Shift has a lot less dependencies than ADD/LEA and has better
reciprocal throughput. Meh.

~~~
DarkShikari
_Shift has a lot less dependencies than ADD/LEA and has better reciprocal
throughput._

Nope. Usually it's the reverse: Atom, for example, can only do one shift per
cycle, but it can dual-issue adds.

~~~
ajross
This surprises a lot of programmers. A barrel shifter is a _big_ circuit for
an ALU. Smaller than a multiplier obviously, but much larger than a ripple
carry adder and somewhat larger even than the fancy pipelined adders used in
fast CPUs.

It's the kind of thing that got skimped on in Atom. Lots of stuff runs
surprisingly slow on Atom. Another one I find striking is that there are two
pipes for ADD/SUB instructions, but ADC/SBB (the carry/borrow variants) are
not just single-issue, but apparently unpipelined. Intel lists their
throughput at 3 instead of 0.5!

~~~
psykotic
> Intel lists their throughput at 3 instead of 0.5!

Agner Fog lists ADC/SBB on Atom as having latency 2 and throughput 1/2. I
guess Intel could have made them run with ADD/SUB-level performance in cases
where there are no intra-pair dependencies on the carry/borrow flag, but the
extra interlocks for handling that were probably not worth the cost.

------
samth
The comment about tail calls is wrong -- it's not an optimization, it's a
guarantee about semantics of your program. Not optimizing tail calls is like
making array dereference run in linear time instead of constant: it changes
the programs you can write.

~~~
barrkel
Irrespective of how it changes the semantics of (mostly functional) languages
that rely on it for iteration, tail calling is also an optimization in
languages that don't need to rely on it for this, and it is not wrong to
describe it as such.

~~~
samth
There are two things here: an optimization, and a semantics guarantee.
Obviously, if your language provides the guarantee, then implementing it
correctly isn't just an optimization. But the original article claims that the
transformation proves that proper tail calling is "just an optimization",
which it doesn't prove at all. It's still an important semantic guarantee
which increases the expressiveness of a language. This is true regardless of
whether it is the only means of expressing iteration in a language.

~~~
ridiculous_fish
What the factorial example proves is that the set of recursive functions that
can run in constant space is larger than just tail-recursive functions.

You're right to say that TCO guarantees increase the expressiveness of a
language. If we could provide the same guarantees for an even larger class of
recursive functions, it stands to reason our expressiveness would increase
even further!

------
merraksh
I have a little doubt about number 6. A switch() without a "default:" label
has undefined behavior if x is negative or above 5, while nothing would happen
in the cascading if's. Wouldn't a correct optimization of that be

    
    
      void function(int x) {
        switch (x) {
          case 0: f0(); break;
          case 1: f1(); break;
          case 2: f2(); break;
          case 3: f3(); break;
          case 4: f4(); break;
          case 5: f5(); break;
          default:
        }
      }

~~~
dchest
Why does it have undefined behavior? Switch statement with no match and
without the default case has defined behavior: the switch statement does
nothing.

~~~
rcfox
"If no converted case constant expression matches and there is no default
label, no part of the switch body is executed."

Section 6.8.4.2, item 5 of <http://www.open-
std.org/jtc1/sc22/WG14/www/docs/n1256.pdf> (page 134)

~~~
cbr
how is this different from "does nothing"?

~~~
rcfox
... it's not? I was just providing an authoritative reference for the parent's
claim.

(Pedantic note: It's technically incorrect to say that the switch statement
does nothing.

    
    
        switch(FunctionThatDoesSomethingAndReturns1()) {
            case 2:
                // foo
        }
    

The switch body does nothing, but the actual statement does something.)

