
Will It Optimize? (2010) - jeffreyrogers
http://ridiculousfish.com/blog/posts/will-it-optimize.html
======
gus_massa
This has been resubmitted a few times:

Most comments:
[https://news.ycombinator.com/item?id=1540567](https://news.ycombinator.com/item?id=1540567)
(357 points, 1555 days ago, 60 comments)

Latest:
[https://news.ycombinator.com/item?id=6823768](https://news.ycombinator.com/item?id=6823768)
(2 points, 329 days ago, 3 comments)

Also, it would be a good idea to add [2010] to the title because some of the
optimizations may be available in the latest gcc version. (I'd like to read an
update, or a year by year comparison.)

~~~
fdsary
And I upvote every time.

~~~
gus_massa
The article is interesting. But in many articles also the HN comments are
insightful and interesting, so I like to read them again. This is the reason I
usually link to the most voted/commented resubmission.

------
bad_user
On (1) recursion elimination ...

Tail-recursions are the exact equivalent of loops.

We've lost sight of what "recursion" means. There's a difference between
recursive functions and recursive _algorithms_ , which need by definition a
stack and there are lots of such problems that can't be optimized to run in
constant space. Whether that stack is the call stack, or a manually managed
stack (within a loop or a tail-recursion), that doesn't matter, what matters
is that the space requirements are definitely not constant.

It's nice that the GCC compiler can optimize that Fibonacci sample. However
you're not getting a guarantee that it can optimize every function that is
optimizable. I'm also pretty sure that the compiler is extremely limited in
what it can do here. And that is dangerous, as running in constant space is
often a matter of correctness, not one of optimization. I would rather have
the compiler err on me if it sees non-tail recursive calls, rather than try to
be smart sometimes.

~~~
gohrt
> recursive algorithms, which need by definition a stack

Your vocabulary is not standard.

[https://www.google.com/webhp?#q=recursive+algorithm](https://www.google.com/webhp?#q=recursive+algorithm)

[http://en.wikipedia.org/wiki/Recursion_(computer_science)](http://en.wikipedia.org/wiki/Recursion_\(computer_science\))

A recursive algorithm is one that computes a result by way of the same
algorithm applied to a smaller input. (Smaller, or else the algorithm will
diverge.)

The distinction between "stack" and "parameter" is a minor one. A simple
factorial function has an ever-growing (accumulating) parameter.

Tail-recursive algorithms can be translated into loop algorithms. Thet are
still recursive.

~~~
JadeNB
> (Smaller, or else the algorithm will diverge.)

This is not true. The first example that comes to mind is the ridiculously
over-powered one: conjecturally
([https://en.wikipedia.org/wiki/Collatz_conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture)),
the definition

    
    
         f x | x `mod` 2 == 0 = f x/2
         f x | x `mod` 2 == 1 = f $ 3*x - 1
         f 1                  = [ whatever ]
    

is perfectly well defined on positive integers `x`, although many recursive
calls actually apply `f` to _larger_ values.

(To be fair, you didn't actually say what 'smaller' meant. I think it's one of
the standard term-rewriting theorems that, if given the freedom to define
'smaller' appropriately, then any total recursive function _does_ make only
smaller recursive calls.)

------
nhaehnle
The advice about enabling -ffast-math is dangerous. Be advised that the
fpclassify() family of functions (e.g. checking for NaN or infinity) no longer
work as expected. If you really want to go into that territory, be sure to
read the compiler documentation: -ffast-math is an option that in fact enables
a number of other flags, and you may want to be a bit more picky in which ones
you choose.

------
nikita
The first example is really dangerous. We recently had a issue with GCC not
optimizing a tail recursion in a relatively large function with parameters
passed in as references to std collections. The result - it was running out of
stack space at large scale. So if you make it to a habit of using tail
recursion extensively GCC can bite you where you don't expect.

Of course adding a unit test for every function saved the day as usual,
however it is trivial to convert tail recursion into a loop and not worry
about stack overflow.

~~~
Dewie
It's nice to be able to annotate functions that should have some optimization
done to it, and then the program fails to compile if it isn't able to perform
the optimization.

~~~
mpartel
Scala has this: [http://www.scala-
lang.org/api/current/index.html#scala.annot...](http://www.scala-
lang.org/api/current/index.html#scala.annotation.tailrec)

~~~
Dewie
I've used it when implementing a game "loop" (just continually asking user for
input to the game). It's nice to be able use recursion when you want to, and
be sure that the call stack won't grow larger just by playing the game for
several rounds.

------
userbinator
I wonder how that 2nd example optimisation would behave with multiple threads,
with one thread reading the string and another thread appending to it. Or
would that be considered undefined behaviour? Probably an edge case (and
likely to be a race condition bug), but still worth thinking about...

~~~
Too
That would be UB. In fact C does not know much about threading at all, the
result of a function should be "equivalent" to what it would be if the
statements were executed in order, unless you use some synchronization
primitives.

But if you call a function, even without suppling s as a parameter, instead of
simply result += s[i], then that function could have side effects on the
string since the char *s is not "restrict".

------
Slix
I'm surprised by the if chain one. I tried it using cremno's gcc.godbolt.org
suggestion, and the newest gcc doesn't turn it into a jump table like the
author suggests, but clang does.

[http://goo.gl/PgqAIs](http://goo.gl/PgqAIs) <\-- The code

------
rikkus
Now I think back many years to when I used bit shifts in C++ to 'optimise'
some multiplications and realise I probably didn't measure twice before
cutting. These days I hope I'm a lot saner!

------
Dewie
It feels like I have decent intuitive hunches about what compilers can do; I
mostly just ask the question, "is the problem conceptually simple?", and go
from there. Of course, I'm missing all of the edge cases and subtleties when
I'm doing that. And I don't actively try to do it when it comes to actual
code; mostly for quizzes like this.

My intuition seems to fail me, though, when we get to things like escape
analysis in languages like Java. It seems like a conceptually simple problem -
that is, stack allocating or "unpacking" objects (like value
classes/objects)that are __obviously__ going to not escape the scope of a
method. I don't mean simple as in determining _all_ objects. And the
__obvious__ ones seem much more common than the non-obvious ones, in regular
code. But then it seems that this is a fairly recent implementation, which
might suggest that it isn't so simple to implement?

~~~
lmm
The JVM is ...hairy, and poorly understood. Some of its code has not changed
since 1996, and much has been very heavily hand-optimized; the people who work
on it today are not the people who originally wrote it. Implementing
_anything_ in the JVM is not so simple; it's not really a reflection on the
complexity of the thing you're implementing.

(It might have got better now - I know the OpenJDK folks were making some
efforts in that direction)

