

The Future of Compiler Optimization - dochtman
http://blog.regehr.org/archives/247

======
sb
Excellent post. Aside from the very interesting and important (hence many PLDI
publications) work of Sorin Lerner, there is a workshop which deals with the
first point of Prof. Regehr: Compiler Optimization meets Compiler
Verification, which is a satellite workshop for ETAPS. So for interested
people, this might very well be the point to start digging for relevant
publications.

------
CountHackulus
Surprisingly, compilers already do much of these optimization, particularly
profile directed feedback (basically a static JIT), and the mixing of high-
level and low-level optimizations.

Now while, LLVM may not do this, it's far from the only compiler on the block;
especially in the non-open-source world.

The IR argument is a non-argument. Certain optimizations work better with
certain IRs than others, that's just the way it is. It's unlikely that there
will be one to rule them all. More likely, we'll convert between them to use
the optimal one.

Superoptimizers are an interesting case, it's kind of like having a database
of optimality. While it may not give much speedup compared to time invested,
if you're desperate for that last little bit of performance, it's a great
tool. Plus it should make itself faster over time. Could even locate it in the
cloud, such that a compiler send its hottest methods over to a remote machine,
and it'll use its massive database of optimality to optimize it as best as
possible.

Verifiable transformations are great in theory, but it's true, it's extremely
hard to implement currently.

The author definitely has a point about AOT/JIT compilers being the future of
optimization. Static compilers have been done to death, while dynamic
compilers are still (comparatively) new and fresh.

~~~
ohyes
The interesting thing about the peephole superoptimizer paper that he linked
is that the actual peephole optimization process is not necessarily slower
than a regular peephole optimizer, as all it is doing is looking up faster
equivalent instructions in a database.

The difficult (compute intensive) part is training the database, and you only
have to do that once for each optimization...

I am a little confused by why he stated that the results of the super
optimizer were not very impressive. Presumably, man-years have been invested
in defining optimizations for compilers like GCC and the intel C compiler, by
human experts...

To have a machine that even gets close to those results is impressive to me.
If the technique were used for an actual compiler, it would mean that no one
would have to write peephole optimizations, instead, the compiler would just
get better and better as it ages.

Combined with aggressive in-lining (or maybe dynamic recompilation? e.x. lisp
style function compilation rather than C), it seems to me that this sort of
technology could be incredibly powerful. Am I wrong?

~~~
beza1e1
Superoptimizers are good at finding complex combinations of mini
optimizations, but often optimization is finding creative solutions to lots of
special cases. Machine learning does not help you with creating more mini
optimizations.

------
beza1e1
I think he emphasizes the joining of verification and optimization too much.
No doubt verification may get easier, if they use the IR and all analysis info
the compiler already did. SSA makes this stuff easier for example.

Just tell me what optimization could learn from verification? Everything
verification does is too slow to do within a compiler. Sure, you can produce
better code, if you are allowed to use NP algorithms, but in reality i have
never seen any algorithm more complex than O(n^2) in a compiler. Usually
O(nlogn) is the most one can afford.

~~~
namin
Arguably, types are a form of verification, and they already lead to better
optimizations.

~~~
beza1e1
Types were invented for optimization and used for verification later, weren't
they?

------
ced
What is the most advanced compiler, across all high-level (not C/C++)
languages? The GHC?

~~~
beza1e1
What do you mean by "most advanced"?

* LLVM is probably the Open-Source compiler which fits your intention most.

* GCC has man-years of efforts for little bit/byte tweaks, which no other compiler does. This is not advanced, but tedious.

* libFirm is the only compiler which does not deconstruct SSA form. This is advanced in a mostly academic sense.

* GHC is probably the best functional (CPS) compiler.

* Sun JVM is probably the best Hotspot (online) compiler.

* I'm not sure which compiler is "most advanced" for dynamically typed code. Some Smalltalk or Javascript compiler? PyPy?

~~~
sb
I think at least worth mentioning in this list is the milepost project:

* <http://ctuning.org/wiki/index.php/CTools:MilepostGCC>

Smalltalk JITs are pretty darn good, e.g., the Strongtalk project is known to
be pretty fast. As of 2006, Sun released the Strongtalk source code. Its
publication record is relatively weak (aside of type system related
publications), but it contains a wealth of relevant optimizations and I am
sure the interested reader/programmer will find something valuable in there.
(<http://www.strongtalk.org>)

Eliot Miranda has been implementing Smalltalk VMs for quite a while, and I
think his recent addition ("Cog") to the Squeak implementation is probably the
most recent addition to JIT compilers for Smalltalk VMs. Given his in-depth
experience and expertise (particularly with inline caching), this could
probably serve as a blueprint for other (Smalltalk) JITs.

V8 for Javascript is supposedly very fast (interesting side information:
Robert Griesemer is working on V8, but worked on the Strongtalk interpreter
before), but I don't know about the involved benchmarks, and how they stack up
against each other--particularly since the TraceMonkey trace-based JITs came
along.

Mike Pall's LuaJIT is a very interesting project (only one-man JIT project I
know of), too.

PS: I am sorry for the overly long post...

~~~
beza1e1
Milepost Gcc is interesting. Regehr does not believe in it, though: "machine
learning in compilers will not end up being fundamental."

His argumentation is flawed though. He says "I could always get a good
optimization result by running all of my optimization passes until a fixpoint
is reached," but unfortunately there is no such fixpoint. Many optimizations
reverse each other (e.g. loop fusion vs loop spitting) or just arbitrarily
choose some normalization (e.g. 2*x vs x+x vs x<<1).

You can build a superoptimizer, which constructs all variations (e.g. equality
saturation <http://portal.acm.org/citation.cfm?doid=1480881.1480915>), though
this is no fixpoint search, but an optimization problem to choose the least
cost alternative. You can not construct all variations anyway. For a simple
example consider loop unrolling an infinite loop.

Hence, unlike Regehr I would not devalue machine learning. I would not bet on
it either, though. ;)

