
Learning to Superoptimize Programs - chivalrous
https://arxiv.org/abs/1611.01787
======
sanxiyn
This work should be read in context of prior work, STOKE:
[http://stoke.stanford.edu/](http://stoke.stanford.edu/)

This work outperforms STOKE, and STOKE outperformed existing methods including
GCC, LLVM, ICC, traditional superoptimization, expert human, etc. That's why
comparison with existing methods are omitted.

This work is also limited to Hacker's Delight and automatically generated
programs, but so was STOKE. But STOKE readily extended to computational
kernels in libm, BLAS, OpenSSL, ray tracer, computational fluid dynamics, etc.
You can read about them on the link above. The next step would be applying the
improvements to other STOKE tasks and more.

STOKE is an open source.

~~~
nickpsecurity
That sounds pretty awesome. So, are these only instruction-level optimizations
or is there a good toolkit for doing it SSA level as well? My idea being
combining it with formal, equivalence checks in certified compilers so we get
optimizations plus ensure correctness given original version was correct via
certified transform. Certified to v1 -> superoptimize to v2 -> certified
checker (v1, v2) -> rinse repeat.

~~~
jfbastien
souper: [https://github.com/google/souper](https://github.com/google/souper)

~~~
nickpsecurity
Hell yeah! Appreciate it.

------
gcc_programmer
This article uses a very contrived data set (bit manipulation), and also a
"synthetically generated dataset", which would never occur in real life
applications. This means that the results are unreliable and need to be
repeated on something more realisic, or at least on programs that are uses by
peole in some context (spec2000, spec2006). Moreover, the authors never
compare the performance with that of llvm or gcc on different levels of
optimisations, or at least do not mention any of that.

~~~
bunelr
The authors of the original Stochastic superoptimization paper performed some
comparison with the outputs of gcc. You can find their results in this paper:
[https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...](https://raw.githubusercontent.com/StanfordPL/stoke/develop/docs/papers/cacm16.pdf)

This is on the same type of dataset, as admittedly, these methods still don't
scale all that well to extremely large programs.

------
crb002
Where is the source? It's not science until you can reproduce their result.

~~~
carlob
Probably here soon

[http://www.robots.ox.ac.uk/~tvg/code.php](http://www.robots.ox.ac.uk/~tvg/code.php)

You might also say that it's not science until it's been peer reviewed and
this is a preprint from 2 weeks ago…

------
coldcode
Even if this was useful, it presumes the ability to prove a application
correct while modifying it. This would only be doable in very limited
circumstances not general purpose applications.

~~~
psi-squared
(edited to add: Note that superoptimization isn't something you do to a whole
program, it's more a thing you do to speed-critical sections of a larger
program, ideally on the order of a few dozen instructions at most. That alone
makes things a lot more tractable)

(edit 2: I've read the relevant papers now, and have corrected some of the
stuff below. Corrections are in italics.)

 _The paper says that this system is built on top of STOKE, which uses the
following verification method:_

* Generate a random set of test cases

* Each intermediate code block gets instrumented and run on the host processor

* The intermediate code block are given a cost depending on how different their results are from those of the original block _on the current test set_ , as well as an estimate of the running time

* _If we get a code block which is correct on all the current test cases, we ask a theorem prover whether it 's equivalent to the original one._ If it isn't, the theorem prover spits out a test case which proves they're different, and we add that to the test set for future iterations.

Finally, the fastest few provably-correct results are output, where they can
be further tested in a full run of the original program they were taken from.

The main room for error here is in the theorem proving step, which requires
knowing the full semantics of the host's machine code. But it feels like that
part should be share-able between different projects, as it's pretty much
independent of the search method used.

------
cnkal
This technique is only as good as the objective function. In most real world
scenarios there is no canonical objective function, nor a comprehensive test
suite exercising all branches of the code.

~~~
sanxiyn
While your point about objective function is correct, optimizing compilers
still manage just fine with imperfect objective function, and superoptimizers
do better. So "as good as objective function" is good enough.

This work proves functional equivalence, so no test suite is necessary.

~~~
cnkal
> optimizing compilers still manage just fine with imperfect objective
> function,

Imperfect or not, optimizing compilers do not require an objective function at
all. The majority of computer programs do not have an objective function.

------
bloaf
As someone not well versed in compilers, this approach seems bizarre.

1\. Given some chunk of a program, we should know ahead of time which I/O
preserving transformations are available to us, because math.

2\. Random search isn't going to find you some kind of Godel-esque situation
where the optimized instructions are equivalent to the un-optimized ones in an
unprovable way, _especially_ when one of your steps is "prove equivalence."

So my question would be: why don't they just randomly apply I/O preserving
transformations to the problem, skip the "output similarity" step, skip the
theorem proving step, and just optimize against whatever actual performance
benchmark they used?

~~~
sanxiyn
Because we "should", but actually don't, know ahead of time what semantic
preserving transformations are available. The entire point is to discover
them.

If you find this bizarre, consider that your semantic preserving
transformations should be able to do examples in Figure 12 and 13 of
[http://theory.stanford.edu/~aiken/publications/papers/asplos...](http://theory.stanford.edu/~aiken/publications/papers/asplos13.pdf)

~~~
bloaf
But it seems to me that the fact that they _have a theorem prover at all_
means they _do_ know ahead of time what semantic preserving transformations
are available.

~~~
dom0
No. They have a verificator that can show that two (sub-)programs (more
accurately; that an applied transformation preserves semantics) are
semantically identical. This is a bit like a P-verificator for the solution to
a NP-hard problem using a certificate. It can only help you prove that the
solution is right, not find new solutions.

------
currriuosly
TL;DNR how it differs from usual performance optimization?

~~~
unlikelymordant
It is entirely automated, it can be thought of kind of like googles alphago. A
neural net is used to guide monte carlo search, though it is working on the
space of programs and optimizing running time. Instead of using known rules,
it can learn its own optimization rules for software.

~~~
eutectic
Note that this uses Markov chain Monte Carlo (MCMC) sampling rather than the
Monte Carlo tree search (MCTS) of AlphaGo, although the latter might also be
interesting to try.

I suppose one disadvantage of naive MCTS is that if it made a supoptimal
decision near the root, correcting that mistake would require relearning the
rest of the program from scratch.

Maybe there could be some bidirectional variant of MCTS, where you search from
both the beginning and the end of the program and join the two fragments in
the middle. If the two trees work independently, can they still learn to find
the optimal solution?

