Hacker News new | past | comments | ask | show | jobs | submit login
Learning to Superoptimize Programs (arxiv.org)
121 points by chivalrous on Nov 20, 2016 | hide | past | favorite | 24 comments

This work should be read in context of prior work, STOKE: http://stoke.stanford.edu/

This work outperforms STOKE, and STOKE outperformed existing methods including GCC, LLVM, ICC, traditional superoptimization, expert human, etc. That's why comparison with existing methods are omitted.

This work is also limited to Hacker's Delight and automatically generated programs, but so was STOKE. But STOKE readily extended to computational kernels in libm, BLAS, OpenSSL, ray tracer, computational fluid dynamics, etc. You can read about them on the link above. The next step would be applying the improvements to other STOKE tasks and more.

STOKE is an open source.

That sounds pretty awesome. So, are these only instruction-level optimizations or is there a good toolkit for doing it SSA level as well? My idea being combining it with formal, equivalence checks in certified compilers so we get optimizations plus ensure correctness given original version was correct via certified transform. Certified to v1 -> superoptimize to v2 -> certified checker (v1, v2) -> rinse repeat.

Hell yeah! Appreciate it.

This article uses a very contrived data set (bit manipulation), and also a "synthetically generated dataset", which would never occur in real life applications. This means that the results are unreliable and need to be repeated on something more realisic, or at least on programs that are uses by peole in some context (spec2000, spec2006). Moreover, the authors never compare the performance with that of llvm or gcc on different levels of optimisations, or at least do not mention any of that.

The authors of the original Stochastic superoptimization paper performed some comparison with the outputs of gcc. You can find their results in this paper: https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...

This is on the same type of dataset, as admittedly, these methods still don't scale all that well to extremely large programs.

Where is the source? It's not science until you can reproduce their result.

Probably here soon


You might also say that it's not science until it's been peer reviewed and this is a preprint from 2 weeks ago…

Even if this was useful, it presumes the ability to prove a application correct while modifying it. This would only be doable in very limited circumstances not general purpose applications.

(edited to add: Note that superoptimization isn't something you do to a whole program, it's more a thing you do to speed-critical sections of a larger program, ideally on the order of a few dozen instructions at most. That alone makes things a lot more tractable)

(edit 2: I've read the relevant papers now, and have corrected some of the stuff below. Corrections are in italics.)

The paper says that this system is built on top of STOKE, which uses the following verification method:

* Generate a random set of test cases

* Each intermediate code block gets instrumented and run on the host processor

* The intermediate code block are given a cost depending on how different their results are from those of the original block on the current test set, as well as an estimate of the running time

* If we get a code block which is correct on all the current test cases, we ask a theorem prover whether it's equivalent to the original one. If it isn't, the theorem prover spits out a test case which proves they're different, and we add that to the test set for future iterations.

Finally, the fastest few provably-correct results are output, where they can be further tested in a full run of the original program they were taken from.

The main room for error here is in the theorem proving step, which requires knowing the full semantics of the host's machine code. But it feels like that part should be share-able between different projects, as it's pretty much independent of the search method used.

You don't need to prove the application correct, you just have to prove the optimisation correct, i.e. that the generated code has the same observable behaviour as the given code, except for its resource usage.

Superoptimisation is an implementation detail, in the same way that existing compiler optimisations, JIT compilers, CPU branch predictors, etc. are optimisation details. The machine will behave according to the code you wrote, whether that code is correct or not.

We already have tools for proving applications and compilations correct. The problem is there's little to no optimization of those applications because optimizations can break correctness. One aspect of research like this is skirting that by throwing all kinds of optimizations at something combined with equivalence checks that ensure the new form is semantically equal to original form that we know was correct.

For example, CompCert compiler will produce correct assembly from a correct, C program. A set of optimizations might be available for the middle passes. Let's say C program gets taken to form I for intermediate. We know I is correct because that's what CompCert does. Applying J = optimize(I) might break it semantically in process of trying to speed it up. J might not equal I in some way. So, strategy is to develop an equivalence check as strong as CompCert to do J = Optimize(I), If Equals(I, J), Then Return J, Else Return I.

An example of a project doing this for real-world stuff is effort to formalize and equivalence-check LLVM's intermediate code:


This technique is only as good as the objective function. In most real world scenarios there is no canonical objective function, nor a comprehensive test suite exercising all branches of the code.

While your point about objective function is correct, optimizing compilers still manage just fine with imperfect objective function, and superoptimizers do better. So "as good as objective function" is good enough.

This work proves functional equivalence, so no test suite is necessary.

> optimizing compilers still manage just fine with imperfect objective function,

Imperfect or not, optimizing compilers do not require an objective function at all. The majority of computer programs do not have an objective function.

> In most real world scenarios there is no canonical objective function

How does this claim mesh with this quote from the paper: """Stoke (Schkufza et al., 2013) performs black-box optimization of a cost function on the space of programs, represented as a series of instructions. Each instruction is composed of an opcode, specifying what to execute, and some operands, specifying the corresponding registers. Each given input program T defines a cost function. For a candidate program R called rewrite, the goal is to optimize the following cost function"""

As someone not well versed in compilers, this approach seems bizarre.

1. Given some chunk of a program, we should know ahead of time which I/O preserving transformations are available to us, because math.

2. Random search isn't going to find you some kind of Godel-esque situation where the optimized instructions are equivalent to the un-optimized ones in an unprovable way, especially when one of your steps is "prove equivalence."

So my question would be: why don't they just randomly apply I/O preserving transformations to the problem, skip the "output similarity" step, skip the theorem proving step, and just optimize against whatever actual performance benchmark they used?

Because we "should", but actually don't, know ahead of time what semantic preserving transformations are available. The entire point is to discover them.

If you find this bizarre, consider that your semantic preserving transformations should be able to do examples in Figure 12 and 13 of http://theory.stanford.edu/~aiken/publications/papers/asplos...

Ugh, the code in Figure 13 is mis-compiled by the super-optimizer. It seems that their checking procedure is flawed. How was that not caught in peer-review?

The problem is that the ranges pointed by x and y could (partially) overlap, in which case the SSE-based optimization is incorrect.

Edit: Even more damning, gcc actually does produce an equivalent optimal code sequence for me when the incoming pointers are correctly marked as restrict and -march=native is used. (Although perhaps that's due to different gcc versions; maybe a gcc improvement was inspired by that article.)

Edit #2: I see they actually discuss this in the main article text. Oh well, I'm leaving the comment here since the figure by itself is misleading.

But it seems to me that the fact that they have a theorem prover at all means they do know ahead of time what semantic preserving transformations are available.

No. They have a verificator that can show that two (sub-)programs (more accurately; that an applied transformation preserves semantics) are semantically identical. This is a bit like a P-verificator for the solution to a NP-hard problem using a certificate. It can only help you prove that the solution is right, not find new solutions.

TL;DNR how it differs from usual performance optimization?

It is entirely automated, it can be thought of kind of like googles alphago. A neural net is used to guide monte carlo search, though it is working on the space of programs and optimizing running time. Instead of using known rules, it can learn its own optimization rules for software.

Note that this uses Markov chain Monte Carlo (MCMC) sampling rather than the Monte Carlo tree search (MCTS) of AlphaGo, although the latter might also be interesting to try.

I suppose one disadvantage of naive MCTS is that if it made a supoptimal decision near the root, correcting that mistake would require relearning the rest of the program from scratch.

Maybe there could be some bidirectional variant of MCTS, where you search from both the beginning and the end of the program and join the two fragments in the middle. If the two trees work independently, can they still learn to find the optimal solution?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact