This work outperforms STOKE, and STOKE outperformed existing methods including GCC, LLVM, ICC, traditional superoptimization, expert human, etc. That's why comparison with existing methods are omitted.
This work is also limited to Hacker's Delight and automatically generated programs, but so was STOKE. But STOKE readily extended to computational kernels in libm, BLAS, OpenSSL, ray tracer, computational fluid dynamics, etc. You can read about them on the link above. The next step would be applying the improvements to other STOKE tasks and more.
STOKE is an open source.
This is on the same type of dataset, as admittedly, these methods still don't scale all that well to extremely large programs.
You might also say that it's not science until it's been peer reviewed and this is a preprint from 2 weeks ago…
(edit 2: I've read the relevant papers now, and have corrected some of the stuff below. Corrections are in italics.)
The paper says that this system is built on top of STOKE, which uses the following verification method:
* Generate a random set of test cases
* Each intermediate code block gets instrumented and run on the host processor
* The intermediate code block are given a cost depending on how different their results are from those of the original block on the current test set, as well as an estimate of the running time
* If we get a code block which is correct on all the current test cases, we ask a theorem prover whether it's equivalent to the original one. If it isn't, the theorem prover spits out a test case which proves they're different, and we add that to the test set for future iterations.
Finally, the fastest few provably-correct results are output, where they can be further tested in a full run of the original program they were taken from.
The main room for error here is in the theorem proving step, which requires knowing the full semantics of the host's machine code. But it feels like that part should be share-able between different projects, as it's pretty much independent of the search method used.
Superoptimisation is an implementation detail, in the same way that existing compiler optimisations, JIT compilers, CPU branch predictors, etc. are optimisation details. The machine will behave according to the code you wrote, whether that code is correct or not.
For example, CompCert compiler will produce correct assembly from a correct, C program. A set of optimizations might be available for the middle passes. Let's say C program gets taken to form I for intermediate. We know I is correct because that's what CompCert does. Applying J = optimize(I) might break it semantically in process of trying to speed it up. J might not equal I in some way. So, strategy is to develop an equivalence check as strong as CompCert to do J = Optimize(I), If Equals(I, J), Then Return J, Else Return I.
An example of a project doing this for real-world stuff is effort to formalize and equivalence-check LLVM's intermediate code:
This work proves functional equivalence, so no test suite is necessary.
Imperfect or not, optimizing compilers do not require an objective function at all. The majority of computer programs do not have an objective function.
How does this claim mesh with this quote from the paper:
"""Stoke (Schkufza et al., 2013) performs black-box optimization of a cost function on the space of programs, represented as a series of instructions. Each instruction is composed of an opcode, specifying what to execute, and some operands, specifying the corresponding registers. Each given input program T defines a cost function. For a candidate program R called rewrite, the goal is to optimize the following cost function"""
1. Given some chunk of a program, we should know ahead of time which I/O preserving transformations are available to us, because math.
2. Random search isn't going to find you some kind of Godel-esque situation where the optimized instructions are equivalent to the un-optimized ones in an unprovable way, especially when one of your steps is "prove equivalence."
So my question would be: why don't they just randomly apply I/O preserving transformations to the problem, skip the "output similarity" step, skip the theorem proving step, and just optimize against whatever actual performance benchmark they used?
If you find this bizarre, consider that your semantic preserving transformations should be able to do examples in Figure 12 and 13 of http://theory.stanford.edu/~aiken/publications/papers/asplos...
The problem is that the ranges pointed by x and y could (partially) overlap, in which case the SSE-based optimization is incorrect.
Edit: Even more damning, gcc actually does produce an equivalent optimal code sequence for me when the incoming pointers are correctly marked as restrict and -march=native is used. (Although perhaps that's due to different gcc versions; maybe a gcc improvement was inspired by that article.)
Edit #2: I see they actually discuss this in the main article text. Oh well, I'm leaving the comment here since the figure by itself is misleading.
I suppose one disadvantage of naive MCTS is that if it made a supoptimal decision near the root, correcting that mistake would require relearning the rest of the program from scratch.
Maybe there could be some bidirectional variant of MCTS, where you search from both the beginning and the end of the program and join the two fragments in the middle. If the two trees work independently, can they still learn to find the optimal solution?