(edit 2: I've read the relevant papers now, and have corrected some of the stuff below. Corrections are in italics.)
The paper says that this system is built on top of STOKE, which uses the following verification method:
* Generate a random set of test cases
* Each intermediate code block gets instrumented and run on the host processor
* The intermediate code block are given a cost depending on how different their results are from those of the original block on the current test set, as well as an estimate of the running time
* If we get a code block which is correct on all the current test cases, we ask a theorem prover whether it's equivalent to the original one. If it isn't, the theorem prover spits out a test case which proves they're different, and we add that to the test set for future iterations.
Finally, the fastest few provably-correct results are output, where they can be further tested in a full run of the original program they were taken from.
The main room for error here is in the theorem proving step, which requires knowing the full semantics of the host's machine code. But it feels like that part should be share-able between different projects, as it's pretty much independent of the search method used.