Stoke – A stochastic superoptimizer and program synthesizer

P-ala-din · on May 29, 2020

I had to use a very crude form of superoptimization for an assembly course this summer. The goal was to create a fast to upper function for ASCII.

I was already familiar with the method to check for zero (with chars being packed in a word): ( x - 0x010101010x01010101UL) & ~(x) & 0x8080808080808080UL

so my idea was to find an equivalent formula that puts 0x80 in every byte where there is a value between 0 - 25, shift it by 2, and xor it with the original word.

I simply tired every combination of simple operators and repeating-constants and found (0x9999999999999999UL - x) & (~x) & 0x8080808080808080UL

jonahbenton · on May 29, 2020

Back in the day, "stochastic superoptimization" would be a reasonable description for an application of genetic programming. Guess I will have to read the paper to understand what's new.

CalChris · on May 29, 2020

Yes, genetic programming arguably goes back to Turing which is way back in the day, but the term superoptimizer goes back to Henry Masselin's 1987 paper, Superoptimizer - a look at the smallest program.

https://web.stanford.edu/class/cs343/resources/cs343-annot-s...

jonahbenton · on May 29, 2020

Thanks for the pointer. Yeah, re: GP I was more referring to the work that came with Holland and after.

Re: "smallest program" I will always think of Tom Ray's Tierra. Went to a demo of his in 1991, mind-blowing.

https://www.cc.gatech.edu/~turk/bio_sim/articles/tierra_thom...

jmmcd · on May 29, 2020

I think superoptimisation refers to it doing a type of program optimisation beyond what an optimising compiler would do.

GP is certainly on-topic here! People are using GP to do this kind of thing too. But I think this system is not using GP.

thechao · on May 30, 2020

Superoptimization is a methodology for building an optimizer. When you run a superoptimizer it’s “just” doing peephole optimization. Although, it’s similar to comparing bogo sort (regular opt) to sorting networks (super opt).

deerpig · on May 30, 2020

And "stochastic superoptimization" sounds so much cooler than their original term; "super-duper random improvement."

red2awn · on May 29, 2020

I haven't looked at it in detail, but from what I gathered from the README, it just seems like a very simple form of linear genetic programming (with only mutation and no crossover).

TriNetra · on May 29, 2020

Can it also reduce the size of the binary? I imagine there could be lot of opportunities there given the huge size of generated programs these days.

pierrec · on May 29, 2020

Yes. "STOKE can be used in many different scenarios, such as optimizing code for performance or size". Note that this is the 4th sentence in the linked page. However the README hints that STOKE is unlikely to improve much over gcc or llvm optimizations such as -Os / -Oz for binary size.

efferifick · on May 29, 2020

I guess it depends on the optimization function? However, I suspect to avoid loops there will be loop unrolling and to avoid interprocedural analyses (which I believe is beyond the scope of Stoke) it may use inlining, which may increase the size of the code.

vmchale · on May 29, 2020

Have you seen upx? It might be what you want

arauhala · on May 29, 2020

This is extremely intesting domain. I feel that even more interesting topic than optimizing code would be in generating code to replicate a process based on input / output samples.

jval43 · on May 29, 2020

You are looking for "program synthesis". It's an active research area.

robertlagrant · on May 29, 2020

It's as easy as knowing the next number in the sequence:

1, 2, 4, ?

IHLayman · on May 29, 2020

I know that one: 1, 2, 4, 8, 16, 31... https://oeis.org/A001591

jonathanstrange · on May 29, 2020

Off-topic but this made me wonder. As the name implies, Solomonoff Induction only works for inductive inference from empirical data. Is there something similar for mathematics that would allow us to list the simplest sequences first, then more complex sequences, and so on? Or does the complexity of the formulation of a sequence depend on the choice of a foundation of mathematics and/or arbitrary conventions?

FeepingCreature · on May 29, 2020

Why not do the same thing? Run a dovetailer, print sequences that terminate...

(Dovetailer: for all n from 0 up, for all m from 0 to n, treat m as a Turing machine and run it for n steps. If it halts, yield its output.)

profstasiak · on June 1, 2020

sorry this is also offtopic. Saw your link 2 years ago to https://sites.google.com/view/lvu18lisbon

can you recommend some decision theory reading? Maybe more on a practical side?

jonathanstrange · on June 2, 2020

Depends on what you're interested in and your level of mathematical maturity. I assume you're interested in real-world, actually applied decision making rather than philosophical debates, right? Then you have to look for multicriteria decision making (MCDM) and multi-attribute utility theory (MAUT).

Eisenführ/Weber/Langer: Rational Decision Making. Springer 2010. This is a good practical introduction in the standard methodology.

Keeney/Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley & Sons 1976. This is an older work, but very good. It's mathematically more rigorous and important if you want to understand additive and multiplicative models.

Both of them are practical and give good examples. The first one is easier to read. If you're not afraid of more complicated mathematics, I can also give you some more other references. I guess you wouldn't be asking me then, however, because there is plenty of sources. I generally recommend Peter C. Fishburn's work, insofar as I can understand it -- some of it is too hard for me.

profstasiak · on June 2, 2020

thank you so much for taking time to reply! it really means a lot and I will follow your recommendations

yters · on May 29, 2020

why wouldnt solomonoff induction work for number sequences?

arauhala · on May 29, 2020

Well, it's obviously not easy, but it's definitely interesting :-)

robertlagrant · on May 29, 2020

gpuhacker · on May 29, 2020

Isn't that exactly what neural networks are used for these days?

emayljames · on May 29, 2020

Could you use such a system to replicate a closed source program?.

red2awn · on May 29, 2020

This is, in essence, a monkey trying to lower a cost function by randomly manipulating an input AST (empty for synthesis). So theoretically it will be able to produce any program, just not in a reasonable time frame.

jonnypotty · on May 29, 2020

How do you reason about correctness when stoke changes your code? Does it rely on invariance of test cases or does it try to 'understand' the intent of the program?

sanxiyn · on May 29, 2020

STOKE proves full equivalence.

TheOtherHobbes · on May 29, 2020

Of trivial code blocks.

That's not a terrible goal if you want to make key libraries faster and more efficient.

But I'm unconvinced that it would be easy to scale up these techniques to auto-clone and improve a big application like Photoshop or Word - or even a generic CRUD app.

jcranmer · on May 29, 2020

This is basically doing what compiler engineers call peephole optimization: optimizing small windows of code, which is a standard part of the optimizer pipeline.

One advantage of peephole optimization is that pretty much all code is amenable to it, so it doesn't rely on their being one key, hot loop to optimize to see performance gains. The main disadvantage is that gains in peephole optimization tend to be quite small, so you're combing for single-digit percentage speedups.

naasking · on May 29, 2020

Even big applications are composed of trivial code blocks. The optimisations are probably synergistic with other passes, and would probably expose many inlining opportunities that didn't exist before.

radarsat1 · on May 29, 2020

I guess this is what the second paragraph is about.

"In addition to searching over programs, STOKE contains verification infrastructure to show the equivalence between x86-64 programs. STOKE can consider test-cases, perform bounded verification all the way to fully formal verification that shows the equivalence for all possible inputs."

_hl_ · on May 29, 2020

Program equivalence is in general uncomputable if I'm not mistaken, so a big part of the possible performance enhancements depend on the strength of their algorithm for proving equivalence.

Definitely an interesting though extremely challenging problem.

GregarianChild · on May 29, 2020

Undecidability of program equivalence only means that there is no algorithm that correctly returns the correct answer to the question P = Q? for ALL programs P, Q in a Turing-complete programming language.

The original STOKE paper [1] only worked for loop free programs, which may be a program fragment for which program equivalence is decidable. Be that as it may, there is a fast heuristic for proving non-equivalence between two programs P and Q, just feed them with random input -- if they return different outputs, they are NOT equivalent, if OTOH they return the same results, they they are candidates for equivalence and you can throw a modern SMT solver at them which may actually be able to prove equivalence (or come up with a counterexample). Otherwise the SMT solver times out. You can even refine this scheme into some form of local search, in that you measure how much P and Q differ (e.g. Hamming distance). This is used in later papers, e.g. [2].

[1] Stochastic Superoptimization – ASPLOS 2013 https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...

[2] Stochastic Program Optimization – CACM 2016 https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...

GTP · on May 29, 2020

Thanks, I was looking for this explanation.

jonnypotty · on May 29, 2020

Thanks for this

yazr · on May 29, 2020

Should this be modifiable for higher-level byte-code?

My thinking is to provide hand-crafted primitives to transform byte code, and maybe use an existing optimizer to do the search.

CalChris · on May 29, 2020

Souper does something like that with IR optimization.

https://github.com/google/souper

TheSpiceIsLife · on May 29, 2020

Slightly offtop, there was a wee thread a few days ago about naming software projects, and now we see Stroke.

I hope we see a trend in brain-injury related software project names.

rscho · on May 29, 2020

Yes, we should absolutely guard FOSS projects against people who read too fast.

It's a nice name for a strochastic optimizer, though...

TheSpiceIsLife · on May 29, 2020

Maybe I was having a stroke when I read it early. Heh. Oops.

radarsat1 · on May 29, 2020

It's called Stoke.

zero_iq · on May 29, 2020

Perhaps your next project should be called "alexia"...? ;)