Hacker News new | past | comments | ask | show | jobs | submit login
Stoke – A stochastic superoptimizer and program synthesizer (stanford.edu)
142 points by lelf on May 29, 2020 | hide | past | favorite | 44 comments



I had to use a very crude form of superoptimization for an assembly course this summer. The goal was to create a fast to upper function for ASCII.

I was already familiar with the method to check for zero (with chars being packed in a word): ( x - 0x010101010x01010101UL) & ~(x) & 0x8080808080808080UL

so my idea was to find an equivalent formula that puts 0x80 in every byte where there is a value between 0 - 25, shift it by 2, and xor it with the original word.

I simply tired every combination of simple operators and repeating-constants and found (0x9999999999999999UL - x) & (~x) & 0x8080808080808080UL


Back in the day, "stochastic superoptimization" would be a reasonable description for an application of genetic programming. Guess I will have to read the paper to understand what's new.


Yes, genetic programming arguably goes back to Turing which is way back in the day, but the term superoptimizer goes back to Henry Masselin's 1987 paper, Superoptimizer - a look at the smallest program.

https://web.stanford.edu/class/cs343/resources/cs343-annot-s...


Thanks for the pointer. Yeah, re: GP I was more referring to the work that came with Holland and after.

Re: "smallest program" I will always think of Tom Ray's Tierra. Went to a demo of his in 1991, mind-blowing.

https://www.cc.gatech.edu/~turk/bio_sim/articles/tierra_thom...


I think superoptimisation refers to it doing a type of program optimisation beyond what an optimising compiler would do.

GP is certainly on-topic here! People are using GP to do this kind of thing too. But I think this system is not using GP.


Superoptimization is a methodology for building an optimizer. When you run a superoptimizer it’s “just” doing peephole optimization. Although, it’s similar to comparing bogo sort (regular opt) to sorting networks (super opt).


And "stochastic superoptimization" sounds so much cooler than their original term; "super-duper random improvement."


I haven't looked at it in detail, but from what I gathered from the README, it just seems like a very simple form of linear genetic programming (with only mutation and no crossover).


Can it also reduce the size of the binary? I imagine there could be lot of opportunities there given the huge size of generated programs these days.


Yes. "STOKE can be used in many different scenarios, such as optimizing code for performance or size". Note that this is the 4th sentence in the linked page. However the README hints that STOKE is unlikely to improve much over gcc or llvm optimizations such as -Os / -Oz for binary size.


I guess it depends on the optimization function? However, I suspect to avoid loops there will be loop unrolling and to avoid interprocedural analyses (which I believe is beyond the scope of Stoke) it may use inlining, which may increase the size of the code.


Have you seen upx? It might be what you want


This is extremely intesting domain. I feel that even more interesting topic than optimizing code would be in generating code to replicate a process based on input / output samples.


You are looking for "program synthesis". It's an active research area.


It's as easy as knowing the next number in the sequence:

1, 2, 4, ?


I know that one: 1, 2, 4, 8, 16, 31... https://oeis.org/A001591


Off-topic but this made me wonder. As the name implies, Solomonoff Induction only works for inductive inference from empirical data. Is there something similar for mathematics that would allow us to list the simplest sequences first, then more complex sequences, and so on? Or does the complexity of the formulation of a sequence depend on the choice of a foundation of mathematics and/or arbitrary conventions?


Why not do the same thing? Run a dovetailer, print sequences that terminate...

(Dovetailer: for all n from 0 up, for all m from 0 to n, treat m as a Turing machine and run it for n steps. If it halts, yield its output.)


sorry this is also offtopic. Saw your link 2 years ago to https://sites.google.com/view/lvu18lisbon

can you recommend some decision theory reading? Maybe more on a practical side?


Depends on what you're interested in and your level of mathematical maturity. I assume you're interested in real-world, actually applied decision making rather than philosophical debates, right? Then you have to look for multicriteria decision making (MCDM) and multi-attribute utility theory (MAUT).

Eisenführ/Weber/Langer: Rational Decision Making. Springer 2010. This is a good practical introduction in the standard methodology.

Keeney/Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley & Sons 1976. This is an older work, but very good. It's mathematically more rigorous and important if you want to understand additive and multiplicative models.

Both of them are practical and give good examples. The first one is easier to read. If you're not afraid of more complicated mathematics, I can also give you some more other references. I guess you wouldn't be asking me then, however, because there is plenty of sources. I generally recommend Peter C. Fishburn's work, insofar as I can understand it -- some of it is too hard for me.


thank you so much for taking time to reply! it really means a lot and I will follow your recommendations


why wouldnt solomonoff induction work for number sequences?


Well, it's obviously not easy, but it's definitely interesting :-)


:-D


Isn't that exactly what neural networks are used for these days?


Could you use such a system to replicate a closed source program?.


This is, in essence, a monkey trying to lower a cost function by randomly manipulating an input AST (empty for synthesis). So theoretically it will be able to produce any program, just not in a reasonable time frame.


How do you reason about correctness when stoke changes your code? Does it rely on invariance of test cases or does it try to 'understand' the intent of the program?


STOKE proves full equivalence.


Of trivial code blocks.

That's not a terrible goal if you want to make key libraries faster and more efficient.

But I'm unconvinced that it would be easy to scale up these techniques to auto-clone and improve a big application like Photoshop or Word - or even a generic CRUD app.


This is basically doing what compiler engineers call peephole optimization: optimizing small windows of code, which is a standard part of the optimizer pipeline.

One advantage of peephole optimization is that pretty much all code is amenable to it, so it doesn't rely on their being one key, hot loop to optimize to see performance gains. The main disadvantage is that gains in peephole optimization tend to be quite small, so you're combing for single-digit percentage speedups.


Even big applications are composed of trivial code blocks. The optimisations are probably synergistic with other passes, and would probably expose many inlining opportunities that didn't exist before.


I guess this is what the second paragraph is about.

"In addition to searching over programs, STOKE contains verification infrastructure to show the equivalence between x86-64 programs. STOKE can consider test-cases, perform bounded verification all the way to fully formal verification that shows the equivalence for all possible inputs."


Program equivalence is in general uncomputable if I'm not mistaken, so a big part of the possible performance enhancements depend on the strength of their algorithm for proving equivalence.

Definitely an interesting though extremely challenging problem.


Undecidability of program equivalence only means that there is no algorithm that correctly returns the correct answer to the question P = Q? for ALL programs P, Q in a Turing-complete programming language.

The original STOKE paper [1] only worked for loop free programs, which may be a program fragment for which program equivalence is decidable. Be that as it may, there is a fast heuristic for proving non-equivalence between two programs P and Q, just feed them with random input -- if they return different outputs, they are NOT equivalent, if OTOH they return the same results, they they are candidates for equivalence and you can throw a modern SMT solver at them which may actually be able to prove equivalence (or come up with a counterexample). Otherwise the SMT solver times out. You can even refine this scheme into some form of local search, in that you measure how much P and Q differ (e.g. Hamming distance). This is used in later papers, e.g. [2].

[1] Stochastic Superoptimization – ASPLOS 2013 https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...

[2] Stochastic Program Optimization – CACM 2016 https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...


Thanks, I was looking for this explanation.


Thanks for this


Should this be modifiable for higher-level byte-code?

My thinking is to provide hand-crafted primitives to transform byte code, and maybe use an existing optimizer to do the search.


Souper does something like that with IR optimization.

https://github.com/google/souper


Slightly offtop, there was a wee thread a few days ago about naming software projects, and now we see Stroke.

I hope we see a trend in brain-injury related software project names.


Yes, we should absolutely guard FOSS projects against people who read too fast.

It's a nice name for a strochastic optimizer, though...


Maybe I was having a stroke when I read it early. Heh. Oops.


It's called Stoke.


Perhaps your next project should be called "alexia"...? ;)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: