I had to use a very crude form of superoptimization for an assembly course this summer. The goal was to create a fast to upper function for ASCII.
I was already familiar with the method to check for zero (with chars being packed in a word):
( x - 0x010101010x01010101UL) & ~(x) & 0x8080808080808080UL
so my idea was to find an equivalent formula that puts 0x80 in every byte where there is a value between 0 - 25, shift it by 2, and xor it with the original word.
I simply tired every combination of simple operators and repeating-constants and found (0x9999999999999999UL - x) & (~x) & 0x8080808080808080UL
Back in the day, "stochastic superoptimization" would be a reasonable description for an application of genetic programming. Guess I will have to read the paper to understand what's new.
Yes, genetic programming arguably goes back to Turing which is way back in the day, but the term superoptimizer goes back to Henry Masselin's 1987 paper, Superoptimizer - a look at the smallest program.
Superoptimization is a methodology for building an optimizer. When you run a superoptimizer it’s “just” doing peephole optimization. Although, it’s similar to comparing bogo sort (regular opt) to sorting networks (super opt).
I haven't looked at it in detail, but from what I gathered from the README, it just seems like a very simple form of linear genetic programming (with only mutation and no crossover).
Yes. "STOKE can be used in many different scenarios, such as optimizing code for performance or size". Note that this is the 4th sentence in the linked page. However the README hints that STOKE is unlikely to improve much over gcc or llvm optimizations such as -Os / -Oz for binary size.
I guess it depends on the optimization function? However, I suspect to avoid loops there will be loop unrolling and to avoid interprocedural analyses (which I believe is beyond the scope of Stoke) it may use inlining, which may increase the size of the code.
This is extremely intesting domain. I feel that even more interesting topic than optimizing code would be in generating code to replicate a process based on input / output samples.
Off-topic but this made me wonder. As the name implies, Solomonoff Induction only works for inductive inference from empirical data. Is there something similar for mathematics that would allow us to list the simplest sequences first, then more complex sequences, and so on? Or does the complexity of the formulation of a sequence depend on the choice of a foundation of mathematics and/or arbitrary conventions?
Depends on what you're interested in and your level of mathematical maturity. I assume you're interested in real-world, actually applied decision making rather than philosophical debates, right? Then you have to look for multicriteria decision making (MCDM) and multi-attribute utility theory (MAUT).
Eisenführ/Weber/Langer: Rational Decision Making. Springer 2010. This is a good practical introduction in the standard methodology.
Keeney/Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley & Sons 1976. This is an older work, but very good. It's mathematically more rigorous and important if you want to understand additive and multiplicative models.
Both of them are practical and give good examples. The first one is easier to read. If you're not afraid of more complicated mathematics, I can also give you some more other references. I guess you wouldn't be asking me then, however, because there is plenty of sources. I generally recommend Peter C. Fishburn's work, insofar as I can understand it -- some of it is too hard for me.
This is, in essence, a monkey trying to lower a cost function by randomly manipulating an input AST (empty for synthesis). So theoretically it will be able to produce any program, just not in a reasonable time frame.
How do you reason about correctness when stoke changes your code? Does it rely on invariance of test cases or does it try to 'understand' the intent of the program?
That's not a terrible goal if you want to make key libraries faster and more efficient.
But I'm unconvinced that it would be easy to scale up these techniques to auto-clone and improve a big application like Photoshop or Word - or even a generic CRUD app.
This is basically doing what compiler engineers call peephole optimization: optimizing small windows of code, which is a standard part of the optimizer pipeline.
One advantage of peephole optimization is that pretty much all code is amenable to it, so it doesn't rely on their being one key, hot loop to optimize to see performance gains. The main disadvantage is that gains in peephole optimization tend to be quite small, so you're combing for single-digit percentage speedups.
Even big applications are composed of trivial code blocks. The optimisations are probably synergistic with other passes, and would probably expose many inlining opportunities that didn't exist before.
I guess this is what the second paragraph is about.
"In addition to searching over programs, STOKE contains verification infrastructure to show the equivalence between x86-64 programs. STOKE can consider test-cases, perform bounded verification all the way to fully formal verification that shows the equivalence for all possible inputs."
Program equivalence is in general uncomputable if I'm not mistaken, so a big part of the possible performance enhancements depend on the strength of their algorithm for proving equivalence.
Definitely an interesting though extremely challenging problem.
Undecidability of program equivalence only means that there is no algorithm that correctly returns the correct answer to the question P = Q? for ALL programs P, Q in a Turing-complete programming language.
The original STOKE paper [1] only worked for loop free programs, which may be a program fragment for which program equivalence is decidable. Be that as it may, there is a fast heuristic for proving non-equivalence between two programs P and Q, just feed them with random input -- if they return different outputs, they are NOT equivalent, if OTOH they return the same results, they they are candidates for equivalence and you can throw a modern SMT solver at them which may actually be able to prove equivalence (or come up with a counterexample). Otherwise the SMT solver times out. You can even refine this scheme into some form of local search, in that you measure how much P and Q differ (e.g. Hamming distance). This is used in later papers, e.g. [2].
I was already familiar with the method to check for zero (with chars being packed in a word): ( x - 0x010101010x01010101UL) & ~(x) & 0x8080808080808080UL
so my idea was to find an equivalent formula that puts 0x80 in every byte where there is a value between 0 - 25, shift it by 2, and xor it with the original word.
I simply tired every combination of simple operators and repeating-constants and found (0x9999999999999999UL - x) & (~x) & 0x8080808080808080UL