
Souper – A Superoptimizer for LLVM IR - albertzeyer
https://github.com/google/souper
======
regehr
I'm one of the Souper developers and can offer a few pointers that may be more
helpful than the github repo. Here's a somewhat recent blog post showing
things that Souper can do:

    
    
      http://blog.regehr.org/archives/1252
    

Here's a talk I gave at the University of Washington last winter (before
Souper did synthesis):

    
    
      https://www.youtube.com/watch?v=Ux0YnVEaI6A
    

Recently we've been teaching Souper to use dataflow facts such as LLVM's known
bits and demanded bits. Perhaps interestingly, Souper can also teach LLVM how
to compute these facts more precisely:

    
    
      http://lists.llvm.org/pipermail/llvm-dev/2015-September/089904.html
    

One really common question about Souper is "why operate on LLVM IR instead of
instructions?" One of the main answers is "so that we can interact with
dataflow analyses." This isn't something that previous superoptimizers have
done. It seems to be working out really well.

------
bhouston
It uses a redis cache? I wonder if future version could use a global cache
somewhere on the internet, and use cloud solvers. Seems like you need a redis
cache per version, but that shouldn't be hard to do -- or just add a version
key to each cache request.

Seems weird to have this type of tool as a local tool rather than a cloud
based tool.

It would be neat to have a service you can query with a snippet of IR, and it
gives you back an optimal version of it. One can continually increase the side
of these snippets as you get more cloud resources.

Having statistics on which patterns are common, even if they are not optimized
would be really interesting. You could then use these statistics to target
future optimization research.

~~~
radarsat1
It's a good idea. Keep in mind that an optimiser takes a bit of code and
replaces it with a different bit of code. So a "cloud" solution would
certainly require some thought towards security.

------
unfamiliar
I found some slides with background information

[http://llvm.org/devmtg/2011-11/Sands_Super-
optimizingLLVMIR....](http://llvm.org/devmtg/2011-11/Sands_Super-
optimizingLLVMIR.pdf)

------
mbrock
Maybe a tall order, but if someone has it in them to provide a clear and
simple explanation of SMT/SAT at the level of an intrigued layman, that'd be
amazing. I've looked at the Wikipedia pages a few times and always just feel
kinda overwhelmed. I'm probably missing the education in formal logic/math to
understand why these are fruitful models, but they obviously are. Is this a
kind of forefront of artificial reasoning research? What are some examples of
problems they can help solve, and approximately what would be involved in
modelling those problems within the formalization?

~~~
gsnedders
They're a good model in part because they're the most fundamental model of a
number of equivalent ones, and the one for which there has been the largest
amount of work on creating performant implementations.

------
inDigiNeous
Any examples of usage ?

~~~
lumpypua
And benchmarks?

~~~
pascal_cuoq
Benchmark would not be relevant here. Souper is intended to detect missing
optimizations that could have applied to LLVM code already generated by, say,
Clang, and to point out new patterns to recognize and transform to the LLVM
developers. The improvements brought by Souper are distilled over a long
period and mixed with other unrelated LLVM improvements. On the other hand,
all improvements suggested by Souper have looked like a good idea to an LLVM
developer who let themselves be convinced at one point, and many of them
simply remove instructions and at least reduce the size of the generated
binary, even if nothing else.

~~~
unfamiliar
From the github page:

>Alternatively, you may immediately let Souper modify the bitcode and let it
apply the missed optimization opportunties by using the Souper llvm opt pass.

So it seems like a benchmark of the program before and after this process
would be relevant.

------
ingenter
I wonder if it's possible to do a similar thing for codegen. i.e. generate
more efficient code based on SMT solver and CPU model.

~~~
cwzwarich
It's been done:

[http://theory.stanford.edu/~aiken/publications/papers/asplos...](http://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf)

They had to restrict themselves to instruction sequences of length 3 due to
limited resources, and say that to go beyond instruction sequences of length 4
they would need a better approach.

~~~
fulafel
This has been done in '87 by Massalin:
[https://www.cs.arizona.edu/~collberg/Teaching/553/2011/Resou...](https://www.cs.arizona.edu/~collberg/Teaching/553/2011/Resources/superoptimizer-
massalin.pdf)

(both are brute force though, not solver-based)

~~~
cwzwarich
Massalin's superoptimizer required humans to check the correctness of
optimizations. It also didn't generate parameterized peephole optimizations;
it directly optimized concrete instruction sequences.

------
bhouston
How good is super optimization? Is this available in current compilers in some
form?

~~~
fulafel
It means exhaustive search of possible operation sequences to find the optimal
one. So it's optimal but impractical. The applications are processing very
short bits of code, and trying to apply the result , eg dropping it in some
real-world code or informing humans about compiler optimizer deficiencies.

