
Branchless code sequences - mnem
http://www.davespace.co.uk/blog/20150131-branchless-sequences.html
======
wsxcde
Scanning
[http://www.hackersdelight.org/aha/aha.pdf](http://www.hackersdelight.org/aha/aha.pdf),
it looks like Aha performs some form of exhaustive search over all programs?
An SMT solver is probably a better way of doing this search; there's been some
work in this direction.

Jha, Gulwani and colleagues have a very nice paper on oracle-guided synthesis.
They show how you can generate loop-free bitvector programs using SMT:
[http://www.eecs.berkeley.edu/~sseshia/pubdir/synth-
icse10.pd...](http://www.eecs.berkeley.edu/~sseshia/pubdir/synth-icse10.pdf).
John Regehr has also written about this on his blog:
[http://blog.regehr.org/archives/1146](http://blog.regehr.org/archives/1146).

Part of the complexity here in an SMT solution might be encoding all the
bizarre instructions that you could possibly take advantage of in x86-like
instruction sets.

~~~
TheLoneWolfling
Never underestimate the utility of brute force & ignorance.

------
andrewchambers
This sort of thing was automated to entire binaries in this paper.

[http://theory.stanford.edu/~aiken/publications/papers/asplos...](http://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf)

They managed to beat gcc -O3 - "we show speedups from 1.7 to a factor of 10 on
some compute intensive kernels"

~~~
andrewchambers
O2* , here is a cached version
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf)

------
haberman
Interesting, I'd never heard of the technique of searching all possible
instruction sequences to find one with desired characteristics. It's the
Infinite Monkey Theorem for code.

~~~
molyss
Here's the original paper on super optimization by Alexia Massalin :
[http://courses.cs.washington.edu/courses/cse501/15sp/papers/...](http://courses.cs.washington.edu/courses/cse501/15sp/papers/massalin.pdf)

I ran into it a couple years ago, after hearing from other research from her.
Her papers are really impressive. I would have loved that the author of the
article had mentioned her.

On a side note, I wouldn't be surprised if the JIT compilers were doing some
use of this, allowing some interpreted language to have better performances
than compiled C on some benchmarks

EDIT : formatting and JIT addition

~~~
kevinnk
JIT compilers are time constrained so super optimization isn't a very good
match.

~~~
chrisseaton
Why are JIT compilers time constrained? If you have a banking system that runs
all day every day, why does it matter if it takes 5 minutes to compile the hot
spots at the top tier of optimisation?

~~~
oso2k
Imagine you happen to walk up to your ATM when it's Mother Banking System is
in the middle hotspot compile followed by a GC pause. What would you do if
your ATM took 5 minutes+ to respond to your Withdrawal Request?

You'd probably freak out and worry about getting ATM card back.

~~~
chrisseaton
The program doesn't have to stop to run the JIT - it can run it on a
background thread.

------
jamesrom
Is there any other reason to use a tool like this other than eliminating
branch misprediction?

~~~
rdc12
Superoptimisers can be used on any sequance of instructions, to find a more
optimial sequance. Since they approach the problem in a very different way to
a compiler, it can do a better job in some cases. And it can work on hand
written assembler too.

They tend to be good at finding obscure solutions.

------
darkmighty
Anyone know why compilers don't already do this for small branches? Not worth
the effort?

~~~
AReallyGoodName
You miss out on branch prediction. There's a bit of a myth that branches are
always bad because a branch misprediction is slower than a correctly predicted
branch and therefore no branches are better.

A bit of code that eliminates branches but has dependant variables (A must be
computed before B can be computed) will have pipeline stalls while the
computation for A is completed. Yes it might be faster than a branch
misprediction but it will never be as fast as a correctly predicted branch
(which takes effectively 0 cycles).

Notice the one thing missing in this post? Benchmarks!

~~~
duaneb
Among other things, there are branch prediction optimal algorithms that either
a) only mispredict the final branch, or b) minimize amortized mispredictions.

------
wehadfun
what kind of a bump does this produce

~~~
chipaca
Goosebumps.

