
Jump Threading - nkurz
http://beza1e1.tuxen.de/articles/jump_threading.html
======
xpaulbettsx
The Windows OS build process does some of this type of optimization too, via a
process called BBT
([http://www.microsoft.com/windows/cse/bit_projects.mspx](http://www.microsoft.com/windows/cse/bit_projects.mspx))
- this is why if you disassemble Windows OS binaries, you'll see some weird
unconditional jumps and other "odd" disassembly

------
DannyBee
" I'm confident termination here can be very directly mapped to solving the
halting problem."

Why?

Iterative dataflow problems, in particular, capturing second order effects,
etc, are very well known things and often do not require solving the halting
problem or anything like it.

Just propagating analysis facts around (IE "i can thread this jump"). In fact,
i'm fairly confident standard jump threading (which this isn't quite) can be
split into an analysis and optimization path.

This is because you already know what the CFG will look like if you perform a
particular jump threading.

This is similar to PRE and other situations where you know what the end result
of a given expression being PRE'd will be, and thus, you can figure out
whether what the full end state of the transform will be, regardless of
whether you do it.

In fact, if you couldn't do this, you couldn't do LCM as two unidirectional
dataflow problems.

Jump threading seems much the same. Completely decomposable, regardless of
whether you perform a given path of optimizations or not.

The only question is the expense of doing so.

~~~
MatzeBraun
I am not convinced it's that simple if 1 application of jump threading creates
a new opportunity. The endlessly unreliable loop example in the post suggests
that is is a halting problem to find out whether we should stop jump threading
because we are unrolling an endless loop or not.

~~~
DannyBee
First, that example is not jump threading, since its cloning code and CFG
blocks.

[https://en.wikipedia.org/wiki/Jump_threading](https://en.wikipedia.org/wiki/Jump_threading)

is a better description of jump threading.

In what the author describes as jump threading, you have the same problem as
loop peeling or any other loop optimization that may get applied to an
infinite loop and want to duplicate per-iteration code from that loop.

Yet somehow, as you'll see, we can prove things about the end states of those
optimizations. ;-)

------
userbinator
_at the expense of code size. For hardware with branch prediction, speculative
execution, and prefetching, this can greatly improve performance_

...up to the point when other code starts getting pushed out of the cache.
This seems like an odd statement to make, since the transformation appears to
be, like loop unrolling, of most benefit to processors which do _not_ have
features like branch prediction/OoO and thus very strongly prefer to execute
"straight-line" code with as few branches as possible. Examples include early
RISCs and the Pentium 4.

Also, the article describes something slightly different from both the GCC
manual and the Wikipedia article - these describe an optimisation which
rewrites chains of jumps into direct ones and does not involve duplicating
code.

~~~
qznc
If you convert branches into straight jumps, then branch prediction cannot be
wrong anymore. Thus improvement.

Without branch prediction and speculative execution, a branch and a jump
instruction should be equivalent, hence no improvement. However, it should
still yield an improvement because there is no condition to evaluate anymore,
hence less code.

~~~
heinrich5991
>If you convert branches into straight jumps, then branch prediction cannot be
wrong anymore. Thus improvement.

It's more like: No degradation. And there might be performance degradation by
the additional memory use.

~~~
qznc
I don't understand what you mean with "no degradation".

A branch predictor can be wrong. A simple jump is like the branch predictor is
always right. Thus, the code might run faster.

~~~
heinrich5991
Yes. But a branch predictor can also be right every time. Or it could be right
so often that it is better than the larger memory footprint of the program.

------
vardump
Not saying that this particular optimization is not generally beneficial;
however I'm always a bit wary of local code size increasing optimizations.
Instruction cache miss costs a lot more than a mispredicted branch. Increasing
code size creates more opportunities for a cache miss to occur.

To really know whether or not this is a win, you need to test this in the
context of the whole thing. Microbenchmarks often give a wrong idea. They're
not usually run under stress of certain CPU resources such as L1C, L1D, TLB,
etc. But in real code the situation is different. Other code also needs those
cache entries and may very well perform worse because of a local optimization
elsewhere bottlenecking execution by taking up resources.

------
dalias
Nice to see firm getting noticed. It's a really promising project.

------
dkersten
I remember Mike Pall talking about the use of this technique in LuaJIT.
Interesting topic.

