
Something about IR Optimization - sctb
https://brrt-to-the-future.blogspot.com/2019/03/something-about-ir-optimization.html
======
slavapestov
Every JIT that starts out by generating code directly from expression trees or
bytecode eventually grows an ad-hoc optimizer and IR. I think its probably
better to avoid all the dead ends and lower directly to an SSA IR. You can
then implement tried-and-true optimizations from compiler textbooks, and
concerns such as "optimizing across expression boundaries" or "my expression
tree is actually a DAG" are defined away completely.

~~~
brrt
Hi! Author here. Honest question, because I don't know all the facts:

\- Given that in this IR, every value has a single definition, and every use
must reference that value, I'd argue that this is already SSA form.

\- In fact, the concept of 'variable' is mostly missing, there's only a
repeated reference to the same value. I think that simplifies things a lot,
but maybe too much.

\- Optimizing across expression boundaries is in fact very much the point, if
we can keep the semantics the same.

Thanks for your interest!

~~~
comex
Well... one thing that's usually done as part of converting something to SSA
form is computing the dominator tree. If you did that, you could relatively
cheaply solve this:

> But it also means that the property we're interested in - a value is
> computed before it is used in, in all possible code flow paths - isn't
> really expressible by the IR.

...since you could verify that the blocks a value is used in are dominated by
the block it's computed in.

------
matt0987
Just a note that this IR sounds a lot like the Testarossa Trees IR:
[https://github.com/eclipse/omr/blob/master/doc/compiler/il/I...](https://github.com/eclipse/omr/blob/master/doc/compiler/il/IntroToTrees.md)

In that IR the rule is that sharing cannot pass basic-block boundaries because
of the exact kind of bug you indicate.

~~~
brrt
Thank you for that link! That does look like it is a very similar IR structure
indeed. Very interesting.

------
UncleEntity
> For instance, if two branches of a conditional expression both refer to the
> same value (represented by name $foo) then the code generator will only emit
> code to compute its value when it encounters the first reference. When the
> code generator encounters $foo for the second time in the other branch, no
> code will be emitted.

Isn't that what phi nodes were designed to fix?

~~~
ridiculous_fish
I don't think so - phi nodes solve the "problem" of expressing a value in SSA
one-definition form, when that value is control-flow dependent. But this is
really an artifact of SSA form.

This seems like a much weirder problem, in which the optimizer is unaware of
control flow and so will allow a value to reference an instruction that does
not dominate it.

LLVM's IR verifier ensures that instructions dominate all of their uses, which
simply means that all users of an instruction must provably execute after that
instruction. This verification step seems to be what's missing.

~~~
brrt
Hi! Author here. I think you hit the nail on the head here. The optimizer _is_
unaware of the control flow, because the control flow is _added_ by the code
generator.(And with it, the distinction between computing a value and
referencing it). It's a control-flow-less IR, sort of.

