However, I don’t recommend using this IR style (aka sea of nodes) because:
- The original CFG and the original ordering of operations contains useful data. You want an IR that preserves that at least up to the point where some pass proves that a better order exists.
- Lots of optimizations can be made cheaper to run by using basic blocks, so it’s good to have an IR in which control flow nodes are basic blocks. Load elimination is an example.
- IRs need to be intuitive to the people who work on the compiler. Sea of nodes is more challenging to think about than SSA over CFG.
- There does not exist a program transform or analysis that is uniquely enabled by sea of nodes. Those program transforms that are made easier by sea of nodes are pretty easy to write in SSA over CFG as well.
I worked on V8 for awhile, which uses a sea of nodes representation. This point really resonates with me. I and many of my teammates never completely developed an intuition for the different edges in the graph. There was a lot of trial and error involved.
I remember reading this paper and thinking "wow! this is the solution to all the problems I never knew I had." The generality and simplicity of a single graph that does everything is appealing. But in practice, I think you'd most often be better with SSA over CFG.
Take a look at mlir. It's a very novel and powerful abstraction that can be used and applied in very creative ways to solve problems that require translation from source syntax to some target(s) (machine, api, a mix).
tensorflow is the biggest user of mlir so far. The project recently moved to llvm repo to be used by other llvm sub projects and because it's the natural place for it to live.
for example, c/c++ -> mlir -> llvm ir can allow many optimizations that cannot be possibly done at llvm ir level given that it would have lost a lot of context from its original source(the language syntax).
The whole thing about using it for C++ optimizations... I’ll believe it when I see a statistically significant speedup. Until then it’s vapor.
As for c/c++ specifically, you can optimize much better and safer if you have 1 or more intermediate dialects prior to llvm's IR. You don't need to believe it. This is compiler 101.
I’m saying that MLIR is unproven for that purpose outside of the ML space.
It does not take 10 years to implement an SSA compiler with LICM, SCEV, and good register allocation. (No idea why you'd want Professional Employer Organizations as part of your RA, but that's just me.) For example, WebKit's B3 compiler took 4 months to write. It's not hard to write it if you know what to write. Hence the need for a good book. Maybe the reason why you think it takes ten years and Professional Employer Organizations is that you never read a good book on the subject!
I learned by interning on a team that knew how to do it right and learning from them and then learning more by word of mouth - asking folks who architected compilers how/why they did things.
They're a (relatively) central point for this kind of metadata.
> You may be in a position to push on that idea though!
Ah, unfortunately probably not, I'm very much on the receiving end of the data.
> Papers are always so careful to point to grant info (when applicable),
There's a lot here - surprisingly common for this not to be the case! Funders definitely want this though.
Very few papers keep getting cited and talked about so long after publication!