If I understand correctly, this would require some correlation between disparate forms of optimization. I would fear that the each optimization would alter/rewrite insns significantly enough that there may not be enough correlation to build those search shortcuts. Too many guess-and-check-cost across all permutations might still be expensive.
Along the same lines of determining which combination of optimizations yields the best overall, I wonder if one could train a NN to do it. Maybe there's some work in the area, I don't keep up with research. But in general I would imagine that with a large enough trained set, the NN could say a spill in case X on arch Y is likely better than a re-eval. But then you get into non-determinism which scares devs. Just a thought.
We use node splitting in the TurboFan compiler, but only do limited splitting during the scheduling phase when it detects that the placement of a node that has multiple uses would produce partially redundant code. Splitting these nodes allows them to be scheduled independently (essentially, moved down the schedule, avoiding placing them on paths where they would be redundant), but of course duplicates code.
In TurboFan, register allocation happens after instruction selection, which happens after scheduling. Instruction selection depends on the schedule, since it generally does not want to combine instructions across basic block boundaries because applying a selection tile across basic blocks might pull computations (even if they are just micro-ops) into a loop.
Rematerialization is yet another optimization that works against CSE. Instead of spilling, the register allocator recomputes a value. This undoes CSE much, much later, only when the register assignment is partially known.
IMO closing that whole optimization loop is just prohibitively expensive (in compile time).
On the other hand, measuring tiny differences in code quality seems to be no longer possible on modern processors due to all the runtime nondeterminism (turbo boost, cache state, reorder buffer state, interrupts, core migration), so how would we validate optimizations like this with such a small impact?
How do you validate these optimizations? Dunno, I'd recommend looking at Agner Fog's benchmark tools. But torturing the optimizer to make a test suite number go down will eventually end in tears.