One nitpicky observation: mem2reg doesn't "move values from memory [...] to registers (directly inside the CPU)." It's an SSA expansion pass, taking the minimal SSA form produced by the C/C++ frontend to turning it into a more aggressive SSA form by eliminating as many `alloca`s as possible. SSA is characterized by the presence of infinitely many "fresh" abstract registers, none of which correspond to actual CPU registers -- LLVM is responsible at a later stage for performing efficient lowering and allocation of registers/stack slots for the SSA registers.
Memory is mutable whereas SSA registers are immutable, so the compiler frontend just put all the C variables in memory to express the semantics of the C program in the simplest way possible. The mem2reg pass then moved all the variables into SSA registers, by simply creating a new register for each assignment and adding phi nodes to track the flow of variables between basic blocks (like the individual iterations of the loop body).
Instead, I'll point you out to the keyword "Dominator Frontier", and "SSA Construction Algorithm".
You should read that only after you understand SSA and Phi functions in particular. Once you know how Phi works, the next question is the algorithm for how they're constructed.
This page may also help.
> -mem2reg: Promote Memory to Register
> This file promotes memory references to be register references. It promotes alloca instructions which only have loads and stores as uses. An alloca is transformed by using dominator frontiers to place phi nodes, then traversing the function in depth-first order to rewrite loads and stores as appropriate. This is just the standard SSA construction algorithm to construct “pruned” SSA form.
> The ‘alloca’ instruction allocates memory on the stack frame of the currently executing function, to be automatically released when this function returns to its caller. If the address space is not explicitly specified, the object is allocated in the alloca address space from the datalayout string.
Based on my reading and understanding of alloca and mem2reg above, I have to disagree with your assessment. It seems like alloca roughly corresponds to a "push" in your typical x86 assembly language, or maybe a "add esp, #ofBytes".
By removing alloca instructions, the mem2reg step is turning stack-memory into registers.
This is true, but it's also misleading: the transformation of stack slots into machine registers is a beneficial side effect of mem2reg. Reducing stack use is an important optimization, but producing an optimal SSA form is even more important, since key passes like folding, DCE and SRoA rely heavily on the SSA form. The "reg" in "mem2reg" refers explicitly to the latter (SSA registers), as the text you've excerpted directly says.
You can also prove this to yourself by contriving an IR function that contains more SSA registers than machine registers: you'll see that, in addition to any ABI constraints, LLVM will be forced to spill any excess SSA registers back onto the stack.
Edit: but also yes, to confirm: `alloca` corresponds more or less directly to `add ESP, <adjust>` within the x86 stack model. But it doesn't have to! LLVM's semantics are target independent even when the IR isn't.
I have crude familiarity with SSA (no personal experience but I've read a book once) but not with LLVM's terminology. I'm looking at the example in the blogpost, and I can see that its clearly adding in the important Phi functions needed
Strange that llvm names their aggressive SSA step in this manner. But yes, adding the phi functions in this manner is a very important step before we can think about optimizations.
The "inductive variable" is far easier to detect once its in a "Phi-form" so to speak.
The end result being that we have the same word "register" referring to two opposed resources (one infinite, one extremely finite) :-).
Having more SSA register than machine registers does not mean there will be any spilling as unlike SSA registers, machine registers can be reused once the old value is no longer needed.