Hacker News new | past | comments | ask | show | jobs | submit login

Great writeup! This is a nice overview of how LLVM's optimization passes compose to produce nicely optimized assembly.

One nitpicky observation: mem2reg doesn't "move values from memory [...] to registers (directly inside the CPU)." It's an SSA expansion pass, taking the minimal SSA form produced by the C/C++ frontend to turning it into a more aggressive SSA form by eliminating as many `alloca`s as possible. SSA is characterized by the presence of infinitely many "fresh" abstract registers, none of which correspond to actual CPU registers -- LLVM is responsible at a later stage for performing efficient lowering and allocation of registers/stack slots for the SSA registers.




Not nitpicky at all - I was confused as to why one would want to do register allocation before control/data flow analysis. Thanks for the explanation!


Thanks for the nitpick, I should have been more precise and write move value from memory to abstract register otherwise it is indeed confusing. I will fix the article and mention your comment.


Could you give an example of an alloca mem2reg would eliminate?


The article has an example: in the unoptimized IR generated by the frontend, all the variables are stored in memory on the stack and accessed through pointers stored in registers, whereas after the mem2reg pass variables are stored directly in registers.

Memory is mutable whereas SSA registers are immutable, so the compiler frontend just put all the C variables in memory to express the semantics of the C program in the simplest way possible. The mem2reg pass then moved all the variables into SSA registers, by simply creating a new register for each assignment and adding phi nodes to track the flow of variables between basic blocks (like the individual iterations of the loop body).


You've very concisely explained the bits that were unclear to me. Thank you.


I don't think that would be helpful in learning what's going on.

Instead, I'll point you out to the keyword "Dominator Frontier", and "SSA Construction Algorithm".

https://en.wikipedia.org/wiki/Static_single_assignment_form#...

You should read that only after you understand SSA and Phi functions in particular. Once you know how Phi works, the next question is the algorithm for how they're constructed.

https://en.wikipedia.org/wiki/Dominator_(graph_theory)

This page may also help.


https://llvm.org/docs/Passes.html#passes-mem2reg

> -mem2reg: Promote Memory to Register

> This file promotes memory references to be register references. It promotes alloca instructions which only have loads and stores as uses. An alloca is transformed by using dominator frontiers to place phi nodes, then traversing the function in depth-first order to rewrite loads and stores as appropriate. This is just the standard SSA construction algorithm to construct “pruned” SSA form.

-----------

https://llvm.org/docs/LangRef.html#alloca-instruction

> The ‘alloca’ instruction allocates memory on the stack frame of the currently executing function, to be automatically released when this function returns to its caller. If the address space is not explicitly specified, the object is allocated in the alloca address space from the datalayout string.

--------

Based on my reading and understanding of alloca and mem2reg above, I have to disagree with your assessment. It seems like alloca roughly corresponds to a "push" in your typical x86 assembly language, or maybe a "add esp, #ofBytes".

By removing alloca instructions, the mem2reg step is turning stack-memory into registers.


> By removing alloca instructions, the mem2reg step is turning stack-memory into registers.

This is true, but it's also misleading: the transformation of stack slots into machine registers is a beneficial side effect of mem2reg. Reducing stack use is an important optimization, but producing an optimal SSA form is even more important, since key passes like folding, DCE and SRoA rely heavily on the SSA form. The "reg" in "mem2reg" refers explicitly to the latter (SSA registers), as the text you've excerpted directly says.

You can also prove this to yourself by contriving an IR function that contains more SSA registers than machine registers: you'll see that, in addition to any ABI constraints, LLVM will be forced to spill any excess SSA registers back onto the stack.

Edit: but also yes, to confirm: `alloca` corresponds more or less directly to `add ESP, <adjust>` within the x86 stack model. But it doesn't have to! LLVM's semantics are target independent even when the IR isn't.


Gotcha. I think I see what you're talking about now.

I have crude familiarity with SSA (no personal experience but I've read a book once) but not with LLVM's terminology. I'm looking at the example in the blogpost, and I can see that its clearly adding in the important Phi functions needed

Strange that llvm names their aggressive SSA step in this manner. But yes, adding the phi functions in this manner is a very important step before we can think about optimizations.

The "inductive variable" is far easier to detect once its in a "Phi-form" so to speak.


It's an extremely confusing naming choice! I don't know why they did it, but to take a guess: a lot of compiler literature refers to SSA values as "registers," since the SSA model of a computer program directly corresponds to the concept of an (infinite) register machine in abstract computation.

The end result being that we have the same word "register" referring to two opposed resources (one infinite, one extremely finite) :-).


It's not the only place where this terminology confuses people. Another one is "register-based virtual machine", in the context of interpreters. In that case the registers belong to the VM and are actually stored in stack memory, not on actual CPU registers. However, people get that confused all the time!


At least historically LLVM has itself been inconsistent in the naming of SSA "values", and mem2reg is arguably just the most prominent example of the "register" name being used.


> You can also prove this to yourself by contriving an IR function that contains more SSA registers than machine registers: you'll see that, in addition to any ABI constraints, LLVM will be forced to spill any excess SSA registers back onto the stack.

Having more SSA register than machine registers does not mean there will be any spilling as unlike SSA registers, machine registers can be reused once the old value is no longer needed.


Sure. To be more precise: you can prove this to yourself by contriving a function that has more live variables/values than there are machine registers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: