
How Clang Compiles a Function - stablemap
https://blog.regehr.org/archives/1605
======
abainbridge
How my brain optimizes sentences:

> we’re not going to look at any non-trivial C++ features

Becomes: we're only going to look at trivial C++ features.

Which is then recognized as a no-op and replaced with the null sentence ;-)

Nice article though.

~~~
MaxBarraclough
Careful with that brain. It's where the nasal demons come from.

~~~
MisterTea
Nasal demons come from little plastic bags.

~~~
MaxBarraclough
I think I may have lost some people -
[http://www.catb.org/jargon/html/N/nasal-
demons.html](http://www.catb.org/jargon/html/N/nasal-demons.html)

------
mangoli
Clang noob here. Can someone explain in non tech terms what’s happening in
this article?

~~~
JdeBP
You possibly need some background concepts.

Compilers work by taking a (computer-)linguistic analysis of the source code
text, a _syntax tree_ , converting it into an _intermediate representation_
(IR) which is then progressively _lowered_ to the raw machine code form by a
series of transformations.

A connected graph of _basic blocks_ (BBs) is generally how such IRs work.
Basic blocks have a formal structure involving them being connected to other
basic blocks only at their tops and bottoms, and not being multiply-connected
internally. A perhaps simplistic way to visualize them is as the polygons on a
flowchart.

So the article, which assumes that you know this background, is showing what
kinds of basic blocks in Clang's intermediate representation result, before
any lowering and optimizations, from the syntax tree. It's skipping over the
part where a syntax tree is constructed from the source code shown.

In many compilers, you don't get to see the intermediate representation. In
compilers like OpenWatcom C/C++ and GCC you can always go and look at the
source code to see what the IR data structures are. But the IR is in memory
(or internal temporary files) and not accessible outwith the compiler programs
themselves. Clang's design involves a publicly specified intermediate
representation that has documented serialization formats for writing to file
and then accessing with other, potentially third party, tools.

* [https://llvm.org/](https://llvm.org/)

~~~
yodsanklai
> A connected graph of basic blocks (BBs) is generally how such IRs work

I was wondering. When is this graph actually built? is it before creating the
IR or after? The way I see it, you can generate IR without worrying about
control flow graph. Then you can build the CFG from the IR in order to perform
common optimization or register allocation. Is that correct?

~~~
jcranmer
In LLVM, a function is linked list of BBs, each of which is a linked list of
instructions. The final instruction in each basic block is a special kind of
instruction that includes a list of all possible successor BBs.

In effect, the IR is exactly a control-flow graph represented in adjacency
list form, so it's not possible to construct the IR without constructing the
control flow graph. You could theoretically write the IR in textual form, but
that's definitely not the common case, and you generally need to construct the
CFG anyways to properly emit the IR (particularly since adding the phi nodes
for SSA requires knowing the predecessors of every basic block).

