
A Tourist’s Guide to the LLVM Source Code - zdw
http://blog.regehr.org/archives/1453
======
nrjdhsbsid
Awesome overview! The author threw a hilarious quip in at the end too:

 _Target: the processor-specific parts of the backends live here. There are
lots of TableGen files. As far as I can tell, you create a new LLVM backend by
cloning the one for the architecture that looks the most like yours and then
beating on it for a couple of years._

~~~
gravypod
I've heard this many times that llvm targets are really hard to do. I don't
really understand why. The code has already been parsed, optimized, and
cleaned into a presumably clean format. What is the issue? Is the framework
provided by llvm just crappy?

~~~
david-given
A while back I got about half of an LLVM backend for the VideoCore IV done
(since completed by someone else). I eventually ground to a halt due to
intractable bugs; later, someone picked it up and completed it.

This is from years ago, mind, but my recollection of the pain points were:

\- enormously steep learning curve; very little of this layer is documented
and there's a lot of concepts hidden behind layers of abstractions hidden
behind other layers of abstractions, all of which you need to understand, but
they're all cyclic. There's nowhere to start; everything you look at depends
on your understanding of something else.

\- very hard to debug --- the compiler's main reaction to things going wrong
is to seg fault, which requires debugging through compiler internals (and
_understanding_ the compiler internals) to fix.

\- holes in the tools. After generating the IR node tree, the main piece of
machinery in the compiler is a pattern matcher which finds machine code
instructions for sets of nodes in the tree. This is automatically generated by
tablegen, but there are some patterns that it can't match, which means that
along with the tablegen matcher you also need to write a manual matcher in C++
for the other patterns, which is very verbose and difficult to keep track of.
Also, the tablegen matcher is pretty brittle and has a tendency to either hang
or match the wrong pattern if you don't get it just right. Understanding why
it's doing what it's doing is non-trivial.

The main problem, though, is that the pool of knowledge of how to write
targets is restricted to a very small number of people --- it's all what an
ex-boss of mine used to refer to as 'tribal knowledge'. So that when you run
into a problem, you're reliant on finding one of these people who has enough
time or incentive to help. Since the problems you run into tend to be complex,
helping is hard, so getting good help isn't easy, and so on.

I should add that gcc suffers from _exactly the same set of problems_ , except
possibly a little more so. gcc's get better target layer documentation, but
the LLVM community is much more helpful; gcc is simpler, but LLVM I think has
the edge of good design. They're both much of a muchness.

It's frustrating, as compiler backends aren't _that_ hard. There should be a
compiler somewhere which combines easy porting with adequate code
generation[1], for people who don't want to spend the resources needed for an
LLVM or gcc port.

[1] I'm actually working on this.

~~~
speps
> There should be a compiler somewhere which combines easy porting with
> adequate code generation[1] > [1] I'm actually working on this.

Anything to share yet? Will it take LLVM IR as input or are you targeting a
specific language? I'm very interested. So far, I found only the Plan 9 C
compilers seemed to have relatively straightforward code as it was designed
for this.

~~~
david-given
I have mostly-working code --- but the register allocator I was using turns
out to be pretty pants, so it's generating lousy code and is nowhere ready to
look at yet. I'm taking time off to learn about graph colouring to try and do
a proper job. (Turns out graph colouring on irregular architectures is kinda
exciting, and not in a good way.)

It's not actually based on LLVM at all; I'm working with a completely
different (and simpler) compiler suite, albeit using some of the same
principles. The code quality is never going to be _good_ , but I'm hoping the
end result will allow a compiler port to any random relatively sane
architecture in only two source files.

~~~
tom_mellior
> Turns out graph colouring on irregular architectures is kinda exciting, and
> not in a good way.

Have you looked into PBQP register allocation? It's almost as simple as basic
graph coloring but handles irregularities in a nice and disciplined way.

~~~
qznc
I helped building the one in libfirm and I agree. The x86 architecture is
quite regular and we have not used it for others in practice. However, for all
irregularities I have come across I could immediately see how to model it in
theory.

~~~
tom_mellior
Oh, I thought PBQP was the only register allocator in libfirm. What do you use
for other architectures?

~~~
qznc
The default is a graph coloring algorithm. Well actually a "recoloring"
algorithm due to its SSA-based nature.

There is also an ILP one, which is only useful for comparison.

------
the_duke
100+ points, first place on front page, and not a single comment.

Now that's rare.

I assume everyone is excitedly browsing [https://github.com/llvm-
mirror/llvm](https://github.com/llvm-mirror/llvm)? ;) ( the official repo at
[http://llvm.org/git/llvm](http://llvm.org/git/llvm) is down)

~~~
akavel
I have a theory that since the introduction of "favorites" on HN, people more
often fave something as a "Read Later [Totally Some Day, Really, Only Just...
Not Today... But I Will Soo Come Back To It]" bookmark. Thus "lists of 100
interesting books", "algorithms you must know", "write yourself an OS in 99
days", etc. etc., I feel they have tendency to behave like that. (The theory
is based partially on what I observed looking at myself ;). But please note
it's totally just a loose theory, I don't have data or whatever, nor I'd judge
this as bad or good. Only observing with curiosity.

~~~
pjmorris
Anec-data: I favorite things as a "Read Later [Totally Some Day, Really, Only
Just... Not Today... But I Will Soo Come Back To It]" bookmark.

Well, mostly. I do read some percentage of the articles and comments as I go,
and I do go back to see updates to the comments... and to, every once in
awhile, read the fave'd article.

I feel like I ought to have a side project that uses my HN favorites, pinboard
links, comments, etc to seed a recommendation engine of 'other stuff you
should probably read'.

~~~
ChickeNES
> I feel like I ought to have a side project that uses my HN favorites,
> pinboard links, comments, etc to seed a recommendation engine of 'other
> stuff you should probably read'.

I would totally use it. I considered writing something myself, but I have too
many on-going side projects as it is unfortunately.

------
omouse
This is the kind of doc that should be written about work projects as well. I
always love seeing great docs that make the job of understanding a codebase
way simpler.

------
nuclx
Regehr's blog is an invaluable source of practically relevant SE knowledge.

------
crb002
I've been putting it off, but I am definitely learning enough LLVM IR to code
the 57 exercises for programmers.

Anyone interested in banging out the LLVM IR solutions to
[http://rosettacode.org](http://rosettacode.org)? It looks like a few exist,
but not many:
[http://rosettacode.org/wiki/99_Bottles_of_Beer/Assembly#LLVM](http://rosettacode.org/wiki/99_Bottles_of_Beer/Assembly#LLVM)

------
zump
Only problem is the CamelCase is incredibly verbose.

~~~
swah
That's critical for enterprise adoption! More seriously, I guess some "design
patterns" really ought to appear in a project this big..

~~~
eliben
I wouldn't call CamelCase naming a design pattern, but I agree that a large
project should stick to consistent coding guidelines - which LLVM does
[[http://llvm.org/docs/CodingStandards.html](http://llvm.org/docs/CodingStandards.html)]

Any particular style grows on you with time. It's more important to make sure
the code is internally consistent.

~~~
swah
Sorry, I didn't realized the op was really criticizing the casing, I though it
was about AbstractFactoryGenerator kinds of names.

