
The Architecture of Open Source Applications: LLVM - ryannielsen
http://www.aosabook.org/en/llvm.html
======
yuvadam
I've been hearing about LLVM for quite some time, but never fully grasped
_what it is_. This article tackles everything you should know about LLVM. I'm
finally starting to get it.

Perfect.

------
obiterdictum
I've never built a compiler, but how well does having common optimizer for all
platforms work _in practice_? (11.5.1) Does it cause a "lowest common
denominator" effect thus making the code take less advantage of the target
architecture than a architecture/vendor-specific compiler would?

I'd love someone with the experience of building compilers to chime in.

~~~
wladimir
If the compiler takes a full machine description into account for each of the
target architectures (AFAIK, LLVM does that. ), it won't be based around a
"lowest common denominator". The machine description is a list of instructions
and what they do and things like timing and scheduling constraints. The task
of the compiler is to find the most optimized instruction sequence that do a
certain thing for the architecture it is compiling for. This principle is the
same for every architecture, so you don't need a specific optimizer per
platform.

~~~
_delirium
That's the theory, but you do run into problems if the entire parameterization
misses some architectural differences that didn't exist when it was first
developed (or weren't important). That's one problem gcc has, that its
architecture description is aged and doesn't have a good way of specifying
some things that are very important for 2010s performance, so has had to
undergo some major surgery (still in progress). LLVM currently avoids that
partly by being newer. It might also be better designed; we'll see how it ages
over the next 20 years. =]

~~~
froydnj
> That's one problem gcc has, that its architecture description is aged and
> doesn't have a good way of specifying some things that are very important
> for 2010s performance, so has had to undergo some major surgery (still in
> progress). LLVM currently avoids that partly by being newer.

LLVM also currently avoids that by not generating code that's as good for
those 2010 architectures.

I would also be interested in hearing you expound upon those things that are
"very important for 2010s performance".

~~~
_delirium
> I would also be interested in hearing you expound upon those things that are
> "very important for 2010s performance".

I'm not particularly an expert in this area, but my understanding from
articles/mailing-lists/etc. was that there were inadequacies in the previous
machine-description and IR frameworks that led to the revamping with Gimple
and MD-RTL over the past few years. Some of that was just maintainability, but
I thought there were also optimization-related issues with the old internals.
Is that not the case?

~~~
froydnj
It's hard to say without reference to specifics; certainly the MD files have
been around since time-immemorial, even if they've been augmented with new
features over the years. And the gimple bits were primarily aimed at making
the in-memory representation more efficient, a goal which they accomplished.

In any event, as you say, things have been revamped, so whatever issues there
were with the old internals no longer apply, right? So there's no point in
dragging out old chestnuts like "GCC can't support modern architectures or
modern optimization techniques", because that's not true anymore, right?

(I apologize if this comes across as harsh; I'm just tired of seeing LLVM
articles where commenters appear to be drinking the LLVM Kool-Aid without
having any idea what the LLVM folks are grousing about. Sometimes the LLVM
folks have a point, sometimes they're just asserting their engineering
decisions are superior, which is debatable, and sometimes they're just
grousing because they don't seem to like GCC. It's hard to say exactly what's
in view from commenters and from the LLVM folks themselves.

GCC currently supports 8-bit microcontrollers, 32/64-bit desktop chips, a few
"nonstandard" VLIW and DSP architectures, and lots of other chips in between.
I, for one, am impressed with how much Clang and LLVM have done, but I'll also
be more impressed if, in a decade and a half, LLVM's architecture hasn't
acquired some warts and it seriously supports more than two architectures.
After all, GCC was, in many ways, state-of-the-art when it first came out
too...)

~~~
Someone
A child will grow old, but that does not change the fact that it is young
_now_.

LLVM _is_ the new kid on the block, and gcc _is_ old. GCC also is wise, but I
think, given a) that we learned a lot about compiler writing since gcc first
appeared and b) the support that LLVM has both in business and in academia,
the writing is on the wall for gcc as the favorite compiler, first for X86 and
ARM, later for other architectures.

Of course that may change; gcc can evolve. However, I doubt that will happen
fast enough. the FSF does not like allowing proprietary compiler plugins and
(I guess) has to little manpower to work on gcc.

~~~
froydnj
> the FSF does not like allowing proprietary compiler plugins and (I guess)
> has to little manpower to work on gcc.

Fortunately, the FSF is not the entity driving development of GCC. In fact, I
can't remember a commit in the last five years (there's been ~60k commits, so
there'd be plenty to choose from) made by an FSF employee. So there are plenty
of people and companies focused on moving GCC forward.

------
forgottenpaswrd
Wow

Very good article, I started liking LLVM some time ago. Now I will use it on
my programs, it seems really easy to do, I agree so much with the gcc design
flaws(that forced me to create my own parsers and code generators).

------
henning
As this essay discusses, GCC has serious technical flaws. LLVM is one of the
best examples I've ever seen of "the best way to complain is to make things."
That and the iPhone.

~~~
gaius
It is said that these "flaws" are deliberate in GCC, to prevent it being
reused as a component (e.g. as a syntax highlighting engine) in someone else's
IDE that hides that it's using GCC under the hood.

~~~
shadowfox
Interesting. Is there any source for this speculation?

~~~
prodigal_erik
This isn't a very controversial theory:

<http://lwn.net/Articles/259196/>

<http://lwn.net/Articles/301135/>

~~~
schrototo
Wow, so they're actively preventing progress for political reasons? Why don't
they try to solve their ideological licensing problems with, I don't know,
licensing solutions? Crippling the underlying technology doesn't seem like a
very hacker thing to do...

------
nhaehnle
To anybody curious about how LLVM looks on the inside, I recommend reading
their excellent tutorial on building a front-end, which you can find here:
<http://llvm.org/docs/tutorial/>

------
akkartik
Looking at the big picture raised one question for me: how does LLVM deal with
dependencies between optimization passes?

It is well-known, for example, that the whether you do register allocation
before or after instruction scheduling can have huge implications for
performance. Some programs prefer scheduling done earlier, and some prefer it
later. gcc's optimization flags switch phase orderings among other things.

Does LLVM have any abstractions for managing phase orderings and intelligently
picking between them? Or must the optimization classes be manually
instantiated in the right order?

~~~
Someone
LLVM leaves that problem for users of the library. Users instantiate
optimization passes and specify in what order to call them. For example, 'opt'
has (<http://llvm.org/cmds/opt.html>):

    
    
      -{passname}
    

opt provides the ability to run any of LLVM's optimization or analysis passes
in any order. The -help option lists all the passes available. The order in
which the options occur on the command line are the order in which they are
executed (within pass constraints).

    
    
      -std-compile-opts
    

This is short hand for a standard list of compile time optimization passes.
This is typically used to optimize the output from the llvm-gcc front end. It
might be useful for other front end compilers as well. To discover the full
set of options available, use the following command:

    
    
      llvm-as < /dev/null | opt -std-compile-opts -disable-output -debug-pass=Arguments
    

I do not know whether compilers 'included' with LLVM (clang, llvm_gcc) have
support for tweaking the order. If they do not, you can always let them write
unoptimized IR, and pipe the output through opt.

------
kenjackson
Does anyone understand their purchasing table? It seems like some money goes
unaccounted for: <http://www.aosabook.org/en/index.html#purchase>

~~~
Tuna-Fish
Receiving money costs money.

