
No Compiler – On LLVM, and writing software without a compiler - btrask
http://msm.runhello.com/p/1003
======
pcwalton
> And actually we’re now entering a world where at minimum supporting LLVM as
> a backend option is mandatory.

I would have said this too a few years ago, but Go is a huge counterexample.
It's become very popular without using LLVM, and the compiler that does use a
mature, optimizing backend (gccgo) doesn't get nearly as much use as the Plan
9-based one.

I should also mention that increasingly it's become popular to have a mid-
level IR that you lower the AST to before lowering to LLVM IR. GHC does it,
Swift is doing it, and Rust is working on doing it. The reasons are (a) you
can do language-specific optimizations like devirtualization, which are often
pretty hard to do in LLVM; (b) going from AST to LLVM IR is often a big, messy
leap for any language that isn't C.

~~~
exDM69
> I would have said this too a few years ago, but Go is a huge counterexample.

I've been wondering about this ever since Go came out. If I recall correctly,
LLVM was already a viable option back then.

The latest release of Go earlier this week featured a pretty big revamp of
their backend with a new SSA-based middle/backend. I wonder how much work went
into that and could they have achieved that faster had they been on LLVM since
day 1.

~~~
amelius
I'd like to know what is the downside of using LLVM. Why did they choose to
write their own optimizer?

Anyway, the most difficult part of writing the compiler was probably the
garbage collector.

~~~
pjmlp
> I'd like to know what is the downside of using LLVM.

Depending on C++ code and keep alive the belief that you can only create
compilers in C and C++?

As many without experience in compiler development think.

Also, and this is specially painful when compiling Rust or Swift from source,
saving themselves the pain of waiting around 4h for a clean compile.

~~~
kibwen
LLVM does take a long time to compile, but four hours is an exaggeration. A
mid-range computer can compile it in 30 minutes with -j4, and a good computer
can compile it in 10 with -j8.

~~~
pjmlp
Core duo netbook with 4GB RAM and 5400 RPM hard disk.

I can assure you it really does take 4 hours.

And to bring the point home, I can link you to the Swift issue where I was
discussing this matter.

~~~
bitJericho
A core duo isn't even in the realm of mid-range. That's very nearly a 10 year
old processor!

~~~
pjmlp
Yeah, but it works perfectly well as a travel netbook for coding on the go.

Also I never had 4h builds with single core processors back when I was working
full time with C++. The slowest ones were around 2h.

A core duo would run circles around those computer systems, hence my dismay
how much hardware LLVM requires for building.

~~~
bitJericho
I would imagine compiler design has become much more complex in 10 years, and
a low end i3 is something like at least 3 times faster than a core duo.

~~~
pjmlp
I would agree if other compilers, not LLVM based, took as much to compile.

Also that is surely a side effect how they are using C++, which desperately
needs modules.

~~~
pcwalton
Those compilers don't do nearly as much optimization as LLVM does. It's hard
to overstate how wide the gap between the classical compilers you like and
LLVM/GCC truly is. LLVM has no fewer than three IRs, each with a full suite of
compiler optimizations (four if you count ScheduleDAG). LICM, just to name one
example, runs _at each stage_ (see MachineLICM.cpp). Turbo Pascal did nowhere
near that level of optimization, and it produced unacceptably slow programs by
modern standards.

------
erichocean
You may be better off building up the LLVM IR as text, vs. using the FFI
builder. (If you want/need type safety as opposed to just waiting for the LLVM
verifier to tell you what went wrong, you could write the builder in Lua.)

Here's a link with a Python example (along with a justification for this
approach): [http://eli.thegreenplace.net/2015/building-and-using-
llvmlit...](http://eli.thegreenplace.net/2015/building-and-using-llvmlite-a-
basic-example/)

------
drmeister
So I've done basically the same thing with a few differences: (1) I chose
Common Lisp as the language I would implement (2) I used C++ template
programming to automatically generate wrappers for C++ library functions (like
boost::python and luabind) so that I could expose C++ functions directly to
Common Lisp. That allowed me to use the llvm C++ API directly.

The project is called clasp -> github.com/drmeister/clasp Send me a message -
we should talk and compare notes.

------
vbit
Instead of generate LLVM code, you could instead generate 'typed Lua' using
Lua as a metaprogramming language. Then the 'typed Lua' can get compiled down
by LLVM into object code.

This is what [http://terralang.org](http://terralang.org) does.

~~~
fizixer
I never understood in simple terms what terralang is actually about. Even in
your sentence, if typed Lua is being "generated" (i.e., it's the target) how
come Lua is being used as a metaprogramming language (i.e., a language that
does the generation).

I understand generative metaprogramming as consisting of a language that a
human deals with, a language that gets generated. In the case of Terra/Lua
it's not clear which generates and which is generated, or whether it's is a
completely different approach (to which I say, what's the point).

On the Terra webpage, it says 'Terra is low-level' but then 'Terra is code
embedded in a Lua meta-program', not to mention 'Terra-Lua programs'.

~~~
smaddox
Essentially it's an implementation of the concept that metaprogramming should
use the same semantics and syntax as normal programming. This is the case in
Lisp, for example, but Lisp is not (or not typically) compiled, so Lisp
metaprogramming is essentially limited to defining syntactic sugar. In Terra,
metaprogramming (or multi-stage programming, as they call it), can actually
result in improved performance compared to a typical C-like implementation of
the same algorithm.

The website has a couple papers that provide benchmarks for some interesting
test cases.

~~~
zeveb
> This is the case in Lisp, for example, but Lisp is not (or not typically)
> compiled …

Say what? Lisp is often compiled; the standard extensively discusses
compilation[1] and — to address your point re. performance-improving
metaprogramming — compiler macros[2] specifically exist in order to advise the
compiler, e.g. for performance.

So, no, Lisp macros are _not_ limited to defining syntactic sugar. It's like I
keep on saying: the Common Lisp standard from 1994 contains with its covers
functionality that people _still_ don't know about.

[1]
[http://www.lispworks.com/documentation/HyperSpec/Body/03_b.h...](http://www.lispworks.com/documentation/HyperSpec/Body/03_b.htm)

[2]
[http://www.lispworks.com/documentation/HyperSpec/Body/26_glo...](http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_c.htm#compiler_macro)

------
daniel-kun
Been there, done that.

See [https://github.com/daniel-kun/omni](https://github.com/daniel-kun/omni)
(look at the little gif in the “Status Quo” section, it will say more than 100
words).

I have the “Code Model” (that is what I call the AST, because there is no
Syntax) up for a very little, C-like language, and I have a frontend that can
manipulate parts of it. However, these are not yet combined, and I stopped
working on it because I am in the middle of transforming my efforts into a Web
App. This will allow you to edit and run a code model in your browser, and
download a compiled binary when you are finished. Btw., you will be able to
link code elements such as classes to online-ressources such as a task
tracker, a diagram, a mindmap, etc. to have the whole ALM process embedded
into your code base.

Sounds good? Sounds good to me.

------
smaddox
Interesting article. Looking at the code, it's pretty clear that this is not a
particularly practical approach, but I think it points in the direction of
where language design should head. Abstractions shouldn't be forced, unless
absolutely necessary; forced abstractions lead to inefficiency.

I've been working on a related concept, but more along the lines of developing
a low-level IR (LLIR) and corresponding syntax that allows direct coding in
the LLIR. It would essentially be a portable assembly language. The LLIR could
be then be compiled down to machine code for different architectures, or even
run directly in a VM.

This isn't too different from the LLVM concept, except the IR would be a
practical programming language, much like assembly. Furthermore, the IR would
provide access control primitives (reference capabilities) that most languages
do not support, but that are extremely useful for low-overhead secure
computing and concurrent computing. These primitives could be especially
useful in the design of exokernel OS's, and related applications.

~~~
fritzo
Gosh, I thought C was "portable assembly". How would your LLIR differ from C?

~~~
eru
C is not really portable assembly. It's full of undefined behaviour, for one.

------
couchand
I've run into so many of these issues myself, trying to work through the
forest of LLVM. The abysmal state of the generated docs and those cryptic
segfaults are the bane of my existence.

So I started to slap together a little library that gives some order to the
madness. I call it petard [0], for reasons that should be obvious if you've
heard that word. At the moment I'm focused on JavaScript bindings, but the
idea is to build a sensible object-based API that's simple enough to export
pure C bindings to make FFI easy.

There's currently support for lots of the basics, but some significant issues
too (lack of features as well as poor C++ style). Stern memory-management
focused pull requests would be very welcome, and feature additions would be
helpful, too.

[0]: [https://github.com/couchand/petard](https://github.com/couchand/petard)

------
a-dub
This seems like a fun way to poke around with LLVM but I don't see how this is
any easier than writing down a grammar and feeding it into a parser
generator...

"Writing a parser" sounds really annoying and complicated, but in my
admittedly limited experience with ocamlyacc, it was easy peasy and dare I say
it, fun!

The hard part I guess was defining the language and maybe that's what he's
trying to avoid.

But even still, scripting around the AST building functions in LLVM and coding
directly to them seems way more annoying to me in the long run. (especially
the writing programs part)

But who knows, maybe I'm missing the point...

~~~
cturner
Here's the potential I see. Imagine if it was routine to write an application
stack entirely in a repl language. Not just the application code - everything.
You'd write your application code. Then other functions would dispatch the
tree of application code to the LLVM API where it would be compiled. Then
other functions could take the resulting artifacts and push them through to
deployment.

Doing this at the moment would involve writing fiddly wrappers around lots of
layers of technology - dynamically constructing makefiles, and shelling out to
call things.

------
barrkel
There's an enormous lump of complexity that this chap seems to be unaware of:
the type system. This is the semantic analysis bit that he seems to think is
done by LLVM, but it isn't.

It's like he mostly thinks of compilers as turning statements and expressions
into executable code, but IMO that's one of the easier bits. Defining how
different values interact, what operations are legal and aren't, and the
runtime support - this is a lot of work, and in particular, it's different for
every language, because it's what makes languages distinct.

~~~
amasad
I think she's a woman not a "chap". But anyways, you'd be relying on the IR
semantics and checking to do the legwork for you. LLVM IR _is_ a language
after all.

~~~
barrkel
It is typed at a very low level. It doesn't cover interesting data structures,
like hashes, vtables, closures, etc.

------
benjismith
This is the most kickass thing I've read all day.

------
prewett
Seems like what he really wants is an LLVM assembler.

~~~
lmm
Yeah. LLVM IR is the most suitable representation for doing this kind of
thing, but it's the one that's not compatible across LLVM versions. Maybe it
would be worth defining an alternate format for the bytecode that can be read
and written by external tools?

------
greydius
Excellent post highlighting the problem solving skills and tenacity of a
talented engineer. Of course, this is also the "duct tape and paper clips"
approach to writing software that I certainly would not advocate for
production systems.

~~~
driusan
I don't know where you got the impression that he was doing any of this with
the intention of putting it on production.

~~~
greydius
I didn't get that impression. I was trying to say that I admire the author
without also advocating putting a bunch of clever hacks into production.

------
sklogic
For everyone here who mentioned building wrappers around llvm-c or llvm c++
api: there is a nice tool for automating this job (and many other similar
things), it's called "Clang". Or, more specifically, cindex. It's easy to use
Python + cindex to parse LLVM headers and spit out nice wrapper library for
your language of choice.

And example of this approach:
[https://github.com/combinatorylogic/clike/blob/master/llvm-w...](https://github.com/combinatorylogic/clike/blob/master/llvm-
wrapper/rebuild.py)

------
kungfooman
I really like the idea, why not hack a scripting engine to add bindings to
itself, e.g. in Duktape JavaScript engine. Then a JavaScript function could
use the C-API of Duktape to generate AST's at runtime for itself. But I think
the VM would be the limit then... support for coroutines etc.

