
Show HN: QBE – a new compiler back end - mpu
http://c9x.me/compile/
======
munin
> Implementing SSA construction is hard. To save its users from having to
> implement it, LLVM provides stack slots. This means that one increment of a
> variable v will be composed of three LLVM instructions: one load, one add,
> and one store.

This is just wrong, though! LLVM provides stack slots for a good reason: stack
allocated variables can escape. You need a place in memory to stick them. If
mem2reg can prove that the stack allocated variable doesn't escape, it gets
promoted into an SSA value.

~~~
jklontz
Yes! And I would also add that:

> LLVM IL is more cluttered with type annotations and casts.

With LLVM moving to typeless pointers[1], this will be less true in the future
than it is now.

[1] [https://groups.google.com/forum/#!topic/llvm-
dev/vphBegBWyyE](https://groups.google.com/forum/#!topic/llvm-dev/vphBegBWyyE)

~~~
munin
I don't know if that would alleviate the authors concern, because (as I
understand that proposal) the type information is still present, it's just
bolted into the load/store/GEP instructions now.

Nothing is stopping you from writing LLVM that doesn't use types though!
That's entirely a front end decision. If you wanted to, in your front end, you
could never emit record or array types and do book keeping on cells in memory
entirely on your own using pointer arithmetic / pointertoint / inttopointer.
Doing so is entirely at the expense of the speed of the generated code though!

~~~
tinco
Hi Munin, I'm writing a C compiler on LLVM at the moment, and hitting problems
with types and pointers, the complexity of which made me think of just using
casts everywhere. What is it that makes it expend speed of the generated code?
Does it mean certain LLVM optimizer phases won't work anymore?

~~~
sklogic
Types in C should not be complex. If you're hitting some kind of complexity,
you're likely doing it the wrong way. C types are directly translated (one
way, of course) into LLVM types.

Casts are evil: they break aliasing analysis, they screw up address spaces,
they break more advanced forms of vectorisation (like polyhedral analysis),
etc.

------
gravypod
I've always found this sort of "I'm going to make my own" approche as being
the best for innovation. I really hope this goes through and continues
development.

~~~
mpu
Thank you for your words. It is often called NIH, but eh, I learned a lot! And
I think that I made some modest improvements over LLVM, you can check them out
in my comparison at
[http://c9x.me/compile/doc/llvm.html](http://c9x.me/compile/doc/llvm.html)

~~~
tinco
Looks good! I know your target is 70% of the performance, but is there any
fundamental reason to QBE that it couldn't be more? Suppose I ported my
compiler from LLVM to QBE (I think I could do so with not too much effort)
would at some point I be able to work on porting some of LLVMs optimizing
phases to QBE to get my performance up to par, or is there a design decision
you made that will get in the way of the last 30%?

~~~
mpu
It's only a goal I set to myself, if we can do better, heck let's do it!
Keeping the code short, on the other hand, is really something I care about.

~~~
tinco
Great, I have the same goal for my compiler. I'd love for the compiler to be
able to bootstrap its own backend, and as it only compiles C that would rule
out llvm.

I am not sure how far I am along in actually compiling C, if I'd had to guess
I'd say around 60%. Hopefully there's not too much crazy things on the
horizon.

I'm not a C programmer myself so I first implemented the switch statement in a
naieve way, and then I discovered the way they actually work and spent days
getting it right.

If I get to the point where I can compile trivial C programs like the
benchmarks game, I'll research a move to QBE :)

------
gnuvince
This is a wonderful idea! As a compiler person, I'm excited to see how well
this works and wish you all the success in the world.

> QBE aims to be a pure C embeddable backend that provides __70% of the
> performance of advanced compilers in 10% of the code. __

This philosophy is very similar to that of vis [1], a vim-like editor that
aims to have 80% of vim 's functionality in 1% of the code. I hope that more
such project see the light of the day; I'm very interested in simpler programs
and simpler ecosystems, even if that comes at the cost of features.

[1] [https://github.com/martanne/vis](https://github.com/martanne/vis)

~~~
sitkack
In the first version we learn and discover, edits and rewrites.

In the second version we build with symmetry and regularity.

We need both versions, and in the second we drop some corner cases to make
things crystalline. It is important to recognize the things we drop and why,
sometimes they are important, but the purity of the construction takes
precedent. And sometimes the corner cases are the thing, without them the
chandelier would cast no light.

Anyway, I don't know where I am going with this. Small, playful,
understandable systems are the most informative. We need playful, useful
models that can be used for experimentation in a momentum free way. And it is
important not to suffocate the future with exhaustive engineering. Recognize
the is.

~~~
marmaduke
This is a great point; implementations always compromise on speed, generality,
domain decomposition, flexibility, ease of use/debugging, compile time, etc...

It's always a compromise subject to design goals of the project or just
personal preferences. There are a lot of reads on the Internet about specific
compromises chosen for a technology but contrasting choices between projects
or versions of a project are difficult to find.

Also I think large systems are just as informative as smaller ones but require
more investment.

------
nickpsecurity
Interesting project mpu. I like the stuff here:

[http://c9x.me/compile/doc/llvm.html](http://c9x.me/compile/doc/llvm.html)

Especially the optimizations that get you the first 70%. I've been asking
optimization people as I see them to do a survey of various methods to find
the smallest collection that provide the hugest benefit. This will help people
writing new compilers and formal verification community get a head start.

Now, back to correctness. Your project aims to be small, simple, and rigorous.
That's perfect time to use methods for higher assurance software or design
yours to work with them easily. So, I suggest trying Design-by-Contract w/
interface checks, some static analysis tools, or even something like
Softbound+CETS that makes C safer automatically. I also usually recommend
writing compilers in something like Ocaml as it prevents lots of bugs and
easier to do.

Also, if you do Ocaml or safe imperative, you can always do two
implementations side-by-side in the safe language and in portable C. You run
tests, analysis, etc available for each one. Problems in one might be problems
in the other. Regardless, though, you get the benefits of the safer language
by default with a C implementation if you absolutely need it. Many of us have
used this with Sandia Labs even doing it for hardware with ML and hardware
language of a Java processor that came out _great_.

So, I challenge you to try to push the envelope by using some available
techniques and tools to boost the assurance of your code. Preferably getting
it off C, too, due to all the errors it (esp pointers) introduce into the
compilers.

~~~
andrewchambers
My qbe C frontend is actually written in myrddin which is a lot like
rust/ocaml.

To be honest, C is not well suited to places where there is adversarial input
such as servers, but when it comes to logical correctness of code I do not see
amazing benefits from languages like ocaml.

~~~
nickpsecurity
I'd usually ignore the language. Yet, given your interesting resume, there
could be some practical decisions and decent code in the compiler worthy of a
look in near future. ;) It indeed has some benefits from Ocaml and safer than
plain C. So, good work covering that detail.

I'll try to address this, though:

"but when it comes to logical correctness of code I do not see amazing
benefits from languages like ocaml."

First, why Ocaml is good for compilers. Most of this is still true and why
Rust team used it:

[http://flint.cs.yale.edu/cs421/case-for-
ml.html](http://flint.cs.yale.edu/cs421/case-for-ml.html)

What I can't overstate, already in that link, is how much easier it is to
verify the properties of ML languages. They were originally designed to write
a theorem prover IIRC. SML had a formal semantics for easier modeling. It
allowed for functional programming style that avoids state management issues
while (esp Ocaml) still lets you get hands dirty where needed. Modifications
for tracking information flow, dependent types, concurrency... all sorts of
things were pretty straight-forward. Had a certifying compiler early. Relevant
to logical correctness, it's easier to map functional specifications to a ML
than a mess like C. This was proven when people doused Leroy et al with praise
over first, verified compiler for C because it really was that hard. Likewise
seL4 matching functional specs to C kernel code took man-years of effort.

So, languages like C are _really_ hard to get logically correct. Languages
like SML/Ocaml are pretty easy to get it done right if you understand the
problem. Rarely the language causing your issues. A hybrid like a modified C
or Modula that has ML-like advantages without C's disadvantages brings you
closer to that ideal while preserving performance & control.

~~~
andrewchambers
just to be clear, I'm not the author of qbe, but I have followed it since it
started. The author uses the handle mpu.

~~~
nickpsecurity
I was briefly confused but gathered that. The only other confusing thing was
that mpu works with major players in high-assurance per a comment but didn't
respond to only comment (mine) about applying assurance tech. Wasn't bothered
but didn't expect it either. Unusual.

~~~
mpu
I actually decided to take more time to answer your comment more throughly
than others. Also, I TA'd twice the class from where you linked the article
below, so I know about it :).

~~~
nickpsecurity
Great! One of my main reasons posting here is getting next generation of high
assurance developers info they need plus learning from them in what's not my
specialty (esp formal verification). Just hate missed opportunities given I
rarely run into people that even know what the phrase means or why it matters.
;)

------
riscy
At times I feel that LLVM is quite a monster, and I really like your goals of
keeping things small & simple. Kudos!

Can you expand on your point about "LLVM does NOT provide full C compatibility
for you"? I have specifically been hacking on calling conventions/ABI stuff
recently in LLVM and this "well known" problem is news to me.

~~~
andrewchambers
In llvm you cannot just pass or return structs, every frontend needs to
explicitly handle the details of when and how to registerize structs to handle
the system V abi for example.

That code is not so trivial to do yourself actually.

~~~
riscy
LLVM supports passing/returning structs (I'm using 3.8.1):
[https://ghostbin.com/paste/ozsh3](https://ghostbin.com/paste/ozsh3)

Furthermore, the output does match the System V ABI:
[https://ghostbin.com/paste/4a5ms](https://ghostbin.com/paste/4a5ms)

Notice how the struct is placed on the stack and the pointer to it is placed
in %rdi for call/return. If you reduce the number of i32's in the struct to 3,
the struct's fields are passed via registers since the type fewer than four
eightbytes.

My older clang does indeed produce wonky LLVM IR code that seems to try and do
this classification in the frontend, so ABI compatibility may have been a
problem in the past, but I'm not convinced it's a problem in current versions
of LLVM.

~~~
mpu
Hi, thanks for the information, but LLVM still does _not_ provide ABI
compatibility. If you reduce the struct to 3 i32, it is passed in edi, esi,
and edx on my machine. However according to the ABI it should be packed in rdi
and rsi.

Checkout the QBE transcription and what it will compile to
[http://c9x.me/paste/mGOO](http://c9x.me/paste/mGOO) (there is a bit of
register shuffling because hinting in regalloc is not very mature yet, but
note that SSA form for the input is not required!).

~~~
riscy
Ahh, good catch! Luckily, I don't use SysV struct passing. :)

------
npx
How does this compare to libfirm[1]?

1\. [http://pp.ipd.kit.edu/firm/](http://pp.ipd.kit.edu/firm/)

~~~
mpu
It's much much smaller (I think libfirm is over 100kloc, QBE is about 6k).

But the major difference is the IL: I use a human-readable and easily-
printable text IL. This means that you don't need a graph-viewing tool to read
the IL (it's just text) and that you can modify the IL between two passes
super easily. This simple IL is a blessing when debugging a compiler.

I think QBE also has better support for the x64 ABI.

Finally, it is much less advanced (less optimizations, less tested) than
libfirm and supports only x64 as a target.

This is a sketchy comparison.

~~~
swah
Its probably also easy for you to output a symbol-file to be used by
IDE/editors, right?

------
mioelnir
This confuses me. The title says it's an alternative, but the text says it
does not intend to solve all problems of industry-grade languages.

Which to me reads as Qbe being entirely differently scoped, and therefor not
an alternative.

~~~
mpu
It's an alternative if you fit in the use case. I did not try to clone LLVM.

~~~
dang
We've taken 'LLVM' out of the title above, since experience has shown that
discussions about titles tend to be off-topic and/or shallow. We also added
'Show HN' since this is your own work. Good luck!

~~~
mpu
Cool, thank you guys!

------
nickysielicki
Here's an excerpt from a 2013 OpenBSD mailing list post about the state of
compilers, whose points still are still valid in 2016, and I think it has some
relevance to this submission.

    
    
        Assuming the upstream developers fail to deliver, it's up to us to fix
        or workaround compiler problems as we encounter them; sometimes it's as
        easy as finding out which patch has been commited upstream, but not
        backported to the version we use; and sometimes it's a genuine issue
        which may or may not have been reported in the latest compiler version,
        and we are on our own. When this happens, we can only rely upon our
        developer skills and intimacy with the compiler.
    
        A few of our developers have, over the years, become unafraid of gcc,
        and able to investigate issues, backport fixes, and fix or work around
        bugs: I'll only mention niklas@, espie@, etoh@ and otto@, and hope the
        few others will forgive me for not listing their names. This has not
        been an easy road, to say the least. Now, another few of our developers
        are working on building a similar knowledge of llvm. I wish them a lot
        of luck, and I will try to join them in the near future.
    
        In the meantime I am not sure they feel confident enough to support
        switching the most popular OpenBSD platforms from gcc to llvm.
    
        In a few months or years from now, things will be different...
    
        ...but there is something I wish would happen first.
    
        An LTS release of an open source compiler.
        Because all compilers nowadays are full of subtle bugs, but so many of
        them than you can't avoid them as soon as you compile any nontrivial
        piece of code, and because we can't afford to going back to assembly, we
        need a compiler we can trust.
    
        GCC, as well as LLVM, have Fortune 500 companies backing them, paying
        smart developers to work fulltime on these projects.
    
        Yet none of them dares to provide a long time support version. Bugs in
        version N are fixed in version N+1, but new bugs are introduced. And
        noone cares about trying to settle things down and produce a compiler
        one can trust (because version N+1 runs 3.14% faster in the loonystones
        benchmark which doesn't match any real life use case). Who cares?
        Tomorrow's compiler will generate code which will complete an infinite
        loop in less than 5 seconds; stay tuned for more accomplishments!
    
        The free software world needs an LTS compiler. The last de-facto LTS
        compiler we have had was gcc 2.7.2.1, and it is too old to compile
        modern C and C++ code.
    
        Should a free software LTS compiler appear (be it a gcc fork, or an llvm
        fork, or something else), then OpenBSD would consider using it, very
        seriously. And we probably wouldn't be the only free software project
        doing so.
    

\-- [http://marc.info/?l=openbsd-
misc&m=137530560232232](http://marc.info/?l=openbsd-misc&m=137530560232232)

( Previous discussion of this post on HN:
[https://news.ycombinator.com/item?id=9322259](https://news.ycombinator.com/item?id=9322259)
)

I like that this project is in-line with these same goals.

------
danbruc
QBE [1] is also a query language, Query by Example, developed at about the
same time as SQL in the 1970s.

[1]
[https://en.wikipedia.org/wiki/Query_by_Example](https://en.wikipedia.org/wiki/Query_by_Example)

------
Ace17
Can QBE do coroutines (e.g 'yield')? Or maybe it's a front-end issue only?

~~~
ante_annum
coroutines and yielding are a pretty high-level concept that I wouldn't expect
to see in a backend, but there's a really great explanation of how to map them
to low-level concepts here: [http://llvm.lyngvig.org/Articles/Mapping-High-
Level-Construc...](http://llvm.lyngvig.org/Articles/Mapping-High-Level-
Constructs-to-LLVM-IR#38)

------
azakai
Any benchmarks of compile times vs other compilers?

~~~
andrewchambers
I have done some using my C frontend, qbe is at least 5 times faster than both
gcc and clang for -O2 -S. It is extremely fast from my experience.

------
hobo_mark
> Very good compatibility with C code

How much harder would it be to call C++?

~~~
azakai
It might be interesting to use the clang frontend and modify it to emit QBE
instead of LLVM. I believe Visual Studio and ICC do that.

~~~
andrewchambers
One thought i had would be make a qbe backend for llvm, just as a temporary
solution until there are faster smaller frontends that can replace them.

------
marmaduke
If thé IL is text, and your goal is short understandable code, why not use a
language like Python?

(Compiling it with Cython would allow you to provide a C api and library with
only lib python as a dep)

~~~
andrewchambers
that would be painfully slow, as it stands now, qbe is actually quite a lot
faster than gcc and clang.

~~~
marmaduke
So the 70% performance was about compiler performance, not compiled code
performance?

~~~
andrewchambers
no, the compiled code performance is slower than gcc, but the compiler itself
executes far faster.

------
Drup
> Its small size serves both its aspirations of correctness ...

No, a compiler (hell, a software) written in C is not correct, period. If you
want a C compiler that has slight chances to be correct, you use compcert.

The small size is an awesome property for _education_ (both for the writer and
the reader). For this purpose, it's absolutely great, so kudos for that. :)

~~~
mpu
At least, I can try!

And also, we are seeing more and more certified C programs: see the DeepSpec
NSF expedition grant, the Verified Software Toolchain, and the CertiKOS
project for examples. I work with these guys.

~~~
nickpsecurity
You are soooo lucky. DeepSpec has a near dream team of people working on this
issue. Appel and Chlipala alone could probably knock out most of the problem
given enough time. Add the others and great stuff on publication list is
entirely unsurprising. Except in its cleverness. :) Glad you brought it up as
I haven't read the info flow for c & asm paper yet.

Btw, hows progress coming on those projects? Specifically, are any of the
tools (a) useful for non-experts with a little bit of training by tutorials,
etc and (b) available for download in open-source or binary form yet? Thanks
ahead of time.

