Hacker News new | past | comments | ask | show | jobs | submit login
Opinion piece on GHC back ends (andreaspk.github.io)
81 points by allenleein 43 days ago | hide | past | web | favorite | 17 comments



When I looked into LLVM for my own little language, my general impression was that LLVM was great if you are writing a C-like language or something close, since you can use the whole compilation pipeline (even clang for the FFI). If you have interesting high-level features like threads, exceptions, etc. with semantics different from C/C++, you run into translation issues and have to re-implement some LLVM intrinsics, which is a fair amount of work but doable. But with lazy semantics like GHC, LLVM's whole idea of "calling" is barely usable and you can't use most of the inlining or cross-method optimizations. So LLVM ends up being an assembler producing machine code.

As an assembler, LLVM is not particularly fast or convenient. It has a huge API, and lots of optimization passes that have to be tweaked to get reasonable output. The posts I read that got that far generally recommended ditching LLVM and forking a JIT engine like LuaJIT, libjit, etc.

tl;dr I agree with the author that maintaining NCG is the way to go. The difference is, I expect LLVM to implode long-term rather than expand.


There are several projects that have tried to use LLVM as a direct backend for higher-level languages, and not many have gone well.

- Rubinius - Ruby JIT on LLVM, often slower than the standard interpreter (!!)

- Pyston - Python JIT by Dropbox, abandoned because it wasn't going anywhere

- FTL - JavaScript backend by Mozilla, they're now doing their own thing with BBB and abandoned LLVM because it wasn't working out

The only ones that have worked well, like Falcon (Java backend by Azul), have major IR and passes in front of them and use LLVM like a portable assembler, but as you say that's not ideal anyway.

You can't just dump your language into LLVM and expect it to work miracles, but people keep trying again and again and again.


* Crystal

* R

* Julia

* MATLAB

* Mathematica

* Swift

Use LLVM as their backend.


These are just more examples of exactly what I mean!

- Crystal is Ruby simplified in order to make it amenable to compilation by LLVM. They couldn't move LLVM to the language, so they had to move the language to LLVM.

- R's performance is very limited.

- Julia was specifically designed to be easy to compile.

- Swift has SIL - they don't just output LLVM. See what they say about LLVM! https://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighL... 'Wide abstraction gap between source and LLVM IR', 'IR isn't suitable for source-level analysis', etc. And they have these problems even with the best LLVM experts on their team!

I don't know much about MATLAB or Mathematica, sorry.


Even Clang would greatly benefit from an intermediate IR between the C/C++ AST and LLVM IR; the presentation you linked starts off by going over that (and I can confirm that it’s still a problem). That doesn’t mean that LLVM or its IR is bad, or that C or C++ are not suited to being compiled to it. It just means that some analyses and optimizations are best done at a higher level.


> It just means that some analyses and optimizations are best done at a higher level.

Yes that’s my point - you can’t just emit LLVM and cross your fingers. But people keep on trying.


Rust as well.

Notably not on this list and known for fast compile times:

* Nim

* Go


No, Rust has MIR, and Go is a relatively simple to compile language that maps to LLVM pretty directly. I don't know about Nim, sorry.


Ah, I missed "directly" in you original comment. Yeah, rust has a couple of layers.


LLVM's MLIR does apparently fix some of those problems, e.g. being a generic medium level IR rather than just an assembler


MLIR is a Google/Tensorflow project, not LLVM, and AFAICT is only used by TensorFlow. This might be because it was only released a few months ago, but also from skimming through their "write a language" tutorial it seems it provides barely anything useful, just JSON-like read/write of the IR, an LLVM output pass which is marginally simpler than writing it yourself, and abstraction machinery to do generic SSA operations and share "dialects" with other DSLs.

I wouldn't say it's worthless, since Tensorflow is using it and MLIR could grow into the middle-end for some new HPC / data science language, but personally I am somewhat leery. GHC already has Core, STG, and C-- so it doesn't particularly need a new intermediate language, and for a new language I would prefer to investigate something besides SSA like Thorin (https://anydsl.github.io/Thorin.html).


Also is started to be used by Julia.


Today if you need an assembler or JIT, Web Assembly should be your default option.


WebAssembly does sadly not support goto, which means that in some cases you have to simulate it using a big loop with a series of if statements.


> As an assembler, LLVM is not particularly fast or convenient. It has a huge API, and lots of optimization passes that have to be tweaked to get reasonable output. The posts I read that got that far generally recommended ditching LLVM and forking a JIT engine like LuaJIT, libjit, etc.

But it can't be so bad that it offsets the gain of the many supported platforms, can it?


Realistically you need to do a lot more work to support many platforms unless your language is c-like in the first place (meaning that it does need only a very limited runtime)


I’ve had the pleasure of interacting with and sometimes collaborating with Andreas over the past year or two. He’s one of my favorite newer ghc contributors. And he’s done some fantastic work improving the code gen for ghc.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: