Hacker News new | past | comments | ask | show | jobs | submit login
Implementing a LLVM Micro C compiler in Haskell (josephmorag.com)
155 points by pcr910303 3 months ago | hide | past | favorite | 15 comments

This is awesome. This essentially will serve as a more up-to-date tutorial on the LLVM bindings for Haskell. So programming language designers could use this tutorial to build a new language using Haskell. Currently, the state of LLVM documentations, and its tutorial leaves much wanting. This work remedies things on the Haskell side of things by quite a bit.

I especially love the fact that instead of implementing a novel language like Kaleidoscope (which the official LLVM does), it implements a subset of C (a language that's low-level enough and widely understood). C has all of the basic constructs, which if you understand how to write a compiler (to LLVM) for, would allow you to implement far more advanced languages.

Thank you Joseph Morag for this wok, and Théophile Choutri, Moritz Kiefer, et al for making it possible.

Kaleidoscope seems to only focus on C++ nowadays, the OCaml version is no longer there.


This reminds me of a project that I did in my university. It was a compiler of my programming language similar to C straight to x86 assembly. It had typechecking, basic primitive types, local and global variables, arrays and function calls. My professor recommended Haskell saying that it'll make the job easier for me, and he was right. The thing basically worked as soon as I got it to compile. I remember that another student used Java for the project, and he ended up with a 10x larger codebase, riddled with bugs.

The code is still on my github. I recently took a look at it and I was surprised how readable it was. It's a shame that in my professional career I didn't get the chance to use neither Haskell nor any of the PL skills that I picked up in uni, because I really had fun with that project. Though maybe that's for the best. If you have to do it for work, you sometimes end up hating it.

This is amazing. I tried following Stephen Diehl's JIT compiler in LLVM tutorial[0] a few years ago but it was already outdated (the llvm-hs library changed quite a bit), and subsequent web searches didn't turn up much.

For those interested in tutorials like this, I'd also recommend a very literate Haskell compiler for the PCF language to C[1], which is essentially lambda calculus with some primitives and pattern matching. It details a number of transformations such as closure conversion and lambda lifting.

[0] https://www.stephendiehl.com/llvm/

[1] https://github.com/jozefg/pcf/

Are there any similar tutorials written for the LLDB bindings? Since the only LLDB documentation I could find was auto-generated doxygen with no usage code.

This looks great, are there any similar tutorials written for languages other than Haskell?

Why not go for the full C language. C was made to make compilers easy to build. Also it might expose/help to see difficulties in the chosen approach.

They went with a subset of C, its enough to write a working executable. Generally for a introduction tutorial to something as complex as writing a compiler a subset like what they presented in the article is enough to show how all the pieces work together to produce a working compiler.

Yup, for the purposes of a demo like this, a simplified top-to-bottom "vertical slice" is of far more value than total feature coverage.

Unless you are talking about something like "A Retargetable C Compiler: Design and Implementation", it is definitly not easy.


During the early 80's, the best home computers could get was Small-C.


Small-C was the best one with open source, but there were many others. This one was my favorite: https://www.bdsoft.com/resources/bdsc.html

What I got back in the day came in book form, it was also another flavour.

"A book on C"


It uses a K&R C subset with bytecodes and a bytecode-> machine language translation.

I just never bothered to type it in though.

A conforming ISO C compiler seems extremely difficulty to build and I don’t think ease of writing a compiler is a part of the design these days.

They could stick to ANSI C.

ANSI C and ISO C are the same thing, at least for the 1989/90 revision. It’s still not a particularly easy language to implement, at any stage.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact