Hacker News new | past | comments | ask | show | jobs | submit login
Write You a Haskell: Building a modern functional compiler from first principles (stephendiehl.com)
472 points by tenslisi on Jan 5, 2015 | hide | past | favorite | 47 comments

Now this is very interesting!

There are plenty of compiler writing tutorials for conservative, imperative programming languages with a straightforward static type system (like C).

I did study a little bit of compilers for functional languages from Simon Peyton-Jones' old book "The Implementation of Functional Programming Languages" [0]. It predates the Haskell programming language and uses a contemporary research language called Miranda as the target as well as the "host" language.

... which brings me to another topic: Monads. The SPJ book has a few chapters on implementing a Hindley-Milner -style type inference algorithm. It's written in Miranda without any Monads and uses a clever trick for coming up with unique temporary type names. There's an infinite list of integers [0..], which is "split" when traversing the syntax tree so that the left branch of the recursion tree gets the even numbers and the right branch gets the odd numbers.

The method works fine and is indeed very clever but try comparing that code to the same algorithm re-implemented (by me) with Monads (State and Error monads + transformer) [1]. The original algorithm is rather long and hard to follow (because lists of unique integers are passed to all functions).

I had a blast writing this algorithm, and I wanted to share it. It's been a few years since I last worked with it. It was my first non-trivial use of monads and I really like how it turned out in the end.

[0] http://research.microsoft.com/en-us/um/people/simonpj/papers... [1] https://github.com/rikusalminen/funfun/blob/master/FunFun/Ty...

Note that SPJ's book (1987) pre-dates Wadler's popularization of monads (1992) by a good 5 years, and Moggi's idea to use monads to describe different notions of computation by 4 years^1.

So they're missing for a reason.

1. http://www.cs.cmu.edu/afs/cs/user/crary/www/819-f09/Moggi91....

Yes, I know that. There's a long and interesting history on the "round hole, square peg" problem of I/O, side effects and functional programming.

My sole intent was to emphasize, using a practical example, how neat and elegant monads can be at best.

Don't forget that Simon Peyton Jones 2nd book on compiling functional languages written with PJ Lester is also out of print but available from him directly on the web and includes source code from the book. Way fun, if slightly dated stuff.


Thanks for sharing, and for anyone else reading you might also like the following book--`Modern Compiler Implementation in ML' by Appel. (http://www.cs.princeton.edu/~appel/modern/ml/)

I think the online version of that book actually uses Haskell (but the text still refers to Miranda).

I don't think so, but Miranda and Haskell are very alike so it's easy to be mistaken.

That seems like an easy mistake to make from what I remember in reading the origin of Haskell. They had issue with the licensing and cost of Miranda and struck out to build upon it. Here's the original article, "A History of Haskell: Being Lazy With Class"[0]

[0]: http://research.microsoft.com/en-us/um/people/simonpj/Papers...

It looks like the author of this (Stephen Diehl) is looking for a job in Boston. Maybe HN can help him out?


Huh, I just realized I inherited a JS codebase from him in a previous job. The code was ingenious, perhaps too ingenious to maintain.

He surely must be getting headhunted a lot.

If this guy isn't getting headhunted, I have no idea what headhunting is. It's pretty rare for people to be able to write correct and easy-to-read guides to such low level concepts.

Finding a job is easy, finding the right job is hard for people like this.

Part and parcel of "There are not enough programmers [willing to work long hours for low pay]!"

For that talent, it might even be that the work is too boring regardless of the hours or pay.

Very true. I'm running into that right now: humdrum legacy CRUD apps, where the only challenge lies in being allowed* to refactor the code into decent shape.

*I wish I could put that in sarcastic quotes.

I recommend Lisp In Small Pieces (LISP) by Christian Queinnec and Compiling with Continuations by Andrew Appel if you're interested in writing a functional compiler. Also:






In addition to Compiling with Continuations, I found Compiling with Continuations, Continued by Andrew Kennedy very useful, as well.


For the interested, I have implemented Cardelli's paper here: https://github.com/noteed/mojito/blob/master/Language/Mojito...

As a sidenote, Stephen is also the author/maintainer of "What I Wish I Knew When I Was Learning Haskell"[0] which is a great resource both for those who are learning and for slightly advanced programmers.

[0] http://dev.stephendiehl.com/hask/

Your comment implies that programmers stop learning once they are slightly advanced.

No, it doesn't imply that, at all. My least favorite thing on the Internet nowadays are shallow fault-finding comments like yours from people who just want to interject themselves into discussions where they have nothing substantive to say. I used to be that guy and still have to actively resist the temptation. Don't be that guy.

Oh, man — not being that guy, at least in the fault-finding sense; just apparently failing to make an obvious joke. Very strictly, grammatically speaking, it does say that, and it was a humorous implication to me, since I have known so many programmers who apparently did stop learning once they were very slightly advanced. But I'm completely aware that the implication was accidental.

No, it actually doesn't even say that; in no way does it preclude the existence of a third group. Not only are you unfunny, you fail logic.

To pull this forward, the author is also the author of "Implementing a JIT Compiled Language in Haskell and LLVM" [0].

[0] http://www.stephendiehl.com/llvm/

If you haven't spent time with this, you need to stop everything and look at it. It's wonderful.

Wonderful regarding learning using LLVM? I mean, isn't the heavy lifting (most interesting part?) done by LLVM, so you're just using it?

I haven't read it, that's why I'm asking this.

So there is a basic introduction to generating LLVM in Haskell there, but what's so great about it is that generating a lexer/parser/(interpreter|compiler) is almost magic in Haskell. The total code of that project is remarkably small. For that reason, it's a pretty standard Haskell workflow to do everything in Haskell and then have it emit LLVM/Cuda/whatever to do the heavy-lifting, and then you get to reason in Haskell and your back-end takes care of the rest. As much as that sounds complicated, it's really, really not.

Think of it as a template for this new essay. Of all of the hard optimization work occurring, most of it is embedded in LLVM. But the process of going from stringly syntax to intermediate language to back-end compilation is an important journey and doing it as Steven does using Haskell is an elegant way to scope out those steps.

I'd like to recommend The Programming Language Zoo by Andrej Bauer. It is a collection of programming language implementations, small and clear, written in OCaml.


Oh, wow. I've been learning OCaml lately, specifically to dive into Hack, Facebook's new statically-typed PHP derivative. This will really help with that, thanks so much for the link!

This looks really interesting! But I can't see any way to subscribe to book updates. A mailing list (my preferred means of subscribing to this kind of stuff these days) or an rss feed to get notified when new chapters are ready would be really handy.

You could `watch` the repository on Github to get notified when there is update

On the subject of compiler tutorials, there's a really sweet tutorial I found a while back by Abdulaziz Ghoulom (http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf)

It simplifies some things (for example tiling while producing the final asm).

Wow, I've been looking for something just like this. Absolutely fantastic!

Very interested in seeing how he'll deal with lexing Haskell. It's a major pain, tools like Parsec are very good out of the box with whitespace insensitivity, but not whitespace sensitive stuff.

Its not that hard to implement some form of whitespace sensitive parsing, you need to maintain explicit state, essentially a stack and a number of combinators. See for example


The state is defined in https://github.com/purescript/purescript/blob/master/src/Lan...

(its just the column number). I believe the whitespace rules for Haskell are somewhat subtle, but shouldn't be much harder.

I tried to change the whitespace rules in the PureScript parser to match Haskell recently (to support explicit braces), and it proved to be very difficult. The "L" function which is defined in the Haskell 98 Report depends heavily on an interaction between the lexer and parser, in which a failed parse causes the indenter to behave differently in the lexer. For this reason, I stayed with the simpler rules that we have now, and I plan to support explicit braces in the parser instead.

IIRC, Haskell has a well-defined mapping of whitespace to explicit braces and semicolons -- I suppose one could perform that transform first to desugar, then parse the resulting token stream normally...

If you think of well-defined as combining the most subtle aspects of Javascript semicolons and Python braces, I agree. As hobby projects, I have coded a significant fraction of a Haskell parser and a majority of a Python parser. It was good I did the Python one first. The Haskell rules include a clause which says if you don't yet have a syntax error from your tokens, and you have a syntax error by adding the next token, and you would not have a syntax error if you added first a closing brace and then the next token, then insert the corrective closing brace. So you can't just tokenize, infer what's missing, then parse the tokens.

I suppose using haskell-src-exts is cheating...

"I will build haskell by taking the source code and running it"

I jest, but this is an issue I've thought about idly for a while. Maybe it's possible to write a two-step parser: one that turns chars into tokens (and cheat a little bit by stripping spaces in some contexts to make it context free), and then you have a nice CFG that you can do easily.

I once took a compiler design course where the instructor said we could use whatever language we wanted (it was a tiny class - I think there were 8 of us?).

I asked whether I could use bash, and consider gcc a feature of the language.

I was informed that would be cheating.

This is awesome. I look forward to following the progression.

I was just looking at the fay compiler because i wanted to learn more how haskell is compiled. This looks awesome.

This is really cool! I'm excited to read the section on row polymorphism.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact