Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Crafting Interpreters – A handbook for making programming languages (craftinginterpreters.com)
423 points by munificent on Jan 15, 2017 | hide | past | favorite | 75 comments

A fantastic book covering similar things that you can read right now is https://pragprog.com/book/tpdsl/language-implementation-patt.... It's the book that got me into interpreters and compilers, I highly recommend it.

It's slightly less in depth than the roadmap of "Crafting Interpreters", but again, it exists right now.

Ha yeah, Terence parr is a référence.

Seems to be a hot topic, a friend and colleague published this in December 2016 and I can thoroughly recommend it (Uses Go as an implementation language):

- https://interpreterbook.com

I can recommend this book as well! It is easy to read and understand and the choice of Go as the host language is also great.

I am in the process writing a semi-natural query language (boolean logic) targeted at domain experts (in this case biomedical researchers) that compiles to either database queries (e.g. SQL) or a function call for arbitrary logic. The goal of the compiler bit is to abstract away where the data ultimately lives (similar to GraphQL) or how it is structured.

I bought this when it appeared on HN around xmas, can recommend, its a pretty great introduction to the subject

Yeah, I've talked to Thorsten over a email a little. I'm excited about his book too, and I hope it does really well. Go is a cool choice for a language book. It's low-level enough to let you show some of the fundamental algorithms, but not as tedious to work with as C and its manual memory management.

Looking forward to the chapters about the C interpreter. Meanwhile, it looks like both interpreters are available to study now in the Git repo. [1]

[1]: https://github.com/munificent/craftinginterpreters

Yes! The code for the whole book is done already and sliced into chapters. I had to do that first before I started putting things online. I didn't want to paint myself into a corner by having something in an earlier chapter that ended up not making sense later.

So the code is all done, and now "all" that's left to do is write and edit all of the prose. Which, of course, is really the hardest part by far.

Me too; I'm looking for a way to learn more about building data-structures in C.

Hanson's C Interfaces and Implementations [0] is one of the best books on the subject.

Strongly complemented by Sedgewick's Algorithms in C.

[0] https://www.amazon.com/Interfaces-Implementations-Techniques...

I added both to my wish-list; Thanks!

Nice project. What do all the

//> SomeName not-yet

in the code mean? Are they meant for some kind of automatic processing? Its a bit hard to read with all of them there...

Sorry, yeah. Those are the markers that the build script uses to insert the code snippets in the right place in each chapter.

I know it makes the code hard to read. I should probably slap together a little script to strip them out.

I really enjoyed http://gameprogrammingpatterns.com/ and I've been looking forward for this book since I saw the link on munificent's homepage

Thank you so much for the book, please, continue!

I am currently working on Scheme R5RS implementation in Java. It is a hobby project, but I'm really enjoying it. It is somewhat enlightening.

Look forward to reading the Optimization chapter: I couldn't find much useful and practical info on that. I think it is quite easy to implement your interpreter, but it is really difficult (at least for me) to make it fast.

Can anyone recommend any articles or books with common optimization tricks and techniques?

> Thank you so much for the book, please, continue!

I will!

> Look forward to reading the Optimization chapter: I couldn't find much useful and practical info on that.

There's not too much depth in that particular chapter. The whole second part is really about implementing an interpreter (fairly) efficiently in C. As you probably know, you have to think about performance at every level—your object model, bytecode format, hash table implementation, etc.

If you'd like a sneak peek, the code for the entire book is actually done already (!). You can see which snippets go into each chapter cloning the repo[1] then running:

    make chapters
    make c_chapters
    make diffs
Then take a look in build/diffs.

There are a ton of optimization tricks that are specific to Scheme and Lisps and a lot of literature behind them. Macros and eval make it a lot harder. It's pretty dense, but I liked "Lisp in Small Pieces" a lot.

[1]: https://github.com/munificent/craftinginterpreters

For making an efficient Scheme implementation: most everything written by R. Kent Dybvig or his students. I'd start with his PHd thesis (Three Implementation Models for Scheme, specifically the stack-based part).


(See also Lua's upvalues for a way to optimize closure variable access that's a bit simpler than his display closures).

> See also Lua's upvalues for a way to optimize closure variable access

Yes! In fact, that's how the C version of the interpreter in my book implements closures. It's pretty brilliant. <3 those Lua folks.

Will you cover anything about LuaJIT's "NaN-coding"?

Or perhaps do you have a link about how it works?

Yes, we'll do NaN-tagged values in the last chapter on optimization.

Lisp In Small Pieces by Christian Queinnec

I'm trying to bootstrap a forth and a scheme but i'm still at the thinking phase right now.

I've been reading http://beautifulracket.com/ and its a really good introduction on making interprets.

Interesting! I like racket...just wish it was faster. Anybody know if there will be a successor to "Realm of Racket"?

Edit: Now I remember it uses LGPL as the license. I understand the reasoning, but I've always viewed the freedoms as giving the user freedom over the implementer...kind of just trading power.

I agree.

Scheme/lisp based languages and Haskell are a great choice for this kind of project.

I clicked the link without looking at the poster, and by the end of the "What’s the Catch?" paragraph I was thinking to myself that this really seems like something written by The Munificent Bob. Then I read the first sentence of the next paragraph :)

I really like your blog, I have read all of it, some posts even 2-3 times, and not only because of the content, but your writing style. I will probably read this book too ;)

Also, I think Magpie is awesome.


One think I find lacking in books about building languages is a focus on tool. I am working on one (available for 0$ on leanpub: "How to create pragmatic, lightweight languages" https://leanpub.com/create_languages ) with the intention to focus more on building editors and simulator.

If you are interested in the subject you could take a look to my blog (https://tomassetti.me) I write 1-2 posts per week on building languages. Most of the time I talk about DSL.

This is awesome, since I'm currently working on my own (toy) language! Thanks for making this public!

On a side note, to anyone interested in an overview of high-level language features and what their implementation requires, I can wholeheartedly recommend Michael Scott's Programming Language Pragmatics. It doesn't go into implementation details, but tells you a lot about the historical choices and implementation possibilities of (mostly) mainstream languages.

Yes! PL Pragmatics was the first book on languages I read and I really enjoyed it.

That's so cool! Learning to make a programming language is in my life's todo list, just as doing my own (super simple) operating system. Thank's for contributing with a free book, I very much appreciate it. Does anybody know of something similar but for operating systems?

I found https://www.cs.bham.ac.uk/~exr/lectures/opsys/10_11/lectures... from a UK university. I like the style of it and the emphasis on getting started coding straight away.

Not quite but this is a good read. https://littleosbook.github.io/

This is good! It will definitely do.

Thank you very much!

This looks great, for some reason the idea of creating a language from scratch and then using it in some professional capacity always seemed really cool to me.

Really looking forward to this!

A much less involved language-building tutorial that's available now is Beautiful Racket.[0] Racket is a Scheme, but it's also specially outfitted to be a language workbench.

And 16 hours later: [0] http://beautifulracket.com

A little late to the party here, but I'll throw another tutorial that I absolutely loved, buildyourownlisp.com which guides you to building 'lispy' in C.

Racket is also a great environment for making languages. See, e.g.:


One question here for the more advanced compiler writers.. How do you go about writing a "pre-lexer" for LISP like language where you want to expand sexpr? I.e. Does the lexer even see `(...), or does it get pre-extracted to (quote ..)? And if so, how does the pre-extraction work exactly? Is it built using regex..? That can be tricky.

Lisp dialects usually have an input mechanism called a "reader". It scans the character syntax and produces the equivalent object, combining lexical analysis and parsing. Special notations like 'x denoting the same thing (quote x) are easily handled here. When the reader sees a quote, it consumes it, calls itself recursively to scan the expression which follows to get an object O and then returns the object (quote O).

Those are handled by the reader. If you Google "Lisp reader" and "Lisp reader macro", you should find a bunch of info.

Just from reading the first few pages of the book, I'm already a fan.

However my one tiny complaint is that the sign-up box is on every page, and even after I've signed up, it still has to be closed every page I go to.

Yeah, that is annoying. I think for my previous book, I used a little cookie support in JS to hide it persistently after the first time you closed it.

I'll try to resurrect that for this one.


This looks great. Thanks for sharing. Looking forward to updates.

Any pointers to intermediate representations for compilers?

Another great-looking book on making dynamic languages. There's been some good ones recently, and this one looks like another good one covering how to make a basic, interpreted language.

Some of us have written 1 or 2 dynamic, interpreted languages, feel pretty comfortable with those concepts, and are interested in transiting to more advanced, static type systems and writing a good type checker.

Anyone up for writing such a high-quality book as this one on statically-typed languages, ideally functional languages? The only decent books I've found on the subject are 20-30 year-old textbooks. I'm watching "Write You a Haskell" with interest (and have partially read it), even though I'm more into ML and Scala than Haskell, but it's still pretty unfinished.

There's also the plzoo, which is a great resource but not a book.

Oh, and I definitely need to read Pierce's Types and Programming Languages textbook, I've ordered it and need to sit down with it. I suspect it will be very enlightening.

But a more approachable book on statically-typed languages in the same genre as this one is (popular, less academic) would also be welcome.

I did think about trying to sneak a type checker in here too, but it's just too much for a first book on languages and interpreters.

If you're looking for a book on statically typed functional languages, I think Appel's book[1] is canon. From what I've heard, the C and Java ones aren't very great. It's worth learning enough ML to get through that version because ML is clearly the language Appel thinks in.

[1]: http://www.cs.princeton.edu/~appel/modern/

> 20-30 year-old textbooks

I assume you're talking about something like "Modern Compiler Implementation in ML"? That book is still a good starting point, and once you get a basic implementation going you'll be in a good position to start learning about and implementing more advanced features. You'll end up teaching yourself as you figure out how to add features to your own codebase.

edit: fix title

There's nothing more "advanced" about statically typed languages. Static typing and type inference are straightforward and can make implementation details simpler than dynamic languages (because you're greatly reducing or removing the unknowns you have to deal with at runtime). This is assuming you're making a batch-style compiler, as most implementors do, and not an interactive system.

If you're focused on benchmark numbers, dynamically typed languages are much more challenging to squeeze performance out of better code generation.

You can also add static type inference and consistency checking to a dynamic language, and indeed some systems do in the name of better

I meant "advanced static-typing" (e.g, like System F, which is advanced to someone learning PL). Not that it was more advanced, although I think in fact it is from a learner's point of view. It's incredibly simple to throw together a Scheme, or even a more complex interpreted dynamic language.

Add in a static typing, type inferencing, and type checking, and that all becomes much more complicated, at least from this learner's point of view. From how highly upvoted my comment is, I suspect I'm far from being alone here. Learning, understanding, and implementing Hindley-Milner by itself was harder for me than learning how to put together a basic interpreter for a simple scheme. And I think a big part of that is the dearth of learning materials for things like Hindley-Milner and related concepts, whereas there are 100 books and tutorials on how to implement a Scheme interpreter in language X.

Finally, if you're talking about language performance, then yes clearly it's much harder to squeeze performance out of dynamic languages. But I'm talking about learning exercises here, not production programming languages.

   dearth of learning materials for things 
   like Hindley-Milner 
What's wrong with the original paper [1]? It's really basic and boiled down to the essence (in true Milner style). I give it to all my students interested in typing systems as first paper. (There is a typo on Page 211.)

[1] L. Damas, R. Milner, Principal type-schemes for functional programs.

Eric Lippert[0] is doing an "excessive explanation" of this paper[1], where he breaks the entire paper down, line by line, with examples that C#/Java people would recognize. It's nice for people like myself who are strong on concepts, but weak on notation.


  map  : ∀α ∀β ((α → β) → (α list → β list))
Which I previously couldn't read, but I could easily explain the concept.

[0] - Former C# language designer, Roslyn compiler team, etc.



[1] - https://ericlippert.com/?s=excessive&submit=Search

Thank you, that's an interesting point.

The thought that somebody would not be able to understand the type of map had not occurred to me, because I'm surrounded by people with a strong maths/FP background. Shows just how narrow-minded I am.

While I agree that compilation of dynamically typed languages is more challenging, inference for advanced typing systems is full of problems:

- How to ensure that the inference algorithm is correct.

- How to design typing systems that are expressive, yet lightweight, and have enough type inference to be usable in practise.

- How to deal with concurrency and parallelism.

None of these problems is really solved. In practise even making type inference fast is a challenge, e.g. in older versions of the Scala compiler, type inference was really slow due to implicits.

> Pierce's Types and Programming Languages

Pierce's book is as approachable as you're likely to see with respect to type systems. You mostly just have to dig in.

Yeah, I suspect you're correct at the moment. In fact, TaPL looks pretty approachable for an academic book.

But it seems to be there's a place for a more popular book on type systems and making a statically-typed language, in the vein of this Crafting Interpreters book (and others like it).

Perhaps I'll read TaPL, write a nice statically-typed language, and then write the book I'm looking for.

I'm reading TaPL, done 4 chapters.

I think you can replace the math with intuitive diagrams and be a little less rigorous and that would be a book for the masses.

TaPL requires you to learn a lot between coding sessions too which may put off more practically minded folk.

I want this too, but is weird to me that is a huge chasm in this area, where a lot of material stop at the most basic, and then it cover a niche that is complicated (like for example, type inference).

If I wanna to implement a dialect of ML, I haven't' find a simple explanation in how implement product/sum types, modules & pattern matching.

Yet, tons of type inference and other less common things.

ie: The basics stop fast and suddenly jump into something else.

> If I wanna to implement a dialect of ML, I haven't' find a simple explanation in how implement product/sum types, modules & pattern matching.

Exactly. I mean, how in the world to implement pattern matching in ML-like languages? And ADTs? These are exactly the kind of subjects I'd like to see covered in a more advanced book on building interpreters or compilers for statically-typed languages.

Writing a type-checker is fun and pretty interesting. For that, I recommend you get acquainted with formal semantics and judgements (that's just the language for describing type-systems). You'll probably be interested in then looking up

  * simply typed lambda calculus
  * Hindley-Milner type inference
  * bidirectional typing
After that, there are all sorts of cool things like linear/affine types, system F, dependent types, etc. which can be picked up.

All of this (except maybe the first part) is easily learnable either from Wikipedia, the original papers, or recent tutorials.

Shameless plug: If you know OCaml, then I wrote this tiny implementation of Hindley-Milner[0] for a tiny functional language.

It's (hopefully) very well commented, so it should clarify a bunch of concepts for newcomers.

[0] - https://github.com/prakhar1989/type-inference

Notes only tell half the story, but these are good notes on the first two topics.


Those are very good notes that have come up frequently in my Google searches.

I'm starting a series of blog posts on making first a dynamic Lisp and then adding a type system to it, then type inferencer, etc. I'd be interested in what resources you found helpful and easy to read.

Are you going to share the links once you're done?

I can share now if you like. They're at http://bernsteinbear.com/blog/lisp/ -- still a work in progress. I hope to be done with Primitives II tonight.

> more advanced, static type systems

Are static type systems more advanced than dynamic type systems? This sounds surprising, given how much more one can do with a dynamic type system.

Basically, yes. You can do more with a dynamic type system than a static type system because you don’t need to (and actually, aren’t allowed to) prove that your code is consistent. Advancements in static checking and inference mean you can prove more things about your programs, so you don’t have to fall back on dynamic types and data structures so much.

The quality of a type system isn't measured solely by what it allows you to do. After all, no type system at all would allow you to do anything and write as many bugs as you want-- not great, right? Instead, a better type system will let you make guarantees about your code, while still allowing correct programs to run. Can you guarantee there are no type errors? Can you guarantee arrays aren't accessed out of bounds? That sort of thing.

Well, you could look at a dynamic type system as just a static type system with 1 type—a sum type of everything. Looking at it that way, a dynamic type system is much less powerful.

   Anyone up for writing such a high-quality 
   book as this one on statically-typed languages, 
Not a book, but maybe "The Essence of ML Type Inference" by Pottier and Remy [1] is of interest. It's a bit more in-depth than TAPL.

[1] gallium.inria.fr/~fpottier/publis/emlti-final.pdf

'Git ta werk. ;)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact