Hacker News new | comments | show | ask | jobs | submit login
[dupe] Crafting Interpreters (craftinginterpreters.com)
187 points by afshinmeh 36 days ago | hide | past | web | favorite | 41 comments

This is awesome. I have been writing and re-writing several horrible regex-based parsers (https://github.com/erikpukinskis/parse-a-little-js) to back an IDE I am building, and I can sense I am well passed the “you need to learn how to do this properly, Erik) point.

So far this book seems to be just at the right pitch for my purposes.

Does anyone else have any recommendationed reading for someone like me? My absolute priority is keeping the code simple and having as few features as possible.

I’ve actually been using my ignorance as a design constraint... I don’t want to allow complex syntax in the editor, (it’s not a general purpose editor) and I want the parser source to be approachable.

But perhaps by this point I may have the resolve to resist the power a proper parser will give me.

Bob Nystrom is a fantastic writer. If you like this book, then I would highly recommend his previous book Game Programming Patterns. Even if you don't write games, there's a lot of great stuff in here. He's a very good teacher.

My interpreters course used this book, as well as 'the little schemer'. They follow a lot of conventions that I find really useful for software development (various intricacies of languages in active development tend to follow the same conventions of diagramming, labeling, notation, etc - all really important things to keep standard when reasoning about, teaching, and explaining programming languages!)




There's also this book that from my understanding, a lot of people consider fairly foundational / useful (own it, haven't read it)


> Does anyone else have any recommendationed reading for someone like me?

Writing an Interpreter in Go (https://interpreterbook.com/)

It's a very approachable introduction to lexing and parsing using nothing but Go and its standard library.

> Does anyone else have any recommendationed reading for someone like me? My absolute priority is keeping the code simple and having as few features as possible.


the focus is on speed, but honestly I found this code easier to read and write (once I got my head around the general concept) than most other parser code I've seen.

I completed the "Tree walking interpreter" section of this book about a month ago, I used modern JS to write it and I found it a really good experience and learned a fair bit. A lot of the content is obviously about implementing the interpreter so there's probably quite a bit that doesn't apply to your project, but

If your looking to make your lexer as simple as possible I would recommend looking at (http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-e...). It's written by the same guy, and gives a short look on pratt parsers, which is one of the simplest ways to write a parser. There's also a simple language he wrote that uses the method as an example available on GitHub. At a top level you end up with declarations like:

infix(token: "plus", parselet: BinaryOperator, precedence: 13);

Having done a fair bit of prototyping parsers and reading about it the hardest parts of parsing JS are regex literals and automatic semi-colon insertion.

You might check "Language Implementation Patterns" I have found it usefull in the pass when creating a DSL https://pragprog.com/book/tpdsl/language-implementation-patt...

> Does anyone else have any recommendationed reading for someone like me? My absolute priority is keeping the code simple and having as few features as possible.

Of course, try the OP, maybe there is something for you.

You might want to watch Per Vognsen's "bitwise" series (another guy that was at RAD Game Tools I think, like Sean Barrett who is linked in a sibling comment) on youtube as well. The recordings are very approachable and mostly straightforward.

Having a little theory to guide you is important, but mostly it's a practical issue. So I mostly recommend trying various approaches yourself. Get high-level ideas somewhere else, and then make implementations on your own.

In that spirit, a few high-level tips for approaching parsing, from the top of my head.

How you parse a language depends heavily on what the parsed language is. You need a good feel for the language to understand how it wants to be parsed. If you have the luxury of making your own language, great!

Pretty much every programming language should be split into parser and lexer for sure, although the stages might need to interact (e.g., parser detects a switch to a different, embedded language, and lets the lexer know).

As for the lexer, absolutely don't use any tools. Probably not even regular expressions. Just handcode the stuff as you go. It's boring but extremely straightforward code. You get a lot of flexibility and debuggability this way, and no painful dependencies. In the simplest case the Lexer is a only few type definitions (Token stream wrapper around some I/O facility, Token kind enum, Token record) and a lex_next_token() function.

As for the parser, the best computer languages are those that can be parsed by recursive descent (again, no tools. A C-like language is enough for implementation). You look at the next input token and do the appropriate thing. Parsing is extremely simple if a language is designed to be parsed that way.

The more mathematical approaches to parsing (like LR) are pretty boring and unapproachable in my opinion. They are comparatively hard to understand and debug.

Besides parsing of global structure, most languages have an expression grammar with infix operators and precedence levels. You can work out how to parse that separately. There is the shunting yard algorithm, which is a table driven method unlike recursive descent. I guess you should know how to use it, but you will also quickly see that it gets hairy as soon as function calls and other non-infix syntaxes come into play. Parsing expressions with recursive descent, too, turns out to be not hard after all, and a nice benefit from that approach is that the parser is written in a consistent style.

It would be interesting if authors (of this and similar materials) will comment how this kind of activity helped them landing new gigs and/or investments, contacts, etc...

It looks like the author made the transition from gamedev to working at Google on Dart language team, so it seems that his writing and side projects worked well for him.

Seems so, especially in case of Google, where it is impossible to apply to a specific team/position/project.

Interesting, how initial contact was made... Did the author contact someone from the Dart team, or did the team/Google contact him in the first place?

> where it is impossible to apply to a specific team/position/project.

Impossible is too strong. Unusual, yes, but not impossible. If you pay attention to their job listings, you will occasionally see openings posted for specific roles in specific teams.

Portfolio, you know?

I love the authors other work, it really helped me when I was getting into compiler design. I'm very excited to start reading.

This is shaping up to be an interesting read if you've ever thought about designing your own language and wondered what happens next after the basic stuff that every compiler tutorial covers.

It's also beautifully presented, so compliments to author for that as well.

i think this is cool. i could learn new languages all day. however I've seen people argue on HN that writing language is a waste of time. why should we embark on such an exercise?

It makes programming just about everything a billion times easier to think about. Every programming thing you have to work on is a map between the problem space and the finished program. The problem space itself can be thought of as a program. You are just mapping from one space to the other. That's what writing languages teaches you how to do.

Some people love Yegge, some hate him. This post is... well it's a rant. But it kinda answers this question in an interesting way https://steve-yegge.blogspot.com/2007/06/rich-programmer-foo...

I didn’t realize it was from 2007 until I got to this tidbit:

>By which I mean Machine Learning, since the term "AI" smacks of not just determinism, but also a distinct lack of VC funding.

For the chance to expand your mind. If you've never done it before, you'll learn a ton doing it the first time. If its been a while since you've done it, you may learn some new things in the processes given what you learned in the intervening years since the last time.

> writing language

Is all we do, in a way or other:


> Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Thats why "Interpreter" is one of the common OO patterns on the book.

I done a lot of micro-languages (later I found are called "Domain Specific Languages" by hipsters). I have one that build the Database interface code automatically from the .sql script that build the postgresql database.

I code the database at hand, run "builddb.py" and it spit out all the mapping and part of the ORM layer. I doit with bad hacks, regex and stupid stuff (from a proper compiler construction perspective) but I only spend like half day doing it.

Later, I also build the inserts/updates and other extra stuff that sometimes is useful. Then I also spit out both F# and swift code, with the same terrible hacks...


Now, after do this thing like millon times before I start to think that I'm a idiot that could have donde this MUCH better. So, this time as a side project I'm doing the same thing, like a ORM-made-as-language.

And boy, this thing is fun. I not only can run "sql" agains my RDBMS but also with owm in-memory structures, similar to LINQ but, IMHO, better.

Now, the thing is... if this little project get to a point where is good, why not release it as a language?

Most of us probably would be wasting our time writing a general purpose programming language that we think could compete with Python or C.

If you think smaller though, there's a ton of utility to be had in taking the ideas and applying them to domain specific languages. I don't know much about Racket, but it feels like that's its reason to be.

For fun and learning.

In the last 10 years all sorts of new languages have come out and been successful. We're not nearly at the end game of language development.

I've started reading it, until it used the visitor pattern for its parser. It sort of generates Java code with some hacky way of doing things. It felt complex for no good reason.

I have nothing against patterns, but it made it harder to follow, so I stopped reading it.

There is no pattern matching in Java (yet).

If you do the whole process in one pass like in C you can use the parser evaluation but even a simple feature like lambdas requires more than one pass.

So either you use a Visitor or you develop pattern matching in your language and bootstrap it rapidly to be able to use you language to write the backend of your language.

Did you have a previous experience of something easier? May you share about it?

What is the actual problem? Is it multiple dispatch? I think some languages (Lisp may be one) handle it natively.

Visitor is the only way I know of dealing with it in C++. Once you do it enough, Visitor starts to feel like a pretty natural way to extend a class without modifying the class.

> Visitor is the only way I know of dealing with it in C++. Once you do it enough, Visitor starts to feel like a pretty natural way to extend a class without modifying the class.

Are you talking about traditonal OOP visitor pattern ? It's pretty terrible, when compared to std::variant ; imagine for instance that you want to have a way to draw the rect that contains a given shape without changing your shape class: https://godbolt.org/g/34KyMw

If your interface is that consistent, why not just make the bounding box drawing function a template?

Visitor is nice when some kind of transformation needs to be applied to the visited objects.

Not the OP, but I took a survey of the /r/programminglanguages reddit, where basically everybody has to deal with the "visitor issue".


The top comment supported the use of the visitor pattern, but I would say that MOST people in the thread did NOT use visitors.

You only really need a visitor if you have more than one traversal. And even then you don't necessarily need it.

I'm curious about this as well. When writing my own interpreter it didn't take me long before I opted to use the visitor pattern, and once I did it felt like a natural fit.

If using C++ I'd usually just opt for a type enum and switch on it. Visitor is a better way of using type to dispatch, but harder to see control flow.

Notice that while the linked page says "You can read the whole book, for free, online", chapters 18 to 30 haven't been written yet (e.g: http://craftinginterpreters.com/types-of-values.html).

Slightly below the button marked "start reading" there is a section entitled "What’s the Catch?" that explains as much.

I was about to write that very comment. It's not like he's hiding it, and he's giving the (e)book out for free, what more could you ask?

The same was true of the author's previous book, Game Programming Patterns, which is now complete and fully published.

(... and still free online!)

i love these kinds of articles, but i dont understand why put the first 2 lines in it, which kind of contradict eachother.

" A handbook for making programming languages.

This book contains everything you need to implement a full-featured, efficient scripting language."

The title makes it look like you would write a compiler /assembler or some plugin for existing one. and then the first scentence dives down to the actual content.

This is obvious from the title here. it just bugs me these kind of words (programming language vs scripting language) are used like that.

Other than that, always interested to read about both topics :D

If you write a C interpreter, is it not a programming language anymore? Also, the fundamental concepts of a scripting language (lexing, parsing, compilation, but to bytecode) apply to compiled programming languages as well. There is no actual contradiction.

Isn't a scripting language a programming language? The way I understood it is that not all programming languages are scripting languages but all scripting languages are programming languages. I'm making my own you see and I also use the same denomination.

Maybe it was changed, but the current title is "Crafting Interpreters". Interpreters are used for scripting languages, so not sure there's any valid complaint that programming language inside an interpreter book is synonymous with scripting language. That's like opening a book on algebra that mentions math and complaining it's not about calculus...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact