Seems to be a hot topic, a friend and colleague published this in December 2016 and I can thoroughly recommend it (Uses Go as an implementation language):
I can recommend this book as well! It is easy to read and understand and the choice of Go as the host language is also great.
I am in the process writing a semi-natural query language (boolean logic) targeted at domain experts (in this case biomedical researchers) that compiles to either database queries (e.g. SQL) or a function call for arbitrary logic. The goal of the compiler bit is to abstract away where the data ultimately lives (similar to GraphQL) or how it is structured.
Yeah, I've talked to Thorsten over a email a little. I'm excited about his book too, and I hope it does really well. Go is a cool choice for a language book. It's low-level enough to let you show some of the fundamental algorithms, but not as tedious to work with as C and its manual memory management.
Yes! The code for the whole book is done already and sliced into chapters. I had to do that first before I started putting things online. I didn't want to paint myself into a corner by having something in an earlier chapter that ended up not making sense later.
So the code is all done, and now "all" that's left to do is write and edit all of the prose. Which, of course, is really the hardest part by far.
I am currently working on Scheme R5RS implementation in Java. It is a hobby project, but I'm really enjoying it. It is somewhat enlightening.
Look forward to reading the Optimization chapter: I couldn't find much useful and practical info on that. I think it is quite easy to implement your interpreter, but it is really difficult (at least for me) to make it fast.
Can anyone recommend any articles or books with common optimization tricks and techniques?
> Thank you so much for the book, please, continue!
I will!
> Look forward to reading the Optimization chapter: I couldn't find much useful and practical info on that.
There's not too much depth in that particular chapter. The whole second part is really about implementing an interpreter (fairly) efficiently in C. As you probably know, you have to think about performance at every level—your object model, bytecode format, hash table implementation, etc.
If you'd like a sneak peek, the code for the entire book is actually done already (!). You can see which snippets go into each chapter cloning the repo[1] then running:
make chapters
make c_chapters
make diffs
Then take a look in build/diffs.
There are a ton of optimization tricks that are specific to Scheme and Lisps and a lot of literature behind them. Macros and eval make it a lot harder. It's pretty dense, but I liked "Lisp in Small Pieces" a lot.
For making an efficient Scheme implementation: most everything written by R. Kent Dybvig or his students. I'd start with his PHd thesis (Three Implementation Models for Scheme, specifically the stack-based part).
Interesting! I like racket...just wish it was faster. Anybody know if there will be a successor to "Realm of Racket"?
Edit:
Now I remember it uses LGPL as the license. I understand the reasoning, but I've always viewed the freedoms as giving the user freedom over the implementer...kind of just trading power.
I clicked the link without looking at the poster, and by the end of the "What’s the Catch?" paragraph I was thinking to myself that this really seems like something written by The Munificent Bob. Then I read the first sentence of the next paragraph :)
I really like your blog, I have read all of it, some posts even 2-3 times, and not only because of the content, but your writing style. I will probably read this book too ;)
One think I find lacking in books about building languages is a focus on tool. I am working on one (available for 0$ on leanpub: "How to create pragmatic, lightweight languages" https://leanpub.com/create_languages
) with the intention to focus more on building editors and simulator.
If you are interested in the subject you could take a look to my blog (https://tomassetti.me) I write 1-2 posts per week on building languages. Most of the time I talk about DSL.
This is awesome, since I'm currently working on my own (toy) language! Thanks for making this public!
On a side note, to anyone interested in an overview of high-level language features and what their implementation requires, I can wholeheartedly recommend Michael Scott's Programming Language Pragmatics. It doesn't go into implementation details, but tells you a lot about the historical choices and implementation possibilities of (mostly) mainstream languages.
That's so cool!
Learning to make a programming language is in my life's todo list, just as doing my own (super simple) operating system.
Thank's for contributing with a free book, I very much appreciate it.
Does anybody know of something similar but for operating systems?
This looks great, for some reason the idea of creating a language from scratch and then using it in some professional capacity always seemed really cool to me.
A much less involved language-building tutorial that's available now is Beautiful Racket.[0] Racket is a Scheme, but it's also specially outfitted to be a language workbench.
A little late to the party here, but I'll throw another tutorial that I absolutely loved, buildyourownlisp.com which guides you to building 'lispy' in C.
One question here for the more advanced compiler writers.. How do you go about writing a "pre-lexer" for LISP like language where you want to expand sexpr? I.e. Does the lexer even see `(...), or does it get pre-extracted to (quote ..)? And if so, how does the pre-extraction work exactly? Is it built using regex..? That can be tricky.
Lisp dialects usually have an input mechanism called a "reader". It scans the character syntax and produces the equivalent object, combining lexical analysis and parsing. Special notations like 'x denoting the same thing (quote x) are easily handled here. When the reader sees a quote, it consumes it, calls itself recursively to scan the expression which follows to get an object O and then returns the object (quote O).
Another great-looking book on making dynamic languages. There's been some good ones recently, and this one looks like another good one covering how to make a basic, interpreted language.
Some of us have written 1 or 2 dynamic, interpreted languages, feel pretty comfortable with those concepts, and are interested in transiting to more advanced, static type systems and writing a good type checker.
Anyone up for writing such a high-quality book as this one on statically-typed languages, ideally functional languages? The only decent books I've found on the subject are 20-30 year-old textbooks. I'm watching "Write You a Haskell" with interest (and have partially read it), even though I'm more into ML and Scala than Haskell, but it's still pretty unfinished.
There's also the plzoo, which is a great resource but not a book.
Oh, and I definitely need to read Pierce's Types and Programming Languages textbook, I've ordered it and need to sit down with it. I suspect it will be very enlightening.
But a more approachable book on statically-typed languages in the same genre as this one is (popular, less academic) would also be welcome.
I did think about trying to sneak a type checker in here too, but it's just too much for a first book on languages and interpreters.
If you're looking for a book on statically typed functional languages, I think Appel's book[1] is canon. From what I've heard, the C and Java ones aren't very great. It's worth learning enough ML to get through that version because ML is clearly the language Appel thinks in.
I assume you're talking about something like "Modern Compiler Implementation in ML"? That book is still a good starting point, and once you get a basic implementation going you'll be in a good position to start learning about and implementing more advanced features. You'll end up teaching yourself as you figure out how to add features to your own codebase.
There's nothing more "advanced" about statically typed languages. Static typing and type inference are straightforward and can make implementation details simpler than dynamic languages (because you're greatly reducing or removing the unknowns you have to deal with at runtime). This is assuming you're making a batch-style compiler, as most implementors do, and not an interactive system.
If you're focused on benchmark numbers, dynamically typed languages are much more challenging to squeeze performance out of better code generation.
You can also add static type inference and consistency checking to a dynamic language, and indeed some systems do in the name of better
I meant "advanced static-typing" (e.g, like System F, which is advanced to someone learning PL). Not that it was more advanced, although I think in fact it is from a learner's point of view. It's incredibly simple to throw together a Scheme, or even a more complex interpreted dynamic language.
Add in a static typing, type inferencing, and type checking, and that all becomes much more complicated, at least from this learner's point of view. From how highly upvoted my comment is, I suspect I'm far from being alone here. Learning, understanding, and implementing Hindley-Milner by itself was harder for me than learning how to put together a basic interpreter for a simple scheme. And I think a big part of that is the dearth of learning materials for things like Hindley-Milner and related concepts, whereas there are 100 books and tutorials on how to implement a Scheme interpreter in language X.
Finally, if you're talking about language performance, then yes clearly it's much harder to squeeze performance out of dynamic languages. But I'm talking about learning exercises here, not production programming languages.
dearth of learning materials for things
like Hindley-Milner
What's wrong with the original paper [1]? It's really basic and boiled down to the essence (in true Milner style). I give it to all my students interested in typing systems as first paper. (There is a typo on Page 211.)
[1] L. Damas, R. Milner, Principal type-schemes for functional programs.
Eric Lippert[0] is doing an "excessive explanation" of this paper[1], where he breaks the entire paper down, line by line, with examples that C#/Java people would recognize. It's nice for people like myself who are strong on concepts, but weak on notation.
ex:
map : ∀α ∀β ((α → β) → (α list → β list))
Which I previously couldn't read, but I could easily explain the concept.
[0] - Former C# language designer, Roslyn compiler team, etc.
The thought that somebody would not be able to understand the type of map had not occurred to me, because I'm surrounded by people with a strong maths/FP background. Shows just how narrow-minded I am.
While I agree that compilation of dynamically typed languages is more challenging, inference for advanced typing systems is full of problems:
- How to ensure that the inference algorithm is correct.
- How to design typing systems that are expressive, yet lightweight, and have enough type inference to be usable in practise.
- How to deal with concurrency and parallelism.
None of these problems is really solved. In practise even making type inference fast is a challenge, e.g. in older versions of the Scala compiler, type inference was really slow due to implicits.
Yeah, I suspect you're correct at the moment. In fact, TaPL looks pretty approachable for an academic book.
But it seems to be there's a place for a more popular book on type systems and making a statically-typed language, in the vein of this Crafting Interpreters book (and others like it).
Perhaps I'll read TaPL, write a nice statically-typed language, and then write the book I'm looking for.
I want this too, but is weird to me that is a huge chasm in this area, where a lot of material stop at the most basic, and then it cover a niche that is complicated (like for example, type inference).
If I wanna to implement a dialect of ML, I haven't' find a simple explanation in how implement product/sum types, modules & pattern matching.
Yet, tons of type inference and other less common things.
ie: The basics stop fast and suddenly jump into something else.
> If I wanna to implement a dialect of ML, I haven't' find a simple explanation in how implement product/sum types, modules & pattern matching.
Exactly. I mean, how in the world to implement pattern matching in ML-like languages? And ADTs? These are exactly the kind of subjects I'd like to see covered in a more advanced book on building interpreters or compilers for statically-typed languages.
Writing a type-checker is fun and pretty interesting. For that, I recommend you get acquainted with formal semantics and judgements (that's just the language for describing type-systems). You'll probably be interested in then looking up
I'm starting a series of blog posts on making first a dynamic Lisp and then adding a type system to it, then type inferencer, etc. I'd be interested in what resources you found helpful and easy to read.
Basically, yes. You can do more with a dynamic type system than a static type system because you don’t need to (and actually, aren’t allowed to) prove that your code is consistent. Advancements in static checking and inference mean you can prove more things about your programs, so you don’t have to fall back on dynamic types and data structures so much.
The quality of a type system isn't measured solely by what it allows you to do. After all, no type system at all would allow you to do anything and write as many bugs as you want-- not great, right? Instead, a better type system will let you make guarantees about your code, while still allowing correct programs to run. Can you guarantee there are no type errors? Can you guarantee arrays aren't accessed out of bounds? That sort of thing.
Well, you could look at a dynamic type system as just a static type system with 1 type—a sum type of everything. Looking at it that way, a dynamic type system is much less powerful.
It's slightly less in depth than the roadmap of "Crafting Interpreters", but again, it exists right now.