Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Resources for building a programming language?
239 points by kevinSuttle on Sept 4, 2017 | hide | past | favorite | 83 comments
Looking for tips, ideas, and actual detailed instructions for creating a programming language. No specific target or runtime. Just want to try my hand at it and leverage my API design experience.

I'd probably start by using Racket (Racket is a scheme is a lisp) which has a whole toolkit for language building. Arc is written this way, for instance. Often these are lispy languages but there is no reason they have to be. A starting point: http://beautifulracket.com/stacker/ or https://www.hashcollision.org/brainfudge/

That will get you started rediculously quickly and give you things like GC and JIT for free. You can always implement it outside of racket once it starts to take further shape

It's worth mentioning that Pyret followed this path. It was originally a Racket #lang. Once we got the design off the ground and decided to primarily target the browser, we wrote a new implementation in JavaScript and Pyret. The tools in Racket (including ragg [http://docs.racket-lang.org/ragg/]) were invaluable in getting started quickly.

This archives the state of the system when we made the switch:


Some things, like the grammar (https://github.com/brownplt/pyret-lang/blob/5f22ec7c8affde15...) have survived largely intact from that prototype for years.

Another one for that list is Hackett, a promising language in development that I'm following closely. It too piggybacks on Racket and aims to bring together the best features of Haskell and of Lispiness.


Note that Hacker News which we are now using is written in Arc, which was written in (a now older version of) Racket.

Also note that Racket is a very fun language :)

Most of the responses so far are about implementing. My favorite intro to programming languages was http://www.eopl3.com/ (though the first edition was more fun and not quite so focused on being a classroom textbook). Working through it, you write interpreters, but that's to make the ideas concrete and testable, not to replace a compilers class. There are newer books that may be better -- I hope someone who's studied them will bring them up.

http://wiki.c2.com/?ProgrammingLanguagesAnInterpreterBasedAp... was very fun and surveys a wider variety of languages, though it's also very dated now.

The first edition of EOPL was eye opening. I learned what a continuation was, (and how to do CPS) from it.

The second and third editions feel less "hard core" somehow.

Do the later editions have less content? Would your recommendation be to read the first edition? Thanks!

(strictly imo) the 2nd and 3d editions are fine books and cover a lot of material, but they don't have as much (for lack of a better word) depth as the first edition. If you can get the first edition, it is certainly worth reading.

I worked through most of the first edition (skipping much of the OO chapter) and read the later editions pretty lightly. They're more polished and developed, e.g. explaining better code for the CPS transform, adding a chapter on types, and such. But they do give me a bit of a spoon-fed vibe compared to the first edition, I guess as a result of optimizing it from classroom feedback. I'm not sure depth is the right characterization of the difference -- I didn't get that impression, besides the compiling chapter getting dropped. But then I didn't study them as much (though I did report an erratum in the second edition).

From this review: http://eli.thegreenplace.net/2017/book-review-essentials-of-...

The book doesn't seem that great for self-study. Would you confirm or reject that assessment?

I did self-study it. I agree that just reading probably won't teach you much -- you need to program your way through the exercises, at least many of them. Getting stuck on a problem could make things difficult on your own and maybe rob you of momentum. This is a hitch I run into with math books a lot more than programming books, personally, but now that you bring it up this may be an issue I should've thought of. (Like him I also worked through SICP this way.)

Good to know. Could you say how long it took you to go through EOPL? Would you recommend another book on the topic for self-study?

I've been hesitant at starting these books because I was thinking too that without the exercises, it would be a lot less useful, and so it would be a bigger time/effort commitment than I want to get into.

Here's one of the newer ones I mentioned, which I haven't read but seems well regarded: http://cs.brown.edu/~sk/Publications/Books/ProgLangs/

EoPL took about as much work as a good college class, minus the sitting-in-lectures-and-tests part. Some parts aren't depended on later, like the OO chapter IIRC, so they're up to whether you're interested.

I used it (1st edition) for self study.just a datapoint.

I've really been enjoying http://www.craftinginterpreters.com/ unfortunately it isn't complete yet.

I've been recommending this to co-workers who want to learn how to write their own language. I had ideas on how I would write a book on compilers and Bob had all the same idea, but he actually went ahead and acted on them.

1. Avoid long discussions on the theory of parsing, how to transform regular expressions into table-driven DFAs, how to build an LR(1) parser, etc.. These topics can be interesting later on, but for the benefit person who just wants to learn to write a language, the author should instead focus how to write the scanner and the parser by hand using a predictive recursive-descent. The value of this approach are three-fold: (1) the book is shorter, (2) it's a simple approach that works with any language without specialized tool (useful if someone wants to write such a tool for vim or Emacs, say), (3) it feels more concrete, more "real" to write a parser by hand rather than create a few grammar rules and let an external tool create the code that does the parsing.

2. "Breadth-first" rather than "depth-first". In a typical compiler textbook, each chapter exhausts almost all that there is to say about a topic before moving on. I think a more practical book will avoid such deep discussions and try to go as quickly as possible from source language to executable program. After the initial implementation is complete, the author can go back and add more details. Bob's approach of having two interpreters, one AST-walker and one that builds bytecode, fits that idea quite well.

3. Multiple implementation languages. The implementation of a language is largely guided by its host language: if using ML or Haskell, then sum types will be quite handy; in Java we can rely on the library's excellent collections. It's cool that Bob decided to write an interpreter in Java and another one in C; the reader will get to see how some decisions in the former (e.g., using exceptions) are "translated" in the later.

4. A full implementation in the book. Many compiler textbooks use pseudo-code and steer clear of "pedestrian" topics like good error handling. By having a full implementation, Bob ensures that the little dirty details are addressed also and not swept under the rug and left for the readers to discover and struggle with.

I really look forward to the day when I can order a copy of Crafting Interpreter, I have no doubt that it will be a terrific book.

You might also like my article https://codewords.recurse.com/issues/seven/dragon-taming-wit... -- it's one chapter instead of a whole book, but it does follow your points 2 and 4. (Not 1 because it just uses Python's built-in parser to ASTs, and not 3 because it's just one compiler.)

I'm looking forward to Crafting Interpreters too.

I'll second this, it's quite good so far.

Have a look at http://t3x.org

There are lots of small, comprehensible compilers and interpreters there, and there is an interpreter and runtime construction kit, look for "S9 core" in http://t3x.org/s9fes/. You can also order books explaining all the stuff. And yes, I'm the author! :)

Many suggest resources for compiler implementations, i personally would start prototyping the language itself, think about it's type system and its runtime properties. There are many possibilities in terms of type systems, from static to dynamic and everything in between, concepts like linear types, algebraic types, etc. Runtime properties are things like it's memory model (garbage collection, yes or no). All of this depends on the purpose of your language.

The second part would be choosing a platform to run on (so you only have to write a compiler frontend). There are many great technologies to host a language, but i'd advocate two specific ones here:

* https://github.com/zetavm/zetavm (shoutout to @love2code)

* https://llvm.org/

ZetaVM is a great target for dynamicly typed and/or jit compiled languages like python or javascript. LLVM is used in many low level / near metal languages like C, C++, rust and so on.

None of both must be your final targets, your compiler frontend can be moved to target another platform like the JVM later.

As a last part, implement your compiler frontend. This involves lexing/parsing source code, type checking if needed and generating your target's intermediate representation. I personally prefer hand-written recursive descent parsers using a parser combinator framework (like https://github.com/Geal/nom) over parser generators. For further processing like type checking, there is the common monolith aproach and the nanopass framework/approach https://www.youtube.com/watch?v=Os7FE3J-U5Q Also there's a great series on compiler frontends by Alex Aiken: https://www.youtube.com/watch?v=sm0QQO-WZlM&list=PLFB9EC7B8F...

Hope this gives an overview. :)

>I personally prefer hand-written recursive descent parsers

>using a parser combinator framework (like https://github.com

>/Geal/nom) over parser generators.


Why handwritten, or why using a parser combinator framework?

Handwritten parsers are most commonly used because most tools have problems making things like proper error reporting good enough to be worthwhile.

Parser combinators is really just a fancy way of composing a parser from higher order functions instead of writing out a function body. If well done it basically provides a DSL to build the parsers, and the way this works by composing small fragments is pretty much what we do with recursive descent anyway, so it's a good fit.

There are tons of resources on implementing languages (interpreters, JITs, and normal compilers), but I haven't seen that many good resources on PL design. Something like recently discussed Graydon Hoares post[1], but bit more approachable for an outsider and maybe less focused on the very bleeding edge.

[1] https://news.ycombinator.com/item?id=15051645

I recommend playing a bit with ometa:


The framework has since evolved into ohm, but I think ometa-js is more fun for playing around with.


The Ohm interactive editor is a lot of fun as well - you get real time feedback on examples as you edit your grammar.


Here's an ebook on the subject: "How to Create Your Own Freaking Awesome Programming Language"


Here's what some programming luminaries had to say (lifted from the website):

“The book I want to read.” — Matz, creator of the Ruby language

“I really love this book.” — Jeremy Ashkenas, creator of the CoffeeScript language

"The book I want to read." - that's a rather odd testimonial.

Japanese/English impedance mismatch?

I highly recommend Write Yourself a Scheme [0]. It is self-contained; besides knowing Haskell (and perhaps Scheme), it doesn't expect you to know anything about writing compilers.

While I'm here, does anyone know of any practical resources on how to implement LLVM languages with GC? The documentation [1] leaves something to be desired.

[0] https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_...

[1] https://llvm.org/docs/GarbageCollection.html

https://interpreterbook.com is a pretty good resource to hit the ground running, with unit tests and little magic.

Once you've gone through it, you'll probably want to use the magic tools as they're well tested at this point, and make your life easier. Still, understanding what they're doing for you is a massive boon.

I should note, it uses Go but not in an idiomatic way. Strings instead of using an error type, etc. This is theoretically for the purpose of making it easier to follow along in another language, but I still wasn't a fan of that aspect.

Author of the book here. Thanks for the shout out! I'm happy to hear that you appreciate the book exactly in the way it was intended to - little magic, lots of code and unit tests.

I'm also super curious about your "not in an idiomatic way" comment. If you have a minute, feel free to send me a longer version - me at thorstenball.com. I'd really love to hear what you'd do differently and what could be more idiomatic.

Sent an email from my username at gmail.

> I'm happy to hear that you appreciate the book exactly in the way it was intended to - little magic, lots of code and unit tests.

Yeah, that's exactly what I wanted out of it, and my coworker recommended it at the best possible time on top of that.

Here's a cool programming language in a live-coding environment that I developed with my interns two summers ago. It enables natural language programming which is great for introducing young kids to computer science.

It is built in Java and uses a recursive descent parser to parse sentences like "draw a red circle at 300, 300", and is well commented because the intention was for advanced students at my education program to be able to easily extend and hack on it.


That idea runs into scaling problems quickly. See COBOL-60 and HyperTalk, which are are surprisingly similar.

A useful question: you want to talk to a computer conversationally, and get beyond the Alexa/Siri level of tasks. How would you do that? Fully general natural language requires strong AI. Could you come up with some more restricted form with enough expressive power but understandable by a computer? The computer might have to rephrase your questions and ask "did you mean...?" Come up with a way to converge when the understanding is poor, and you'll have something.

Correctable dead-ends are better educationally than crafted successes. I don't think the goal is to teach students how to parse english, or making a conversant programming language, but it could show why programming languages use something other than natural language.

What do you want to do? You could write a LISP interpreter in a high level language in an hour, you could target the JVM/CLR/whatever and get fairly high performance and GC without too much work, you could spend months/years writing a quality language runtime + native compiler, etc.

http://www.craftinginterpreters.com/ is coming along really nicely.

Reading academic papers can give you some excellent ideas once you get into the weeds (e.g. garbage collection, compiler optimization).

This is for building a compiler for the language tiny: http://thinkingeek.com/gcc-tiny/

Otherwise, get your hands dirty with a parser generator(PEG parser generators[1] tend to be fairly forgiving). It is pretty easy to get started making an interpreter that way, and it is quick to prototype with.

[1]: http://bford.info/packrat/

I would recommend keeping your hands clean and using ANTLR [1]. ANTLR4 is powerful lexer/parser generator. LL(*) is ridiculously powerful. Also, ANTLR is well documented and the his book [2] is quite readable.

[1] www.antlr.org

[2] https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-re...

Also, if you're really only interested in the language then you should think about targeting LLVM IR or the JVM.

For beginners I recommend building tiny lisp interpreter with this article: http://norvig.com/lispy.html

It's really easy so I could build it with coffeescript in just 4 hours while I was a student.

If you're looking to create a dynamic language with reasonable performance, you could look at implementing it with RPython [1], which is used for PyPy, or for the Parrot VM [2], which is used for Perl 6.

Once it becomes a little more mature, Zeta VM [3] may also be a good target.

[1] https://rpython.readthedocs.io/en/latest/

[2] http://www.parrot.org/

[3] https://github.com/zetavm/zetavm

I don't think Parrot has been used in some time. Now P6 has the Rakudo distribution using the MoarVM I think. It's always confused me a bit.

Another option is to target Lua. The C version runs everywhere and the LuaJIT version is super fast.

Using Forth as a substrate lets you focus on the more interesting aspects to an even higher degree than Lisp. The last thing you want is detailed instructions; unless you're just building another whatever, which never really made sense to me. Build the most simple and naive thing possible that works the way you want it to, and go from there. That's how Snabel was born:


Therein lies my problem with Forth. Yes there is a ton of power, but that is inaccessible to a lot of users. I know I can look at Jones Forth & MeCrisp, but I honestly couldn't see where to start. I'd like to see a tutorial start with either an assembly or C base and then teach Forth fundamentals such as how to start your dictionary and choose between direct/indirect threading and how to implement each. I'm always curious how so many Forth users got to that stage.

Edit: It looks like Snabel is a Forth inspired concatenative language written in C++ with some perl like features. That's pretty cool. If you ever get the chance I think I'd enjoy it if you made some video tutorials explaining the design and some of the code choices.

I came into Forth with 32 years of mixed experience from Lisp, Smalltalk, Haskell, C++, C, Java, Perl, Python, Clojure, Scala and more; it's difficult for me to judge anything from that perspective.

I've never written a standard Forth program in my life though, never installed any other implementation. The second I was introduced to Forth, it clicked; Lisp took me much, much longer to get by comparison.

Like I said, I'm not very much into rebuilding what has already been built better by someone else. Anyone who's written any amount of code is bound to have own ideas, and the nice thing about Forth as a substrate is that doesn't come with that many ideas of its own.

I've written a few blog posts explaining design choices (https://github.com/andreas-gone-wild/blog/blob/master/forthy...), there's more interesting stuff to come now that the pieces are falling into place; I've only been working on Snabel for a couple of months.

Thanks for the reply. I'll take a look. When you said it clicked. What we're you looking at? Starting or Thinking Forth or what? I'm trying to find out how most people learn to build their own system.

It was a random blog post praising the simplicity of Forth; couldn't find it again if my life depended on it. Thankfully it didn't go too much into Forth dogmatics; what it did instead was to cut the ideas down to their core, which simplified my thought process to the point where I could finally see a clear path to something that felt like it was worth my effort.

We over-complicate things, for different reasons. The truth is that nothing is very complicated once you understand it well enough to break it down to its core.

How often do you use Forth for your professional /non hobby work?

I believe the gforth manual contains a fair bit about its implementation and design decisions.

For threading, this is probably a good start: https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Thr...

EDIT: I don't know how well you know Forth, but it's probably a waste of time trying to implement it before you grok it from userland, if that's where you are. Understanding first the concatenative paradigm and then (more importantly) the deep metaprogramming runtime-is-a-fundamental-part-of-your-application paradigm (which Forth shares with Lisp and Smalltalk) is super important

I understand a bit of userland, but not a whole lot. I figured you needed to learn Forth via a bit of both as all things tend to get circular, as in I'd understand this better if I knew how it was implemented.

"Write you a Haskell": http://dev.stephendiehl.com/fun/

To keep motivation, you need to have result quickly while focusing on topics you are interested in. You should choose the high level language (with at least a GC) where you are the more proficient. Have very small exigences in error diagnostic. If your interest is mainly in language syntax and semantic, you can either write an interpreter or target an existing language (C++ was initially generating C).

This is a useful comment. The suggestion to go with an interpreter first if you want to have something in a relatively short amount of time is a good one.

I strongly suggest you build something esoteric and fun first. This should be a "bare minimum" VM or interpreter. Here's one of mine that I wrote like a 8 years ago[1]: https://github.com/dvx/zeded/

Don't start off with YACC/Bison as they hide a lot of stuff under the hood. It's cool learning things from scratch. The most commonly-suggested book on compilers is known as the Dragon Book[2] and if you want to take this endeavour seriously, you should really get a copy.

[1] https://code.google.com/archive/p/zeded/

[2] https://www.amazon.com/Compilers-Principles-Techniques-Tools...

For the dragon book, the first edition is significantly cheaper. Is there anything in the new version making it worth over 10x as much?

The later chapters on ILP, software pipelining, optimizing for parallelism and locality are completely new. Basically, chapters 8-11 have been either re-written or written from scratch. On the other hand, the 2nd dropped the Want to write a compiler and A look at some compilers chapters.

Here's a tutorial on how to create a programming language using LLVM (in C++ or OCaml): https://llvm.org/docs/tutorial/

I would suggest you focus a lot on the design, research the history and design of other languages. Try to understand how they evolved and try to figure out their bad decisions on the way.


Pythons PEP, PHP’s RFC, Mailinglists

Building the language itself is the easy part...

Robert Harper's book Practical Foundations for Programming Languages (free draft) http://www.cs.cmu.edu/~rwh/pfpl.html

This book will stop you from making easy mistakes in design of whatever DSL or language you're trying to create. Also see the commentary he keeps on it which changes often http://www.cs.cmu.edu/~rwh/pfpl/commentary.pdf

Understanding Computation by Tom Stuart [0]. While not soley dedicated to creating a language, the first couple of chapters deal with building semantics of a simple language using Ruby as the implementation language (but easily done in any other language you're familiar with). Implementing the virtual-machine the language runs upon in a language you already know provides some really wonderful insights.

[0] http://a.co/7MI5j7h

That book is not for the faint of heart. It gets deep fast haha.

Hi, for the parser and compiler this helped me a lot: https://compilers.iecc.com/crenshaw/

If you are building an interpreter too this was a great inspiration: http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInst...

Your URL is slightly off: https://compilers.iecc.com/crenshaw/


I'm building a language right now and I found this tutorial to be a great help: http://lisperator.net/pltut/

Here's the language I'm building: https://github.com/kevwu/kythera

Gary Bernhardt has a recent screencast on creating a compiler


Not free, but his screencasts are excellent, so worth a look if you see a topic that interests you.

I accumulate some (ok, I lie: Is a lot!):


For DSLs, although the tools mentioned can be used for GPLs, see DSL Engineering: http://voelter.de/data/books/markusvoelter-dslengineering-1....

Xtext looks like one interesting way to get started with your own language. I only just started playing around with it and so far it seems pretty cool.



you can continue with the links from there.

do you want to target something in particular?

implementing a lisp (scheme) or a forth is a good starting point.

You'll want the Dragon Book as a reference, not necessary to get started with, but to build the semantic tree on which you will need to pin concepts on. https://www.amazon.com/Compilers-Principles-Techniques-Tools...

If you aren't yet committed to any language, you can start building a parser with PyParsing. It's really easy. http://pyparsing.wikispaces.com/

If you want to take a quick (albeit expensive) class on it, Dave Beazley offers one: http://www.dabeaz.com/chicago/compiler.html

Hmm. This might be kind of blasphemous, but I think I'd recommend against the dragon book? As I remember, the emphasis in that book is on syntax-directed translation. I might argue that the less you're thinking about syntax and lexing/parsing crap the better. (For that reason: ignore other answers that tell you to learn about a particular parser generator (e.g. ANTLR). That's just fluff.)

Really, if you're coming to programming language design with the thought "I'm going to make an imperative, object oriented language" (with parallelism as an afterthought), you're doing it wrong. The world has enough of those already and you're going to invent something worse than what's already there.

Probably, instead of inventing a new language (say Matlab for matrix operations or Prolog for logical reasoning), you'd be better off implementing a library that handles the same concepts and embeds into another language (which is really what happened with Tensorflow or MapReduce to think of two examples).

(Grune's book "Parsing Techniques" is a great reference on parsing crap, but the secret is that if you design your grammar to be LL(1) you can parse your language using recursive descent: you only need a fancy parser if you designed a more complicated grammar (why'd you do that?))

Recommended book: The Reasoned Schemer. It's a cute (maybe too cute) book that shows how to implement a logic programming language (~datalog) using scheme as a base language. The Wizard book (structure and interpretation of computer languages) also has really cool examples that I think 60% of programmers I've worked with in industry don't fully appreciate.

Only the first couple chapters deal with lexing and parsing. The Dragon book covers a lot more than just the syntax analysis: semantic analysis, type checking, run time environment design, intermediate language design, code generation, code optimization, and more. It doesn't have the newer stuff like JIT or Hindley–Milner type system, but those can be picked up separately with reading the papers.

OP asks for resource to build a language. The Dragon book is a very good book to cover the whole process.

This might be kind of blasphemous, but I think I'd recommend against the dragon book?

I agree! It's hard to criticize a book rightly regarded as a classic, but I think it solves the wrong problems, or at least emphasizes the wrong areas.

if you design your grammar to be LL(1) you can parse your language using recursive descent

Yep, that's the way to do it.

I read the question as the OP wanting to go through a didactic exercise in building a programming language from the ground up rather than building a DSL, in which case, lexing and parsing is a fundamental concept to cover, among others. I agree with keeping it LL(1), but it may also be worth it for the OP to understand what LL(1) means. I agree parsing is not at all the most important thing to learn in language implementation, but it seems to me almost a necessary thing to cover in order complete one's education in the subject. To omit it would be akin to omitting a discussion of limits in a study of calculus.

Guy Steele, Growing a Language: https://www.youtube.com/watch?v=_ahvzDzKdB0

I suggest reading the source code to a professional quality compiler. There are lots to choose from.

"looking at leveraging my experience... anyone have a resource that shows me how to do it in detail?"

Get the dragon book, learn LLVM.

Nature finds a way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact