
Ask HN: What's a starting point for learning how to write programming languages? - da_murvel
I&#x27;ve been thinking about writing my own programming language for some time. This to gain a deeper understanding of how programming languages, and by extent, computers work. I&#x27;ve bought the Flex and Bison book by John Levine to have a resource on parsing and lexing. I have come to realize however, that writing a complete language like Java, Python, Ruby, from bottom to the top, is of course not a small task. So therefore I started to think of writing maybe a compiler for another language, writing a &quot;simple&quot; language like SQL, Markdown etc. to begin with.<p>What are your experiences when writing a new programming language, where and how did you start?
======
evincarofautumn
Others have already mentioned various resources for learning the details of
implementing a language. As someone who does programming language development
as a hobby and PL technology as a career, I have a higher-level bit of advice:

Take a top-down approach, and focus on semantics over syntax.

If you want to design a new language, think of a moderately sized
representative example of a problem you have day-to-day, and write a sketch of
a program in the magical pseudocode you _wish_ you were able to write. It
doesn’t have to be a big problem, or a big language, just a concrete problem
domain.

Then, piece by piece, investigate the details of how to make that notation
work in reality: how do you parse, typecheck, optimise, compile, and implement
a runtime for it? Figure out the smallest set of core primitive operations
needed to express the semantics of your language—that’s your core calculus,
which you can prove things about or use as an intermediate representation in
your compiler.

In addition, a productive strategy for producing an actual implementation,
followed in some compiler courses, is to build your compiler (or interpreter)
in such a way that you _always have a working implementation_ at every step—of
a language that starts small and grows over time.

For example, write a program that takes in a “hello world” or other primitive
program written in your language, and just produces the expected output
without any analysis—the result of the program, or a binary (machine code,
.NET/JVM bytecode, &c.) with the right behaviour. Then, incrementally add
features—outputting other messages, doing other things than simple
input/output, evaluating complex terms, rejecting invalid programs, and
ultimately taking advantage of the semantic features that make your language
unique. This style of development helps encourage you to write a test suite of
increasingly complex example programs, which are an essential part of a
language implementation—you don’t need to follow strict rules like TDD, but
you do need to test _everything_.

~~~
da_murvel
Thanks for your reply! Although I (think) I usually go about programming by
starting small, and then adding features as they are needed. It never occurred
to me, for some reason, I could do the same in this case. By reading the other
replies I think a combination of starting small and using a couple of "how
to"-links and books as a reference will be my best bet.

------
ChuckMcM
For many people, and I include myself in that group, the class in compiler and
language design is where all of the concepts being taught in CS came together
into a bigger and more rich picture. A common observation that it was like
learning "how" music was written rather than just "playing" music.

That said, everyone who learns to program usually learns more than one
language. For me, it was after about the fifth language that the syntax of a
particular language fell away and what remained was the algorithm that was
being expressed. It was this ability to see the algorithm distinct from the
syntactic construction of the language that lets me see where the language is
"helping" and where it is "hurting" in expressing the algorithm.

The advice then for starting points for developing new languages would be to
take some algorithms you know well and implement it in as many languages as
you can find. Ideally you'll have at least one imperative language (BASIC, C,
Fortran, COBOL all qualify), one functional language (haskell/APL), one lambda
language (lispish), and one semantic language (Ada, Mesa, Modula) Typically
sorting, string handling, and asynchronous execution (or schedulers) all make
for good algorithms to try. See what is "easy" and what is "hard" to implement
based on the features the language brings to the table.

Then optimize for 'easy'.

That will gives you verbs and structure that support your need, once you have
those then giving them names and a syntactic structure is fairly straight
forward.

~~~
da_murvel
Thanks for your reply! As a musician myself I like your analogy. It was a huge
revelation for me when I learned how to write my own music, and I also started
to "consume" music in a completely different way. I also sort of hope that the
same will happen with my programming, taking it to the next level.

------
tluyben2
A Forth or a Lisp are simple tasks for an experienced programmer, and even
basic ones are really satisfying imho. Seeing simple constructs work in an
hour of coding really feels great. I implemented more complex languages (with
types) for work which were more like DSLs but in languages less suited for the
type of DSL we required, but I find most pleasure doing little Forth (like
[1]) or Lisp likes when I am waiting for hours (in a plane for instance). It
is like playing games for me. I write them on my phone or iPad.

I would recommend a lot of books here as well for a solid base and giving you
the background needed to write more complex languages including static types,
compilers, JITs and VMs. But for me, because I really do not have that much
time, that would stretch over too long time and I lose interest, so I stick to
things I know I can finish (or rather, get working and achieve something a
previous incantation did not achieve; smallest, simplest, fastest etc).

[1]
[https://gist.github.com/tluyben/16ee2645c4c8aed813005d51488d...](https://gist.github.com/tluyben/16ee2645c4c8aed813005d51488d5c6a)

~~~
gleenn
Someone wrote an amazing write-your-own lisp using Python tutorial, I found it
amazingly consumable and it only took a weekend:

[https://github.com/kvalle/diy-lang](https://github.com/kvalle/diy-lang)

~~~
emidln
There's also Make a Lisp:
[https://github.com/kanaka/mal/blob/master/process/guide.md](https://github.com/kanaka/mal/blob/master/process/guide.md)

It's available in a variety of languages and has a nice guide (linked) to
various steps.

------
axlprose
Beautiful Racket[0] is a good starting point that covers a lot of nice small
examples that incrementally build up your understanding of the different
components involved in making languages. And it uses Racket[1], which
incorporates the idea of crafting many small "domain-specific" languages to
solve different problems into its entire philosophy[2].

[0] [https://beautifulracket.com/](https://beautifulracket.com/)

[1] [https://racket-lang.org/](https://racket-lang.org/)

[2]
[http://felleisen.org/matthias/manifesto/](http://felleisen.org/matthias/manifesto/)

~~~
mikeurbach
I'm just finishing up the last tutorial, and I highly recommend this as well.
I made one traditional C compiler back in college with flex and bison, which I
consider an invaluable experience, but I've always wanted to tap in to the
power of Scheme for developing new languages. I had some false starts on my
own and with mediocre tutorials, but in the end Beautiful Racket was exactly
the kind of pedagogy I needed.

------
Impossible
Although its still in development,
[http://craftinginterpreters.com/](http://craftinginterpreters.com/)
(previously on HN
[https://news.ycombinator.com/item?id=13406081](https://news.ycombinator.com/item?id=13406081))
seems promising, and the existing content is good.

~~~
cgag
I'm not super far in but I'm very happy with it so far. I recommend it very
highly and am very skeptical of anyone recommending reading the dragon book
without doing something more hands on like this first.

Maybe that book is more practical than I realize though.

------
CyberFonic
I would suggest looking at Niklaus Wirth's Compiler Construction book and even
the Oberon compiler. This is one of the best ways to learn about compilers and
writing programming languages. See:
[https://www.inf.ethz.ch/personal/wirth/](https://www.inf.ethz.ch/personal/wirth/)

Another approach is to look at SICP, it is Scheme oriented and more oriented
towards macros. See:
[https://mitpress.mit.edu/sites/default/files/sicp/index.html](https://mitpress.mit.edu/sites/default/files/sicp/index.html)

SQL is a deceptively complex language, the optimisations that it performs are
very complex. Markdown, on the other hand, is more of a transpiler to HTML and
is not Turing Complete.

------
tptacek
Write a parser for a complicated file format.

Write a simple interpreter for a C-like language by parsing it into a syntax
tree, constructing a stack of symbol tables, and evaluating the nodes.

Once that works, write a version that emits code for a stack-based virtual
machine (a simple stack machine written concisely might eat just a few
screenfuls of code).

When that works, symbolically evaluate the stack bytecode into variable
references, pick a simple architecture (I like MSP430 for this but ARM will
do) do a trivial register allocator, implement register spill to the stack,
and emit real code.

Then read Lisp In Small Pieces.

Worked OK for me!

A surprising amount of the complexity you might perceive in a "real"
programming language just comes down to how expressions are parsed.

~~~
pvg
What's the purpose of the file format bit?

~~~
tptacek
Declarative thingies are easier to parse? You can skip a lot of these steps, I
guess --- or just stop at a working interpreter, which will get you as far as
MRI Ruby got before RoR took off.

~~~
pvg
Oh, I see. I thought it was something like 'writing an ad-hoc parser for
something convoluted will make you understand and appreciate systematic
approaches to parsing better' but I guess that's just me being a terrible
person.

~~~
tptacek
I pretty much just use yaccs.

------
laxd
Put that flex & bison book away. Don't think about compilers. Design your own
litte interpreted language. Write a lexer by hand (not that hard). Write a
recursive descent parser by hand. It's a bit tricky to get infix operator
parsing stuff right, but an important learning experience. Once you got the
basic parsing algorithm right you can probably add lot's of fun stuff to your
language just with general programming skills. When you feel ready to move on.
Read the flex & bison book, and explore more advanced topics knowing you don't
need to write lexers and parsers by hand again.

~~~
bootsz
+1. This is the approach I'm currently taking. I tried starting with books and
tutorials but felt I wasn't learning much just being spoon-fed code and
copy+pasting. I think it's much more illuminating to attack the problem with a
beginner's mindset and work it out for yourself, at least up to a point (to
start I'm working on implementing a small subset of Scheme). If I occasionally
get stuck I'll peek at a book just enough to get un-stuck. I'm having a blast
with it and learning a ton. Once I get tired of implementing language features
I'll probably start from scratch and follow a book the second time through.
For me it's often easier to absorb problem-solving concepts if I've attempted
the problem myself before.

------
feniv
I highly recommend Norvig's (How to Write a (Lisp) Interpreter (in Python)) as
a very short, practical introduction to implementing a language.
[http://norvig.com/lispy.html](http://norvig.com/lispy.html)

There's also a followup article implementing additional data types and tail
call optimization.
[http://norvig.com/lispy2.html](http://norvig.com/lispy2.html)

------
nostrademons
Learn the mechanics of compiler construction first - that'll inform everything
about your language, and is a prerequisite to actually building something
other people can use. I'd recommend The Dragon Book [1], SICP [2], and Modern
Compiler Implementation in ML [3] for this.

After that, it's a question of where you want to go with your language, and
what distinguishing features you'd like it to have relative to other
languages. Languages which just change some syntax and keywords but otherwise
look like C are common, but they tend not to get usage beyond their creators.
Much of the academic research lately is about type systems (TAPL has already
been mentioned) - there's research into dependent types, effect typing, total
functional programming, linear types already made it into Rust's borrow
checker, etc.

However, many of the worst problems in industry - dependencies, versioning,
smoothly migrating from a prototype to robust production code, code
navigation, security, parallelization & distribution, the memory hierarchy -
have gotten relatively little attention in academic research, and are
desperately awaiting solutions. If you have some industry experience, it may
be worth trying to imagine how you might fix these problems if you were
allowed to change the programming language at will.

[1]
[https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools)

[2]
[http://web.mit.edu/alexmv/6.037/sicp.pdf](http://web.mit.edu/alexmv/6.037/sicp.pdf)

[3] [https://www.amazon.com/Modern-Compiler-Implementation-
Andrew...](https://www.amazon.com/Modern-Compiler-Implementation-Andrew-
Appel/dp/0521607647)

~~~
earenndil
> Languages which just change some syntax and keywords but otherwise look like
> C are common, but they tend not to get usage beyond their creators

Considering c++ is one of the most widely-used languages out there, I would
question this.

~~~
nostrademons
C++ has a bunch of new semantics on top of C - classes, access modifiers,
virtual functions, constructors/destructors, operator overloading, different
casting rules, exceptions, namespaces, templates, etc. People who know both of
them well consider them very distinct languages, because the programming style
you use in a C++ program is very different from a C program.

------
citeguised
Check out "Writing An Interpreter In Go"
[https://interpreterbook.com/](https://interpreterbook.com/) It uses Golang to
make an interpreter for a simple C-Style language and it's "only" 200 pages
long. I'm about a third in and my head is smoking.

~~~
dvlsg
That looks pretty awesome. Thanks for the suggestion. Looks like it gets to a
good amount of functionality too, with first class functions and closures and
the like.

~~~
stevekemp
Very much so, and once you've completed the book there are a lot of extensions
various people have made.

For example here's a version with classes / OO-behaviour:

[https://github.com/haifenghuang/monkey](https://github.com/haifenghuang/monkey)

Or here's my version which "just" adds some more standard-library facilities,
allowing you to open/read/write files:

[https://github.com/skx/monkey](https://github.com/skx/monkey)

I found the language fun to play with, and I used a virtually identical lexer
and parser-idea in a recent project of my own - rather than "parsing" an
input-file with regular-expressions, which is how I would have previously
approached the implementation of simple DSLs. I guess that showed I'd learned
something useful!

------
rzzzt
I found Jack Crenshaw's "Let's Build a Compiler" series in a magazine, quite a
while ago:
[https://compilers.iecc.com/crenshaw/](https://compilers.iecc.com/crenshaw/)

------
arh68
There's more than a few ways to do it.

Ohm [0] is a good place to start. It's a parser generator. If you can
understand the math example [1], you can probably realize how things get built
upwards. Then you can focus on what you want your language to do, how you'll
interpret it, what problems it would express, &c.

[0] [https://ohmlang.github.io/](https://ohmlang.github.io/)

[1]
[https://github.com/harc/ohm/blob/master/examples/math/index....](https://github.com/harc/ohm/blob/master/examples/math/index.html)

Also consider reading about OMeta, if Ohm isn't your thing.

~~~
guidoism
Ohm’s web playground is an awesome way to learn how to build up a language
from scratch. I’m in the process of porting Postscript to it and it’s
amazingly easy.

~~~
carapace
FWIW Prolog DCGs (Definite Clause Grammar) are very similar to Ohm/OMeta.

------
derekp7
I'd like to offer a bit different approach. First, start off by making a
simple calculator program, that uses RPN syntax. This is dirt simple to
implement -- read a line of input, if it is a number then add it to a stack.
If it is a math operator, then call a function handling that operator
(addition, multiplication, etc) on the last two items in the stack and push
the results back on the stack.

Now add error handling (i.e., unrecognized operators, or stack underflow
condition) -- figure out what you want to do, such as crash the interpreter
with an error message, or return an error and let the user continue.

Now, read up on the Shunting Yard algorithm. This lets you take a regular
(infix) math expression, and convert it to RPN (postfix) notation, which you
can then feed directly into the RPN interpreter ("stack machine") that you
just wrote.

Now look at handling parentheses, which should be covered in the same
explanation of the Shunting Yard algorithm.

Next, add in variables in the original stack language, and figure out what
minimal changes to the second piece (the algebraic expression parser) you need
to do to get it to handle variables.

Finally, add in functions, and you have a simple language. I followed this
exact formula when putting together a toy language many years ago, after which
I learned how much I didn't know that I didn't know about language design. But
I know enough now for a second shot.

Oh, and some other resources to study that may help -- Look for the reference
books for the Postscript language. The "Blue" book teaches you Postscript, and
I believe it is the "Red" book that tells you enough of the internals for you
to write your own Postscript interpreter (or to at least get inspiration on
creating you own version of a Stack machine interpreter).

------
mballantyne
I suggest "Programming Languages: Application and Interpretation", available
free here:
[http://cs.brown.edu/courses/cs173/2012/book/](http://cs.brown.edu/courses/cs173/2012/book/)
It's an undergraduate-level programming languages class textbook.

Other books burn a lot of time on things like lexing and parsing; this one
does not. In the lisp tradition, it assumes s-expressions and gets to the
juicy bits of programming language semantics immediately: lexical scope, first
class functions, objects, etc.

~~~
bendmorris
Second this. I'm writing a compiler now
([https://github.com/kitlang/kit](https://github.com/kitlang/kit)) and this
book has been invaluable.

A few other tips:

* Use a parser generator. Parsing is more or less a solved problem; get it out of the way as fast as possible.

* Writing a language that compiles to another language is very helpful as you can often borrow the underlying language's implementation of a feature. Doubly helpful if you can leverage the underlying language's libraries for things like syscalls.

* Make a todo list (I like Trello for this) full of self-contained tasks. Pick one at a time, think through how you would implement that feature, and go to work. Compilers look like a lot of work in the beginning, but if you take one step at a time, one day you'll look back and have accomplished more than you realize. The list can also be helpful to avoid scope creep; add any new ideas to your backlog and deal with them later.

* Write unit tests, as you'll very likely want to refactor something down the line. Once your compiler becomes complex enough, add functional tests that verify entire programs can compile and run and produce the expected output.

------
alexashka
If you like books, this one is great:

[https://www.elsevier.com/books/programming-language-
pragmati...](https://www.elsevier.com/books/programming-language-
pragmatics/scott/978-0-12-410409-9)

If you like excellent free university lectures online, this is great:

[https://www.youtube.com/watch?v=3N__tvmZrzc](https://www.youtube.com/watch?v=3N__tvmZrzc)

In reality - programming languages are most likely what you think they are, if
you simply start with the basics like what's a variable, what's a function.
How do I call functions inside other functions. Do I want variables to have
types. What kind of types?

If you start with the very very basics - global variables, global functions,
you can't call functions within other functions - and implement that, you'll
get the feel for it, which is most likely as far as you'll want to go, because
it gets very tedious, very quickly, as soon as you want something as 'simple'
as proper unicode support, garbage collection, generics, inheritance, foreign-
function-interface and the list goes on and on.

To get your feet wet regarding what it's all about - those video lectures
above are top notch.

------
coryrc
The book "Lisp In Small Pieces" takes you threw the journey of creating a few
compilers of increasing difficulty, from a simple language's interpreter using
a complex language to a complex language's compiler using a simple language.

By using Lisp-y language you can ignore the tedious and uninteresting task of
parsing for the real meat of compiling and (some) optimizing.

~~~
affinespaces
Love lisp in small pieces! Beautiful book. Also enjoyed Essentials of
Programming Languages by Friedman and Wand -- very SICPish into to programming
language implementation in that it weaves together this narrative with uniting
underlying themes rather than bombarding the reader with seemingly disparate
concepts and leaving it an exercise to unite them as many "classic" texts in
this discipline do

------
kovrik
What worked for me: read Norvig's tutorials (How to write lisp). It gives
basic understanding.

Then continue enhancing that tiny language, add more features.

Start with an interpreter, they are easier to grasp. You can always add a
compiler later.

I grabbed Scheme R5RS standard and started implementing it in Kotlin:
[https://github.com/kovrik/scheme-in-kotlin](https://github.com/kovrik/scheme-
in-kotlin)

Then added more Clojure-like and Racket-like features, plus some Kotlin
features.

------
dfox
For me the starting poin was Structure and Implementation of Computer
Programs, particularly the 80's lecture recordings.

Also I would recommend staying away from lexer and parser generators and just
use hand built recursive descent parser and some set of ad-hoc lexical
analysis subroutines.

------
kris-s
Chris Lattner, the creator of Swift has a nice walkthrough for making a small
language called Kaleidoscope
[http://web.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/docs/...](http://web.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/docs/tutorial/LangImpl1.html)

~~~
ChickeNES
This is a better link:
[https://llvm.org/docs/tutorial/](https://llvm.org/docs/tutorial/) That page
is using a 10+ year old version of LLVM

~~~
alecbenzer
I loved going through that, definitely second this.

------
merlincorey
Types and Programming Languages (TAPL) is required reading, in my opinion:
[https://www.cis.upenn.edu/~bcpierce/tapl/](https://www.cis.upenn.edu/~bcpierce/tapl/)

~~~
alexashka
Absolutely not for a markdown parser, which is what the author is asking about
to get started

~~~
merlincorey
Parsing is but 1/3 of compiling.

Markup languages are not generally thought of as programming languages, but
maybe that's an archaic view.

At any rate, 90% of the material in this thread is way overkill if all one
wants to do is parse Markdown.

------
e19293001
I bet this is what you are looking for: [https://www.amazon.com/Compiler-
Construction-Using-Java-Java...](https://www.amazon.com/Compiler-Construction-
Using-Java-JavaCC/dp/0470949597)

This book taught me how to write a compiler and build a programming language
of my own.

To answer you question "where and how did you start?": This is where I started
learning about compilers and programming language implementation.

Here is its description from its website:

* Comprehensive treatment of compiler construction.

* JavaCC and Yacc coverage optional.

* Entire book is Java oriented.

* Powerful software package available to students that tests and evaluates their compilers.

* Fully defines many projects so students can learn how to put the theory into practice.

* Includes supplements on theory so that the book can be used in a course that combines compiler construction with formal languages, automata theory, and computability theory.

What I promise you with this book: You'll learn how to write your own
programing language. Not only how compilers and about the language but also
about computer architecture. This book is very easy to read and understand. It
starts with very basic topics then slowly building from it until you'll grok
implementing the language given the specification. You'll have the chance to
build a compiler from scratch or by using JavaCC and Yacc.

------
nakkaya
I maintain a programming language called Ferret [1]. You can read through the
manual, it is a literate program, covers everything you need to implement your
own lisp. Compiler, optimizations, runtime, garbage collection etc. [2]
overlaps with [1] covers how to compile a restricted subset of Clojure to C++.
If you are interested in interpreters [3] has an implementation of the "A
Micro-Manual for Lisp - not the whole Truth" in C.

[1] [https://ferret-lang.org/](https://ferret-lang.org/) [2]
[https://nakkaya.com/2011/06/29/ferret-an-experimental-
clojur...](https://nakkaya.com/2011/06/29/ferret-an-experimental-clojure-
compiler/) [3] [https://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-
imple...](https://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-
in-c/)

------
hyperpallium
> a "simple" language like SQL, Markdown

If you're happy to consider Markdown a language (which it is, just not a
_programming_ language), you could also consider a ( _subset_ of) XML parser,
or JSON parser. It's a nice place to start, because it's very small scope, a
subset of PL (parsing), and the completed thing is actually usable and useful.

Plus, it gets you a step on the path to a programming language - arguably, the
fundamental step.

I recommend a recursive descent parser, and this dragon compiler book
([https://en.m.wikipedia.org/wiki/Principles_of_Compiler_Desig...](https://en.m.wikipedia.org/wiki/Principles_of_Compiler_Design)
Your local uni library has it.). An early chapter gives a complete calculator
(arithmetic parser), in very simple and clear C code.

NB the most recent dragon book uses horrific over-complex architecture-
astronaut, design-patterns OO java code. OOP has its place, but not like this.
Not like this.

------
oldandtired
There is much good advice given by others here regarding the implementation of
languages. However, if you really want to get into language design, there is
one area that you should/need to understand (as far as I am concerned) and
that is how to specify the meaning of your language and its constructs. If
this is not done, you end up with too many gotchas and special cases in how
you will do things.

I have found too many language designers have not thought about this clearly
enough. When you go to actually use the language, you will suddenly find all
sorts of oddities in how you construct the program you are trying to write.
Undefined behaviour is a curse, implementation defined behaviour is a curse.

If your language allows you to write something and it passes the compiler,
then make sure that it is completely defined so that no matter what there are
no subtle surprises.

~~~
chrisseaton
> that is how to specify the meaning of your language and its constructs

You pose the question but don't seem to answer it - this area is called
_semantics_ , and there are several approaches and notations you can use.
Types and Programming Languages by Pierce is the most common starting point.

~~~
oldandtired
I am not posing a question. I am suggesting he look into the subject. The word
"semantics" doesn't hold much meaning for many people, they understand the
word "meaning" better. I could have dived straight in and suggested that he
look at "denotational semantics", but I considered that just a little too much
to start him off with.

By the way, how much are you doing on Katadhin these day?

~~~
chrisseaton
I haven't touched Katahdin in the last decade!

Some of the ideas live on in Truffle and Graal.

[http://chrisseaton.com/rubytruffle/pldi17-truffle/pldi17-tru...](http://chrisseaton.com/rubytruffle/pldi17-truffle/pldi17-truffle.pdf)

------
marianoguerra
Write lexers and parsers for data formats, JSON and EDN are two good starting
points.

write a forth or lisp lexer, parser and interpreter.

then write a small language that compiles to a target, like js, make sure your
language is close in semantics to the target language to avoid headaches.

------
nils-m-holm
You might have a look at "Write Your Own Compiler", which builds a full
compiler (emitting ELF) for a block-structured language in less than 1600
lines of code. It is a very short introduction (slight more than 100 pages)
and yet covers many aspects of compiler writing. The compiler source code is
in the public domain.

Yes, I'm the author! There are other, more in-depth, compiler books on my
homepage: [http://t3x.org](http://t3x.org)

The home page also contains the source code to many compilers and interpreters
at different levels of complexity, from trivial to real-world.

------
hyperpape
There's a ton of great responses in this thread. Allow me to go a bit meta:
this is a big topic, and as you enter it, you'll eventually find out which
parts of it appeal to you. Do you care about parsers? Type theory? Garbage
collection? Code generation/optimization? Notation as a tool for thought?

Ever since I started programming, the idea of writing my own language was a
sort of "white whale". I have one idea for a language that I think could be
usable in a very particular domain. For that, I wrote a very primitive tree-
walking interpreter (terrible performance, very technically uninteresting).
The idea for the language was what I wanted to explore. I recently started
writing a regex->jvm bytecode compiler. That's much lower level, and if I
stick with it, performance will be what I optimize for. In neither case did I
do anything interesting with parsers, though I used to love learning about new
parsing techniques.

In that vein, see dfox's suggestion to "just use hand built recursive descent
parser and some set of ad-hoc lexical analysis subroutines". That's the
fastest way to get started, whether or not it'll be the best in the long term.

So, my advice is to explore and figure out which things you're interested in,
and what you want to accomplish. That will tell you where you want to invest
your time.

------
kartickv
You're right that writing a complete language like Java or Python is not a
small task, but that's because these languages have redundant ways of doing
the same thing. If you simplify them be eliminating redundancy while retaining
power, you'll have a much simpler implementation task. Here's a language I
designed for this: [https://kartick-log.blogspot.com/2016/04/why-arent-there-
sim...](https://kartick-log.blogspot.com/2016/04/why-arent-there-simple-
programming.html) If you want to learn language implementation rather than
design, you can implement this language as is.

Another option is to build a VM, that accepts as input byte in a text (not
binary) form. Less efficient, but easier to implement, and you can write your
bytecode by hand instead of also building a compiler to do so. Like object-
oriented assembly language, if you will. If you're interested, I'll scan and
share my dissertation report with you.

You can also implement a domain-specific language. Did you, in the past, feel
that a task you were doing would be simpler if there was a custom language for
it? If so, maybe you can try building that DSL and then using that to make
your main task easier.

------
corysama
Speaking from daydreams and not experience, I’ve always thought the
combination of [http://leafo.net/guides/parsing-expression-
grammars.html](http://leafo.net/guides/parsing-expression-grammars.html) and
[http://terralang.org](http://terralang.org) would be a great toolset on which
to build a language.

------
stiff
Implementing an evaluator for infix parenthesized arithmetic expressions, like
"5*(1+2)+60", is a good starting point.

You can begin with writing a tiny lexer and a recursive descent parser that
simply writes out the value of an expression that the user typed in. Then you
can make it generate assembly for computing the expression instead (hardcoding
some further assembly to print the result out at the end of the execution of
the program), turning it into either a traditional compiler or a JIT compiler.
Once you do that, it should not be too hard to see how to add variables and
simple loops. This will already demystify things quite a bit and make further
learning easier.

I specifically recommend implementing your own parser and generating assembly
on your own, at least if your primary goal is learning. You could instead use
an existing parser generator and e.g. a library for generating JVM bytecode,
but those ready-made components are high level enough to turn it all into the
usual exercise in plumbing libraries together, rather than a learning
experience across many layers of the computing stack.

------
truth_be_told
IMO, The best example of designing and implementing a "small" language can be
seen in Chapter 8 of "The Unix Programming Environment by Kernighan & Pike"
where they walk you through the construction of "hoc" (higher order
calculator) using the standard compiler construction tools. The example is
realistic, not too small nor too big and every step is explained. The
exposition is very clear and you really can "get" how languages are designed
and implemented. This should be the first thing you should read :-)

If you have a specific problem, you can then write a small DSL for it. I once
had the opportunity to write a small "test language" for the sub-protocols of
H.323. These protocols are specified using SDL
([https://en.wikipedia.org/wiki/Specification_and_Description_...](https://en.wikipedia.org/wiki/Specification_and_Description_Language))
and i just converted the tokens in the spec directly into my language and put
a simple markdown syntax around it. The tests simply exercise the protocol
FSM.

Eg:

    
    
       <H245>
          <Test1>
             <Prestimuli>
                Var1=Val1         //these are assignments
                Var2=Val2
             </Prestimuli>
             <Stimuli>
                MsgFromNetwork=<msg name>  //trigger to FSM
             </Stimuli>
             <Poststimuli>
                 Var1=Val1.1       //these are assertions
                 Var2=Val2.1
             </Poststimuli>
          </Test1>
       </H245>
    

Though simple, this worked nicely, allowing me to push off the actual writing
of the test cases to non-programmers who just had to convert the SDL diagrams
from the spec. to text as given above.

------
iafisher
Build Your Own Lisp
([http://buildyourownlisp.com/](http://buildyourownlisp.com/)) is as good a
starting point as any. You learn to implement a high-level dynamic programming
language in C, which is a common choice of implementation language for
interpreters (e.g., CPython, Ruby). I really can't recommend the book enough.

------
annywhey
In phases, this is roughly how I learned:

1\. Cute hacks like using the eval function in Python and rewriting the string
before passing it in. This is basically metaprogramming by macro-processing
and a good stepping stone to thinking about what a programming language really
accomplishes.

2\. Writing various Forth-like and Lisp-like languages. This let me explore a
working system without spending too much time on parsing - I could try
interpreting the AST directly or compiling it to an intermediate language.

3\. Gradually learned that a lot of the features and technologies in general
purpose programming languages aren't things I want to care about(how to
implement arithmetic, perform variable assignment, perform typechecking etc.).
Focused on data languages like JSON and small supplemental languages for e.g.
scheduling concurrency.

4\. Found a useful model for prototyping semantics before adding a syntax, by
envisioning the whole language in the form of a state machine that builds
expressions one API call at a time, then compiles/executes the result with a
"commit" call or similar. This is a practical technique for many domain
specific problems, and makes the necessary syntax emerge naturally.

A lot of what goes into making a useful production language isn't really on
the theoretical side, but on finish-and-polish stuff: fast builds, helpful
docs and error messages, well-rounded libraries, debugging tools, ease of
deployment and so on. Learning this through the process of writing small PLs
helped change how I viewed language selection to be more like picking any
other application, vs a point of religion: choosing something with
approximately the right features and tooling so that the groundwork is easy,
and then magnifying that leverage by adding a bit of custom syntax, static
checking, code generation, what-have-you on top. Turning the tech into an
airtight abstraction generalized across a whole language is a ton of work, but
getting 80% of it in for the specific app takes a fraction of the time in
comparison.

------
parthdesai
I'm going through this book right now, i find it to be pretty good!
[https://www.amazon.ca/Writing-Interpreter-Go-Thorsten-
Ball-e...](https://www.amazon.ca/Writing-Interpreter-Go-Thorsten-Ball-
ebook/dp/B01N2T1VD2/)

------
ewright
I have ADHD that makes it nearly impossible for me to learn how to code, and
thus I've never been able to perfect it. I know bits and pieces here and
there, but I can tell you what usually works best for me for learning complex
tasks.

-1. You need to actively set aside time each day for this task. -2. Leave time for this to last a long time, two months, minimum. -3. Plan every step out. -4. Repeat, repeat, REPEAT!! As in, repeat your training tasks. People have to repeat something seven times to burn it into permanent memory, and you need to remind yourself of it at least once a month for a year, then once every six months after this.

If you do those four things, which are surely no easy task, you should be an
excellent programmer in no time. Good luck :)

------
nybblesio
I'm writing a programming language from scratch called Basecode. It's very
early days yet. The bootstrap compiler (written in C++) is about three months
away from being complete. Then I'll rewrite the compiler in Basecode itself to
make it self-hosted. After the language is self-hosted, there's a second level
of features I'll be adding that I think are quite exciting.

I've captured the entire process on video:
[https://www.youtube.com/watch?v=0_PX2Ue6F_A&list=PLWMUVtnFsZ...](https://www.youtube.com/watch?v=0_PX2Ue6F_A&list=PLWMUVtnFsZu6YAU-
Fwva_Raz6-IbPd-2U)

The YouTube playlist is an archive of my daily streams on Twitch.

------
ryanmccullagh
I spent 3 years writing como-lang
[https://github.com/analang/ana](https://github.com/analang/ana) I didn’t have
a compilers course at my college. I think the first thing you should
understand is how a language works. The first part is the syntax and grammar.
It would also be very useful to know how assembly works, as than you’re
probably keen on stack machines, which are very simple, yet powerful machines
capable of representing expressions in a linear way. Feel free to email me if
you have any questions! I’d love to help you get on the right track.

When I was learning, my only resources were books, google, and PHP-src and
CPython.

------
ecesena
I've been working for a while on a sort of language for writing highly
optimized cryptographic code, mostly elliptic curve cryptography where there's
a very elegant mathematical description, but when you try to optimize every
single detail manually it's easy to get very dirty code.

The idea was then to have a language that maintains the elegance and clarity,
and a compiler that generates in my case C code. This let me focus on the
"core" of a compiler, without too much emphasis on parsing, tons of
optimizations, corner cases.

So this would be my advice. Pick a domain, maybe focus on getting highly
optimized code, design a small language and build your first compiler.

------
doombolt
This awesome essay by Steve Yegge: [http://steve-
yegge.blogspot.com/2007/06/rich-programmer-food...](http://steve-
yegge.blogspot.com/2007/06/rich-programmer-food.html)

------
chooseaname
Algorithms + Data Structures = Programs by Wirth

Specifically Chapter 5 - Language Structures and Compilers

------
codebudo
Write a DSL. This is really easy in languages like Ruby that don't need
parentheses around function parameters. You can write a Ruby DSL in a few
hours and expand on syntax, features as you like.

The next step is to actually parse code with an abstract syntax tree. The easy
path here is actually javascript. There are many transpilers for JS so things
like bluebird/harmony can work.

Next step is to do thing "the right way"(tm) with yak/bison and build a
compilable language. For that, I suggest you read a few books on languages and
understand concepts such as backus-naur form before continuing.

------
picometer
If at any point you find yourself looking for parser generators, Nearley JS is
a fun tool for experimenting with lightweight grammars. (Emphasis on
lightweight - it's a javascript library.) It has good docs as well. If you're
interested in the Earley algorithm, you can run it in a nodejs shell and look
under the hood at the data structure, step-by-step as it scans the string.

[https://nearley.js.org/](https://nearley.js.org/)

------
e12e
Lots of great advice here. In addition to "use prolog" and "play with
ometa/ohm" \- you might want to consider looking at a ML - like ocaml (used
for bootstrapping rust) or standard ml, sml.

Another approach is to look at rpy - the restricted python subset used for
pypy, see for example "other languages" on:

[https://www.pypy.org/features.html](https://www.pypy.org/features.html)

------
mark_l_watson
I am working on my first programming language. I decided to use Racket and I
am using the book Beautiful Racket and miscellaneous blog articles to guide
me.

I started out just working on a set of linked/cooperating libraries and then
decided to offer both the libraries and a custom language. I am writing
something for my own use but when I release the MVL, I obviously also hope to
get users and pull requests.

------
svilen_dobrev
languages.. are everywhere. You keep making them even if you do not notice.
Say, a function with 3 arguments is a small language.. just missing all the
fuss, checks and generality. A repeating data structure that feeds some
switch-decision tree is another language.. implicit one, essentialy the code
is an interpreter.

But it's much better to make languages explicit. And there's no need to invent
new grammars and parsers.. keep those for real extreme cases.

i express my languages in python, its easiest to use as base and build
semantics on top (and even syntax). One can just use function/class-based
declarations, of domain-specific notions, and then interpret them any way one
wants. Or one can even turn function-calls into "statements", and
"with"-construct into "blocks-of-statements", etc. Gives you strict debuggable
proto-syntax, on which to build your own "syntax"/semantics/error-
checking/reporting.

but any way, try to see/feel the Language behind the syntax/implementation.
have fun

------
chaoticmass
Probably the greatest amount of growth I've had in my lifelong programming
journey was during the years when I was working on making my own scripting
language. I think the skills I learned have shaped how I approach other
programming project. Think debugging is hard? Try debugging why your script
interpreter isn't running a given script correctly.

------
barrkel
Jack Crenshaw, all the way. Parser for sure: go recursive descent. Don't mind
386 assembler - generate stack machine code.

------
knicholes
For my capstone class for my CS BS, we had to write a compiler that compiled
to machine code that the virtual machines we wrote could run. I can't remember
the name of the book, but we called it "The Dragon Book." Looking in the other
comments, it looks like someone already linked it.

------
axilmar
There is a nice tool called "The Gold Parser Builder" which is free and can
help you design a programming language.

[http://www.goldparser.org/](http://www.goldparser.org/)

------
mr_toad
The simplicity of a language doesn’t mean it will be easy to write a parser
for. The grammar of SQL is actually quite lengthy. As others have suggested a
‘harder’ language like Lisp might be a better place to start.

------
icemaze
I found this amazing site (craftinginterpreters.com), and I made a (barebone)
language in a little more than a week. I used Instaparse instead of a custom
scanner and parser to save time.

------
quickthrower2
I think there is value in just writing interpreter figuring out as you go
along. It won't be optimal but it will bring to your mind the questions you
want to ask to make it better.

------
kevinSuttle
Some good answers here:
[https://news.ycombinator.com/item?id=15171238](https://news.ycombinator.com/item?id=15171238)

------
charlysl
If you haven't yet, you may enjoy reading "Concepts, Techniques, Models of
Computer Programming" [1]. The reason is that it is chock-full of interesting
ideas:

\- kernel language, which is not a practical language, but contains all the
concepts for building one of a given model

\- growing said kernel language one concept at a time, when its expressiveness
is stretched

\- the case for multi paradigm languages, with a real example throughout the
book (Oz)

\- and I mean multiparadigm (although the author dislikes this word): first
declarative, then with concurrency, then with ports, then with explicit state,
then object oriented with and without concurrency (which in fact isn't really
an extension of the kernel language), then relational ... all being simple but
very careful extensions to that kernel language

\- the case for each concept, which results in an extension to the language,
is very carefully explained and many alternatives are considered, and much
justification is given to why it would improve the language; if you are
considering designing your own language, you could do worse than learning from
the author's carefully described approach

\- each concept results in an addition to the grammar, the semantics of which
are described in detail; they are also described formally in an appendix

\- whenever a new programming model is described (which is basically a set of
concepts), an archetypal example is discussed: Haskell (concurrent
declarative), Erlang (message passing), Java (concurrent OO), Prolog
(relational)

\- techniques for programming with each concept are introduced

\- each chapter has exercises at the end, several of which consist of doing
your own extensions to the language; some of these are research level

\- as an example of how deep but accessible this book is, check the "Principal
programming paradigms" poster [pdf] [2]

\- the extensive bibliography at the end is fantastic

And since I believe that it hasn't been mentioned in the comments and in case
you don't know yet, check out "Lambda the Ultimate", " _the programming
languages weblog_ " [3]

[1]
[https://www.info.ucl.ac.be/~pvr/book.html](https://www.info.ucl.ac.be/~pvr/book.html)

[2]
[https://www.info.ucl.ac.be/~pvr/paradigmsDIAGRAMeng108.pdf](https://www.info.ucl.ac.be/~pvr/paradigmsDIAGRAMeng108.pdf)

[3] [http://lambda-the-ultimate.org/](http://lambda-the-ultimate.org/)

------
unilynx
Decide first whether your language will be case-sensitive or case-insensitive.
Then regret your choice forever.

------
carapace
This is going to sound "out of left field" perhaps, but _use Prolog_.

I've been exploring exactly this, compiling with Prolog, this past month or so
and I'm simply blown away. It's so simple, elegant, and easy. It will be
faster and easier for you to learn the basics of Prolog and then learn how to
write compilers using it, than to learn to write compilers in another language
(even a language you already know.) Prolog is an extremely simple language
despite its incredible power. It's very different from imperative/functional
languages so it might take a minute to wrap your head around it, but then
you're operating on a higher plane.

Read "Logic Programming and Compiler Writing " by David Warren
[http://sovietov.com/tmp/warren1980.pdf](http://sovietov.com/tmp/warren1980.pdf)
But beware: it's dated. Or, if you can get a copy of "The Art of Prolog" you
can read Chapter 24, which covers the same concepts but has more modern code.
Specifically, you're going to want to use Definite Clause Grammars (DCG):

[https://en.wikipedia.org/wiki/Definite_clause_grammar](https://en.wikipedia.org/wiki/Definite_clause_grammar)
[http://www.pathwayslms.com/swipltuts/dcg/index.html](http://www.pathwayslms.com/swipltuts/dcg/index.html)
[https://www.metalevel.at/prolog/dcg](https://www.metalevel.at/prolog/dcg)
[http://www.amzi.com/manuals/amzi/pro/ref_dcg.htm](http://www.amzi.com/manuals/amzi/pro/ref_dcg.htm)

Study "Toy compiler in Prolog"
[http://faculty.cooper.edu/smyth/cs225/ch7/prolog.htm](http://faculty.cooper.edu/smyth/cs225/ch7/prolog.htm)
And maybe this:
[https://github.com/fusiongyro/dailyprogrammer/tree/master/20...](https://github.com/fusiongyro/dailyprogrammer/tree/master/2013/05/22-halt-
simulation)

I can vouch for this. I wrote a parser for Oberon in two days. I wrote an
assembler for the RISC CPU used in Project Oberon in another two days. Both of
them are no longer than two pages and the code is essentially just grammar
rules. I haven't (yet) written the bit in-between that translates the AST from
the parser into assembly code for the chip, but only because I have other fish
to fry. But I can see how to do it and it would be very easy. The compiler is
described in the Project Oberon book if you want to attempt that language.

I'm implementing a compiler for Joy on top of the assembler and it's been like
a dream. Because the code is so simple, and the editor I'm using has an
extension that uses the Prolog interpreter to lint as I go, it's actually
_hard to introduce bugs._ I write the asm code then abstract patterns into the
compiler to refactor and (lightly) abstract the raw asm into what are
effectively AST nodes. But there is no language above this. I think I have to
write a blog post called "Compiler for No Language" or something. Eventually,
I am going to abstract my way to "atomic" nodes that represent Joy functions
and then I'll have a Joy compiler. But in the meantime, I'm floating around in
a semantic sea between parsing and assembly, and it's warm and tropical and
you would not believe the fish I've seen.

I have to conclude that anyone who is writing any sort of compiler and NOT
using Prolog is working way too hard.

I've been programming in Python for twenty years, happy as a clam, and now you
couldn't pay me enough to NOT use Prolog.

"These are the words of VAL. They are true. Hear them."

~~~
carapace
In fact, it would actually be easier to _implement Prolog_ then implement a
compiler in it, than to implement a compiler directly in e.g. C, Java, Lisp,
etc.

------
fusiongyro
To understand how computers work is the reason that compilers is the capstone
class in many CS programs. It unites automata with algorithms and machine
code, bringing everything together in one program. I'm pretty sure I wouldn't
have gotten through this exercise myself outside college.

A lot of the resources out there are going to assume you have one or two sides
of this triangle already and are ready to bring it all together. If you don't
then you'll probably feel like you're in over your head.

I would recommend checking out "Let's Build a Compiler". It's a bit old-
fashioned but at least things are pretty simple.
[https://compilers.iecc.com/crenshaw/](https://compilers.iecc.com/crenshaw/)

Different flavors of Markdown are not a bad place to start. Try and find
something with a real syntax. As I recall, the original/official version is
defined by a Perl program and has no true grammar. But other variants probably
do.

SQL is not simple. Just the parser for PostgreSQL's dialect was around 1500
lines of code, last I checked. Although, if you did a small subset, maybe it
wouldn't be so bad.

I really like the book _Semantics with Applications_ which discusses the
meaning of different programming languages. When I was starting out, it seemed
to me like there was a lot riding on the syntactic differences between
languages like C and Pascal. Now having seen the fringe, a lot of those
differences are not as significant to me as they were before. This book helps
you drill into what makes some languages different below the surface, and how
to model those differences formally.

I think it's eminently possible to make a small language, like "While", in a
few pages of code. I would start with a very simple unambiguous grammar, and a
little interpreter for it, but if you really want the full experience,
generate assembler. This paper sort of presents the language I'm talking
about, but there is a lot here about type theory and semantics that may not be
interesting to you, so just ignore that.
[https://arxiv.org/pdf/1603.08949.pdf](https://arxiv.org/pdf/1603.08949.pdf)
When I took compilers ~15 years ago, we generated code for nasm because gas
syntax was too weird for us. Anyway.

Another approach that looks interesting to me is "many small steps":
[http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf](http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf)
The idea here is to start with a program that only knows about integers, but
handle everything end-to-end. Then you slowly add features one-by-one, but in
an end-to-end fashion. I think this would be a lot easier to iterate and it
hearkens back to Crenshaw's approach.

A lot of compiler construction material focuses on Lisp and Scheme. This is
because it sort of hides or ignores the parsing problem. You will probably
have to learn some other languages to build a compiler; it wouldn't hurt to
know some Lisp or Scheme to be able to access these resources. A lot of people
build a Lisp-like language as their first for this reason too. Just something
I'm pointing out.

Anyway, it's a noble goal and best of luck!

