
Ask HN: How to learn to write a compiler and interpreter? - lala_lala
Greetings,<p>I am in my late 30s without a formal CS education. I write software for a living and have this itch to learn to write a compiler &#x2F; interpreter.<p>DO I have to take any courses (e.g. Discrete Maths, Automata Theory) in order to do it?<p>Do I have to take any courses? If so, do you have recommendations for those as online courses?<p>I really want to do this in 2019 but the thought that I have to study Maths, then Automata, maybe OS, then Compilers sounds like a long path.<p>Is there who has learned to write compiler &#x2F; interpreter without CS background? How did you do it?
======
stephen82
I would highly recommend you start with this book first
[https://interpreterbook.com](https://interpreterbook.com) and then feel free
to read Nystrom's online book as ablerman suggested.

The you can read more formal CS book for the sake of covering the necessary
theory.

Another suggestion would be to check the source code of various SQL databases,
such as SQLite3 and PostgreSQL.

Also, I would suggest to read lots of lots of source code from github.

Here is a short list of suggestions:

    
    
      * https://github.com/lemon-lang/lemon
      * https://github.com/d5/tengo
      * https://github.com/wren-lang/wren (by Robert Nystrom)
      * https://github.com/tj/luna (by TJ Holowaychuk, creator of Express.js)

------
WaltPurvis
I cannot recommend Thorsten Ball's books highly enough. They're really
fantastic, step-by-step tutorials, and you certainly don't need a CS degree to
understand what he's doing.

 __Writing An Interpreter In Go
__[https://amzn.to/2FYwwiQ](https://amzn.to/2FYwwiQ)

 __Writing A Compiler In Go
__[https://amzn.to/2sLilph](https://amzn.to/2sLilph)

------
ablerman
Robert Nystrom has been putting an online book together. I’ve found it really
clear so far.

[http://craftinginterpreters.com/](http://craftinginterpreters.com/)

~~~
lala_lala
Does it require any background in CS?

~~~
snazz
You can probably pick the CS and math up as you go. I haven’t gone through
that resource but it looks interesting and useful.

------
shoo
Graydon Hoare's "One-Day Compilers" presentation:
[http://www.venge.net/graydon/talks/mkc/html/mgp00001.html](http://www.venge.net/graydon/talks/mkc/html/mgp00001.html)

    
    
      > goal: translate subset of makefile syntax into native executables
    
      result: pipeline to compile subset of makefile syntax into native executables, in less than 400 lines of code
    
      > including
      > * lexing and parsing
      > * variable binding
      > * type checking
      > * error reporting
      > * native code generation
    
      - use custom camlp4 pre processor
      - define Ocaml types and functions to model things found in input makefile (variable, rules, dependencies). leverage Ocaml semantics 
      - parse input simple makefile language
      - map parsed input into AST for Ocaml functions defined earlier
      - use ocaml's quotation machinery to define ocaml AST values from standard ocaml syntax, without building AST values by hand 
      - bundle everything into an ocaml program
      - ocaml program produces single ocaml value that models the input program described by makefile
    
      - translate ocaml value into C code:
      - generate C code from templated quotations, again using ocaml quotation support, but for C language, not ocaml
      - compile generated C code with gcc
    
      - glue everything together into one pipeline

------
ksherlock
If you're familiar with recursion and basic data structures, that's probably a
sufficient background. Toy compilers aren't that difficult when you break them
down into parts. (Real world languages end up with warts and special cases).

You have a text file. You break it up into tokens (numbers, strings, keywords,
operators, etc) . You assemble the tokens into a structured tree (statements,
expressions, etc). Then you walk the tree and generate code (for a compiler)
or execute it (for an interpreter). That's it. Really.

Every other week, somebody posts a tutorial to HN about writing a lisp
interpreter in javascript. or go. or python. or rust. or lisp. If you can't
wait, Let's Build a Compiler by Jack Crenshaw is an oldie but a goodie.

[https://compilers.iecc.com/crenshaw/](https://compilers.iecc.com/crenshaw/)

~~~
shoo
i also heartily recommend Crenshaw's "let's build a compiler" if you want some
guidance and motivation to help you just roll up your sleeves and start
building a bare-bones but functional compiler.

> Maths, then Automata, maybe OS,

i don't think any of these are a requirement for building a simple compiler.

after reading Crenshaw some years ago, i was motivated to just knuckle down
and start building a compiler for brainfuck (yes, a toy compiler, entirely
pointless). this was probably one of the most enjoyable periods of programming
i can remember.

i think my descent into fun with compilers went something like:

* "let's implement a compiler for a simple language (brainfuck) to something low level (assembler) in a high-level language i am familiar with (python)"

* ok, that was easy, now what?

* "self-hosting compilers are cool. can i write a brainfuck compiler in brainfuck?"

* "this is horrifying"

* "OK, what if i defined a simple language that is much easier to do useful work, but could be readily compiled to brainfuck? i could build a compiler from that new language to brainfuck"

If you have a bit of spare time i heartily recommend (i) learning to implement
a basic compiler, and (ii) building the compiler in or for a language that has
a radically different concept to languages you are more familiar with.

[https://github.com/fcostin/abfc](https://github.com/fcostin/abfc)

------
faical
I designed a programming language and wrote an interpreter for it [1] for fun
without a formal CS education. I started a series of articles [2] to teach
people everything they need to do the same. Give it a try :).

[1] [https://github.com/ftchirou/blink](https://github.com/ftchirou/blink)

[2] [https://hackernoon.com/lets-build-a-programming-
language-261...](https://hackernoon.com/lets-build-a-programming-
language-2612349105c6)

(P.S. Feel free to reach out to me [github_username] at gmail.com)

------
tudelo
You may be interested in this
[https://beautifulracket.com/stacker/intro.html](https://beautifulracket.com/stacker/intro.html).
At the end you have a small interpreter that can handle math. There are
further tutorials on
[https://beautifulracket.com](https://beautifulracket.com) to expand it. This
is probably the most approachable option.

------
PhilWright
You can get started on the compiler/interpreter specific learning as long as
you already know a programming language and basic data structures. You do not
need any math skills.

1) Lexical Analysis (Tokeniser)

The first step of your compiler takes in free form text and turns it into a
set of tokens that make the rest of the process much easier. To understand
this you should learn about Regular Expressions and Finite State Machines.

2) Syntax Analysis (Parsing)

Processing the above tokens and ensuring the input has the valid syntax of a
program is your next step. You need to learn about Context Free Grammers,
Backus Naur Form and Abstract Syntax Trees.

3) Semantic Analysis

Just because the input has valid syntax does not mean it makes actual sense.
Here you need to learn about Symbol Tables and Type Checking. The specifics
are very dependant on the actual language you are dealing with.

4) Code Generation

A compiler will be generating code but an interpreter will perform actual
execution of the Abstract Syntax Tree operations. Again, the specifics are
very dependant on the actual language you are dealing with and the target.
Start by looking into Code Generation, Code Optimizations and Assembly Code.

I recommend an online, free course, like the following from Stanford that is
self-paced and results in a compiler that generates assembly level
instructions for test running in an emulator.

[https://lagunita.stanford.edu/courses/Engineering/Compilers/...](https://lagunita.stanford.edu/courses/Engineering/Compilers/Fall2014/about)

------
ryanmccullagh
@stephen82 has great references. In particular wren, and luna are references I
used when I created my language como-lang-ng:
[https://github.com/rmccullagh/como-lang-
ng](https://github.com/rmccullagh/como-lang-ng) You do not need Dicrete Math,
or Automata Theory to learn how to write a compiler and interpreter.

I would recommened first, figuring out what you want your language to look
like (the syntax). Then, you can define a grammar for your language, and
research parsers. From parsing, you will be leade to Abstract Syntax Trees
(the most modern way to get from text file to bytecode). After that you'll
need to compile the abstract syntax tree (which you would have built from the
parser). Compiling the AST is typically done targetting your own simplified
instruction set.

For an example instruction set, check out Python's header file at
[https://github.com/python/cpython/blob/master/Include/opcode...](https://github.com/python/cpython/blob/master/Include/opcode.h)

------
kazinator
Some books on compilers go into the automata theory for the following reason:
they teach about the workings of _tools that generate_ lexical analyzers and
parsers. A compiler writer is usually just a consumer of those tools. The
skills that come into play is being good in program design: structuring code
and data.

How much CS comes into play depends on how advanced you make the compiler and
what design choices you make. E.g. if you make a stack-based VM, or one with
unlimited registers, instead of targeting a real machine with a fixed number
of registers, you don't have to study the algorithms for register allocation.

Some languages avoid the complexities of parsing, such as those in the Forth
family, and to a large extent Lisp family. To make a language, you don't even
need to be a consumer of scanner and parser generating tools, let alone
understand how they work.

------
e19293001
I think this book suits your situation and I've been into that!

Compiler Construction Using Java, JavaCC, and Yacc, IEEE/Wiley, 2012

Get this book. Please. __ _I promise_ __. You'll be able to grok compilers
after reading and answering the exercises from this book.

We can talk further by sending me an email (see my profile).

Here's my previous comment somewhere in HN:

I bet this is what you are looking for: [https://www.amazon.com/Compiler-
Construction-Using-Java-Java...](https://www.amazon.com/Compiler-Construction-
Using-Java-Java..).

This book taught me how to write a compiler and build a programming language
of my own.

To answer you question "where and how did you start?": This is where I started
learning about compilers and programming language implementation.

Here is its description from its website:

* Comprehensive treatment of compiler construction.

* JavaCC and Yacc coverage optional.

* Entire book is Java oriented.

* Powerful software package available to students that tests and evaluates their compilers.

* Fully defines many projects so students can learn how to put the theory into practice.

* Includes supplements on theory so that the book can be used in a course that combines compiler construction with formal languages, automata theory, and computability theory.

What I promise you with this book: You'll learn how to write your own
programing language. Not only how compilers and about the language but also
about computer architecture. This book is very easy to read and understand. It
starts with very basic topics then slowly building from it until you'll grok
implementing the language given the specification. You'll have the chance to
build a compiler from scratch or by using JavaCC and Yacc.

------
Someone
If you don’t want to read up front, don’t; start with a calculator. If you
know regular expressions (and even if you don’t), getting something that
handles

    
    
       a=3
       b=4
       a=a+b
       print a
    

shouldn’t be too hard. In iteration 1, forget about expressions with more than
one operator such as

    
    
       a=3*b+c
    

If you think it’s easier, also forget about multi-character identifiers (a…z
should be enough for everybody) types (all variables can be int or, if
desired, float)

Also, initially forget about making a product that doesn’t crash on invalid
input. If a…z is enough for everybody, your toy language may crash on

    
    
      A=3
    

or

    
    
      aa=0
    

Then, add features as you see fit. You will run into problems, such as
questioning whether

    
    
      print=3
      print print
    

should be a valid program, or whether it’s a good idea if

    
    
      a=2+3*4
      print a
    

prints ‘20’, but that’s part of the fun (if you have a puzzling mind, figuring
out how to do that ‘properly’ without reading about it isn’t out of reach).
And those features you add shouldn’t be large. For example, looping:

    
    
      a=1
      @again
      print a
      a=a+1
      >again
    

add simple if statements:

    
    
      a=1
      @again
      print a
      a=a+1
      if a<20 >again
    

The end result will likely be buggy in places. For example, if you decide to
support

    
    
       a=3;b=4
    

and parse by splitting each line on semicolons, adding strings

    
    
       a=“strings may have semicolons, too;”
    

may result in an interpreter/compiler that is broken for a while. Who cares?
You aren’t building a compiler or interpreter, you’re learning.

Also: start with an interpreter. A compiler isn’t that much harder, but
requires you to know about your system’s assembly, object code format, etc.

------
creatornator
I found tutorials like this [0] one to be pretty useful, since they show how
to actually use frameworks like LLVM, and lex/yacc/etc. practically. Note that
this one is from 2009, and uses an outdated version of LLVM. Actually
following along requires some debugging and digging through documentation,
which is also helpful in its own way.

[0] [https://gnuu.org/2009/09/18/writing-your-own-toy-
compiler/](https://gnuu.org/2009/09/18/writing-your-own-toy-compiler/)

------
vkaku
Practical steps:

1\. Create a grammar for your language. Start with Lex/Yacc and model your
grammar with Lex/Yacc. Use them to generate lexer/parser.

2\. Create sample code in your new language. Parse the generated parser on
said language file into a list of Syntax trees.

3\. With said Syntax trees, generate target code - Compiler (or) evaluate the
tree and print results - Interpreter.

Iterate and repeat steps until your language is complete.

------
auganov
Well, the way you ask your question in abstract sort of biases it towards a
CS-heavy approach, doesn't it?

What's your goal?

I've written non-trivial macros that could be understood to be a tiny
compiler, but wasn't ever interested in compiler design and I can't claim to
know much about it.

~~~
lala_lala
It is just that I have a job and a family to support. So I want to create
something in whatever time I have. The prospect of engaging myself in myriad
of courses sounds dull.

Maybe it's a folly but I am just intrigued by how they work and want to be
able to develop them without committing to taking lot of courses.

~~~
wycliffb
maybe you could look into LLVM's Kaleidoscope tutorial..

------
nouney
Read the Dragon Book[0]

[0]
[https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools)

