
Ask HN: What's the process of writing a new programming language? - audace
I know of the process on a very high level. But what would the first 5-10 steps look like for writing a language that would compile into C (like Go I belive).
======
panic
1\. Sketch out what you want the first iteration of the language to look like.
Write some example programs in your language.

2\. Write a lexer: a program which turns a string of source code in your
language into a list of values like "open parenthesis", "plus sign", "if
keyword", "identifier". These values are called tokens. To test your lexer,
output each token and see if your example programs tokenize properly.

3\. Write a parser: a program which turns a list of tokens into a structured
representation of your program's source code. For example, "if keyword"
"identifier" "equals sign" "number" "then keyword" "print keyword"
"identifier" might turn into an IfStatement with a predicate EqualsExpression
(that itself has a left IdentifierExpression and a right
LiteralNumberExpression) and a list of Statements for the code to run. You can
write the parser yourself (look up recursive descent parsing) or use a parser
generator tool to do it.

4\. Write a code generator: a program which goes through your structured
representation and outputs lower-level code (in this case, C) for each
expression and statement.

~~~
dsacco
I'm sure this is vastly oversimplified, but thanks for writing a quick summary
of the steps. Do you have a good recommendation for books on compilers other
than Dragon?

~~~
panic
I don't know of any good books, sorry! There's plenty of helpful stuff on the
web, though it can be hard to sift through. Personally I think just trying to
write the code is the best way to learn.

------
kayamon
Go read the Jack Crenshaw series of articles. It's not the most relevant
nowadays, but you can't beat it for simplicity.

~~~
kirang1989
Highly Recommended. It really helped me simplify the process of writing a
compiler.

------
david927
Here is a place to start:

[https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools#Second_edition)

~~~
audace
As the question is about a side-project, does the book also dive into
languages that compile into another relatively high-level language (i.e. C)
apart from writing a language the compiles directly into assembly? I'm more
interested in building something light-weight to understand the general
process than building something completely from the ground up.

------
GregBuchholz
If you are interested new languages, then you might want to think about
writing an interpreter first, and then writing a compiler later when you have
more experience with your new language. Languages like Prolog are good for
making interpreters, along with Lisp/Scheme and ML/Haskell.

------
robertelder
I haven't made my own programming language yet, but I am working on a from-
scratch C compiler
([http://recc.robertelder.org/](http://recc.robertelder.org/)), so I'll give
you a few ideas:

1) You'll probably want to start by thinking about what the programming
language will do, and what the grammar
([https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_F...](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form))
of the language will look like. I would recommend starting by writing an LL
grammar
([https://en.wikipedia.org/wiki/LL_grammar](https://en.wikipedia.org/wiki/LL_grammar)),
so you can write a recursive descent parser for it
([https://en.wikipedia.org/wiki/Recursive_descent_parser](https://en.wikipedia.org/wiki/Recursive_descent_parser)).
You will also need to be careful to not introduce indirect, or direct left
recursion into your grammar
([https://en.wikipedia.org/wiki/Left_recursion](https://en.wikipedia.org/wiki/Left_recursion)).

2) The first step will naturally lead you to need to consider tokenization:
[https://en.wikipedia.org/wiki/Tokenization_(lexical_analysis...](https://en.wikipedia.org/wiki/Tokenization_\(lexical_analysis\))

3) Some caveats of step 1 include the dangling else problem
([https://en.wikipedia.org/wiki/Dangling_else](https://en.wikipedia.org/wiki/Dangling_else))
and other grammar ambiguities
([https://en.wikipedia.org/wiki/Ambiguous_grammar](https://en.wikipedia.org/wiki/Ambiguous_grammar)).
A recursive descent parser will likely need to do backtracking so you'll want
to think about how you can also backtrack any internal state that gets build
up as the parser does its thing.

4) Once you can create a full parse tree and can traverse it, you can consider
code generation. For un-optimized code, this is probably the easiest part, but
once you start to considering possible optimizations, you'll probably want to
write a 'back-end' and you could probably spend the rest of your life creating
new optimizations.

5) Of course, this all gets more complicated if you want to do it differently
with an LR grammar or if you want an interpreted language. You can also think
about things like just in time compilation, etc.

------
montyedwards
The Go programming language doesn't compile to C. Compiling Go is faster than
compiling C.

If Go transpiled into C code first, and then had to compile resulting C code,
then that entire process would be slower.

The Nim programming language compiles to C, so you may want to reach out and
ask their community. It used to be called Nimrod, but is now Nim.

The Rust programming language leverages LLVM instead of transpiling to C, so
you may want to take a look at how that is done. A recent post about Rust MIR
is well-written and is an enjoyable read for anyone interested in compilers.

------
bjourne
First you write an RPN calculator. Make a program that takes the input: "3 4 *
4 + 2 /" and figures out that the answer is 8.

