
How I wrote a programming language, and how you can too - adamnemecek
https://medium.com/@william01110111/the-programming-language-pipeline-91d3f449c919
======
BuuQu9hu
Compiler author here. Overall cool post. I have nitpicks and opinions. Don't
let me stop you, but consider what I have to say.

"There are two major types of languages, compiled and interpreted."

Well, no. There are two types of language runtimes, compilers and
interpreters. (And a technique which can turn interpreters into compilers.
[0]) This matters quite a bit if you want your language to not be permanently
married to a single compiler; your language design should somewhat accommodate
both compiled and interpreted modes as possibilities. This seems obvious in
hindsight, but even today, popular languages like Go do not have compilation
models which make REPLs easy or even possible.

"I chose C++ because of its performance and large feature set. Also, I
actually do enjoy working in C++. If you are writing an interpreted language,
it makes a lot of sense to write it in a compiled one (like C, C++ or Swift)
because the performance lost in the language of your interpreter and the
interpreter that is interpreting your interpreter will compound. If you plan
to compile, a slower language (like Python or JavaScript) is more acceptable.
Compile time may be bad, but in my opinion that isn’t nearly as big a deal as
bad run time."

Okay. This is totally up to the author. It sounds painful, though. JVM legend
Cliff Click has said during talks that choosing C/C++ for a VM implementation
language is one of the worst decisions that a VM author can make in the modern
ecosystem. Memory-unsafety notwithstanding, the amount of effort required to
write good C++ is pretty high, and compiler algorithms are complex, so writing
a compiler in C++ is very hard.

"In the end, I didn’t see significant benefits of using Flex, at least not
enough to justify adding a dependency and complicating the build process. My
lexer is only a few hundred lines long, and rarely gives me any trouble.
Rolling my own lexer also gives me more flexibility, such as the ability to
add an operator to the language without editing multiple files."

On one hand, once your code is correctly factored, an operator _will_ be
smeared over several files, presuming that your lexer, parser, and AST class
are in at least three distinct modules. On the other hand, hand-written lexers
_are_ better at error messages. So it's a tough call and it largely comes down
to whether you prefer the pain of writing lex/flex/etc. or writing C++. (This
is why a better interpreter language can make all the difference!)

Mr. Bright on parsers, via the author: "Somewhat more controversial, I
wouldn’t bother wasting time with lexer or parser generators and other so-
called “compiler compilers.” They’re a waste of time. Writing a lexer and
parser is a tiny percentage of the job of writing a compiler. Using a
generator will take up about as much time as writing one by hand, and it will
marry you to the generator (which matters when porting the compiler to a new
platform). And generators also have the unfortunate reputation of emitting
lousy error messages."

I want parser generators to be good enough. But they largely aren't. These
points are all absolutely correct.

"Put simply, the action tree is the AST with context."

These are normally called _annotated_ ASTs. Each node carries an annotation.
This can be used for type-checking, two-level syntax, specialization, side-
effect-tracking, etc. It's a great technique that really should be publicized
in compiler literature. Unfortunately, every interpreter language has
different ways to do it. Haskellers will want to use a comonad. Hash-table-
rich languages will probably use a hash of AST nodes to annotations.

"It is, but compiling is harder than interpreting."

Efficient compilation is as hard as efficient interpretation. Ignoring the
theory of _partial evaluation_ [0], an interpreter usually has to make the
same optimizations that a compiler does in order to be speedy. Python and
friends, and many other Smalltalk relatives, for example, compile to bytecode.
If the bytecode compiler is any good, then the bytecode interpreter will be
good too.

[0]
[http://www.itu.dk/~sestoft/pebook/pebook.html](http://www.itu.dk/~sestoft/pebook/pebook.html)

