
An ultra-simplified example of a modern compiler written in JavaScript - alokrai
https://github.com/jamiebuilds/the-super-tiny-compiler
======
mrkeen
9/10 times when I see a "compiler" on GitHub, it's just an interpreter, or the
compilation phase is completely missing.

In this case I believe the "compiler" is an "ast printer".

~~~
westoncb
As far as I can tell you are saying "just an interpreter" and referring to
this as an "ast printer" because the output language is not machine code.

I'm not sure why that would be so significant. Don't get me wrong, I'm sure
targeting some real machine code presents plenty of unique difficulties—but
the interesting thing about compilation to me (and I'd wager a significant
proportion of other readers here) is effective ways of structuring translation
processes.

In the case of this project, the goal is to be a minimal demonstration,
distilling the essence of how a compiler is structured. The fact that the
output language is not machine code doesn't affect the essential structure of
a compiler (as far as I know).

~~~
thosakwe
What most compiler tutorials are missing is coverage of intermediate
representations. There are a billion parsing and calculator tutorials, but not
nearly as many that show the process of turning, say, C-style code, into some
sort of intermediate format, _before_ emitting machine code.

~~~
westoncb
The AST _is_ an intermediate format. Is it important to have an intermediate
_language_? Otherwise why does the AST not qualify?

My understanding is that having something like LLVM’s IR, for instance, is
part of its decoupling the compiler frontend/backend, but probably wouldn’t be
desirable for a single language compiler. But maybe I’m mistaken, or you are
referring to something else :)

~~~
unlinked_dll
Not OP and I agree, "IR" is just the data structure that holds the AST or a
human readable version of it.

That said I think what they're getting at is that the interesting bits of
modern compilers are transforms between ASTs to lower from one IR to another
either to perform some kind of optimization or to replace an abstraction with
an implementation before it gets to codegen.

For example if you have generators in your language it's pretty easy to see
how to turn a "yield" statement into an AST node. But to actually make the
system work you'll probably need a compiler pass over your AST to transform
coroutine definitions to subroutine definitions and a state machine to
represent the execution context and a constructor/destructor for the state.

Same goes for all interesting language concepts, in the compiler the
interesting bit is the pass that transforms the top level AST/IR into more
explicit IR and going through the pipeline to get to codegen. Which is as
complex as everything else these days when it needs to be fast.

The example talks about this but doesn't dive too deep.

~~~
agumonkey
Not really agreeing on this one. By IR people means middle ground between
various semantics. Parsing gets you an abstract structure of the semantic-
world you typed in. An IR would be a something bridging the gap between
concepts and register machines, or whatever.

------
neilparikh
If anyone is curious, I threw together a similar (although not commented yet)
example of how I would write this using parser combinators in Haskell:
[https://gist.github.com/neilparikh/b8a9c758ebbbb5a3ee9b2ed99...](https://gist.github.com/neilparikh/b8a9c758ebbbb5a3ee9b2ed999740b0d)

The JS example wrote everything from scratch, so to do the same, I wrote my
own parser combinator library (based on
[https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf](https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf)),
but you could use an existing library, and get rid of `Parser.hs`.

~~~
matlin
For those interested in this approach, one of the authors (Hutton) of the
paper has a great video explaining functional parsing
[https://youtu.be/dDtZLm7HIJs](https://youtu.be/dDtZLm7HIJs)

------
ducaale
There was a person who began rewriting this using reason [1]. Sadly, he never
got to complete it.

I began tinkering with F# recently and found that ML languages are perfect for
parsing. Is there a version of this in F# or ocaml?

[1] [https://medium.com/@javierwchavarri/building-the-super-
tiny-...](https://medium.com/@javierwchavarri/building-the-super-tiny-
compiler-with-reason-part-1-21460cd4ae7b)

~~~
cxr
Haxe is written in OCaml.

[https://github.com/HaxeFoundation/haxe/search?l=ocaml](https://github.com/HaxeFoundation/haxe/search?l=ocaml)

------
masswerk
Maybe interesting for a comparison: 21 years ago I wrote my own tiny demo
compiler in JS, much inspired by Niklaus Wirth.

[https://www.masswerk.at/demospace/eal/eal.htm](https://www.masswerk.at/demospace/eal/eal.htm)

See the page source for the code. (Probably fun, old syntax both for HTML and
JS, pre-ECMA JavaScript [meaning, the semicolon isn't a delimiter, but a
separator], even the indentation style is outdated. Mind that back then, even
the support of JS object literals wasn't guaranteed.)

Syntax, etc:
[https://www.masswerk.at/demospace/eal/syntax.htm](https://www.masswerk.at/demospace/eal/syntax.htm)

Byte code reference:
[https://www.masswerk.at/demospace/eal/pcode.htm](https://www.masswerk.at/demospace/eal/pcode.htm)

------
Kwantuum
Reminds me of this old screencast about writing a compiler in ruby:

[https://www.destroyallsoftware.com/screencasts/catalog/a-com...](https://www.destroyallsoftware.com/screencasts/catalog/a-compiler-
from-scratch)

------
mrkeen
I want to dispute the 'modern' claim.

I don't know my dates well enough. But is there anything in here that isn't at
least 50 years old?

The high-level is tokenize->parse->ast_transform->codegen.

The low-level looks like `char = input[++current];`

------
tobr
What does “modern” mean in this context?

~~~
gameswithgo
I don't know but I don't think I want any part of it. A compiler is a very
foundational piece of software, where performance is important, you run it
many times, often on large things. It should be written in a a language with a
very high performance ceiling and quick start up time, imho.

------
MR4D
Petty cool. Very well documented too (I only read through a few hundred lines
before bookmarking it for later).

------
aliswe
Wouldn't this be closer to a "transpiler"?

------
lostmsu
Does this parse in linear time?

