
A Brief and Brisk Overview of Compiler Architecture - signa11
https://blog.felixangell.com/compilers-brief-and-brisk
======
nabla9
Compiler architecture is also a good metaphor for understanding what Common
Lisp is all about.

Common Lisp = the programmable compiler & runtime all in one. Most CL
implementations compile before calling eval. Read–eval–print loop is actually
parse-compile-execute-print loop.

    
    
      * (defun foo (x) (+ x 1))
      FOO
      * (disassemble 'foo)
    
      ; disassembly for FOO
      ; Size: 38 bytes. Origin: #x1003A7DD1C
      ; 1C:       498B4C2460       MOV RCX, [R12+96] ; thread.binding-stack-pointer
                                                     ; no-arg-parsing entry point
      ; 21:       48894DF8         MOV [RBP-8], RCX
      ; 25:       BF02000000       MOV EDI, 2
      ; 2A:       488BD3           MOV RDX, RBX
      ; 2D:       41BBC0010020     MOV R11D, 536871360  ; GENERIC-+
      ; 33:       41FFD3           CALL R11
      ; 36:       488B5DF0         MOV RBX, [RBP-16]
      ; 3A:       488BE5           MOV RSP, RBP
      ; 3D:       F8               CLC
      ; 3E:       5D               POP RBP
      ; 3F:       C3               RET
      ; 40:       CC10             BREAK 16   ; Invalid argument count trap
      NIL
    
    

Front end = Read function does the lexical analysis. CL syntax is almost one-
to-one with the AST, so lexer-parser part is simple. You can reprogram the
lexer with reader macros if you want. This is the last place where CL code is
text. After it's reader is done with it, the code is represented with CL data
structures.

Middle End = CL has multiple passes with AST. First normal macros, then
compiler macros operate here. They deal with CL data. Lists, symbols, structs,
objects, arrays, numbers, etc. CL packages are symbol tables. AST and IR are
all CL data structures.

Back End = finally the function 'compile' function compiles the end result
into into native code. Saving the image and exiting the runtime is like
compiling in normal languages.

~~~
lispm
Compiling will not change READ into PARSE. READ is still there.

> Most CL implementations compile before calling eval

Many or a lot, but I'm not sure actually most.

There are quite a lot which use an Interpreter in the REPL, but where one can
compile on demand. SBCL by default compiles. LispWorks for example does not.

> After it's reader is done with it, the code is represented with CL data
> structures.

Right, but the data structures carry no syntactic information. If you write (1
2 +) the reader accepts it and does not know that this is invalid syntax.

> AST and IR are all CL data structures.

True, but not necessarily like s-expressions...

See for example data structures the SBCL compiler uses:

[https://github.com/sbcl/sbcl/blob/master/src/compiler/node.l...](https://github.com/sbcl/sbcl/blob/master/src/compiler/node.lisp)

> Saving the image and exiting the runtime is like compiling in normal
> languages.

Working with images is optional - the Common Lisp standard says nothing about
images at all.

------
jfoutz
If you would like a heartwarming tale of an engineers first compiler, this is
the article for you.

There is either nothing to learn here, or a spectacular insight into
compilation, depending on where you are in your career.

Compilers are fun. It’s an invaluable experience having written one. There is
no magic left in a computer once you’ve written a compiler.

If you have written a compiler, this article is amusing. You get to see a
programmers eyes open.

If you have not written a compiler, you get some insight about how they work-
but perhaps you get more insight about how you work.

Also, the author should constant fold his arithmetic example. It slows down
compile time, but makes the program faster.

~~~
bluedino
I love these articles as they are a big 'magical' to me. They make me wish I
had gotten a CS degree.

~~~
eatonphil
You don't need a CS degree to enjoy this stuff or learn more! I do not have a
CS degree, or any degree at all. It took me a few years before I built up the
courage to write an interpreter or compiler but I was compelled by peers who
had built their own (and didn't attend college either).

There are numerous bloggers who dig deeper into the implementation of things
and explain them. You just have to start asking "how do I build X from
scratch".

~~~
felixangell
This is very much the case! I started tinkering with compilers around 4 years
ago when I was 16. If it was possible for me to do then, I'm sure it's
possible for anyone else to learn too.

When I write articles like these, I think of what I would have liked to have
read when I was first getting into the topic at hand.

------
pcr910303
Are there resources to learn compilers in general?̊̈ It’s hard to find
resources that are

* In text * Free and accessible in the web * Not lengthy (not a 1000 page book) * Points to resources to read after.

Any suggestions?̊̈ I have read the unfinished
[http://www.craftinginterpreters.com/](http://www.craftinginterpreters.com/) ,
but I would like to know more about compilers (that emit machine code).

~~~
rwmj
For the majority of use cases a simple rule suffices: Write a parser which
generates LLVM IR and feed the IR into LLVM. You only need to research two
topics: parser generators and LLVM IR, and both are pretty simple skills to
pick up.

There are a lot of cases where that doesn't work (eg. writing a functional
language or a JIT or doing programming language research) but those are edge
cases. In those cases you probably want to read one of Appel's books ("Modern
compiler implementation in ML" for example), or the Dragon Book.

~~~
sischoel
These books are kind of old. Are they still state of the art or is there
something newer?

I had a compiler class at university but that was a while ago, so I would like
to update my knowledge because there seems to be a lot of exiting stuff going
on with compilers and transpilers.

~~~
rwmj
The basics don't change. There was this rather fun talk on HN 2 weeks ago
[https://news.ycombinator.com/item?id=19657983](https://news.ycombinator.com/item?id=19657983)
where he asserts that only 8 optimizations are necessary to get about 80% of
best case performance, and all of those were catalogued in _1971_.

~~~
felixangell
Wow, that's a really interesting fact. I hope you don't mind but I updated the
article to reference this and credited you accordingly :^)

------
nv-vn
The funny thing about compilers is that one architecture has been established
as the dominant architecture (the one described here) for probably the past 50
years. Almost without exception, every compiler is structured just about
exactly the same. If you're interested in an alternative architecture, it's
worth reading up about Nanopass compilers. While largely the same basic ideas,
Nanopass tries to distill each discrete action into its own pass -- think of
it like composing a bunch of programs using shell pipes. So rather than just
Frontend=>Middle End=>Backend you end up with something like
Lex=>Parse=>Remove Lambdas=>Remove Loops=>Remove Conditionals=>SSA=>Allocate
Registers=>Eliminate Dead Code=>.....

~~~
enos_feedler
When I worked on GPU compiler backends the middle and backend passes basically
were a collection of more basic passes that did one thing, in sequence. I
wouldn't call this an alternative architecture. I think it's just an
implementation detail of the dominant architecture.

