Common Lisp = the programmable compiler & runtime all in one. Most CL implementations compile before calling eval. Read–eval–print loop is actually parse-compile-execute-print loop.
* (defun foo (x) (+ x 1))
* (disassemble 'foo)
; disassembly for FOO
; Size: 38 bytes. Origin: #x1003A7DD1C
; 1C: 498B4C2460 MOV RCX, [R12+96] ; thread.binding-stack-pointer
; no-arg-parsing entry point
; 21: 48894DF8 MOV [RBP-8], RCX
; 25: BF02000000 MOV EDI, 2
; 2A: 488BD3 MOV RDX, RBX
; 2D: 41BBC0010020 MOV R11D, 536871360 ; GENERIC-+
; 33: 41FFD3 CALL R11
; 36: 488B5DF0 MOV RBX, [RBP-16]
; 3A: 488BE5 MOV RSP, RBP
; 3D: F8 CLC
; 3E: 5D POP RBP
; 3F: C3 RET
; 40: CC10 BREAK 16 ; Invalid argument count trap
Middle End = CL has multiple passes with AST. First normal macros, then compiler macros operate here. They deal with CL data. Lists, symbols, structs, objects, arrays, numbers, etc. CL packages are symbol tables. AST and IR are all CL data structures.
Back End = finally the function 'compile' function compiles the end result into into native code. Saving the image and exiting the runtime is like compiling in normal languages.
> Most CL implementations compile before calling eval
Many or a lot, but I'm not sure actually most.
There are quite a lot which use an Interpreter in the REPL, but where one can compile on demand. SBCL by default compiles. LispWorks for example does not.
> After it's reader is done with it, the code is represented with CL data structures.
Right, but the data structures carry no syntactic information. If you write (1 2 +) the reader accepts it and does not know that this is invalid syntax.
> AST and IR are all CL data structures.
True, but not necessarily like s-expressions...
See for example data structures the SBCL compiler uses:
> Saving the image and exiting the runtime is like compiling in normal languages.
Working with images is optional - the Common Lisp standard says nothing about images at all.
There is either nothing to learn here, or a spectacular insight into compilation, depending on where you are in your career.
Compilers are fun. It’s an invaluable experience having written one. There is no magic left in a computer once you’ve written a compiler.
If you have written a compiler, this article is amusing. You get to see a programmers eyes open.
If you have not written a compiler, you get some insight about how they work- but perhaps you get more insight about how you work.
Also, the author should constant fold his arithmetic example. It slows down compile time, but makes the program faster.
There are numerous bloggers who dig deeper into the implementation of things and explain them. You just have to start asking "how do I build X from scratch".
When I write articles like these, I think of what I would have liked to have read when I was first getting into the topic at hand.
* In text
* Free and accessible in the web
* Not lengthy (not a 1000 page book)
* Points to resources to read after.
I have read the unfinished http://www.craftinginterpreters.com/ , but I would like to know more about compilers (that emit machine code).
There are a lot of cases where that doesn't work (eg. writing a functional language or a JIT or doing programming language research) but those are edge cases. In those cases you probably want to read one of Appel's books ("Modern compiler implementation in ML" for example), or the Dragon Book.
For a parser, you should consider using tree-sitter. Tree-sitter gives you live editor support for free. Impressive work by Max Brunsfeld.
First. Design your language if you are not going to use something that already exists. So you need to understand grammars and the various pitfalls. There are quite a number of tutorials on this subject freely available around the web.
Second. Understand what you want your language to actually mean, this is the subject of semantics. So if you are starting out, make a simple grammar and simple associated semantics. Again there is various material around the traps that you can avail yourself of.
Third. Have a look at the various parser generators out there and what they expect in terms of your language. There are many and each has its pros and cons. Again, there are various tutorials about the subject.
Fourth. If your language is relatively simple and has relatively simple semantics, see what is required for the generation of LLVM IR. If there is any complexity in the language and its semantics, LLVM IR may not necessarily be the way to go. So, back to the first and second points, keep your investigatory language simple in both its syntax (grammar) and its semantics.
Fifth. Have fun. The entire subject can be a delight to learn, but some of the material out there is deadly boring and will quickly turn most people off the subject. But it is actually fun. Son enjoy the time you spend learning about the subject.
I had a compiler class at university but that was a while ago, so I would like to update my knowledge because there seems to be a lot of exiting stuff going on with compilers and transpilers.
The newer things:
* SLP vectorization. Instead of classic vectorization (where you have a loop that you convert to a vector), SLP vectorization tries to form vectors from a single basic block, and it can work quite well for the small SIMD units such as SSE.
* Polyhedral loop transformation. This is sort of the equivalent of the lattice-based dataflow analysis methodology, which is to say it's a very powerful, and slow, general-purpose technique which actually isn't used all that much in production compilers.
* Decompilation, disassembly, and other forms of static binary analysis have progressed a fair amount in the past decade or so.
* Dynamic binary analysis/translation, of which the state of the art is probably Intel Pin (https://dl.acm.org/citation.cfm?id=1065034).
* JITs have evolved a lot. In terms of what's missing from the classic compiler books, this is the big missing area. Things such as tracing, recompilation techniques, garbage collection, guard techniques, etc. have come a long way, and I'm not off-hand aware of any actual good book or paper here that covers most of the topic.
* Superoptimization is a technique that's still mostly in the academic phase, but you are seeing peephole optimizations and other information populated by ahead-of-time superoptimization (e.g., Regehr's Souper work).
* Symbolic and concolic execution is something else that's in the transition phase between academic and industry work. The most advanced use of concolic execution is fuzzing work where concolic execution is used to guide the fuzzer to generate specific inputs to test as many paths as possible.
Most of these topics wouldn't be covered in something as introductory as the Dragon Book (which itself has poor coverage of the major loop transformations), and generally only come into play in more specific scenarios.
All this is to say the barrier to entry in PL/compiler/interpreter implementation is only as high as you want it to be. You can do the whole thing yourself as a learning experience or you can use existing tools to finish a fairly-fully-featured project in a few hours over a few weeks.
There are many series like mine throughout the internet that are generally useful aides when combined with the typical dense, classical compiler textbook. The single exception to "dense textbooks" on compilers is Nikolaus Wirth's Compiler Construction  where he takes you through writing a compiler from scratch in ~100 pages. This is one of my favorite compiler (and programming?) books by one of the coolest programming language researchers. But even it may not be super accessible as a first on compilers.
and you will know how a cross platform self-hosting C compiler works in 1,982 lines of code
and if you want to know how an assembler works read this: https://github.com/oriansj/mescc-tools/
only about 562 lines for the linker, 645 for the macro assembler and 204 lines if you want dwarf stubs in your binaries. Oh and it is cross platform too
But, note "The PLAtform NEutral Transpiler"..."allows one to compile a subset of the C language".
Subset of C not being C.
It sounds like someone took the Unix philosophy "each program should do one thing well" and applied it to compilers.