Also, writing a Lisp in a purely functional language isn't the best :X Things like by-reference function environments and atoms are hard to implement if you're not Haskell, at least if you try in the method that the guide outlines...
Without the assumption that any other software exists
The radically simple implementation of TCO really surprised me.
Question: how do I go from tackling this to tackling compilers?
Choose a good Macro Assembler like NASM or YASM, and implement the bytecodes as macros.
It won't win any benchmark, but you will get native executables.
Or use Lisp macros to generate the code.
Here is a Scheme based tutorial.
Then, write the compiler in that language.
The compiler walks the code just like the interpreter, but instead of interpreting, it spits out a translation.
Spitting out some kind of translation into something isn't a very high bar; the difficulties are all in optimization: things like analyzing lexical scopes for the presence of closures which escape, so as to be able to stack-allocate environments that aren't captured by closures. Plus all those other difficult things: like turning a program into a graph of basic blocks and doing flow analysis; register allocation, selection of instructions for a myriad target machines, debugging suport, and so on. Any of these requirements can be excluded from a compiler, yet it's still a compiler.
The "Lisp in 500 lines of C" (plus accopmanying .lisp) file implemented a compiler. The complete source is hard to find, but here is a derivative project: https://github.com/jackpal/lisp5000
A chunk of init500.lisp implements compilation to C. Search the file for terms like "compile" and also "deftransform".
If you've ever written a macro, you've written a piece of a compiler. Macros take an input syntax tree and spit out a new one. Many actions traditionally associated with a compiler (by Lisp-unaware computer scientists) can be done by macros or in the macro expander. Basic compiling is a lot like macro expansion. It can actually be Lisp-to-Lisp even; the transformers can spit out a Lisp data structure representing a target language. (Heck GCC spits out a Lisp-like data structure for intermediate code. Of course; it was originally a project started by Stallman.)
From there, you’ll be more than ready to generate machine code.
For example, when arg_cnt is 2, mv->val.f2 receives func converted to void * (* )(void* , void* ), that is, pointer to a function that accepts two parameters of type pointer-to-void and returns pointer-to-void. [Weird spacing to effectively escape the asterisks that would otherwise be treated as italicizing markup.]
C permits conversion between pointer-to-function of one type to pointer-to-function of another type. The inverse is also defined, and the resulting pointer will compare equal to the original. However, calling a function through a pointer-to-function where the types do not match invokes undefined behavior.
case 13: mv->val.f13 = (F13)func; break;
One can bend the rules of prim-and-proper well-defined ISO C just a little bit and assign to the simplest function pointer, taking advantage of all function pointers having the same representation on every machine known to mortal hacker, and all being overlaid by the union. Thus all the cases collapse down to this:
mv->val.u.f0 = (void (*)(void)) func;
These functions are only used from within C code; they are not the basis for any intrinsic functions. The call to each of these functions knows exactly the type of function it wants to make.
There is a notation in the naming.
Firstly, the leading 'f' denotes functions with an environment value, whereas 'n' denotes non-environment functions. Then there is a number indicating the number of arguments. Then an optional letter which is 'o' if the function has optional arguments, or 'v' if it is variadic. Both can be present in which case we have 'ov.
Thus for example func_n2ov(some_c_function, 1) will create a variadic function with 2 fixed arguments, of which 1 is required (so one optional). some_c_function has to have the type signature val (* )(val, val, varg). No casting is required because func_n2ov just assigns this to the correct union member. If we pass a function with an incorrect signature, we get a C compile error.
The func_n2ov constructor is used when registering the unique function in eval.c:
reg_fun(intern(lit("unique"), user_package), func_n2ov(unique, 1));
I wrote TXR in a way that is easy to understand and maintain. It is only implemented in C, though. If you want to study how to make a Lisp interpreter in C. Yet, it is production code for real work.
The benchmark results of 42 implementations are very impressive:
which are discussed in Japanese at:
They wrote the app in Common Lisp and claimed that it was their secret weapon which allowed them to iterate faster and stay ahead of their competition. They ended up selling their company (ViaWeb) to Yahoo for $50 million (IIRC).
PG also authored two books on Lisp, and HN is written in Arc (a Lisp built on top of Racket).
Around 2005 I was doing a lot of Lisp programming and I was really interested in what PG had to say. He was writing a lot of essays then and giving lots of talks, etc.
Unfortunately, I don't get the opportunity to use Lisp as much as I used to, but I'm doing a lot of Python and even played around with Hy (Lisp on top of Python). So I'm interested in Lisp as I find it to be an incredibly powerful language that is really fun to use.
It's a beautiful thing. I wish I would never have to climb down into the cave (languages with more advanced parsers) again.
Also, it's interpreters all the way down. Even assembly is interpreted.
Or recently this way:
Interesting enough, the primitive versions of LISP that aren't high-performance (i.e. advanced compilers) could probably be done by hand in hardware where the whole thing was bootstrapped up with LISP-only tech. The LISP would start/stop at the abstract, state machines or RTL of the bootstrapped version.
EDIT: Good luck on making it through the hurricane. Feel for yall out there.
Learning assembly is great, and every programmer should dive into at least a simpler version of it, but it's not exactly the best way to learn how your high-level language is implementing tail-call optimization.
Which you can code in a convenient way by using Lisp and Lisp macros to generate the <insert target CPU> machine language output using <insert target CPU> opcodes.
Speak truth to power. Every time someone says this there seem to be nothing but haters.
Because it's not completely true. If a CPU is microcoded, then it's accurate to say "assembly is interpreted" because every instruction is effectively an address into a lookup table of microinstructions. But in a non-microcoded (e.g. purely RISC) CPU, the bits of the instruction word are effectively enable and data lines to a bunch of hardware logic gates and flip-flops, which cause register transfers and arithmetic operations to happen. In this case, the ones and zeros in the instruction word are voltage levels on logic gates. Calling the latter "interpretation" is a stretch.
To be fair, there aren't many pure RISC implementations around these days. Most everything has some degree of microcode involved, so to that extent you're right.
Could you please elaborate on this point? I'm still very much in the cave.
That is the case for all instructions. Each one is interpreted by the CPU. And for modern CPUs, even translated into an internal format that is futher interpreted.
 Sans power right now. Using my cellphone as a hot-spot and my iPad as a laptop. The aftermath of hurricanes is brutal in Florida.
(Good luck recovering from the hurricane! Keep your head down!)
- Homoiconicity: Code and data are the same (s-expressions)
- Lisp code is basically the AST in itself
- It's trivial to implement lisp in lisp (eval)
- Continuations (call/cc)
It's a truly fascinating language.
I never really did any Lisp coding outside some university projects and I still obsess and read about Scheme all the time.
Lisp is addictive, like a hard drug.
Lisp is the original "programmable programming language".
Experienced programmers with no previous knowledge of Lisp should relish (or doubt in disbelief) at the features provided.
Lisp, at least Common Lisp (but other Lisp dialects as well) might claim to be the most powerful and versatile programming language (and environment) available.
CL can run pretty fast. At the same speed than Java on the oracle JVM, and if some tricks are applied it should match C and Fortran speed under certain conditions. CL code is extremely portable.
Lisp has been used for very high level tasks (AI, etc) and for low level tasks (operating systems for Lisp machines).
You have also noticed that lots of people use it to create cool things which reach the HN front page.
github.com/AlDanial/cloc v 1.70 T=2.86 s (433.6 files/s, 67570.0 lines/s)
Language files blank comment code
Visual Basic 36 1569 132 8036
Swift 36 1199 1606 7951
SQL 36 956 649 7594
Pascal 19 704 951 6253
Ada 28 1975 380 5856
Elm 19 1619 226 5647
awk 16 290 15 5169
Lisp 37 825 256 4562
VHDL 17 434 32 4228
C# 20 658 346 4175
Python 37 574 394 4018
make 89 1244 873 4010
C 19 466 392 3499
Perl 35 413 171 3450
Go 17 281 148 3397
Rust 18 304 130 3331
Forth 18 516 158 3228
C++ 18 568 71 2986
Java 17 325 135 2938
D 17 363 10 2938
Racket 35 480 385 2740
TypeScript 17 284 56 2740
Markdown 8 714 0 2730
Rexx 17 283 47 2713
Bourne Shell 30 390 315 2692
Tcl/Tk 17 284 43 2416
Dart 16 255 41 2303
F# 20 337 13 2298
Objective C 17 285 150 2189
Erlang 17 277 191 2185
MATLAB 26 184 128 2165
Haxe 19 254 79 2161
Crystal 18 417 58 2146
vim script 17 242 50 2076
Haskell 17 331 99 2000
PHP 18 249 120 1955
Lua 18 248 87 1874
Scala 16 221 113 1816
Elixir 19 366 35 1809
JSON 28 246 0 1765
R 17 201 100 1612
Groovy 17 170 98 1582
Kotlin 17 312 0 1554
Julia 17 189 126 1405
Nim 16 320 36 1397
lex 18 209 0 1355
Ruby 17 167 87 1264
OCaml 15 109 98 1211
PowerShell 11 128 71 1146
ClojureC 15 257 129 1053
CoffeeScript 17 195 126 1037
CSS 6 150 132 792
C/C++ Header 22 266 44 752
HTML 2 36 35 486
Bourne Again Shell 47 1 4 135
YAML 2 6 10 86
Maven 1 2 7 85
Clojure 2 8 12 65
ClojureScript 1 1 0 2
SUM: 1242 24806 10447 158293
make stats | egrep "Running|total|comments"
However, that being said, the following implementations are "smallish" (in both lines of code and bytes): mal itself (self-hosted), Clojure, factor, Perl 6, Ruby. The following are "largish": Awk, PL-pgSQL, C, PL-SQL, Chuck, Swift 2, Ada.
OCaml is in the "smallest" third of the implementations.
It then grew into a personal learning tool for me for learning new languages. As I implemented more languages, I refactored the structure to make it more portable and incremental. At some point I realized that it might be an interesting learning tool for others so I wrote an early version of the guide and had a friend work through it and give feedback. He liked it enough that he immediately did a second implementation.
At some point the project got some twitter/HN attention and other people began contributing implementations and feedback for the guide.
Mal also serves as programming language "Rosetta Stone". A bit like rosettacode.org but using a full working project.
It's important to re-do things which others have already worked out, in order to reach a greater understanding :)
I learned how to implement Lisp by reading the source code for multiple compilers. Started with Franz Lisp, then KCL and CMUCL. I don't feel that a toy interpreter is going to teach the same things.
A toy language won't teach the exact same things, but it covers a lot of the same topics.
I had a go in Rust recently. Was a great way to learn new things about Rust and also some interpreter concepts.