I won't spend a lot of time making excuses; only point out that things happen, and priorities change. In the four years since installment fourteen, I've managed to get laid off, get divorced, have a nervous breakdown, begin a new career as a writer, begin another one as a consultant, move, work on two real-time systems, and raise fourteen baby birds, three pigeons, six possums, and a
The author has quite a diverse work history:
(Links provided for info, not complaining about dupes.)
Found one article: https://m.eet.com/media/1171254/toolbox.pdf
It's not based on the linked guide, but there's 8cc. Outputs x86-64 only.
Diary of it being made: https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...
Github sources and history: https://github.com/rui314/8cc
The "gen.c" source file might be helpful for some of your questions about x86-64: https://github.com/rui314/8cc/blob/b8a46abdb5f0633bdbff31482...
Adding floating point arithmetic needs another dozen or so instructions. The x87 opcodes (FLD, FST, FADD etc.) are very easy to program against, since it's a stack machine - a post-order traversal of an expression tree is usually sufficient if the stack won't overflow, though SSE2 instructions are more usual for x64.
Large portions of the full instruction set are vector operations which you wouldn't realistically be emitting for a didactic toy compiler. You can get by perfectly well without the string operations too.
I tried to create "assembler", in python, for just one instruction, MOV. And not even 64, or 32-bit, (it was either 16-bit or 8-bit).
After scratching my head with all the corner cases, and handling about 80% of them, my python code looked like a complete mess.
There is an article floating around (posted here multiple times) that says MOV is turing complete .
This compiler compiles C to MOV instructions only.
Also, the opcodes (and especially the ModRM) make far more sense when written in Octal - see http://www.dabo.de/ccc99/www.camp.ccc.de/radio/help.txt
Don't believe this claim - any computer has only access to a finite amount of memory while a Turing machine needs an infinite tape to work. So the proof must be false.
In other words: only a mathematically idealized version of MOV might be Turing complete - but this cannot be the MOV that x86 implements.
Turing-completeness isn't a property of computational models, it's a property of problems. A problem is Turing-complete when it cannot be solved by any computational model less powerful than a Turing Machine, or other Turing-equivalent computational system, such as the Lambda Calculus.
Moreover, a Turing machine does not have an infinite tape. It has an unbounded tape. Its tape is not limitless, it is extensible, such that a computation cannot fail due to running out of tape. Hey, you wanted to get pedantic.
Finally, in practical land, the distinction between Turing-equivalent and non-Turing-equivalent is very relevant to real programming languages: Epigram is pointedly not Turing-equivalent, and therefore it's possible to prove programs in that language will halt. Moving away from Turing-equivalence frees you from the tyranny of the Halting Problem.
So if you argue that the silent ‘(but obviously with a machine-bound tape)’ assumption is OK, you argue that it is OK to identify Turing machines with a class of machines that is even weaker than finite-state machines.
Any professor for (theoretical) computer science will be horrified by such claims.
It’s not about classes of machines. A machine with a limited tape is a perfect approximation for the proper class of a machine with an unlimited tape if the limit is not reached for all programs you care about.
Professors won’t be horrified - they also use the silent parenthetical constraint. Look for example at formal publications like Dolan’s and see how academics talk about it.
Or, if you want to skip assembly language entirely and compile directly to 386/ELF:
64-bit is almost entirely a superset of 32-bit, so that will at least get you started.
That is just a shallow impression though. Convince me I'm wrong?
For example I studied how to program assembly code for an STM32F0 microprocessor. Would never do that in practice. but worked wonders in teaching me the intricacies of a processor at a very low level.
In particular, old compiler texts will often do things like emit code directly from the parser, try to minimise the number of AST passes, or keep complex global symbol tables separate from the AST. These are mostly workarounds for technical limitations that are no longer relevant, and in fact only obscure the principles being taught (code generation during parsing is my pet example of this).
For example, the author doesn't divide the compiler into multiple passes, so it can fit in RAM; he assumes you have "enough" RAM for a simple compiler to fit, and goes from there.
Similarly, he jumps right into recursive code, because that's simpler, and he assumes your computer has a stack of reasonable depth. (Go back far enough and computers had really shallow stacks. Go back farther and computers didn't have call stacks at all.)
Finally, his compiler doesn't optimize the code, because he assumes that the obvious code will run sufficiently fast. A fairly modern idea, and one which removes the complexity which would otherwise drive the design and force things like multiple passes and complex internal representation.
This is useful for both practical and teaching purposes: for practical because it keeps things simple in case the additional complexity isn't needed (e.g. scripting languages) and for teaching purposes because someone learns both ways (which are used in real world problems) while at the same time learning why one might be preferable to the other. And if you do the partial AST bit you even introduce the idea of an AST gradually by building off existing knowledge and experience the student has acquired.
And if you want to/need to later, you can trivially introduce an AST in those compilers by replacing the calls to the code-generator with calls to a tree builder, and then write a visitor-style driver for the code generation.
Instead of calling the code emitter, you call an AST builder.
Then you build a tree walker that call the code emitter.
At least the Wirth Oberon compiler was retrofitted with an AST by at least one student in Wirth's group as part of experiments with optimization passes.
When developers ask, "I am curious about compilers, where should I start?", a common answer, even today, is "the dragon book". This is just bad advice. There are better introductory texts on the subject. However, if we step back , there's no reason not to suggest "Let's Build a Compiler" even though it seems dated at this point. Why is that? Because the dev is only curious and the content of "Let's Build a Compiler" is still, to this day, generally a good introduction to what compilers actually do. It's lite on theory and heavier on practicality and can easily satiate curiosity.