I feel like many people missed one important point before they complained about how hard it is to write a compiler: start simple! You don’t have to start with mature languages like Java even you really want just support a subset of it. Same for codegen targets: try some simpler (pseudo) architectures/bytecode before even thinking about the real ones like JVM bytecode or native assembly
Agreed! Start with something really really simple like arithmetic. Part of the motivation for this blog post was that I struggled to get over the hump that came once I had done the basic stuff like arithmetic, variables, if/else, top level functions, and got to the harder bits like closures, garbage collection, etc.
If you look at many early compilers (BCPL, Mini Pascal, B, ... and come to think of it, arguably Python etc.) they massively simplified their lives by only doing codegen to very small VMs.
which? B? I'd thought that was at least threaded. BCPL and Mini Pascal definitely had VM targets...
Edit: anything targeting PDP-8 native code could be considered to target a reduced ISA, but I don't think any of those were really mainstream. PDP-7 as a target looks like a pretty vanilla ISA to me?
I can't understand what you are trying to say. Are you calling a bytecode interpreter a virtual machine? Are you thinking of abstract machines like the p-machine or warren abstract machine (which were really just bytecode interpreters with fancy names)? Every BCPL compiler I know of generated assembly code; Mini Pascal was written in OCaml so implicitly targeted its bytecodes...?
And what do you mean that the PDP-8 had a "reduced ISA"?
yes, bytecode interpreter. Martin Richards' BCPL[0] generates bytecode (but maybe there's an optional native codegen that gets bootstrapped that way?). Niklaus Wirth's Mini[1] Pascal was load-and-go in the sense that it would JIT a rudimentary bytecode into memory and then interpret that.
PDP-8 I was just guessing at trying to figure out what you meant, not realising we were talking about different implementations. (thinking you might have meant ISAs were simpler back then — IIRC PDP-8 ISA is not very different from the very small bytecode Wirth's mini Pascal interpreted [edit 2: cf infra], but PDP-7 ISA, while simpler than modern ones, is in my eyes only quantitatively and not qualitatively so)
In any case, the biggest transformation is from a tree-structured AST down to a linear (associative) sequence of data and ops, and if one has learned how to do that for an interpreted bytecode, it's plug-and-chug to do the same for a native ISA.
also, I just realised the confusion with the other concept of virtual machine in terms of guest supervisor levels. I was using the virtual machine which was virtual in the sense that it is program, not machine, interpreted and hence usually designed to be an easy target.
Edit 2: eg the PL/0 virtual machine ops:
lit 0,a : load constant a
opr 0,a : execute operation a
lod l,a : load varible l,a
sto l,a : store varible l,a
cal l,a : call procedure a at level l
int 0,a : increment t-register by a
jmp 0,a : jump to a
jpc 0,a : jump conditional to a