Hacker News new | past | comments | ask | show | jobs | submit login
How to think about compiling (uptointerpretation.com)
40 points by hardwaregeek on Dec 11, 2022 | hide | past | favorite | 13 comments



I feel like many people missed one important point before they complained about how hard it is to write a compiler: start simple! You don’t have to start with mature languages like Java even you really want just support a subset of it. Same for codegen targets: try some simpler (pseudo) architectures/bytecode before even thinking about the real ones like JVM bytecode or native assembly


Agreed! Start with something really really simple like arithmetic. Part of the motivation for this blog post was that I struggled to get over the hump that came once I had done the basic stuff like arithmetic, variables, if/else, top level functions, and got to the harder bits like closures, garbage collection, etc.


I did just that, compiling reverse-polish mathematical operations, such as "1 3 / 9 *", into x86 assembly.

Even trivial compilers like that are fun to work on, as reminders of the low-level facilities available.

https://github.com/skx/math-compiler


If you look at many early compilers (BCPL, Mini Pascal, B, ... and come to think of it, arguably Python etc.) they massively simplified their lives by only doing codegen to very small VMs.


Not VM. ISA.


(ISA is https://en.wikipedia.org/wiki/Instruction_set_architecture in case anyone else was wondering).


which? B? I'd thought that was at least threaded. BCPL and Mini Pascal definitely had VM targets...

Edit: anything targeting PDP-8 native code could be considered to target a reduced ISA, but I don't think any of those were really mainstream. PDP-7 as a target looks like a pretty vanilla ISA to me?


I can't understand what you are trying to say. Are you calling a bytecode interpreter a virtual machine? Are you thinking of abstract machines like the p-machine or warren abstract machine (which were really just bytecode interpreters with fancy names)? Every BCPL compiler I know of generated assembly code; Mini Pascal was written in OCaml so implicitly targeted its bytecodes...?

And what do you mean that the PDP-8 had a "reduced ISA"?

I'm utterly puzzled here.


yes, bytecode interpreter. Martin Richards' BCPL[0] generates bytecode (but maybe there's an optional native codegen that gets bootstrapped that way?). Niklaus Wirth's Mini[1] Pascal was load-and-go in the sense that it would JIT a rudimentary bytecode into memory and then interpret that.

PDP-8 I was just guessing at trying to figure out what you meant, not realising we were talking about different implementations. (thinking you might have meant ISAs were simpler back then — IIRC PDP-8 ISA is not very different from the very small bytecode Wirth's mini Pascal interpreted [edit 2: cf infra], but PDP-7 ISA, while simpler than modern ones, is in my eyes only quantitatively and not qualitatively so)

In any case, the biggest transformation is from a tree-structured AST down to a linear (associative) sequence of data and ops, and if one has learned how to do that for an interpreted bytecode, it's plug-and-chug to do the same for a native ISA.

[0] https://www.cl.cam.ac.uk/~mr10/BCPL.html

Edit: [1] sorry, got my names confused. Wirth's is PL/0. See bottom of page at http://pascal.hansotten.com/niklaus-wirth/pl0/

also, I just realised the confusion with the other concept of virtual machine in terms of guest supervisor levels. I was using the virtual machine which was virtual in the sense that it is program, not machine, interpreted and hence usually designed to be an easy target.

Edit 2: eg the PL/0 virtual machine ops:

    lit 0,a  :  load constant a
    opr 0,a  :  execute operation a
    lod l,a  :  load varible l,a
    sto l,a  :  store varible l,a
    cal l,a  :  call procedure a at level l
    int 0,a  :  increment t-register by a
    jmp 0,a  :  jump to a
    jpc 0,a  :  jump conditional to a


That's why I've been glad the term abstract machine has often been used for this case. The jvm would better have been called the "jam"!


> I mean, how the heck do you take a nice, elegant language like Java and turn it into [..] assembly?

Here's the neat part: you don't!


...follow the dependencies?

JVM is built on C syscalls.


> how the heck do you take a nice, elegant language like Java

Ahh, it's a work of parody. :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: