I wish more resource were available on the part most of these leave out. Emitting the AST into machine code or IL, or instead integrating with llvm. Seems these articles all assume people would only be interested in parsers and interpreted non-compiled languages.
Turning an expression tree into basic machine code is typically relatively straightforward if the target is stack-based. Basically you do a post-order traversal. For leaf nodes, just push the value. For other nodes, obtain the arguments from the stack, perform the operation and push the result.
let me teach you 4 years of physics and now you can build a rocket that goes to the moon. compiler books feel exactly like this, parsing something seems obvious but most compiler books devote 90% of their content to parsing, but how do you convert something from a class to binary code or how do you represent loops and modules, you don't even know where to begin. I figured out parsing pretty quickly and was able to build a working parser in a week, it took me a good few months to grasp converting that to x86 instructions.
Yes! Thank you! This is 100% what I mean. Especially higher level constructs like structs or classes as well. Emitting is for some reason even more of a dark art than parsing. How crazy is that?
I started writing a simple multithreaded interpreter that processes an imaginary assembly. Here's a program in that imaginary assembly that sends integers to other threads and then sends a jump instruction to another thread to jump to some code. It uses an pseudo actor implementation I wrote that can send 20-100 million messages between threads a second. When sending a single integer at a time it's a lot slower and can only send 1.7 million messages a second due to overhead of the interpreter. Switch interpreters aren't fast such as deegen or Java's template interpreter or V8
threads 25
<start>
mailbox numbers
mailbox methods
set running 1
set current_thread 0
set received_value 0
set current 1
set increment 1
:while1
while running :end
receive numbers received_value :send
receivecode methods :send :send
:send
add received_value current
addv current_thread 1
modulo current_thread 25
send numbers current_thread increment :while1
sendcode methods current_thread :print
endwhile :while1
jump :end
:print
println current
return
:end
I then started writing a parser for a high level language and then code generation from the AST to the imaginary assembly. My interpreter is multithreaded and can send integers between interpreters. It is very early and doesn't do much.
The high level language looks similar to Javascript except I tried to parse everything as an expression. I need to change the parser to parse functions as expressions.
I was experimenting with Protothreads in C recently to try understand how it worked and I wrote a giant switch statement and a while loop in Java to simulate async/await. It would be interesting to do codegen for coroutines.
One idea for a stackless design I had was to preallocate memory for each method call referenced by a method for a call to that function and avoid a stack altogether. This would allow coroutines between methods and avoid the function colour problem because everything is a coroutine.
Is there any communities for programming language developers? Where do all the language developers meet up and talk theory and implementation? I am on eatonphil's discord and we talk there.
One problem I am trying to understand how to solve is how you would write a multithreaded interpreter and language that allowed parallel interpretation similar to C# and Java. If the allocator is thread safe and you share an object pool between interpreters and you hash object equality by sourcecode, then you could send objects between threads with only a synchronization cost.
I believe Python has the problem that object identity is different in each subinterpreter so you need to marshall the data.
> Is there any communities for programming language developers? Where do all the language developers meet up and talk theory and implementation? I am on eatonphil's discord and we talk there.
I've briefly looked into it. It looks like it is fairly active and has the types of discussions you're looking for. It seems there are a lot of people out there trying to write languages.
I've done some small expression languages for work (and customized rhino js for a work project). I plan to build out a small language (self hosting, dependent typed, compile to JS) as a learning exercise, but at the moment I'm learning dependent type theory and contributing to Idris.
Yep this is the place. I've drifted away from my language project but the year or two I was actively developing it this community was exceedingly helpful.
It shows the process of building an optimizing compiler for the MONAD language from last year's Advent of Code. It covers topics like:
- no-op elimination (episode 1)
- constant propagation (episode 2)
- value numbering (episode 3)
- value range analysis (episode 4, coming out ~next week)