A Compiler Writing Playground

obi1kenobi · on Nov 26, 2022

Folks interested in this might also enjoy my hands-on blog series about compiler optimizations: https://predr.ag/blog/compiler-adventures-part1-no-op-instru...

It shows the process of building an optimizing compiler for the MONAD language from last year's Advent of Code. It covers topics like:

- no-op elimination (episode 1)

- constant propagation (episode 2)

- value numbering (episode 3)

- value range analysis (episode 4, coming out ~next week)

beefman · on Nov 26, 2022

The playground seems to be at: https://citw.dev/tutorial/create-your-own-compiler

lloydatkinson · on Nov 27, 2022

I wish more resource were available on the part most of these leave out. Emitting the AST into machine code or IL, or instead integrating with llvm. Seems these articles all assume people would only be interested in parsers and interpreted non-compiled languages.

dukoid · on Nov 27, 2022

Turning an expression tree into basic machine code is typically relatively straightforward if the target is stack-based. Basically you do a post-order traversal. For leaf nodes, just push the value. For other nodes, obtain the arguments from the stack, perform the operation and push the result.

evnix · on Nov 28, 2022

let me teach you 4 years of physics and now you can build a rocket that goes to the moon. compiler books feel exactly like this, parsing something seems obvious but most compiler books devote 90% of their content to parsing, but how do you convert something from a class to binary code or how do you represent loops and modules, you don't even know where to begin. I figured out parsing pretty quickly and was able to build a working parser in a week, it took me a good few months to grasp converting that to x86 instructions.

lloydatkinson · on Nov 28, 2022

Yes! Thank you! This is 100% what I mean. Especially higher level constructs like structs or classes as well. Emitting is for some reason even more of a dark art than parsing. How crazy is that?

samsquire · on Nov 26, 2022

Thank you.

There is also OMR

https://github.com/eclipse/omr

but I'm not sure how powerful that is.

I started writing a simple multithreaded interpreter that processes an imaginary assembly. Here's a program in that imaginary assembly that sends integers to other threads and then sends a jump instruction to another thread to jump to some code. It uses an pseudo actor implementation I wrote that can send 20-100 million messages between threads a second. When sending a single integer at a time it's a lot slower and can only send 1.7 million messages a second due to overhead of the interpreter. Switch interpreters aren't fast such as deegen or Java's template interpreter or V8

  threads 25
  <start>
  mailbox numbers
  mailbox methods
  set running 1
  set current_thread 0
  set received_value 0
  set current 1
  set increment 1
  :while1
  while running :end
  receive numbers received_value :send
  receivecode methods :send :send
  :send
  add received_value current
  addv current_thread 1
  modulo current_thread 25
  send numbers current_thread increment :while1
  sendcode methods current_thread :print
  endwhile :while1
  jump :end
  :print
  println current
  return
  :end

I then started writing a parser for a high level language and then code generation from the AST to the imaginary assembly. My interpreter is multithreaded and can send integers between interpreters. It is very early and doesn't do much.

Code is at https://github.com/samsquire/multiversion-concurrency-contro...

The high level language looks similar to Javascript except I tried to parse everything as an expression. I need to change the parser to parse functions as expressions.

I was experimenting with Protothreads in C recently to try understand how it worked and I wrote a giant switch statement and a while loop in Java to simulate async/await. It would be interesting to do codegen for coroutines.

here's that giant switch statement and scheduler https://github.com/samsquire/multiversion-concurrency-contro...

One idea for a stackless design I had was to preallocate memory for each method call referenced by a method for a call to that function and avoid a stack altogether. This would allow coroutines between methods and avoid the function colour problem because everything is a coroutine.

Is there any communities for programming language developers? Where do all the language developers meet up and talk theory and implementation? I am on eatonphil's discord and we talk there.

One problem I am trying to understand how to solve is how you would write a multithreaded interpreter and language that allowed parallel interpretation similar to C# and Java. If the allocator is thread safe and you share an object pool between interpreters and you hash object equality by sourcecode, then you could send objects between threads with only a synchronization cost.

I believe Python has the problem that object identity is different in each subinterpreter so you need to marshall the data.

dunham · on Nov 26, 2022

> Is there any communities for programming language developers? Where do all the language developers meet up and talk theory and implementation? I am on eatonphil's discord and we talk there.

There is a discord for that: https://proglangdesign.net/wiki/discord

I've briefly looked into it. It looks like it is fairly active and has the types of discussions you're looking for. It seems there are a lot of people out there trying to write languages.

I've done some small expression languages for work (and customized rhino js for a work project). I plan to build out a small language (self hosting, dependent typed, compile to JS) as a learning exercise, but at the moment I'm learning dependent type theory and contributing to Idris.

MH15 · on Nov 27, 2022

Yep this is the place. I've drifted away from my language project but the year or two I was actively developing it this community was exceedingly helpful.