So, when I was starting to work on my own toy compiler, I was totally going to go with the vertical approach... Well, I quickly realized it didn't work for me for writing my first ever compiler.
You see, when you go vertical you have to learn a little about every layer of a compiler with every language feature you add: a bit of lexing, a bit of parsing, a bit of semantic analysis, a bit of assembly generation. Basically, you have to constantly switch context between the layers. And I personally found it much easier to learn about managing stack, calling conventions and translating expressions into assembly in one go without switching context away from the code generation layer.
In contrast, layers of compilers tend to be rather loosely coupled, so you don't have to think about parsing at all while working on semantic analysis.
But if I ever set out to write a second compiler, I'll try to go vertical again. Now, that I have more or less clear idea of how to build each layer, it feels I can build them adding slice by slice. What helps is I can keep a picture of the end result in mind when adding each piece.
In a sense, even if my student's project is their first compiler, they already saw and did exercise on each compiling step individually, a bit like you said.
I agree, this is a illustrative example of data coupling: Very loose, easily observable and there are even nice visualization techniques for certain steps. Each can be understood and tested in isolation.
Also how vertical is the "vertical" approach really? In a first iteration you ought to lay the foundation of each component. Later you just add additional branches/structures so to speak.
This is fine, but I think there is a minimal sub-language that you want to implement on the first iteration, that is sufficiently small to not to get lost in the more hairy details, but large enough so it presses you to find a 'workable flexibility' for each of the components.
It can be quite vertical, really. Like starting with a compiler for a language where the only correct programs consists of a single integer. This is what I do with my students to get them started on their compiler project. Of course you may have to rethink and rewrite or adapt some code depending of the order in which you're adding features and how they interact, but that will be true anyway if your language evolve :).
After this experience, I do agree that going with the "vertical" approach you described would be significantly easier - I actually considered doing this for just the raw lambda calculus so that I could get CPS transform and native codegen working on a smaller scale.
Unfortunately, I haven't had time to work on the project in a while (I'm finishing my PhD in chemical biology), but I hope to come back to it soon.
I realize this is more about Rust and writing compilers. But if anyone is interested in finding out more about Standard ML, give a shot to the Programming Languages course on Coursera. Part A is devoted to Standard ML (SML) . To say the instructor is great would be a huge understatement. The code we wrote for assignments used only immutable values, recursion instead of loops, partial application. It didn’t make me a hardcore functional programming proponent, but seriously broadened my horizons. On top of that, we used pattern matching extensively, which in combination with discriminated unions impressed me the most: using these two to model a lot of problems felt really natural.
GhostCell: Separating Permissions from Data in Rust
The essential idea is that you can use one of these "GhostCell" structures to assign a lifetime to a related collection of objects (such as the nodes in a graph), which allows the entire graph to be treated as if it had a single owner and a single lifetime rather than a separate owner and lifetime per node.
The paper is relatively recent (2021), and GhostCell is not, so far as I am aware, used in rustc at this point.
I don't know the details of how this is being used in the project, but it might be worth investigating.
Hope that helps.
If Rust claims to solve memory management, then we should not fool ourselves by allocating everything in a huge chunk which is then freed when the program ends.
This loses the whitespace and consequently positions... but should be enough to get you started...
I use a UTF-8 crate to split words which keeps all whitespace, so I can keep positions correctly, in case you want to do that:
There is no SSA IR in rustc itself (MIR, regrettably, only lowers control-flow, but uses variables instead of representing dataflow).
So basically it sidesteps the type system issues in a similar way that an inconvenient car trip instead of taking a flight sidesteps the TSA issues? ;)
F# is the most mainstream of the 3. If you can live without the fancier type-system features (trust me, you probably can!) it's the easiest ride.
In practice you have to use a lot of C# class based code, which brings an inherent mismatch to the functional model and makes code look more like a multi-paradigm language like Scala.
The tooling and ecosystem is great (though not as good as for C#, obviously...), the language has some very nice features and design decisions, but you have to live with a somewhat messy reality.
In practice sticking to C# is probably wiser.
And a talk from Don Syme, the author of F#, on good F# code as they see it . It's a great talk overall and they specifically address OOP's good and bad bits starting from about 39:00.
When I (an utter beginner in it) code in F#, I miss ReSharper a lot. I wish there were an IDE extension with features similar that of ReSharper but for F# rather than C#.
Personally, as an Ocaml coder, I make extensive use of Ocaml's fancier type-system features and would be sad to lose them.
One could argue that not only you can, but you should (for the sake of having more readable programs).
I'm not sure there is much reason to use SML itself today, given that it has stagnated since the late 90s, but it is an excellent language for teaching programming because you don't have to sweep anything under the rug. But if you want to use it, then it does actually have well-maintained pretty good compilers. MLton probably generates faster code than OCaml and F# in many cases. The main problem with SML is that there are not a lot of SML programmers around, and hence very few libraries, particularly for handling modern protocols and formats.
It seemed like both I and a lot of other students had a lot of issues with it. It seems like the university agreed as they later moved away from it.
Basic SML might not be hard to learn but you quickly have to go to concepts like recursion that is not a very easy concept to grasp for beginners. SML seems more suited for the theoretical mathematical (for lack of better words) parts of computer science rather than the more practical parts.