
MLton Compiler Overview - undecidabot
http://mlton.org/CompilerOverview
======
Athas
This is a very nice wiki. I am implementing an ML dialect myself, and studying
the implementation of MLkit, MosML, and Mlton has been very enlightening. I
work with the maintainers of the former two, but my compiler has ended up
being more like Mlton in its design.

~~~
rwmj
So am I! What parser are you using? I'm using a recursive descent (ie hand
written) parser and it has some troubles parsing ML completely.

~~~
user2994cb
Just out of interest (I worked on a table-driven parser for SML a couple of
decades ago) - what bits of ML syntax is giving you trouble? I remember one
bit that (theoretically) needed unbounded lookahead, and user-defined infix
operators were fun.

~~~
rwmj
There's a difficult problem with precedence and unary operators. For example:

    
    
        set_field !list_end 1 cons
    

cannot be parsed. Anyway since then I rewrote the lexer to work like the
sibling comment and I'm still getting that working.

------
mattnewton
Oh wow, I had Professor Fluet in college, and taught me functional programming
concepts in haskell and lisp. Cool to see his work on the front page of Hn.

In addition to being a smart guy he was a really good professor. I really
enjoyed the class (survey of many different paradigms), even if I did come
away jaded and thinking that all the standard languages I have used in my
career since are poor approximations of what they could be!

~~~
bjg
+1 Professor Fluet was one of my favorite professors, it was his first year at
RIT, and he did an awesome job.

------
rwmj
With work-in-progress support for RISC-V:
[https://github.com/agoode/mlton/commits/riscv-
wip](https://github.com/agoode/mlton/commits/riscv-wip)

~~~
hajile
Now if we could only get better threading support.

------
nickpsecurity
The highest-performing compiler for ML's is MLton. The highest-correctness
compiler is CakeML. Anyone looking for an interesting project in compilers and
verification should consider porting some MLton techniques to CakeML.

[https://cakeml.org](https://cakeml.org)

~~~
tytytytytytytyt
In what ways is it more correct?

~~~
steinuil
It uses formal methods to verify that the generated machine code has the same
behavior as the source program.

~~~
tytytytytytytyt
That's not at all helpful - I meant specific examples...

~~~
chrisseaton
They mean 'correct' in the technical PL sense.

So they don't mean there are specific things that are incorrect in MLton -
there may not be any examples anyone can give you - they mean that we don't
know mathematically how correct MLton is or not, because nobody has done that
work, and we do know to a better extent mathematically that CakeML is correct,
because it has been mathematically proven to be correct to a certain degree.

------
sevensor
I don't know compilers, but this seems like a lot more steps than I would have
expected. Is it actually an unusually large number of transformations, or is
that just how compilers are done?

~~~
undecidabot
Compilers come in all shapes and sizes.

Some are single pass like Turbo Pascal. They directly generate machine code
while parsing (no AST!). Niklaus Wirth (Pascal/Modula/Oberon guy) wrote a book
following this approach [1].

Some have multiple passes like Chez Scheme. They have many simple passes,
which they call a "nanopass". Andrew Keep has a great talk on this approach
[2].

In practice, most compilers today are multi-pass, though probably not as many
Chez. If we look at Rust, they go from AST -> HIR -> MIR -> LLVM IR -> machine
code [3]. There are probably more things going on from LLVM to machine code,
but I'm not knowledgeable enough to comment on it.

I think the trade-off here is clear: less passes -> shorter compile times,
more passes -> faster code generated, more modular compiler. Martin Odersky
(Scala guy) has a paper attempting to get the best of both [4].

[1]
[http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf](http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf)

[2]
[https://www.youtube.com/watch?v=Os7FE3J-U5Q](https://www.youtube.com/watch?v=Os7FE3J-U5Q)

[3] [https://blog.rust-lang.org/2016/04/19/MIR.html](https://blog.rust-
lang.org/2016/04/19/MIR.html)

[4]
[https://infoscience.epfl.ch/record/228518/files/paper.pdf](https://infoscience.epfl.ch/record/228518/files/paper.pdf)

~~~
le-mark
The unmentioned history here is memory use. In ye olden days, machines had
much less memory and required more passes, simply because there was no space
for all the data. So one might see a pass for preprocessing, lexing, ast
generation, optimizations, code generation, etc. Note some of this exist
today, ie gcc integrates with gas and flags to dump the assembler.

~~~
pjmlp
Yep, implementing a compiler with systems having 512 KB maximum on average did
not leave too much space for clever optimizations.

Using compilers on 8 bit systems was even worse (max 64KB).

Many game studios used UNIX/VMS systems, with cross compilers to upload data
into ZX and C64 computers as development cycle.

------
nikofeyn
i have seen some work regarding a real-time version of mlton. i would love to
see this work done more because nothing would please me more to be able to do
embedded systems with an ML language.

