
Why ML/OCaml are good for writing compilers (1998) - jasim
http://flint.cs.yale.edu/cs421/case-for-ml.html
======
GreaterFool
Every time I see ML on HN I hope it's about ML the language and I'm sad when
it turns out to be Machine Learning instead, haha.

It's a shame ML-family of languages isn't very popular. 1ML for instance could
be a fantastic modern language but I don't see that happening.

~~~
adrusi
What is 1ML, I've never heard of it and it's hard to google.

~~~
riscy
1ML is basically SML with first-class modules. It's very recent research
(paper appeared in ICFP 15) so I'm not surprised there isn't much coming up.

EDIT: I'd like to add that the video from ICFP is up:
[https://www.youtube.com/watch?v=42Wn-
mXWcms](https://www.youtube.com/watch?v=42Wn-mXWcms)

~~~
david-given
Is there a compiler for it yet?

~~~
riscy
only a prototype interpreter [1]. this stuff is fresh out of the research lab

[1] [http://www.mpi-sws.org/~rossberg/1ml/](http://www.mpi-
sws.org/~rossberg/1ml/)

~~~
david-given
Can't make it build, unfortunately:

    
    
        ocamlopt -c lib.ml
        File "lib.ml", line 24, characters 49-58:
        Error: Unbound value List.mapi
    

TBF, this is probably a good thing; it's not like I have any time...

~~~
gnuvince
List.mapi was added in OCaml 4.00.0 [1], so it's likely that you are using an
older version of the compiler.

[1] [http://caml.inria.fr/pub/docs/manual-
ocaml/libref/List.html](http://caml.inria.fr/pub/docs/manual-
ocaml/libref/List.html)

------
MaxScheiber
In my experience, #4 has held up extremely well over time and is the most
compelling reason for choosing an ML over another language. Algebraic data
types are an absolute joy to work with when dealing with abstract syntax
trees. The other points are obviously great to have, but strong language
support for ADTs is key, in my opinion.

I'm not certain I agree with #3. It seems to defeat the purpose of a strong
type system. Either way, it can be very nice to express meta-level constructs
in a matching object-level type. For example, if the language you are writing
a compiler for has an int32 datatype, but you use an int64 in the language
you're writing the compiler in, you'll need to simulate overflow. It would
just be easier and safer to use an int32 in both places.

These days, I'd recommend Menhir over ocamlyacc unless you have a very
specific use case that the former breaks on but the latter works on.

~~~
wtetzner
> In my experience, #4 has held up extremely well over time and is the most
> compelling reason for choosing an ML over another language.

I have a hard time choosing between ADTs and the module system, honestly. If I
really had to chose, I'd probably pick ADTs, but it'd be very close. It's
really disappointing every time a new language comes out and it has a weak
module system. Especially F#, which was primarily influenced by OCaml.

------
gnuvince
I love writing compilers in OCaml: I T.A. a compiler class and my own
implementation of the project (a compiler for a not-quite subset of Go) is
written in OCaml. Compared to my thesis project which is written in Java,
OCaml is a breath of fresh air and I feel that the single, most important
feature of OCaml is its type system, especially having sum types available.
Once you learn to use sum types to design your solutions, it's very hard to go
back to other languages that don't support them and where you need to figure
out the least painful way to emulate them.

~~~
smcl
Hey - minor long shot but have you got a link to the course? Or is it on
edx/coursera?

~~~
MaxScheiber
Not the original comment author, but I thoroughly enjoyed taking Penn's
undergraduate compilers course, which used OCaml to compile a basic OO
language with single inheritance and dynamic dispatch.
[http://www.seas.upenn.edu/~cis341](http://www.seas.upenn.edu/~cis341)

------
sklogic
MLs are way much better than the average blubs for writing compilers, but yet
they still have some issues. The worst is the amount of a boilerplate code
needed to implement even the smallest pass over an AST. Say, you want to visit
and rewrite all the identifier nodes, but yet you must implement recursive
functions for _all_ the node types. In Haskell a bit of the Scap Your
Boilerplate magic relieves this problem a bit, but still a long way to go to
reach a level of convenience of Nanopass.

Next issue is related: slightly amended AST types are hard to define, they
cannot be derived fromtypesxisting one by a simple rewrite.

Also, MLs do not allow an easy way to handle metadata transparently.

What I really want to have is ML type system and Nanopass density combined in
one language (working on it, not done yet).

~~~
rwmj
You can usually generate the visitor function, if your ML has a decent macro
language (as does OCaml).

I'll give you the one about amended ASTs though. I once had to write an HTML /
CSS processor. It started with the HTML DOM [tree] and performed successive
processing stages on it, each time adding a little bit of extra data into the
tree and extra node types. I had to tediously write an html type, an html1
type, an html2 type and so on. Never did find out if there's a better way to
do this.

~~~
wtetzner
I don't know that anyone has really solved that problem:

[http://lambda-the-ultimate.org/node/4170](http://lambda-the-
ultimate.org/node/4170)

Honestly, maybe the solution is just to look at the problem a different way,
and use an attribute grammar instead, like Ted Kaminski mentioned.

~~~
jasim
Asking as a complete layperson, are there any other strongly typed languages
that doesn't have this issue when parsing an AST and adding metadata during
subsequent passes?

~~~
wtetzner
Well, in ML you can easily create an AST type that allows you to add metadata
as you go, by using option types. The problem is that what you really want is
to have multiple representations of the AST, each only supporting relevant
data.

You could potentially solve the problem using macros, to generate different
versions of the AST, but it doesn't solve the problem of having multiple ASTs.

I suppose you could parameterize your type with whatever metadata should be on
it, which means you only have to create one tree. But then pattern matching on
it becomes a little bit messier, and your type is a bit more complicated.

So it's not that the problem doesn't have any solutions at all, it's just that
none of them are totally satisfactory.

The reason you don't have this problem in untyped languages is that you aren't
trying to type them :). The first solution I mentioned, using option types, is
effectively how you would handle it in a dynamic language.

------
xvilka
One of the main stoppers of the wider OCaml adoption, I think, is a poor
Windows support. For example, missing Unicode support [1] or lack of the opam
[2] on this platform.

[1]
[https://github.com/ocaml/ocaml/pull/153](https://github.com/ocaml/ocaml/pull/153)

[2]
[https://github.com/ocaml/opam/issues/2191](https://github.com/ocaml/opam/issues/2191)

------
pcwalton
> ML has, perhaps, the best gc around; it's so fast that for many real apps
> it's as fast as the C++ malloc/free, and maybe even a little faster in some
> cases. You don't have to wring your hands about using ML's gc, as you do
> with Java's, which is slow.

This is from 1998. I highly doubt this is true today. HotSpot now has an
incredibly good generational, concurrent garbage collector.

~~~
jordwalke
Other tracing collectors have certainly caught up since 1998, but I'd keep my
eye on two things that have also happened since 1998:

1\. Work by Damien Doligez which made further improvements to OCaml's current
tracing collector (in 4.03) which further reduced pause times.

2\. The multicore OCaml tracing collector work that's going on right now at
Cambridge. If I understand correctly, this work involved not only partitioning
memory into multiple thread local heaps, and a shared heap, but it also
improved the collector in general.

I haven't tested it but I'm curious if the combination of this work will again
put it ahead of other alternatives.

------
baldfat
I went through the videos of a coursera class for Programming Languages with
Dan Grossman.
[https://www.coursera.org/course/proglang](https://www.coursera.org/course/proglang)

He starts teaching programming with ML and then moves to Racket and ends with
Ruby.

I had tried to teach myself Haskell several times but it always fell flat. I
ended up loving ML and Racket (Especially Racket) the 1ML does look very
interesting. Racket is pretty amazing for me. I learned a ton and was able to
really improve my code in Python and R.

------
jmartinpetersen
SML is one of a very few languages that have a formal description of the
meaning of the language.

[http://sml-family.org/sml97-defn.pdf](http://sml-family.org/sml97-defn.pdf)

~~~
qznc
This is also true for C [0,1] and other mainstream languages [2] now.

[0]
[http://robbertkrebbers.nl/thesis.html](http://robbertkrebbers.nl/thesis.html)
[1]
[https://github.com/kframework/c-semantics](https://github.com/kframework/c-semantics)
[2]
[http://www.kframework.org/index.php/Projects](http://www.kframework.org/index.php/Projects)

------
bpyne
I used SML/NJ for a course I took last Fall. It was my first experience with
ML other than looking over some sample code on the web. I'm not sure why, but
my thought processes just work well with ML. Beautiful language. Very
expressive while light on verbosity.

I heard an interview with Benjamin Pierce in which he extolled the virtues of
the OCaml compiler. While he said that many other languages are interesting to
him, OCaml is the go-to for getting stuff done.

Given people's comments and the linked post, my project for next Summer is
going to be writing a compiler in OCaml for some simple language I'll define
and implement.

------
Drup
For interested people, there is a very nice OCaml MOOC for beginners
currently: [https://www.france-universite-numerique-
mooc.fr/courses/pari...](https://www.france-universite-numerique-
mooc.fr/courses/parisdiderot/56002/session01/courseware/W1/) !

------
kruhft
A (classic?) book on writing compilers with ML:

[http://www.amazon.com/Compiling-Continuations-Andrew-W-
Appel...](http://www.amazon.com/Compiling-Continuations-Andrew-W-
Appel/dp/052103311X)

------
kod
Despite being almost 20 years old, the only thing about this that sounds
anachronistic is extolling the virtues of exceptions failing at runtime.

------
jackweirdy
While #8 is useful, I've been writing a static analysis tool for JS in OCaml
and it's been extremely hard to refactor when type inference is happening on
function parameters. Mostly because I can't easily look to see whether I've
refactored a method to use my new types.

But on the other hand, I think that's because (I at least get the impression)
there's a different strategy for refactoring functional programs effectively.
I haven't quite figured out what that is, though.

~~~
conistonwater
Actually, in Haskell the usual coding style is to explicitly declare the type
of all top-level things, which kind of contradicts his first example of 8. The
idea is that every time you change a type, all uses of it that use the
previous type are then highlighted as compile-time errors, which is definitely
useful for refactoring. I'm not sure about the example that he gives, but if
used this way type inference is actually quite useful for refactoring. Plus,
all intermediate non-top-level types are adjusted automatically whenever
possible, which saves work — I think this is really what he's saying.

------
devit
In the meantime, several newly designed languages do in fact have most or all
of these features along with other pragmatic advantages.

Rust and Scala being perhaps the most popular and maybe best ones.

------
groovy2shoes
Previous discussion:
[https://news.ycombinator.com/item?id=3002838](https://news.ycombinator.com/item?id=3002838)

------
jhallenworld
These sound like good reasons for using it for compiler front ends. Is it
equally helpful compiler back ends? I mean optimization and register
allocation.

