
Why ML/OCaml are good for writing compilers (1998) - monssoen
http://flint.cs.yale.edu/cs421/case-for-ml.html
======
hongbo_zhang
For web developers who are looking for an industrial strength functional
language instead of JS, OCaml probably has the best story here.

Actually it has two OCaml->JS compilers of very high quality The first one,
js_of_ocaml, could bootstrap the whole compiler several years ago(probably the
first one there).

The recent one,
[https://github.com/bloomberg/bucklescript](https://github.com/bloomberg/bucklescript),
push the JS compilation into next level, it generates fairly readable code,
good FFI story, and its compilation is _extremely_ fast, check out the
compiler in JS version([http://bloomberg.github.io/bucklescript/js-
demo/](http://bloomberg.github.io/bucklescript/js-demo/)), and imagine how
fast it would be for the compiler in native version. BuckleScript has a good
story for Windows, and generates fairly efficient code, see benchmark here:
[https://github.com/neonsquare/bucklescript-
benchmark](https://github.com/neonsquare/bucklescript-benchmark) BuckleScript
is already used in production by big companies: for example Facebook
messenger.com 25% is powered by BuckleScript, the new WebAssembly spec
interpreter by Google is also partly cross compiled into JS by BuckleScript.

Disclaimer: I am one of the authors of BuckleScript

~~~
atombender
I'm optimistic about Reason, Facebook's new syntax "skin" on top of OCaml. I
find OCaml's syntax to be quite gnarly; of the MLs, F# is probably the
cleanest and most modern-feeling. Something like F# without the .NET stuff
could have been amazing.

~~~
haskellandchill
I find OCaml's syntax simple and clear. I don't get the reason for Reason, but
hope it leads to more OCaml adoption.

~~~
e12e
OCaml might be simple and clear, but starting from Standard ML - there's a
number of differences that feel like warts and needless complications for no
discernable gain for the _programmer_.

I do think Reason fix up a few of these ancient and partially crumbled stone
walls making the ocaml landscape easier to criss-cross for a new generation of
programmers.

~~~
cies
> there's a number of differences that feel like warts and needless
> complications for no discernable gain

Such as?

~~~
e12e
It's been a while - but pretty much what Reason address - although I'd prefer
filling in SML for js-ish syntax.

------
Dangeranger
After learning Elm I wanted to understand ML/OCaml a bit more, so I worked
through some documentation from the OCaml site and walked away pleasantly
surprised.

After using it for a couple of weeks I am confused why ML/OCaml aren't more
popular. They are safe, functional, stable, fast, and have great tooling. They
seem poised to take over the functional domain.

While the syntax took a little getting used to ( emphasis on little) once you
are used to it, it's very natural. Union types are wonderful, and the implicit
type safety within programs was nice.

~~~
throwaway7645
Crappy windows support for OCaml. F# is similar to OCaml, but is very
difficult for beginners and those not familiar with .NET. I also don't see a
lot of beginner material for OCaml.

~~~
poizan42
At computer science at the University of Copenhagen F# has been used for the
introduction courses the past couple of years, which as far as I know has been
a big success, so I'm doubtful of it not being beginner friendly

~~~
throwaway7645
Mileage is probably not representative when you're in a university setting
being taught by professors in an intro course versus a professional trying to
use a multitude of the data science libraries and having trouble tying it
together. Python has a zillion books published and a large percentage of them
are beginner oriented. F# has only a few books and they are almost all for
experts.

~~~
jackfoxy
The community surrounding F# today is very supportive and helpful with people
of all experience levels. Compared to when I learned F# in 2010 it is orders
of magnitude easier and more social. So many resources: the F# Foundation
website [http://fsharp.org/](http://fsharp.org/) is a good place to start.
[http://fsharpforfunandprofit.com/](http://fsharpforfunandprofit.com/) and the
book form of the site's articles on github
[https://swlaschin.gitbooks.io/fsharpforfunandprofit/content/...](https://swlaschin.gitbooks.io/fsharpforfunandprofit/content/site-
contents/index.html) has great information for people at all levels. And
finally just get on twitter #fsharp to start meeting people in the community.

~~~
throwaway7645
I've gone through all of those and although it's nice, it is not what I'd call
beginner friendly at all. Python has books teaching you to program in Python
which is a really good way to learn a programming language. Scott's tutorial
using Frankenstein to explain Monads is interesting, but I need to see how to
build simple programs and modules first. How does one organize a program using
pure FP, or what's the best way to mix in OOP and it is really confusing to
learn all the pragmas and compiler directives. I'm not sure if I'm using the
right terminology, but a lot of example code uses FSI which has to call the
modules differently than if you make an executable. I really would love
nothing more than F# to be my go to language, but I need a little more help
getting there. I realize not all users have this problem though.

~~~
jackmott
i am an expert and Scott's website is usually bananas to me. For some people
his approach doesn't seem to click.

A good beginner book for F# is a good idea.

~~~
throwaway7645
Glad to hear I'm not the only one lol. I'd spend top dollar for a beginner's
book focusing on creating short 1/2 page programs like guess my number,
hangman, plotting graphs...etc and only mix in things like currying and monads
later in the book along with C# interop.

------
rbehrends
I'll note that some of the aspects don't necessarily work out like that in
practice:

1\. The GC part is true, but one has to remember that this was written at a
time when GC was still a bit of an unusual feature in mainstream languages.

2\. Tail recursion doesn't really make much of a difference for walking trees,
which is recursive, but (mostly) not tail recursive.

3\. OCaml in particular uses 63/31-bit ints due to implementation details,
which isn't a good fit for 64/32-bit integers. The strings and bignum part is
mostly right, though.

4\. ADTs can be good or bad for describing ASTs. Once you enrich ASTs with
semantics shared by all variants (such as source coordinates), inheritance can
become a better fit than ADTs.

8\. Type inference doesn't really extend to module signatures, which you have
to write out explicitly (though tooling such as `ocamlc -i` allows you to let
the compiler help you write them). I also generally find it better to
explicitly annotate functions with types. Not only does it make the code more
readable later in its life, but you get fewer truly impenetrable type error
messages because you forgot parentheses or a semicolon somewhere.

That said, there are several good points still.

~~~
coolsunglasses
>2\. Tail recursion doesn't really make much of a difference for walking
trees, which is recursive, but (mostly) not tail recursive.

Non-strictness helps here more than TCO in a strict language.

>4\. ADTs can be good or bad for describing ASTs. Once you enrich ASTs with
semantics shared by all variants (such as source coordinates), inheritance can
become a better fit than ADTs.

Since this article was written we have better ways of augmenting/annotating
ASTs. There's a lot of this out there, but here's one example:
[https://brianmckenna.org/blog/type_annotation_cofree](https://brianmckenna.org/blog/type_annotation_cofree)

There are other alternatives that are like inheritance but with better
reasoning properties as well. Finally tagless comes to mind.

>I also generally find it better to explicitly annotate functions with types.

This Haskeller whole-heartedly agrees for all the reasons stated.

~~~
tom_mellior
> Non-strictness helps here more than TCO in a strict language.

Can you explain? Assume I want to fold a function over a large tree _and_
fully inspect the final result. (For example, to compile a large expression to
a piece of code.) If I use non-tail recursion, my stack will be exhausted. How
does non-strictness help with stack usage?

------
nv-vn
The problem with this article is that it's missing an answer to why one would
choose ML/OCaml over Haskell. Haskell has many more features, a more advanced
type system, arguably superior syntax, and much better library support.
However, I believe that OCaml/SML are often a better choice for a number of
reasons.

First of all, OCaml/SML are the best choice in terms of example code for
compilers. They're historically the choice of many compiler/interpreter/type
theory texts (Types and Programming Languages, Modern Compiler Implementation
in ML, and an ML is even used as a language to interpret in Essentials of
Programming Languages). Andrej Bauer's PLZOO is also written in OCaml. Equally
important is the fact that there are a variety of ML implementations, all of
which are much more approachable than GHC. The OCaml compiler's codebase is a
reasonable size that an individual could get a good idea of how it works in a
few weeks or so. SMLNJ, MLKit, MLton, CakeML are all open source and on
Github, and all seem to be fairly approachable in comparison to the monolith
that is GHC. And that's not even mentioning other compiler in ML (Haxe, Rust's
bootstrap compiler, Facebook's Hack compiler, etc.). The fact that there are
real-world compilers with perfectly approachable code bases (even without
great familiarity with the language; compilers in Haskell might require an in-
depth understanding of many of the core type classes and language extensions
available) that are open source is highly attractive to novice compiler
writers.

Additionally, the feature set in MLs is a good choice for compilers. While
they lack some of the cooler features of Haskell, MLs make up for it in
simplicity; lots of the features in GHC's type system (especially with
language extensions) mean very little for 90% of compiler writers, and getting
rid of them from the get-go helps keep the code small and easy to reason about
(even if you won't have as much type safety in the compiler itself). This also
means that there are a lot less ways to do a single thing, which can be nice
when you're not sure exactly how you're going to implement a certain feature.
However, one thing I really find incredibly useful is OCaml's polymorphic
variants. These are pretty much perfect for statically enforcing a nanopass-
like system in your compiler and are a great way of designing your AST/data
types in your compiler. I feel like this gets passed up a ton (as far as I
know I'm the first person who's used them to create nanopasses), but it's
quite convenient and makes OCaml a good competitor for Scheme in this regard.

~~~
throwaway7645
Haskell may indeed be one of the most advanced languages out there in terms of
raw power, but it is very complex (how many monad tutorials does it seriously
take to teach one of the most core pieces of the language) and how much
category theory do you need to know to be moderately effective? Also, the
ecosystem could use some work. An example is the main string library isn't
used in favor of a different one. Using the first and obvious one leads to
performance worse than python and perl even after you compile. I'm being
nitpicky, but anytime someone writes a blog post comparing a programming
language to playing darksouls (game where you die thousands of times) I'd say
you have an issue.

~~~
data_hope
I started to write a toy compiler in OCaml. I had some previous experience
with Haskell, but in no way an expert. I.e. no category theory background,
only shallow exposure to monads.

My "problems" with OCaml started, when I wanted to "map" over a data structure
I defined. I ended up having to define custom mapping functions for all
container-like data structures I wrote and call them in a non-polymorphic
fashion (where I would have just used fmap in Haskell).

Sure, in OCAML I needed to use a parser generator where I would have used
megaparsec in haskell, but it was also a tolerable inconvenience.

Trouble started when I needed to track state in the compilation process. I.e.
I was generating variable names for temporary results and values, and I needed
to track a number that increased. In the end I used a mutable state for it,
and it turned out nightmarish in my unit tests.

After a while, I just ported the code base to Haskell and never looked back.
The State monad was an easy fix for my mutable state issues. Parser
combinators made the parser much more elegant. And many code paths improved,
became much more concise. It is hard to describe, but in direct comparison,
OCaml felt much more procedural and Haskell much more declarative (and
actually easier to read).

The only advantage of OCaml to me is the strict evaluation. I don't think lazy
evaluation by default ins Haskell is a great idea.

~~~
ms013
I assume you were just not interested in passing the state around to the
functions that needed it, and preferred the fact that the state monad hides
that plumbing for you via bind and return. It's worth noting that there exist
Ocaml libraries that provide the same operators and even similar do notation
syntax that desugars to bind/return operators (via PPX).

Ocaml does tend to be more verbose than Haskell - it's just the nature of the
language syntax. E.g., in Ocaml, one says (fun x -> x+1) vs (\x -> x+1).
Similarly, ocaml is cursed by the excessive "in"'s that accompany let
bindings. "Let .. in let .. in let ...". That can get annoying.

Interestingly, I had the opposite experience with a commercial compiler
project. Haskell's syntactic cleverness (monadic syntax, combinator libraries,
etc..) eventually got in the way - it became very difficult to understand what
a single line of code actually meant since one had to mentally unpack layers
of type abstractions. Migrating to ocaml, the verbosity eventually was more
tolerable than the opacity of the equivalent Haskell code once the compiler
got sufficiently complex.

My experience may vary from yours. I've been doing Haskell/Ocaml in production
for many years, so the pain points I've adapted to are likely different than
one working on toy compilers or weekend projects. And no, category theory
exposure is not and never has been necessary for understanding Haskell or FP
unless one is a PL researcher (and even then, only a subset of PL researchers
are concerned with those areas). And one can be quite productive and prolific
in Haskell without a deep understanding of monads and monad transformers - the
blogosphere has given you the wrong impression if you believe otherwise.

~~~
throwaway7645
Thanks for the thorough reply and it sounds like you're quite experienced
here. Any chance going into more detail with what you do for a living? Do you
maintain a compiler for something more mainstream?

~~~
ms013
I cofounded a company recently that is using code transformation and
optimization methods to accelerate data analytics code on special purpose
hardware. Our compilation toolchain is all ocaml, and the language that is
compiled/transformed/optimized is Python. Prior to this venture, I did similar
work - code analysis and transformation, but in that case largely around high
performance computing for scientific applications. That tooling was mostly
ocaml/Haskell, but not production focused - it was mostly research code.

------
vmasto
For anyone interested and isn't aware yet Facebook is developing Reason, a
layer over OCaml. I've been fiddling with it for the past couple of weeks and
coming from JavaScript I personally found the experience generally enjoyable.

[https://facebook.github.io/reason/](https://facebook.github.io/reason/)

~~~
mhink
Hah! Thanks for the link- I discovered this gem just now in their list of
comparisons to JS syntax:

    
    
      Javascript    | Reason
      --------------+----------------------------
      const x = y;  | let x = y;
      let x = y;    | reference cells
      var x = y;    | No equivalent (thankfully)

~~~
vmasto
:) Yeah, unfortunately JavaScript can only be improved by educating people not
to use the horrible parts (rather than fixing/deprecating them which is not
possible). const and let were extremely necessary.

------
StrykerKKD
I agree that Ocaml is just extremely well suited for making new programig
languages. If you are interested in Ocaml+programming languages check out the
plzoo: [http://plzoo.andrej.com/](http://plzoo.andrej.com/)

I personally think that Ocaml is really good at this, because I started
converting the Scheme examples from the PLAI book to Ocaml and it's just felt
right(maybe because I'm not fan of the scheme syntax).

------
chairmanwow
Currently taking a compilers course that uses SML/NJ and it has been an
absolute delight. The functional paradigm is a little strange to get used to
at first, but after a while its strong suits make themselves known. The
trivial type inferences and pattern matching capabilities make it easy to
efficiently describe complicated and precise program situations.

------
kornakiewicz
I'm a relatively young developer (three years older than Java) and don't fully
get the thing about exceptions (point 7). It sounds very familiar to the
solution I know from Java, and it does not make safety - or even feeling about
it - any better. If you can still write code that can throw exception and an
explicit assurance about it is the only way to prevent the crash, it doesn't
change anything, actually.

P.S. I'm not really familiar to ML/OCaml, but have decent experience with
large code bases in languages that are not very keen to protect you from
yourself.

~~~
gizmo686
Exceptions in ML languages are very similar to those in Java. The reason for
that is simple: they are simply a good way of dealing with computations that
might fail. Having said that, exceptions are best used (in any language) where
you want to deal with the failure way up in the call stack. If you catch the
exception right where it occurs, you should just use a safe method that
reports a failure in the return value.

Speaking as a Haskell programmer, never use exceptions. You can get away with
this advice because the _Either_ monad allows you to have the behavior of
exceptions (namely, at any point you can "fail" a computation and have the
error automatically propagate up to the handler). However, this approach
relies heavily on having a type system more advanced than OCaml's in order to
be reasonable.

~~~
thedufer
The Either pattern for errors works great in OCaml; most of the code I write
uses it. I'm not sure what problem you're referring to.

------
agumonkey
FP is based on recursion, and recursive types, inductive algorithms are
essential for linguistic processing and transformation.

------
hyperpallium
How are simple parsers written in ML (or ocaml)?

You can't use the coding style used for recursive descent in the Dragon
compiler book, without using mutable variables.

Do you have to use parser combinators, which have their own limitations?

~~~
mafribe
Parser combinators tend to work by recursive descent, so cannot handle left
recursive grammars [1], and tend to be really slow. The latter is not a
problem for many applications, but removing left recursion can be irritating
even for small grammars. It is possible to build combinator parsers that can
handle all context-free grammars [2], but I'm not sure any of Ocaml's are
built that way.

In any case, Ocaml has parser generators that are fast, do bottom-up parsing
(hence handle left-recursion without issue) and _not_ based on parser
combinators, e.g. ocamlyacc [3].

I'd use parser combinators for quick prototypes, and, if measurement shows a
performance problem, replace them with a generated (e.g. by ocamlyacc) parser.
As far as I remember the parser in Ocaml's (superfast) compiler is generated
by ocamlyacc.

[1]
[https://en.wikipedia.org/wiki/Left_recursion](https://en.wikipedia.org/wiki/Left_recursion)

[2] T. Ridge, Simple, functional, sound and complete parsing for all context-
free grammars. [http://www.tom-ridge.com/resources/ridge11parsing-
cpp.pdf](http://www.tom-ridge.com/resources/ridge11parsing-cpp.pdf)

[3] [https://caml.inria.fr/pub/docs/manual-
ocaml/lexyacc.html](https://caml.inria.fr/pub/docs/manual-ocaml/lexyacc.html)

------
jackmott
OCaml has the potential to be great for almost everything with some work and a
bigger ecosystems. F# as well (very similar). i can only imagine how great the
world would be if a standard ml had accidentally become the web browser
language and all that mind share had gone into evolving it and optimizing it
and its tools.

