
SML and OCaml: Why was OCaml faster? - cannam
http://thebreakfastpost.com/2015/05/10/sml-and-ocaml-so-why-was-the-ocaml-faster/
======
CHY872
This is why we don't microbenchmark languages. Usually if you dig a little
deeper, you find that actually the difference between the languages is mostly
just that one of them does one thing badly.

This by the way is MLton's _worst case_ scenario. MLton is designed to perform
full-program optimisation (unlike almost any other compiler I've seen); the
gap is supposed to grow as the code gets larger (read: more realistic).

Sadly due to its design tradeoffs (it's a compiler) it's totally unsuitable
for the big ML projects (interactive theorem provers).

~~~
jasperry
Can you elaborate on why being a compiler makes MLton unsuitable for theorem
provers? Is it because they are usually written on top of the interpreter's
REPL, and need its reflection capabilities?

~~~
CHY872
It's because they use metaprogramming so extensively. As an example, HOL4 (I
can only speak for that) accepts a script written by the user (as though in a
REPL), compiles and runs it, and produces an sml theory file which represents
their defined theory [0]. The user when they start their next theory are
supposed to use this computer generated code.

You would have to write your own repl, write your own way of loading, writing
theories etc and by that time you've lost performance (HOL even after 20 years
of progress and tweaks now has only mostly ok performance).

So it would at least require some rearchitecting, which is my point; it's not
a drop in replacement.

[0] - This is AES, before and after building!
[https://gist.github.com/j-baker/244ef3e59f19d74c352b](https://gist.github.com/j-baker/244ef3e59f19d74c352b)

~~~
jasperry
That makes sense. It's pretty cool to think that ML was designed as a
metalanguage for theorem provers, and the innovations required for that turned
out to be the next step (or next several steps) in the evolution of general-
purpose languages.

------
tsuyoshi
Another issue, comparing the MLton FFI and the Ocaml FFI, is that the FFI in
MLton is somewhat slower. When I converted an Ocaml program to SML, I
discovered that a small portion written in C ran much faster when I rewrote it
in SML. In Ocaml, writing portions in C is almost always a performance win,
just because the Ocaml compiler doesn't optimize very well. In MLton there's a
bit of overhead just for the FFI (possibly because the code generator
allocates registers very differently from how it's done in C). That overhead
goes away if you use the C code generator instead, but usually the native code
generator performs better anyway.

If I remember correctly, it has been proposed to rewrite the MLton floating
point conversion in pure SML for precisely this reason.

------
saosebastiao
So basically Ocaml was faster because it was FFI?

~~~
ori_b
Both did FFI. Ocaml was faster because there was less data munging around the
FFI.

------
agumonkey
Still surprising how much time is devoted to formatting (ocaml or any
language).

~~~
ori_b
Input and output of floating point involves a decent amount of bigint math to
do correctly. It's not surprising to me that it is slow -- doing it quickly is
still an active area of research.

Writing a bit about that is still on my todo list.

~~~
thechao
I remember reading, years ago, a paper which provided an algorithm for
converting an IEEE 754 into a string using only a fixed number machine word
length scalars—but I certainly can't find it now.

~~~
sanxiyn
I think you are looking for "Printing floating-point numbers quickly and
accurately with integers (2010)".

[http://dl.acm.org/citation.cfm?id=1806623](http://dl.acm.org/citation.cfm?id=1806623)

The title is play on "Printing floating-point numbers quickly and accurately
(1996)", which in turn is play on "How to print floating-point numbers
accurately (1990)".

[http://dl.acm.org/citation.cfm?id=231397](http://dl.acm.org/citation.cfm?id=231397)
[http://dl.acm.org/citation.cfm?id=93559](http://dl.acm.org/citation.cfm?id=93559)

~~~
kristianp
I wonder if any C standard libraries have implemented this.

~~~
jwmerrill
I understand that v8 is using this algorithm, and Julia also now uses it
thanks to work by Jacob Quinn
[https://github.com/JuliaLang/julia/tree/master/base/grisu](https://github.com/JuliaLang/julia/tree/master/base/grisu)

~~~
sanxiyn
Rust also now uses this algorithm, thanks to Kang Seonghoon:
[https://github.com/rust-
lang/rust/tree/master/src/libcore/nu...](https://github.com/rust-
lang/rust/tree/master/src/libcore/num/flt2dec)

Rust implementation is heavily commented with beautiful ASCII diagrams.

