
Four MLs (and a Python) - atombender
http://thebreakfastpost.com/2015/04/22/four-mls-and-a-python/
======
unhammer

        $ time wc big-data.csv                                                                     
         1252045  1252045 17805309 big-data.csv
        
        real    0m0.417s
        user    0m0.411s
        sys     0m0.004s
    
        $ time ./sum-ocaml big-data.csv                                                            
        6143290.,13087196.,14473656.,50128757.,3765822.,1290420.
        
        real    0m3.680s
        user    0m3.666s
        sys     0m0.008s
    
    
        $ time python3 sum.py big-data.csv          
        6143290.0,13087196.0,14473656.0,50128757.0,3765822.0,1290420.0
        
        real    0m7.186s
        user    0m7.131s
        sys     0m0.036s
    
        $ time python2 sum.py big-data.csv         
        6143290.0,13087196.0,14473656.0,50128757.0,3765822.0,1290420.0
        
        real    0m5.146s
        user    0m5.111s
        sys     0m0.012s
    
        $ time pypy2 sum.py big-data.csv            
        6143290.0,13087196.0,14473656.0,50128757.0,3765822.0,1290420.0
        
        real    0m2.754s
        user    0m2.697s
        sys     0m0.052s
    

I really like ocaml, but have to admit, pypy is pretty amazing.

~~~
andor
It completely depends on the application. I have yet to see a use case that
benefits from PyPy. Here's an example I tried with CPython2, 3 and PyPy
yesterday: It loads the most probable translations for all unigrams from a
Moses phrase translation table (11GB) into a dict, and using that dict
translates 73 MB of text word by word. Most of the time is spent filling the
dict/reading the large file.

    
    
      $ time python literal_translation.py phrase-table en test4
      translation table contains 76885 elements
      ...................................................................... 070
      .......................................................
      125 files translated
      
      real	1m14.345s
      user	1m9.836s
      sys	0m4.157s
      
      
      $ time python3 literal_translation.py phrase-table en test5
      translation table contains 76885 elements
      ...................................................................... 070
      .......................................................
      125 files translated
      
      real	2m17.680s
      user	2m2.605s
      sys	0m4.987s
      
      
      $ time pypy literal_translation.py phrase-table en test6
      translation table contains 76885 elements
      ...................................................................... 070
      .......................................................
      125 files translated
      
      real	1m13.347s
      user	1m0.033s
      sys	0m4.734s
    
    

[https://gist.github.com/andreasf/af6bdc00cf5a928712b5](https://gist.github.com/andreasf/af6bdc00cf5a928712b5)

~~~
314
If your problem is memory-bound then no compiler will help you.

Memory-mapping a serialisation of the dict would help, but I'm not sure how
you would do that outside of an extension lib.

~~~
unhammer
1m for 11GB sounds IO-bound, but yeah that's still not a use-case where
switching to pypy will make a difference.

But did you try using it as a service (ie. avoid the startup time) and timing
the actual translation?

At work, we switched to pypy for the speed. For a spell-checker I made in
python, pypy made the dictionary compiling 3x faster and the spelling 2x
faster, so it can really matter. On the other hand, pypy's memory usage is
often a bit worse, and startup time can also be a bit slower, at least if your
program has an otherwise "zero" startup time.

~~~
andor
No, I didn't try to optimize this any further. I just checked whether any of
the new interpreters is really faster. It's a simple one-off experiment, and
compared to real statistical machine translation with Moses on the same data
the translation is pretty much instant. Moses needs something on the order of
1600 CPU-hours (50 nodes * 8 threads * 4 hours).

------
gsg
Hmm, the OCaml could be a lot shorter:

    
    
        let rec fold_channel f acc chan =
          match input_line chan with
          | line -> fold_channel f (f line acc) chan
          | exception End_of_file -> acc
    
        let comma = Str.regexp ","
    
        let values_of_line line =
          List.map float_of_string (Str.split comma line)
    
        let sum_channel chan =
          let folder line tot = List.map2 (+.) tot (values_of_line line) in
          fold_channel folder (values_of_line (input_line chan)) chan
    
        let sum_and_print_file filename =
          let chan = open_in filename in
          sum_channel chan |> List.map string_of_float |> String.concat "," |> print_string;
          close_in chan
    
        let () =
          match Sys.argv with
          | [|_; filename|] -> sum_and_print_file filename
          | _ -> failwith "Exactly 1 filename must be given"
    

This is also a bit faster, and almost all of the difference is because I
hoisted the Str.regexp ",". I suspect using a split-on-character operation
would make a bit more difference there, but of course the spartan OCaml stdlib
lacks such a function.

~~~
gsg
Oops, breaks on empty files. I should have matched on the input_line chan in
sum_channel to catch End_of_file - damn exceptions.

print_string should also be print_endline to match the original.

~~~
ignoramous
Also, is doing flow control using exceptions an acceptable pattern in the
OCaml world?

Like in Java:

    
    
      try {
         print(awesomeObj.getSomeState());
      } catch(NullPointerException npe) {
         awesomeObj = AwesomeObj.getInstance();
      }
    
    

would be considered very bad. You would instead be expected to do:

    
    
       if (awesomeObj == null) awesomeObj = AwesomeObj.getInstance();
       print(awesomeObj.getSomeState());
    
    

(how do you <code-wrap> stuff on here?)

~~~
kenko
It's always weird to me when people talk about using exceptions for control
flow.

Exceptions are, pretty much explicitly, a control flow mechanism. That's what
they do, they transfer control from one place to another. It's like people are
getting hung up on the _name_.

~~~
ignoramous
Exceptions, in Java land at least, are very costly (think interrupts and
context switching). To do flow control using them is shooting yourself in the
foot.

------
leovonl
Your OCaml code is not doing the same thing as the Python one: you are parsing
a regexp in a loop, and then splitting by the regexp - not by a delimiter.

I moved the regexp instantiation outside the loop, see the results below
("orig" is your code, "proper" has the regexp parsed only once):

    
    
      > bash -c 'time ./orig bigdata.csv'
      15502592020.,15502065537.6,15519223046.6,15498884970.,15502078298.,15519530367.3,15510803256.2,15519590717.2,15511590976.
    
      real    0m3.110s
      user    0m3.107s
      sys     0m0.003s
    
      > bash -c 'time ./proper bigdata.csv'
      15502592020.,15502065537.6,15519223046.6,15498884970.,15502078298.,15519530367.3,15510803256.2,15519590717.2,15511590976.
    
      real    0m2.596s
      user    0m2.590s
      sys     0m0.003s
    

I'd say the rest of the difference is probably due to the standard library
performance itself. BTW, compilation time for ocamlopt is 70ms.

~~~
cannam
Any idea why the compiler doesn't hoist the constant single-character regexp
automatically?

~~~
leovonl
1) It's not a constant expression, it's a function call.

2) The compiler does not know if there are collateral effects or not.

3) Ocaml is strict eval - no memoisation

Therefore this would be an unsafe optimisation if the compiler tried to
perform it, which could lead to unintended behavioural changes.

The only way to fix this properly would be to add an effects system in some
sense, at least a way to mark pure/impure functions.

------
jkldotio
I wanted to have a look at how much PyPy might speed up the task so I
generated a one million line x 20, 287M, big-data.csv of my own with the
following.

    
    
        import random
        with open("big-data.csv","w") as f:
            for b in range(1000000):
                f.write(','.join([str(random.random()) for x in range(20)])+"\n")  
    

Each line looks like:
"0.47509825737,0.525866136528,0.167956183215...0.888687040645".

The script from the blog takes 7.3 seconds to chew through it on my machine
with Python 2.7.8, and 4.9-5.5 seconds with PyPy, a reasonable improvement.
Out of interest, taking all the boilerplate and checking off, the following is
7 lines versus the 24 lines or so in the blog and runs in a similar 4.9-5.4
second range on PyPy (although performance blows out under Python 2.7.8 to 13
seconds or ~6 seconds slower than the blog). Somewhat more Pythonic, and more
likely do be done live in the REPL without even bothering to write a script,
to my mind.

    
    
        from collections import Counter
        c = Counter()
        lines = (rawline.strip().split(",") for rawline in open("big-data.csv"))
        for line in lines:
            for colname, colval in enumerate(line):
                c[colname]+=float(colval)
        print c  
    

Above on PyPy.

    
    
        time pypy sum5.py ./big-data.csv
    
        real	0m4.972s
        user	0m4.946s
        sys	        0m0.024s  
    
    

wc on the same machine.

    
    
        time wc big-data.csv  
    
        real	0m3.701s
        user	0m3.672s
        sys	        0m0.028s
    

Bonus one-liner (6.6 seconds under PyPy).

    
    
        print map(sum,zip(*([float(x) for x in rawline.strip().split(",")] for rawline in open("big-data.csv"))))

~~~
dagw
Bonus bonus easy to read one-liner (using numpy):

    
    
       numpy.loadtxt("big-data.csv",delimiter=',').sum(axis=0)

~~~
nine_k
This is a nice data-processing trick, but obviously runs _very_ little Python
code.

~~~
dagw
Well over 99% of the run time is spent in loadtxt which is pure python. In
fact this code is slower than the original code due to the overhead of the
checks that loadtxt does.

------
ignoramous
No wonder OCaml compiled to native is faster than other MLs and Python. I rem
stumbling on benchmarks elsewhere, where compiled OCaml rivaled CPP in terms
of performance on certain tests.

In my experience with OCaml, I have found the syntax a bit too unfavourable
like the author did, but the resulting code was very fast. With concurrency
sorted out, I wonder if it could take on Rust, Nim, and Go as far as systems-
programming is concerned (esp since Go has the weight of Google behind it).

It also helps that the language designers have written a large tutorial on
systems programming:
[https://ocaml.github.io/ocamlunix/](https://ocaml.github.io/ocamlunix/)

Then there's this epic Mirage OS built with Cloud Computing in mind by Anil
and his team of OCaml zealots:
[http://openmirage.org/](http://openmirage.org/)

~~~
steveklabnik
Fun fact: the Rust compiler was originally written in OCaml.

~~~
cannam
What is it written in now? (Rust?)

~~~
steveklabnik
Yup!

------
pron
It's worth mentioning that OCaml has a good, multi-threaded, implementation on
the JVM, too: [http://www.ocamljava.org/](http://www.ocamljava.org/)

------
gnud
I absolutely agree with the author, in that I find SML the most pleasant to
read (and write).

And I've said this before: If only SML actually had proper unicode support, I
might use it for something real.

------
drostie
This is the start of a solution in Haskell:

    
    
        import System.Environment
        import qualified Data.Vector.Unboxed as V
        import Data.Vector.Unboxed (Vector)
    
        display :: Vector Double -> String
        display ds | V.null ds = ""
                | otherwise = tail . concatMap format . V.toList $ ds
                        where format d = ',' : show d
    
        csv_sums :: String -> Vector Double
        csv_sums input = go (V.length $ h) 1 h t where
            h : t = map (V.fromList . read . bracket) $ lines input
            bracket s = '[' : s ++ "]"
            go s n summary [] = summary
            go s n summary (x:xs)
                | V.length x == s  = go s (n + 1) (V.zipWith (+) summary x) xs
                | otherwise        = if blankLastLine then summary
                                                    else error "Inconsistent lengths."
                    where blankLastLine = case xs of [] -> V.length x == 0
                                                    _  -> False
    
        main = do
            paths <- getArgs
            if length paths /= 1
                then error "Exactly one filename must be given."
                else readFile (head paths) >>= putStrLn . display . csv_sums
    

It takes about 22x longer than Python on my platform though, which I'm not
sure how to solve. (The vector keeps it nice and un-thunked so it doesn't seem
like it's laziness that's killing it, maybe it's the use of strings over
Data.ByteString.Lazy or so?)

~~~
Codas
The main problem is probably creating and traversing the strings, although I
have not benchmarked it. This version
[https://gist.github.com/Codas/894694eea247aaacf35f](https://gist.github.com/Codas/894694eea247aaacf35f)
runs about 4 times faster than the python version on my machine.

It does use some libraries that are not in the haskell platform though.

~~~
drostie
Very cool!

------
atmosx
Can someone explain to me, why would a programmer want to develop a web
application using an ML-derivative (e.g. F#) instead of Ruby/JavaScript/PHP or
Python?

Except from a totally different programming paradigm what do ML-derivateves
have to offer?

Also the author states that ML is "statically typed", "type inference"... Why
does a statically typed langauge need "type inference"? As I understand this -
I'm probably wrong but - dynamically type == type inference.

If anyone takes the time to answer... Well thanks a priori! :^)

~~~
mercurial
> Why does a statically typed langauge need "type inference"
    
    
       let suffix str = str ^ "mysuffix";;
       val suffix : bytes -> bytes = <fun>
    

Type inference in action: the OCaml toplevel displays the type of the 'suffix'
function as something that takes a value of type bytes and returns a value of
type bytes, without any type declaration (though obviously, the type inference
system works in much more complex cases as well). You can write entire
programs without a single type declaration.

Dynamic languages do not do type inference. They do duck-typing. The big
difference is that OCaml will not let you write a program like:

    
    
        let suffixed_five = suffix 5
    

You will get:

    
    
        Error: This expression has type int but an expression was expected of type                                                           bytes
    

Python will not let you concatenate integers and strings, either, but will
blow up at runtime.

~~~
brudgers
Some dynamic languages, notably Ruby, use duck typing. Many tag all values
with types, Lisps are an example. Tagged types allows a dynamic language to be
strongly typed [dynamic/static is orthogonal to strong/weak typing, C being an
example of weak static typing].

Type inference is syntactic sugar in most languages that allow it, and
explicit type declaration is [almost?] invariably permitted. Of course, the
price of omitting type declarations in code is the potential loss of clarity
that comes anytime implicit communication is used in lieu of explicit
statements of intent.

------
Zolomon
Haskell might also be an interesting choice in this case? Worth looking into!

------
adultSwim
Wouldn't the OCaml example be much faster if it used arrays instead of lists?
Doesn't seem like a fair comparison.

------
sgoody
I enjoyed reading this and I'm not bothered about tweaking the solutions into
oblivion to eke out performance gains. This is a nice, simple comparison, with
performance being a simple demonstration of code written by one author.

The only thing that seems to be an obvious omission is the omission of
Haskell.

------
ufo
I noticed that the author is comparing the size of the executables. Dunno
about all the other MLs but in Ocaml the executables can blow up in size if
you statically link them instead of let them be dynamically linked as is the
default. The presence of debugging symbols also matters a lot.

------
keenerd
Is a 300MB CSV really considered "big data" now? I routinely generate larger
CSVs using a $10 sensor (rtl-sdr with rtl_power) and process them on a $70 ARM
device (Odroid U3).

~~~
maxerickson
The author addressed that in a comment over there:

 _The choice of filename was tongue-in-cheek. (The main thing is that it’s big
enough to blow the stack with non-tail-recursive code, and to take a non-
trivial amount of time to run.)_

------
sologoub
For the Python test, why not use csv module instead?

~~~
dorfsmay
Or panda or numpy... I think they wanted to compare raw languages, in terms of
syntax, size of the code, ease or writing it for a given problem, as a matter
of fact look at their disclaimer"

> Do it with R instead .../... Or at least use an existing CSV-mangling
> library.

Having said that, I think your comment is to the point, one of the reason I
use python a lot is because of its libraries. I once decided to use OCaml to
solve a performance issue, and was surprise by the lack of librairies for what
seemed common in more modern languages (even with opam).

------
iopq
You forgot the most important ML derivative: Haskell. I'd like to see how it
measures up.

