

Memory Access Microbenchmark: Clojure, F#, Go, Julia, Lua, Racket, Ruby, Self - logicchains
http://togototo.wordpress.com/2014/02/22/memory-access-microbenchmark-clojure-f-go-julia-lua-racket-ruby-self/

======
lqdc13
The Python code is terrible. That's not how you code in Python. Also, this
implementation seems to take more than 32 GB on my machine. Here is a
refactor:
[https://gist.github.com/lqdc/9149772](https://gist.github.com/lqdc/9149772)

Anyway, this problem should not be solved this way in python either way. I
would have stored everything in a Pandas table and computed stuff from there.

~~~
logicchains
Thanks! I'm rerunning it and updating the tables progressively every few hours
as optimisations are suggested. Feel free to make a pull request. Ideally
within a day or so the implementations will have all been optimised enough to
make for fairer comparison.

I understand that a Pandas table would be a better idea, but the purpose of
this benchmark is comparing the speed of the raw languages.

~~~
lqdc13
I updated with a pandas script as well.

I think the strength of the languages have in part to do with their libraries.
You have to pick the right tools for the right task.

The pandas version runs 3x faster than regular python, but 17x slower than C.

~~~
logicchains
Great, I'll include that in the next run of the benchmark. Is it Python 3 or
2? (Does it work with Pypy) Also, if you're comparing it to C you should
compare it to C3.c, from the final tables, which uses bignums like Python does
automatically.

*Edit: I tried your faster_py.py, but it didn't seem any faster than Pypy2.py in the repo (running both with Pypy). I haven't yet got the Pandas version to work due to library compatibility issues (I've got about five versions of Python installed; still working on it).

~~~
lqdc13
It's about 2x faster in regular CPython 2.7 on my machine. In pypy it is
actually way slower than your original one.

In many cases, pypy cannot be used, because it's not compatible with all the
libraries.

~~~
logicchains
Ah, right. I've included your version (converted to Python 3) as Python3_fst;
it's around twice as fast as regular CPython 3.3.

------
mattchamb
I had a play around with your F# implementation so its abit cleaner and uses
abit more fsharp-ish style:

[https://gist.github.com/mattchamb/20019b22ae841ff5ce1b](https://gist.github.com/mattchamb/20019b22ae841ff5ce1b)

Im not sure if it is any faster though; as each run varies quite widely from
the previous one

~~~
logicchains
Interesting, thanks. It might be faster than FS.fs but not FS2.fs, as I think:

tradesArray.[i] <\- {

    
    
        TradeId = idx;
    
        ClientId = idx;
    

... }

would lead to the creation of a new object every time whereas:

    
    
        trades.[i].TradeId <- (int64 i)
    
        trades.[i].ClientId <- (int64 1) etc.
    

doesn't.

~~~
taspeotis
If you wanted to, you could set GCSettings.LatencyMode [1] to LowLatency [2]

> Enables garbage collection that is more conservative in reclaiming objects.
> Full collections occur only if the system is under memory pressure, whereas
> generation 0 and generation 1 collections might occur more frequently ...
> This mode is not available for the server garbage collector.

Just for fun, you wouldn't want to run in LowLatency for too long.

[1] [http://msdn.microsoft.com/en-
us/library/system.runtime.gcset...](http://msdn.microsoft.com/en-
us/library/system.runtime.gcsettings.latencymode\(v=vs.110\).aspx)

[2] [http://msdn.microsoft.com/en-
us/library/system.runtime.gclat...](http://msdn.microsoft.com/en-
us/library/system.runtime.gclatencymode\(v=vs.110\).aspx)

~~~
logicchains
Interesting. I suspect for this particular program it'd have to be in low
latency mode for the entire run. It doesn't seem to make too much of a
different to the C#2 implementation however (the one using sensible
allocation), so I suspect that one isn't generating too much garbage.

