
Benchmarking level generation: Go, Rust, Haskell and D - stock_toaster
http://togototo.wordpress.com/2013/07/23/benchmarking-level-generation-go-rust-haskell-and-d/
======
crntaylor
The problem with almost every language benchmarking blog post I see is that
the author is generally an expert in at most one of the languages they are
benchmarking, and so they write slow or non-idiomatic code in the other
languages, leading to a useless comparison.

Here's one example that I picked up. The author writes that they had to
refactor part of the Haskell code from (paraphrasing)

    
    
        if (checkColl tr rsDone || checkBound tr)
          then {- branch 1 -}
          else {- branch 2 -}
    

into

    
    
        if checkBound tr
          then noFit
          else if checkColl tr rsDone
            then noFit
            else {- branch 2 -}
          where
            noFit = {- branch 1 -}
    

because of "problems with lazy evaluation". But in fact the only problem is
that the call to `checkBound` is fast whereas the call to `checkColl` is slow,
and the || operator evaluates its left argument before deciding if it needs to
evaluate its right argument (just like in C, and every other language I've
ever used). So all that is required to get the speedup is to switch the order
of the calls:

    
    
        if (checkBound tr || checkColl tr rsDone)
          then {- branch 1 -}
          else {- branch 2 -}

~~~
logicchains
I should have made it clearer; when I said "Lazy evaluation often tricked me
up", I meant that I made the stupid mistake of thinking Haskell could somehow
magically evaluate the shortest branch first via lazy evaluation. I.e. lazy
evaluation tricked led me to make mistakes due to expecting too much from it,
not because of any inherent fault in lazy evaluation.

------
dbaupp
The Rust code was very non-idiomatic, and is dramatically disadvantaged by
using the ISAAC random number generator (which is cryptographically secure,
and so quite slow). By rewriting it to use more iterators (for non-bounds
checked vector access), more vector-building functions (specifically,
`vec::with_capacity` and `vec::from_fn`), and a non-cryptographically secure
RNG[1], I get the following:

    
    
      Rust       0.34
      Rust isaac 0.57
      Rust orig  0.70
      Clang      0.42
      GCC        0.36
    

My rewritten version: [https://github.com/huonw/levgen-
benchmarks/blob/master/R.rs](https://github.com/huonw/levgen-
benchmarks/blob/master/R.rs) (I didn't bother implementing seeding of the RNG
for this, so it's the same result every time. Edit: implemented now, no
performance change.)

[1]: The new RNG is XorShiftRng, which is also in the standard library, and
actually has far better randomness properties than most platforms' `rand()`
functions anyway.

(Edit: merging two adjacent `if` statements, took it down to 0.34 from 0.36.)

~~~
masklinn
Here's the compare view for an overview of the changes dbaupp applied to the
original code: [https://github.com/huonw/levgen-
benchmarks/compare/logicchai...](https://github.com/huonw/levgen-
benchmarks/compare/logicchains:master...master?w=1)

edit: wrong addition left thereafter for posterity, ignore it (apparently Go
uses ISAAC so the change to XorShiftRng might be unfair to Go (though it's
trailing anyway). I expect C uses the standard libc generator?)

~~~
pbsd
Assuming this [1] is what Go is using, it is not ISAAC. It looks like a lagged
Fibonacci generator.

[1]
[http://golang.org/src/pkg/math/rand/rng.go](http://golang.org/src/pkg/math/rand/rng.go)

~~~
masklinn
Ah yes, I misread a comment below which talked about Go and Rust, I didn't
read closely enough and took the ISAAC mention to apply to Go.

------
nknighthb

        if((((rx + rw +1 ) < x) || ((rx > (x+w +1 ))))) { RoomOkay=true
        } else if((((ry + rh +1 ) < y) || ((ry > (y+h +1 ))))) {RoomOkay=true
        }else {RoomOkay=false}
        if(RoomOkay==false){ return true}		
    
    

Gaaaah, really? "go fmt" is your friend! That mess automatically cleans up
into this:

    
    
        if ((rx + rw + 1) < x) || (rx > (x + w + 1)) {
            RoomOkay = true
        } else if ((ry + rh + 1) < y) || (ry > (y + h + 1)) {
            RoomOkay = true
        } else {
            RoomOkay = false
        }
        if RoomOkay == false {
            return true
        }
    

> _Particularly ugly is_
    
    
        (*mySlice)[indexI].myField=newValue;
    

> _I’m not sure why the language can’t automatically dereference the slice
> when the [] operator is applied to it_

I don't know exactly what the Go team would have to say about this, but my
take would be language simplicity. Pointers are pointers, and special-casing
some of them breaks the rules that both the compiler and the programmer rely
on.

It may look like C, but it isn't. There's no pointer/array equivalence here,
indexing and dereferencing aren't the same thing. If I weren't coming from a C
background (which I'm guessing the author is, too), I'm not sure it would ever
occur to me to expect a pointer-to-slice/array to be something you could
index.

Also, as a slice is basically a pointer to array plus two ints, you normally
wouldn't pass a pointer to it in the first place. You only need to do that if
you want to resize it.

~~~
laumars
The problem when people talk about code beauty is that beauty doesn't always
equate to readability. I find Go is very much one of those languages that
doesn't look particularly pretty but it's a joy to read when compared to many
other languages of the same ilk.

------
pcwalton
> Hopefully in future Rust will adopt something like Go’s automatic
> initialisation of all values to zero, or at least have this as an option (or
> document it better, if it already is an option).

First, thanks for the feedback!

Unfortunately, I don't think this feature will be added to Rust, though. IMO
it causes bugs, as accidentally failing to initialize a variable yields a
garbage value at runtime. There's no reason that 0 is any more likely to be a
valid integer value than 1, or 42, or 0xdeadbeef, and the compiler shouldn't
try to guess. And the feature wouldn't work in Rust anyhow due to the lack of
null pointers.

However, having an explicit zero value is convenient, and this is what the
Zero trait is for. With Rust traits you can overload on the return value, like
Haskell, so you can write:

    
    
        let (emptyr: Room, emptyt: Tile, emptyl: Lev) = Zero::zero();
        let mut ls = [ emptyl, ..100 ];
    

assuming that your types derive Zero (which you can do with
#[deriving(Zero)]). In the future it should be:

    
    
        let (emptyr, emptyt, emptyl) = zero();
        let mut ls = [ emptyl, ..100 ];
    

This could be done by either writing a zero() wrapper in the prelude or
allowing static methods to be imported, both of which are probably good ideas.

~~~
pbsd
There's no notion of (default) constructors in Rust? How do you build complex
objects in it?

~~~
masklinn
The same way you do in other struct-based languages I expect, either build the
struct directly or go through a static factory function.

------
p0nce
I work almost full-time on C++ optimization and I find most type of such
benchmarks moot, and _not_ because of methodology flaws.

We need to stop just looking at numbers and investigate what these compilers
allow in terms of productivity, optimization opportunities and explicit
control of generated code.

If you look at x264 numbers you'll see that there is up to a x10 gap between
hand-optimized assembly and native C, and I believe that this gap exist in
most fast programs because optimizing backends are too much black-boxes that
consistently produce quite good, even sometimes amazing results, but with
little control or feedback.

The nature of highly optimized code is detailed, explicit control of what
should happen. I will take a poorly optimizing compiler any day if it allows
me complete control of the backend optimizations it chooses, if I can give him
register hints that actually matter, if it stop compiling and ask for the
annotations he need to optimize better. C++ compilers are good at this game
but this could be taken further.

Also it's unfair to never look at language productivity at the same time as
speed, since the ability to produce fast programs will be notably affected by
the speed at which we can achieve that anyway.

~~~
dom96
I agree. However, how do you suggest we judge languages in terms of their
productivity, isn't that highly subjective? I have never seen anyone do a
proper 'productivity benchmark', but perhaps with some rigor it could be done
reliably.

~~~
p0nce
I don't know. All methods I can think of are expensive and wasteful. So
perhaps it's a good proxy.

------
davidw
> For those who haven’t been following Rust, there are around seven different
> kinds of pointers: @ (garbage collected heap allocation), ~ (uniquely owned
> heap allocation), & (borrowed pointer to heap/stack allocation), * (raw C
> pointer, only useable in unsafe code), and mutable versions of the first
> three of those (actually, I’m not sure if there’s a mut & pointer, so maybe
> just six pointers in total).

I am not sure I like the sound of that. I guess having the unsafe stuff
available is very useful for certain things, but it all sounds a bit messy,
too.

~~~
noelwelsh
Being able to talk about memory allocation in a safe way is one of the key
objectives of Rust. Rust is the only modern language (i.e. not C or C++) I
know of that allows one to mix GC'ed and manually allocated memory, and heap
and stack allocated memory.

Reasoning about mutability is necessary for safe (and fast!) concurrency.

~~~
masklinn
> Rust is the only modern language (i.e. not C or C++) I know of that allows
> one to mix [...] heap and stack allocated memory.

Technically I believe (to verify) Go does that as well, but the mix is
implicit and the system (not the developer) will decide on its own whether a
value should be stack- or heap-allocated.

C# will also allocate value types local variables on the stack (technically
all local variables are stack-allocated, but for reference types it obviously
stack-allocates a reference to the heap, the end-result is that under the
usual understanding of "stack allocation" only value type local variables and
parameters are stack-allocated)

~~~
pkolaczk
JVM HotSpot does this, too, implicitly, by means of EscapeAnalysis.

------
ozataman
A sidenote: Your blog's code formatting is awful. I would recommend that you
embed one of the common solutions for code snippet syntax highlighting into
your blog. I needed special motivation to continue reading your post past the
first code sample.

Edit: And then of course actually _use_ the prominent guidelines for good-
looking code in each language!

~~~
logicchains
My apologies, I was in a hurry; I'll try to put more effort into presentation
in future. Are there prominent guidelines for good-looking code in Rust?

~~~
dbaupp
There's the Rust style guide[1] but it's not really that useful at the moment;
however, 2 points of general advice:

\- make use of functions like `vec::from_fn`, `uint::range` and
`some_number.times`, rather than writing `while`-loops with a counter
explicitly.

\- use iterators[2,3] for traversing data structures, since it allows the data
structure to optimise the accesses (e.g. the vector iterator avoids bound
checks).

However, all the tutorial/manual are out of date, and the documentation makes
it very hard to find out about these functions; so it's definitely not the
programmers fault when they turn out non-idiomatic Rust.

[1]: [https://github.com/mozilla/rust/wiki/Note-style-
guide](https://github.com/mozilla/rust/wiki/Note-style-guide) [2]:
[http://static.rust-lang.org/doc/tutorial-
container.html#iter...](http://static.rust-lang.org/doc/tutorial-
container.html#iterators) [3]: [http://static.rust-
lang.org/doc/core/iterator.html](http://static.rust-
lang.org/doc/core/iterator.html)

------
samnardoni
> The initial version must have been checking each room for collisions (a long
> process) before checking whether it fit within the level bounds (a short
> process), so manually sequencing it with if-then-else led to a spectacular
> 268% improvement.

Or just change `(checkColl tr rsDone || checkBound tr)` to `(checkBound tr ||
checkColl tr rsDone)`.

------
qznc
The dmd middle- and backend got little attention over the last years.
Especially, when compared to GCC and LLVM. No wonder it produces slower code.

With D it is not a bad idea to use dmd for development and GDC/LDC for release
builds.

------
mercurial
Nice to see Rust doing so well, but I'm disappointed by the performance of the
Haskell code.

~~~
moomin
The reports from all of these micro benchmarks seems to be consistent on the
subject of Rust: it doesn't yet have the performance that justifies its
existence. I'm guessing you could get similar numbers out of Java.

I'm hoping Rust's numbers improve. It's incredibly promising, but without C++
level performance, it's too much semantic complexity.

~~~
pkolaczk
+1 The same is true probably also about D.

~~~
p0nce
No because D has C performance, now. Have you seen the blog numbers?

~~~
pkolaczk
As pointed out in the other comments on this thread, this benchmark benchmarks
mostly standard RNG performance, not quality of the code generated by each
compiler. Here you've got much more computationally intensive benchmark,
showing that D is "as fast as C" like it could be said about Java:
[http://attractivechaos.wordpress.com/2013/04/06/performance-...](http://attractivechaos.wordpress.com/2013/04/06/performance-
of-rust-and-dart-in-sudoku-solving/)

It is close, but Java is just as close, too. Not a tremendous performance
advantage that someone would choose one over another for and not something I'd
expect from a language marketed as "performance oriented".

~~~
p0nce
Ok you live in a world where Java is "as fast as C". Let's terminate this
discussion.

~~~
pkolaczk
I didn't say it. You said D was as fast as C now. Then, according to _your_
definition of "as fast as C", Java must be fast as C as well. Seems your as
fast as C definition must be something between 30%-95% performance of C
because those are the most common performance numbers reported for D in
various benchmarks (I cited one of them, and the one in the blog also confirms
this - only one D compiler managed to match the best C compiler, the others
were significantly behind). The same numbers are consistently reported about
Java, so I don't really understand what you're arguing about. So I conclude
both are the same league (with two minor exceptions: 1. I saw Java beating GCC
many times, once quite severely by almost 2x on simple array numerical code,
while I never saw that happening for D and 2. D has much worse GC
implementation than Java or C# have).

------
dom96
Any chance you could also benchmark the nimrod programming language?

I think nimrod is a very good contender and it's sad never seeing it being
benchmarked when Go, D and Rust are.

I will gladly help you write a benchmark if you wish to pursue this .

~~~
pjmlp
Which C compiler would you use as Nimrod lacks a proper compiler?

~~~
dom96
GCC most likely.

~~~
pjmlp
Then it would be testing how GCC compiles C code and not Nimrod, right?

~~~
dom96
I can see what you are trying to say. But it would still be testing the speed
of Nimrod, whether the Nimrod compiler compiles Nimrod code to C or to
Assembly is irrelevant. The fact is that Nimrod is not C, and we are
interested in the speed that Nimrod can achieve.

