
Comparing k-NN in Rust - dbaupp
http://huonw.github.io/2014/06/10/knn-rust.html
======
pcwalton
From the perspective of a compiler writer, the most interesting thing about
this code is that it contains no bounds checks while still being safe. Using
the high-level iteration APIs like map/filter/etc. is not only good for
readability, it's good for performance as well. C-style for loops require
bounds checks to maintain safety, but there's no need for them with a properly
designed iterator API.

~~~
andrewflnr
What about all those calls to unwrap? Don't those sweep some of the bounds
checking under the rug? It's okay in a proof of concept like this, but I
wouldn't be comfortable running that code "for real".

Edit: to be more clear, I wouldn't expect it to have an impact on performance,
but I'm not convinced this is a good example of expressiveness and safety in
the same package.

~~~
ben0x539
.unwrap() is checked, if it goes wrong, it will safely cause task failure and
stack unwinding. There won't be any memory unsafety.

~~~
andrewflnr
You can't get corruption, but I don't consider a program that violently
terminates on unexpected input to be "safe". Just semantics, I guess.

~~~
dbaupp
Yeah, saying 'safe' around Rust is generally interpreted as meaning memory
safety.

------
pbsd
For reference, a straightforward translation of the Rust code to C++, compiled
using clang, halves the running time. There are two reasons for this:

\- `int` is the fastest available integer type in C(++), whereas in Rust it is
defined to be pointer-sized. This allows the C++ version to vectorize better
than Rust by using 32-bit integers (also there aren't native SIMD 64-bit
multiplications in current x86).

\- Even when changing the native type from `int` to `i32`, to match the C++
code, rustc does not vectorize the distance function. I do not know the exact
reason for this, but I would guess the iterator code is unable to elide null
pointer checks. It's unclear to me whether this is a language limitation or
simply a current compiler limitation.

~~~
pcwalton
Rust doesn't have null pointers, so I see no language-level reason why that
wouldn't vectorize. LLVM's vectorizer tends to be pretty brittle, so probably
tweaking the generated IR a bit would cause us to vectorize. Feel free to file
an issue if you have time :)

~~~
pbsd
Well Rust _is_ checking for null pointers at the assembly level:
[http://goo.gl/eh2aqc](http://goo.gl/eh2aqc) (lines 22--27).

~~~
pcwalton
Blech. I'm pretty sure LLVM, not the frontend, is inserting those null checks
for some reason.

~~~
kzrdude
The slice iterator yields `Option<&T>`, the None variant is represented by a
null pointer. That's probably the null check.

------
cousin_it
Rust's iterators are really nice. As far as I can tell, the idea is to do only
one check on each iteration, which doubles as an array bounds check (for
memory safety) and a loop condition check (for loop termination). That's much
better than the usual "for" loop with an index into a checked array, which
uses one check for loop termination, and two more checks upon array indexing.
Why don't more languages do that? It seems like a simple enough idea, I
blogged about it sometime ago.

~~~
pcwalton
In the case of Java, I believe it's because the underlying JVM only supports
bounds-checked array indexing. So even though the iterator APIs in Java could
in theory eliminate the bounds checks, the high-level information is compiled
away and the JVM has to prove on its own that the bounds check is not
necessary.

~~~
cousin_it
I wonder if the language can support similarly fast access to user-defined
data structures. Though maybe it's not a very interesting question, if arrays
are built in. Most user-defined data structures are already their own
iterators in some sense. For example, iterating over an ML-style linked list
obviously requires only one check per iteration.

------
mrjbq7
A Factor version compares favorably at 13 lines of code, and as fast as the
parallel OCaml version.

[http://re-factor.blogspot.com/2014/06/comparing-k-nn-in-fact...](http://re-
factor.blogspot.com/2014/06/comparing-k-nn-in-factor.html)

------
thinkpad20
Very cool! The code is quite clean and readable. I'm quite impressed by the D
sample linked as well. I had a brief foray with D a year or so ago and liked
it a lot. I'd like to see a speed comparison. I'm not surprised the author had
trouble getting it to compile though; D seems to be similar to Rust in that it
is still actively evolving as a language (I recall several times where I
couldn't get the code in Alexandrescu's book to compile).

~~~
dbaupp
The author of the D post and I are having/had a discussion on reddit about the
problems I had:
[http://www.reddit.com/r/programming/comments/27s7g6/comparin...](http://www.reddit.com/r/programming/comments/27s7g6/comparing_knn_in_rust/ci3v7y9)

------
bjz_
In summary, the Rust code in the direct source translation is about 3.5–4×
faster than the OCaml. The parallel version is even faster. This is in safe
code with bounds checks.

~~~
kibwen
In addition, "I made no effort to remove/reduce/streamline allocations", which
is a bit of a cruel tease. I want to know how much faster it could still be!
:)

~~~
dbaupp
I just ran perf on it: the slowest allocation function takes 0.04% CPU time in
total, meaning there's not much time to gain from just removing the
allocations directly. There may still be a benefit from better data locality
from fewer allocations.

~~~
kibwen
In the future I'd recommend perf for benchmarking as well. `perf stat -r 3
./foo` will do the repeated runs for you, and give you output like
"1.002251432 seconds time elapsed ( +- 0.025% )", where that latter number
appears to be the coefficient of variation.

~~~
dbaupp
Oh that's nice, thanks.

------
thinkpad20
For any interested, I wrote up a Haskell version[1]. Initially, it was
immensely slow (~ 2 minutes); however, switching over to unboxed vectors
brought it to be on par with OCaml. There are still opportunities for
improvement.

[1] [http://lpaste.net/105413](http://lpaste.net/105413)

~~~
thinkpad20
And here's one that runs in half that time (or possibly less if you have a lot
of cores)[1]. Speedups due to suggestions in this thread[2].

EDIT: now runs in parallel and is faster than the Rust single-threaded
version, and on par with the parallel Rust version, but only ~35 lines.

[1] [http://lpaste.net/105456](http://lpaste.net/105456)

[2]
[http://www.reddit.com/r/haskell/comments/27tcvz/knearest_nei...](http://www.reddit.com/r/haskell/comments/27tcvz/knearest_neighbors_in_haskell_is_elegant_but_slow/)

------
sdegutis
Related: Has there been any ETA announced on when Rust will stabilize? I'm
very interested in learning more about it, but not until I don't have to read
changelogs regularly.

~~~
dbaupp
A 1.0 release is wanted at the end of the year. Note: it will really only be
stabilising the core language and an unspecified set of core libraries; it
will likely take a while after that until there are stable libraries on a
scale rivaling Python or Go.

~~~
sdegutis
I'm just hoping to avoid the level of instability that I've experienced in my
Ruby projects over the years. It makes it nigh impossible to keep a project
maintained sanely long-term as its dependencies evolve and even the language
or stdlib change in backwards-incompatible ways every couple of years.

~~~
brson
Not to dismiss the importance of language stability, but forward porting in
Rust _should_ be significantly easier than Ruby for a few reasons:

* Rust's strong type system means the pieces fit together only in specific ways - roughly, if you can get it to compile again it will work; in more dynamic languages you have little confidence that the code works until runtime.

* Co-evolving the language with the downstream community is such a critical issue that Rust is developing several tools and processes to help, and this should set it apart from other open source languages that have gone down this path:

The Rust process already attempts to tag all breaking changes in the commit
log with `[breaking-change]` and we've heard anecdotally that this has made
forward-porting Servo much easier. This log isn't published anywhere besides
the commit log yet, but it will be.

Secondly, Rust has a [stability]([http://doc.rust-
lang.org/rust.html#stability](http://doc.rust-lang.org/rust.html#stability))
system that tracks API stability at a fine level. This is influenced by
node.js, but in Rust stability is detected and use of unstable API's can be
enforced by the tooling. This is still in development but you can see it in
the [docs]([http://doc.rust-lang.org/std/intrinsics/](http://doc.rust-
lang.org/std/intrinsics/)).

------
orthecreedence
I've been watching rust from the sidelines for a few months now, cheering it
on. I'm really excited for the language to stabilize a bit.

Has anyone here who uses it run into any downsides in comparison to other
languages (aside from it being new and changing)?

~~~
sanderjd
I love rust, but the entire system of tracking lifetimes is both a major
upside and a downside. When it "just works", it's like a dream where you can
make a mess and have it cleaned up automatically, cheaply, and at exactly the
right time. When you start getting errors or want to refactor across function
call boundaries, it can be frustrating and/or require a quite deep
understanding of the workings of the system to figure out how to fix it.

tl;dr: If you consider "has a fairly complex concept that is unfamiliar and
necessary to learn with a fair degree of depth" a downside, then I think it
has one.

~~~
burntsushi
I somewhat agree with your characterization. Lifetimes definitely adds a bit
of complexity to the language. And you're right about the benefit; lifetimes
are frankly awesome. It's really _really_ cool to write memory safe code
without using a GC.

When I first started with Rust (was also my first foray with lifetimes), the
compiler completely kicked my ass for at least a few days. I struggled a lot
with writing anything beyond a few functions, and especially when those
functions were returning borrowed pointers. I think the code I wrote at the
point could be fairly characterized as, "the bare minimum that I could
convince the compiler to accept."

But as I wrote more code, I got better at it pretty quickly. At this point, I
can look at most Rust code and feel pretty good about spotting lifetime errors
before consulting the compiler. I'd say it only took me a couple thousand
lines of code to get there, which isn't a huge price to pay.

Anyway, this is obviously a personal anecdote. But it's coming from someone
who thought Rust was crazy complex only a few months ago. FWIW, it took me
about 48 days from knowing absolutely zero Rust (other than random Internet
buzz) to writing and getting libregex merged upstream.

~~~
sanderjd
We're in violent agreement. (Though you've gone much further with rust more
quickly than I - libregex is beautiful.) A couple thousand lines to feel
pretty solid with lifetimes sounds about right - I'm getting close to that
range and the issues I'm hitting are increasingly of the obscure kind rather
than the initial "nothing works, I guess I'll put it all on the heap!" kind.

This is basically why I said _if_ you consider this a downside. I think it's
only a "downside" in the same way that purity is a "downside" in Haskell - it
isn't _accidental_ complexity, and wrapping your head around it is just a part
of really learning the language.

------
lpw25
By replacing the distance function with:

    
    
        let distance (a1 : int array) (a2 : int array) =
          let open Array in
          let len = length a1 in
          let acc = ref 0 in
          for i = 0 to len - 1 do
            let v1 = unsafe_get a1 i in
            let v2 = unsafe_get a2 i in
            let d = v1 - v2 in
            acc := !acc + d * d
          done;
          !acc
    

the OCaml goes 3 times faster. This is what would be produced if OCaml's
inliner had triggered on the original definition of `distance`, so that is
probably the main difference in the two language's performance. If you inline
some of the other functions by hand (and tidy up some of the sillier parts of
the OCaml code) it easily runs 4 times faster than the original.

~~~
lpw25
Indeed using the improved OCaml inliner
([http://www.ocamlpro.com/blog/2013/07/11/inlining-progress-
re...](http://www.ocamlpro.com/blog/2013/07/11/inlining-progress-report.html))
on the original OCaml gives a speedup of 2.75.

------
kzrdude
How much work would it be to rewrite `slurp_file` to return an `Option<..>`
result, that is replace all of the .unwrap calls into some kind of short-
circuit return?

~~~
dbaupp
I wrote it up; [http://huonw.github.io/2014/06/11/error-handling-in-rust-
knn...](http://huonw.github.io/2014/06/11/error-handling-in-rust-knn-case-
study.html)

~~~
kzrdude
Ok that's the amazingest answer you could give to my question! Very cool, it
doesn't look so bad.

In my mind this style should be the default for error handling when coding
(i.e. I'd prefer to push the choice of using .unwrap or not to the caller)

------
thikonom
you could use move_iter() instead of iter(), its more idiomatic.

~~~
burntsushi
Huh? They serve two different purposes. `move_iter` returns an iterator that
consumes values (i.e., ownership transfers) while `iter` consumes borrowed
references to values (i.e., no ownership transfer).

The only place in the code, that I can spot, where `move_iter` is even
available is on line 46. But I don't see any compelling reason to use
move_iter there (plus, `validation_sample.len()` in the final println would
have to be moved up and let bound before the call to move_iter).

~~~
steveklabnik
I think they were trying to say that borrowing is preferred over ownership
transfer.

