Hacker News new | past | comments | ask | show | jobs | submit login
Comparing k-NN in Rust (huonw.github.io)
115 points by dbaupp on June 10, 2014 | hide | past | favorite | 75 comments

From the perspective of a compiler writer, the most interesting thing about this code is that it contains no bounds checks while still being safe. Using the high-level iteration APIs like map/filter/etc. is not only good for readability, it's good for performance as well. C-style for loops require bounds checks to maintain safety, but there's no need for them with a properly designed iterator API.

What about all those calls to unwrap? Don't those sweep some of the bounds checking under the rug? It's okay in a proof of concept like this, but I wouldn't be comfortable running that code "for real".

Edit: to be more clear, I wouldn't expect it to have an impact on performance, but I'm not convinced this is a good example of expressiveness and safety in the same package.

The unwraps are not memory unsafe, unlike avoiding bounds checks. In any case, rewriting to avoid them isn't so hard: http://huonw.github.io/2014/06/11/error-handling-in-rust-knn...

(Notably, since it's all in the types, you're guaranteed to not ignore any errors.)

.unwrap() is checked, if it goes wrong, it will safely cause task failure and stack unwinding. There won't be any memory unsafety.

You can't get corruption, but I don't consider a program that violently terminates on unexpected input to be "safe". Just semantics, I guess.

Yeah, saying 'safe' around Rust is generally interpreted as meaning memory safety.

The `unwrap` calls are all in the input parsing code (`slurf_file` fails if input is invalid), except for one on the result of `min_by` (`classify` fails if you pass it an empty array). I wouldn't say any of these "sweep bounds checking under the rug."

That's actually very interesting, I've had to implement bounds checking in a C-style language, and it definitely was a lot of fun and required some thorough checks. What's taken into consideration when designing the iterator API? Could you please provide an example of using a properly designed iterator API?


For reference, a straightforward translation of the Rust code to C++, compiled using clang, halves the running time. There are two reasons for this:

- `int` is the fastest available integer type in C(++), whereas in Rust it is defined to be pointer-sized. This allows the C++ version to vectorize better than Rust by using 32-bit integers (also there aren't native SIMD 64-bit multiplications in current x86).

- Even when changing the native type from `int` to `i32`, to match the C++ code, rustc does not vectorize the distance function. I do not know the exact reason for this, but I would guess the iterator code is unable to elide null pointer checks. It's unclear to me whether this is a language limitation or simply a current compiler limitation.

Rust doesn't have null pointers, so I see no language-level reason why that wouldn't vectorize. LLVM's vectorizer tends to be pretty brittle, so probably tweaking the generated IR a bit would cause us to vectorize. Feel free to file an issue if you have time :)

Well Rust is checking for null pointers at the assembly level: http://goo.gl/eh2aqc (lines 22--27).

Blech. I'm pretty sure LLVM, not the frontend, is inserting those null checks for some reason.

The slice iterator yields `Option<&T>`, the None variant is represented by a null pointer. That's probably the null check.

This is https://github.com/mozilla/rust/issues/11751: LLVM doesn't understand (non)null pointers enough.

That might still be the case. IIRC I read somewhere that when the iterator code gets inlined, the non-nullness of the pointer gets lost, as the iterators internally use raw pointers (which can be null in Rust).

Yeah, nonnull is only allowed on parameters in LLVM IR, so it gets lost post-inlining. I'm pretty sure it's LLVM that is inserting those null checks, as we never do in the frontend (since safe pointers are not null in Rust, and unsafe pointers are never checked for null since they're unsafe).

Doesn't it insert None checks for Option<&u32> (because of zip)? LLVM probably doesn't figure out that they'll never be NULL.

Yeah, that's probably it. (The compiler isn't inserting the checks, but rather the libraries are.) The vectorizer can't safely vectorize that code without changing when failure occurs if the vectors are different lengths.

Perhaps the best thing to do is to change zip, or add a new parameter to zip, to allow it to "chunk" its inputs and thereby enable vectorization. Or we could have a special type that encapsulates two or more vectors and checks up front that they're the same length, so that zip can then assume that they're the same length. It also might be possible to add an optimization to LLVM to have it figure out that the vectors are the same length.

If zip is following the 'functional' definition, it should only return a list of the length of the shorter of the two inputs. In Haskell:

    zip :: [a] -> [b] -> [(a, b)]

    zip :: [a] -> [b] -> [(Maybe a, Maybe b)]
If you want to allow processing a sequence of tails, you could have:

    zipWithTail :: [a] -> [b] -> ([(a, b)], Either [a] [b])
Which would return a list of pairs and a list of either the tail of the first or the second input, whichever was longer.

It is following that definition, but the issue is that it's built on iterators, which are of unknown length. Haskell would have effectively the same checks if built on lists because of the way lazy thunks work.

Prompted by pcwalton I've added a tiny specialized iterator for zipping two vector slices that apparently is simple enough to avoid the null pointer checks after all. This brings the runtime on my system down from ~3.1 seconds to ~2.06 seconds. Going from i64 to i32 brings it further down to ~1.03s, beating your C++ version (thanks!) which runs in ~1.55s on my system.

Code at http://ix.io/cUd

Nice! Is this going into the standard library?

I've looked at the output of both inner loops, and it seems I put too much faith in std::valarray's expression templates. Changing the distance function to

    int distance(std::valarray<int> const& x, std::valarray<int> const& y) {
        return inner_product(begin(x), end(x), begin(y), 0, 
        std::plus<int>(), [](int x, int y){ return (x-y)*(x-y); });
results in the exact same inner loop as Rust. The remaining speed difference comes from the file loading code, which admittedly is crap (way too many memory allocations). So for all intents and purposes, I admit defeat.

> Nice! Is this going into the standard library?

Possibly. There's been some talk in that direction, but also some "can't we just fix llvm?" objections. I guess someone needs to write up a pull request (and see what other iterator traits to implement, maybe).

So, after reading this thread I still didn't really get if that's something that most likely will be fixed over time, or Rust is fundamentally inferior to C++ in that sense?

I think we can fix it by adding a zip method on slices to avoid the null checks (as we just investigated).

But note that the C++ version has no bounds checks, so it's unsafe. You could always write C in Rust in the unsafe sublanguage, so saying it's "fundamentally inferior" is not true in any case.

For the sake of reproducibility, transparency and all that, here's the C++ version: http://pastebin.com/5pMka6ZJ

What's the justification for having a variable size int as the default? That seems like a mistake to me.

Variable-size as in C++, or variable-size as in Rust? Because they both seem like mistakes to me, but not everyone seems to agree. :)

I don't think it's a mistake in Rust, because of array indices.

I'm not convinced that array indexing is the primary use case of integral variables. And since this is the use case most often espoused for ints, they should probably be renamed to something more appropriate (intptr/uintptr? index/uindex? idx/uidx?). And heck, while we're at it, bare numeric literals should probably default to i32 rather than int (as gross as that feels, it's the least bad option that doesn't involve removing numeric literal type inference). But perhaps this is the wrong forum for this discussion. :P

I agree it is weird too.

Because usually ints are used as array indices, and you want them to be the same size as a pointer in that case.

There is an issue to remove the notion of a "default" integer type.

Ah, I see. Personally I think that isn't a strong enough justification, since this choice introduces possible 32/64 bit portability issues and prevents optimizations in scenarios like this. Arrays with >4 billion elements are rare and it would be a shame to compromise the language design for them.

It would also be a shame to end up with an overly restrictively language in future, where large arrays/values need special `..._64` functions (taking 64 bit integers, rather than 32 bit ones).

I like that justification, assuming there are int8, int16, int 32, int64, etc, there is no need for another int anyways.

Rust's iterators are really nice. As far as I can tell, the idea is to do only one check on each iteration, which doubles as an array bounds check (for memory safety) and a loop condition check (for loop termination). That's much better than the usual "for" loop with an index into a checked array, which uses one check for loop termination, and two more checks upon array indexing. Why don't more languages do that? It seems like a simple enough idea, I blogged about it sometime ago.

In the case of Java, I believe it's because the underlying JVM only supports bounds-checked array indexing. So even though the iterator APIs in Java could in theory eliminate the bounds checks, the high-level information is compiled away and the JVM has to prove on its own that the bounds check is not necessary.

I wonder if the language can support similarly fast access to user-defined data structures. Though maybe it's not a very interesting question, if arrays are built in. Most user-defined data structures are already their own iterators in some sense. For example, iterating over an ML-style linked list obviously requires only one check per iteration.

A Factor version compares favorably at 13 lines of code, and as fast as the parallel OCaml version.


Very cool! The code is quite clean and readable. I'm quite impressed by the D sample linked as well. I had a brief foray with D a year or so ago and liked it a lot. I'd like to see a speed comparison. I'm not surprised the author had trouble getting it to compile though; D seems to be similar to Rust in that it is still actively evolving as a language (I recall several times where I couldn't get the code in Alexandrescu's book to compile).

The author of the D post and I are having/had a discussion on reddit about the problems I had: http://www.reddit.com/r/programming/comments/27s7g6/comparin...

In summary, the Rust code in the direct source translation is about 3.5–4× faster than the OCaml. The parallel version is even faster. This is in safe code with bounds checks.

In addition, "I made no effort to remove/reduce/streamline allocations", which is a bit of a cruel tease. I want to know how much faster it could still be! :)

I just ran perf on it: the slowest allocation function takes 0.04% CPU time in total, meaning there's not much time to gain from just removing the allocations directly. There may still be a benefit from better data locality from fewer allocations.

In the future I'd recommend perf for benchmarking as well. `perf stat -r 3 ./foo` will do the repeated runs for you, and give you output like "1.002251432 seconds time elapsed ( +- 0.025% )", where that latter number appears to be the coefficient of variation.

Oh that's nice, thanks.

For any interested, I wrote up a Haskell version[1]. Initially, it was immensely slow (~ 2 minutes); however, switching over to unboxed vectors brought it to be on par with OCaml. There are still opportunities for improvement.

[1] http://lpaste.net/105413

And here's one that runs in half that time (or possibly less if you have a lot of cores)[1]. Speedups due to suggestions in this thread[2].

EDIT: now runs in parallel and is faster than the Rust single-threaded version, and on par with the parallel Rust version, but only ~35 lines.

[1] http://lpaste.net/105456

[2] http://www.reddit.com/r/haskell/comments/27tcvz/knearest_nei...

Related: Has there been any ETA announced on when Rust will stabilize? I'm very interested in learning more about it, but not until I don't have to read changelogs regularly.

A 1.0 release is wanted at the end of the year. Note: it will really only be stabilising the core language and an unspecified set of core libraries; it will likely take a while after that until there are stable libraries on a scale rivaling Python or Go.

I'm just hoping to avoid the level of instability that I've experienced in my Ruby projects over the years. It makes it nigh impossible to keep a project maintained sanely long-term as its dependencies evolve and even the language or stdlib change in backwards-incompatible ways every couple of years.

Not to dismiss the importance of language stability, but forward porting in Rust should be significantly easier than Ruby for a few reasons:

* Rust's strong type system means the pieces fit together only in specific ways - roughly, if you can get it to compile again it will work; in more dynamic languages you have little confidence that the code works until runtime.

* Co-evolving the language with the downstream community is such a critical issue that Rust is developing several tools and processes to help, and this should set it apart from other open source languages that have gone down this path:

The Rust process already attempts to tag all breaking changes in the commit log with `[breaking-change]` and we've heard anecdotally that this has made forward-porting Servo much easier. This log isn't published anywhere besides the commit log yet, but it will be.

Secondly, Rust has a [stability](http://doc.rust-lang.org/rust.html#stability) system that tracks API stability at a fine level. This is influenced by node.js, but in Rust stability is detected and use of unstable API's can be enforced by the tooling. This is still in development but you can see it in the [docs](http://doc.rust-lang.org/std/intrinsics/).

It's a fine line to walk. Don't change enough and you hemorrhage developers due to not supporting new and interesting features and concepts, change too much and you you hemorrhage developers who don't want to start a new project to run into the same problems as the last one, where an upgrade caused headaches. FWIW I think Rust's tagging of breaking changes (and eventually making them easy to identify and fix) as stated in a same level comment is wise no matter the strategy.

To give some concrete data about Rust's instability: https://gist.github.com/steveklabnik/616e89860446eb2a6732

I believe you've picked up some wisdom, congrats!

it will take at least a decade to rival Python. Go is a much, much easier target.

I doubt it will take that long to match what is considered the useful subset of Python libraries, if the language becomes popular.

What I've noticed from watching the Python, Ruby and Javascript module ecosystems approach and in some cases surpass CPAN (in terms of raw numbers of packages) is that there's a lot more programmers out there now, and a lot more existing implementations available in similar languages for just about anything you are attempting to do.

No one is aiming for all Python libs, just the standard libs (datetime,regex, etc.)

that looks like a 3-5 years affair, too. rust interests me greatly so i'll be really happy to be wrong.

1.0 is scheduled to be released by the end of 2014.

I've been watching rust from the sidelines for a few months now, cheering it on. I'm really excited for the language to stabilize a bit.

Has anyone here who uses it run into any downsides in comparison to other languages (aside from it being new and changing)?

I love rust, but the entire system of tracking lifetimes is both a major upside and a downside. When it "just works", it's like a dream where you can make a mess and have it cleaned up automatically, cheaply, and at exactly the right time. When you start getting errors or want to refactor across function call boundaries, it can be frustrating and/or require a quite deep understanding of the workings of the system to figure out how to fix it.

tl;dr: If you consider "has a fairly complex concept that is unfamiliar and necessary to learn with a fair degree of depth" a downside, then I think it has one.

I somewhat agree with your characterization. Lifetimes definitely adds a bit of complexity to the language. And you're right about the benefit; lifetimes are frankly awesome. It's really really cool to write memory safe code without using a GC.

When I first started with Rust (was also my first foray with lifetimes), the compiler completely kicked my ass for at least a few days. I struggled a lot with writing anything beyond a few functions, and especially when those functions were returning borrowed pointers. I think the code I wrote at the point could be fairly characterized as, "the bare minimum that I could convince the compiler to accept."

But as I wrote more code, I got better at it pretty quickly. At this point, I can look at most Rust code and feel pretty good about spotting lifetime errors before consulting the compiler. I'd say it only took me a couple thousand lines of code to get there, which isn't a huge price to pay.

Anyway, this is obviously a personal anecdote. But it's coming from someone who thought Rust was crazy complex only a few months ago. FWIW, it took me about 48 days from knowing absolutely zero Rust (other than random Internet buzz) to writing and getting libregex merged upstream.

We're in violent agreement. (Though you've gone much further with rust more quickly than I - libregex is beautiful.) A couple thousand lines to feel pretty solid with lifetimes sounds about right - I'm getting close to that range and the issues I'm hitting are increasingly of the obscure kind rather than the initial "nothing works, I guess I'll put it all on the heap!" kind.

This is basically why I said if you consider this a downside. I think it's only a "downside" in the same way that purity is a "downside" in Haskell - it isn't accidental complexity, and wrapping your head around it is just a part of really learning the language.

When I experimented with rust, I ran into 6393. ezyang had a good blog post about that and 6268:


By replacing the distance function with:

    let distance (a1 : int array) (a2 : int array) =
      let open Array in
      let len = length a1 in
      let acc = ref 0 in
      for i = 0 to len - 1 do
        let v1 = unsafe_get a1 i in
        let v2 = unsafe_get a2 i in
        let d = v1 - v2 in
        acc := !acc + d * d
the OCaml goes 3 times faster. This is what would be produced if OCaml's inliner had triggered on the original definition of `distance`, so that is probably the main difference in the two language's performance. If you inline some of the other functions by hand (and tidy up some of the sillier parts of the OCaml code) it easily runs 4 times faster than the original.

Indeed using the improved OCaml inliner (http://www.ocamlpro.com/blog/2013/07/11/inlining-progress-re...) on the original OCaml gives a speedup of 2.75.

How much work would it be to rewrite `slurp_file` to return an `Option<..>` result, that is replace all of the .unwrap calls into some kind of short-circuit return?

Ok that's the amazingest answer you could give to my question! Very cool, it doesn't look so bad.

In my mind this style should be the default for error handling when coding (i.e. I'd prefer to push the choice of using .unwrap or not to the caller)

It shouldn't be too hard, but you may have to write the short circuiting `map` by hand with a for loop.

you could use move_iter() instead of iter(), its more idiomatic.

Huh? They serve two different purposes. `move_iter` returns an iterator that consumes values (i.e., ownership transfers) while `iter` consumes borrowed references to values (i.e., no ownership transfer).

The only place in the code, that I can spot, where `move_iter` is even available is on line 46. But I don't see any compelling reason to use move_iter there (plus, `validation_sample.len()` in the final println would have to be moved up and let bound before the call to move_iter).

I think they were trying to say that borrowing is preferred over ownership transfer.

I don't need ownership inside the loop, so using `.iter` is fine. `.iter` is also allows the inner loop to be free from destructors, meaning it will be tighter and (likely) faster.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact