The fastest n-body program is written in very idiomatic rust. https://benchmarks...

madars · on Jan 4, 2021

In https://old.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov... /u/Saefroch writes:

Some years ago I submitted an n-body implementation that used the crunchy crate to unroll the inner loop. This was rejected as being non-idiomatic and obscuring what compilers are/aren't capable of. Rust is currently leading the benchmark because someone added the flag -C llvm-args='-unroll-threshold=500' which achieves the same effect. Why one of these is acceptable and the other isn't is beyond me, and all of this makes the whole project very discouraging.

FartyMcFarter · on Jan 3, 2021

n-body in C compiled by clang runs just as fast as Rust apparently:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

pjscott · on Jan 3, 2021

It's not entirely surprising that a carefully-optimized C program using explicit SSE intrinsics, plus a fancy trick involving a low-precision square root instruction fixed up with two iterations of Newton's method, would be fast. :-)

What impresses me is that the Rust version didn't do any of that stuff, just wrote very boring, straightforward code -- and got the same speed anyway. Some impressive compilation there!

neopallium · on Jan 4, 2021

Someone wrote a 6 part of blog post [0] about porting that nbody benchmark from C to Rust. They went from a straight line by line port using unsafe rust using the same SSE based design to clean no-unsafe and no SSE rust code that was faster then the original C code with hand optimized SSE.

It is a great example of how the Rust compiler can auto-vectorize code.

0. http://cliffle.com/p/dangerust/6/

creato · on Jan 4, 2021

I would be shocked if those two programs performed the same. C and Rust are using the same compiler backend, better aliasing information probably isn't going to make that big of a difference.

Are you sure the rust performance data isn't for one of the other implementations that use the same crufty tricks as the C version?

e.g.: https://benchmarksgame-team.pages.debian.net/benchmarksgame/..., https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

pjscott · on Jan 4, 2021

It's pretty startling, but yes, that non-crufty version is "Rust #8" that tops the n-body performance lists:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

It looks like the autovectorizer did a really good job on this one.

creato · on Jan 5, 2021

It also looks like this version is using an algorithm that none of the others use: it precomputes the distance pairs for all bodies. Some of the others precompute the vectors between the pairs, but that's not the expensive part of computing the distance.

As an aside: this made me notice this n-body simulation only has 5 bodies! This is a pretty strange case that makes these O(n^2) optimizations practical. The n-body simulations I've been familiar with in the past had thousands of bodies, where this approach probably isn't a good idea.

FartyMcFarter · on Jan 3, 2021

Good point! It would be interesting to find out where the Rust version gets most of its speed from.

neopallium · on Jan 4, 2021

The Rust compiler can auto-vectorize loop code.

This blog post shows how to write simple idiomatic Rust code that will allow the compiler to auto-vectorize:

http://cliffle.com/p/dangerust/6/

jcelerier · on Jan 4, 2021

But, C and C++ compilers also can autovectorize quite well. I had some SSE & AVX algorithms where the compiler ended up doing the same or lightly better job in C++ for instance.

So maybe it's just the C benchmark being a cargo cult.

amelius · on Jan 4, 2021

Yes, it could be mostly LLVM doing the heavy lifting here, for all we know.

awestroke · on Jan 3, 2021

The code for the C-Clang version is terrifying, compared to the Rust version. Which one would you rather maintain?

berkut · on Jan 3, 2021

Well, given the comments at the top:

// Contributed by Mark C. Lewis.

// Modified slightly by Chad Whipkey.

// Converted from Java to C++ and added SSE support by Branimir Maksimovic.

// Converted from C++ to C by Alexey Medvedchikov.

// Modified by Jeremy Zerfas.

It sounds like no-one bothered to actually write a from-scratch version of many of these things :)