Some years ago I submitted an n-body implementation that used the crunchy crate to unroll the inner loop. This was rejected as being non-idiomatic and obscuring what compilers are/aren't capable of. Rust is currently leading the benchmark because someone added the flag -C llvm-args='-unroll-threshold=500' which achieves the same effect. Why one of these is acceptable and the other isn't is beyond me, and all of this makes the whole project very discouraging.
It's not entirely surprising that a carefully-optimized C program using explicit SSE intrinsics, plus a fancy trick involving a low-precision square root instruction fixed up with two iterations of Newton's method, would be fast. :-)
What impresses me is that the Rust version didn't do any of that stuff, just wrote very boring, straightforward code -- and got the same speed anyway. Some impressive compilation there!
Someone wrote a 6 part of blog post [0] about porting that nbody benchmark from C to Rust. They went from a straight line by line port using unsafe rust using the same SSE based design to clean no-unsafe and no SSE rust code that was faster then the original C code with hand optimized SSE.
It is a great example of how the Rust compiler can auto-vectorize code.
I would be shocked if those two programs performed the same. C and Rust are using the same compiler backend, better aliasing information probably isn't going to make that big of a difference.
Are you sure the rust performance data isn't for one of the other implementations that use the same crufty tricks as the C version?
It also looks like this version is using an algorithm that none of the others use: it precomputes the distance pairs for all bodies. Some of the others precompute the vectors between the pairs, but that's not the expensive part of computing the distance.
As an aside: this made me notice this n-body simulation only has 5 bodies! This is a pretty strange case that makes these O(n^2) optimizations practical. The n-body simulations I've been familiar with in the past had thousands of bodies, where this approach probably isn't a good idea.
But, C and C++ compilers also can autovectorize quite well. I had some SSE & AVX algorithms where the compiler ended up doing the same or lightly better job in C++ for instance.
So maybe it's just the C benchmark being a cargo cult.