
Learning SIMD with Rust by Finding Planets (2018) - btashton
https://medium.com/@Razican/learning-simd-with-rust-by-finding-planets-b85ccfb724c3
======
gameswithgo
You may enjoy my video tutorial on SIMD Intrinsics as well:

[https://www.youtube.com/watch?v=4Gs_CA_vm3o](https://www.youtube.com/watch?v=4Gs_CA_vm3o)

I also use Rust but its perfectly fine for learning about intrinsics in C/C++
or .NET as well. I cover some of the fundamental strategies for using them
well, how to lay out data in memory, how to deal with branches, etc.

------
pixelpoet
> After running benchmarks with all the variants and planets, the improvement
> is about 9% to 12%.

Pretty weak speedup, maybe a straight up n-body implementation would see
closer to the 8x theoretical speedup.

~~~
Fronzie
> but Rust does not provide the Intel _mm256_cos_pd() instruction yet.

That might be part of the reason. Even with experience it's really hard to
optimize code without detailed profiling. Either with a profiler that shows
clock-ticks per instruction or by making very small changes to your code and
keep a log of the total running time after each change.

~~~
tom_mellior
> > but Rust does not provide the Intel _mm256_cos_pd() instruction yet.

> That might be part of the reason.

Yes, a cosine calculation should dominate all the rest of the computation.
Grepping through
[https://www.agner.org/optimize/instruction_tables.pdf](https://www.agner.org/optimize/instruction_tables.pdf),
the latency of FCOS is listed as at least 10x the latency of a floating-point
add or multiply across pretty much all microarchitectures.

I'm also unsure about re-packing the results of the cosine just to allow a
single multiply, the results of which are then unpacked again. It might be
faster to just do that multiply in scalar code, though that's exactly the
thing that would need to be measured.

------
qiqitori
> AVX functions start with _mm256_

I don't know anything about Rust, but a nicer word is probably "intrinsics".
They usually compile to a single instruction.

~~~
maeln
It's because they just use the name of the actual AVX ops
([https://software.intel.com/sites/landingpage/IntrinsicsGuide...](https://software.intel.com/sites/landingpage/IntrinsicsGuide/)).

This is a low-level lib. They don't want to hide anything. If you see _mm __*
you know you are using AVX and which version (which is important to know which
CPU is supported).

High level lib do use more natural names.

~~~
xiphias2
The commenter was pointing out that intrinsic functions shouldn't just be
called functions (I have no strong opinion on that comment). He wasn't
commenting about the names of the functions themselves.

~~~
maeln
Ah ok, I didn't understand.

------
krapht
This looks kinda gross to me. Do the rust developers not want to emulate what
ipsc and cuda do? Writing intrinsics by hand is not what I expect from a 2019
language.

~~~
psv1
To be honest, everything in Rust looks a bit ugly to me. I really tried to
like the language but the syntax, everything being overly annotated, and the
number of features that you need to understand to do simple tasks - all of
these make it not really worth it to pick Rust for new projects. There are
other problems like the ecosystem of crates and the lack of learning resources
but at least they aren't intrinsic to the language itself.

~~~
ekidd
> To be honest, everything in Rust looks a bit ugly to me.

I write a lot of Rust code at work, and I admit that it can sometimes be
pretty noisy. There are several major contributors to this:

1\. Rust offers fine-grained control over pass-by-value, pass-by-reference,
and pass-by-mutable reference. This is great for performance. But it also adds
a lot of "&" and "&mut" and "x.to_owned()" clutter everywhere.

2\. Rust provides support for generics (aka parameterized types). Once again,
this is great for performance, and it also allows better compile-time error
detection. But again, you wind up adding a lot of "<T>" and "where T:" clutter
everywhere.

3\. Usually, Rust can automatically infer lifetimes. But every once in a
while, you want to do something messy, and you end up needing to write out the
lifetimes manually. This is when you end up seeing weird things like "'a". But
in my experience, this is pretty rare unless I'm doing something hairy. And if
I'm doing something hairy, I'm just as happy to have more explicit
documentation in the source code.

Really, the underlying problem here is that (a) Rust fills the same high-
control, high-performance niche as C++, but (b) Rust prefers explicit control
where C++ sometimes offers magic, invisible conversions. (Yes, I declare all
my C++ constructors "explicit" and avoid conversion operators.)

Syntax is a hard problem, and I've struggled to get syntax right for even tiny
languages. But syntax for languages with low-level control is an even harder
problem. At some point, you just need to make a decision and get used to it.

In practice, I really enjoy writing Rust. It's definitely not as simple as
Ruby, Python or Go. But it fills a very different ecological niche, with
finer-grained control over memory representations, and support for generics.

