

Practical Cross Platform SIMD Math - rahulroy
http://www.gamedev.net/page/resources/_/technical/general-programming/practical-cross-platform-simd-math-r3068

======
Keyframe
I'm not sure if I understand why is he returning from the functions. Wouldn't
it be faster if result variable was passed by pointer or reference instead?

~~~
exDM69
No, pass by value is a lot faster if the values being passed can be passed in
registers, which is the case with SIMD vector values and inline functions.
Passing by pointer adds restrictions to what the compiler can optimize and in
the worst case you actually end up pushing values to the stack just to read
them back to registers in the called function.

I recently spent a good deal of time looking at assembly code emitted by
compilers when doing SIMD code with C and intrinsics. On GCC and Clang pass by
value and force_inline functions gave the best results (at least until link
time optimization becomes more mainstream). This was even the case with 4x4
matrix structs, not just SIMD vectors.

The speed is not in getting individual functions to work fast, but to let the
compiler inline and combine several function calls together and keeping live
values in registers from one function to another.

Here's my SIMD math lib: <https://github.com/rikusalminen/threedee-simd>

~~~
Keyframe
Compiler knows best. Again. Thanks, I wasn't really aware of this. I'll have
to make a habit of reading asm output from my compiler... something I haven't
done in years.

Your math lib look nifty! Which license would you share it with? Also, why in
particular std=gnu99?

~~~
exDM69
> Your math lib look nifty! Which license would you share it with?

Thanks! zlib license.

> Also, why in particular std=gnu99?

Because I use some c99 things and using -std=c99 will disable some posix/gnu
extension features. I think it was time.h and clock_gettime which I was using
for benchmarking.

