Hacker News new | comments | show | ask | jobs | submit login
Practical Cross Platform SIMD Math (gamedev.net)
34 points by rahulroy 1426 days ago | hide | past | web | 5 comments | favorite



I'm not sure if I understand why is he returning from the functions. Wouldn't it be faster if result variable was passed by pointer or reference instead?


No, pass by value is a lot faster if the values being passed can be passed in registers, which is the case with SIMD vector values and inline functions. Passing by pointer adds restrictions to what the compiler can optimize and in the worst case you actually end up pushing values to the stack just to read them back to registers in the called function.

I recently spent a good deal of time looking at assembly code emitted by compilers when doing SIMD code with C and intrinsics. On GCC and Clang pass by value and force_inline functions gave the best results (at least until link time optimization becomes more mainstream). This was even the case with 4x4 matrix structs, not just SIMD vectors.

The speed is not in getting individual functions to work fast, but to let the compiler inline and combine several function calls together and keeping live values in registers from one function to another.

Here's my SIMD math lib: https://github.com/rikusalminen/threedee-simd


Compiler knows best. Again. Thanks, I wasn't really aware of this. I'll have to make a habit of reading asm output from my compiler... something I haven't done in years.

Your math lib look nifty! Which license would you share it with? Also, why in particular std=gnu99?


> Your math lib look nifty! Which license would you share it with?

Thanks! zlib license.

> Also, why in particular std=gnu99?

Because I use some c99 things and using -std=c99 will disable some posix/gnu extension features. I think it was time.h and clock_gettime which I was using for benchmarking.


Return Value Optimization is a generally implemented optimization technique that makes most value returns equivalent to reference passing and is allowed by the standard.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: