Hacker News new | past | comments | ask | show | jobs | submit login

Looking at interpret.cpp for SIMD potential: I bet you could add an allocator for std::vector that aligns and pads everything to 32 bytes then just replace all of the scalar op loops with loops over AVX intrinsics. No need for an external library.



That's a possibility to get something running near term. I'm trying to avoid CPU-specific intrinsics since I have a fantasy that this might be run on ARM in the future, though that may be getting really ahead of myself.


NEON intrinsics are pretty easy as well ;) As long as you are doing simple +-*&| ops they work the same as SSE.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: