Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mostly. ARM actually has instructions that allow a full 64-bit shift then add.

But I checked the Cortex A78 optimisation manual. They take 1 cycle if the shift is 4 or less and 2 cycles in other cases.



0-4 shift not 0-3? That is a little bit weird.


Arm64 has fast 128-bit loads. Not just with NEON, but with regular integer instructions, you can quickly load 128 bits into a pair of 64-bit registers.

So it kind of makes sense to support fast shift by four. Though, it's more likely they just profiled a bunch of code and decided fast shifts by four was worth budgeting for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: