Hacker News new | past | comments | ask | show | jobs | submit login

nth_set_bit_u64: wouldn't that be __builtin_ctzll(_pdep_u64(1<<n, v)) with BMI2?



That's my guess as well.

Bitstring rank/select is a well-known problem, and the BMI and non-BMI (Hacker's Delight) versions are available as a reference.


That's assuming you're ok with your program not running on some older cpus.


That and that you're not willing to entertain splitting the manual version as #[cfg(not(target_feature = "bmi2"))] fallback implementation. For something already down to ~ 1 ns both of those may well be very reasonable assumptions of course.


AMD machines prior to Zen 3 had a micro-coded implementation of pdep and pext, so they're actually relatively expensive for those earlier Zen machines (as well as Bulldozer). Some people still have Ryzen 3000 series chips.

On the Intel side, pdep has been fast since its release with the Haswell in 2013, so pretty much everyone using Intel should be fine in this regard.


I believe the equivalent ARM64 instructions are in SVE2 which isn’t yet supported on Apple’s M-series chips as of M4, sadly.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: