Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On modern CPUs atomic adds are now reasonably fast, but only when they are uncontended. If the cache line the value is on has to bounce between cpus, that is usually +100ns (not cycles) or so.

Writing performant parallel code always means absolutely minimizing communication between threads.





Sure, but even the uncontended case is ~10x slower than regular ADD.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: