On modern CPUs atomic adds are now reasonably fast, but only when they are uncon...

		Tuna-Fish 12 days ago \| parent \| context \| favorite \| on: Shared_ptr<T>: the (not always) atomic reference c... On modern CPUs atomic adds are now reasonably fast, but only when they are uncontended. If the cache line the value is on has to bounce between cpus, that is usually +100ns (not cycles) or so. Writing performant parallel code always means absolutely minimizing communication between threads.

Sure, but even the uncontended case is ~10x slower than regular ADD.