Hacker News new | past | comments | ask | show | jobs | submit login

What are some applications that use "relaxed memory concurrency"? Which ones really benefit from the performance increase?

How does it compare with using lock-free / immutable data structures? IMO this strategy is less error prone and easier to test. I've written multi-threaded C++, but mostly before C++11.




In general, relaxed memory models are useful because you do not need to fetch a cache line from another core (slow!) every time you want to perform a read or even possibly a write (although that could introduce a data race). In case of a lock, where you want to guarantee some state being equal for all cores, you'd need strong memory ordering guarantees.

For other applications, where it is only important that things will eventually propagate through, think of adding elements to a vector (which could be serialized in multiple ways anyway) or performing computation on data that is implicitly shared between cores but not actually read by the other core during the operations, a relaxed memory system is sufficient. It's also faster and more power efficient as high-speed core interconnects are a major drain, compared to, lets say the ALU in a modern CPU.

Lock-free structures are useful, but as your data structure gets more complicated and the updates involve more operations, it becomes a very non-trivial task to write a lock-free version of your operation. It's easier to simply lock and perform an 'atomic' update. Immutable structures often have performance overheads which are not desired in the C++ STL.


Minor nitpick: There is no such thing like fetching a cache line from another core, at least in Intel x64 architecture. All the synchronizing between the cores happen at L3 cache layer and QPI, because caches are inclusive and L3 is common to all cores. When a core writes to L3, it invalidates the same cache line in L1 and L2 caches of other cores. Therefore if your cores write to the same cache line, you have a big problem regardless of the chosen memory order model.


Relaxed is useful when you need atomicity but not consistency between cores. x86 guarantees that all reads/writes are atomic, i.e. no torn reads/writes but I don't think all architectures have this guarantee.

Also there maybe times in your program where you know that your thread has exclusive access to the variable during a certain segment of the program. During this segment it may be beneficial to do lazy reads/writes then complete the critical section with a release operation.

This is used heavily in single producer single consumer queues.


Minor nitpick, but x86 only guarantees atomic reads/writes for properly aligned values. I believe ARM is the same, but not very sure about that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: