As someone who has used TSX to optimize synchronization primitives, you can expect to see a ~15-20% performance increase, if (big if) your program is heavy on disjoint data access, i.e. a lock is needed for correctness, but conflicts are rare in practice. If you have a lot of threads frequently writing the same cache lines, you are probably going to see worse performance with TSX as opposed to traditional locking. It helps to think about TSX as transparently performing optimistic concurrency control, which is actually pretty much how it is implemented under the hood.
While the API of TSX (_xbegin(), _xabort(), _xend()) gives the appearance of implementing transactional memory, it is really an optimized fast path - the abort rate will determine performance. The technical term for what TSX actually implements is 'Hardware Lock Elision'.
If you are going to use TSX, don't use it directly unless you are writing your own primitives (i.e. transactional condition variables [0]), prefer using Andi Kleene's TSX enabled glibc [1] or the speculative_spin_mutex [2] in TBB.
Thanks for the links. Im aware of Andi's work and enjoy reading his posts on TSX. You're absolutely right about HLE vs HTM.
Mostly I wish there was a better fallback than locking when transactions fail, but I suppose this is more in line with thinking of them as optimistic locks (such as what I assume the speculative spin lock does).
Do you have any insight into how it compares to lock-free techniques?
A full-blown (say) lock-free hash table is almost insurmountable - I recall seeing an efficient reusable implementation a couple of years ago, but most before that had serious performance or correctness issues. However, if you take this into account early enough in the design of the system, you can usually do with much simpler lock free data structures (seqlock and friends).
transactional memory is, of course, a useful abstraction when you can't (or didn't) take this into account upfront.
Lock-freedom is all about progress, not performance. i.e. generally the only reason to use a lock-free algorithm is if you have issues with ill-behaved code stalling other processes. (I can think of several simple ways to make a fast concurrent hash table, none of which provide lock-freedom, yet which avoid taking locks when necessary.)
Transactional memory is an access abstraction that presents memory as something over which transactions can be made. It says nothing about whether the implementation is lock-free (though the better implementations approach, or are lock-free). There exist software TM algorithms which lock during the transaction, which lock only briefly at the end of the transaction, and which are lock-free, and some which are a combination of these.
TSX (specifically HLE) only takes a lock if the transaction fails without taking a lock, in order to guarantee forward progress in the absence of an ill-behaved transaction (assuming fair locks). Were the software fallback implemented using lock-free techniques (possible with the more general RTM portion of TSX), this guarantee would extend to ill-behaved transactions as well.
> Lock-freedom is all about progress, not performance. i.e. generally the only reason to use a lock-free algorithm is if you have issues with ill-behaved code stalling other processes.
Lock freedom is about progress, but in many practical cases (short time between load-linked/store-conditional pair, for example) it beats locks and everything else in performance -- basically, in the best case you do the same work as locks (i.e. synchronize caches among CPUs), and in the worst case, you spin instead of suspending the process.
Of course, if you do a lot of work between the load and store (or whatever pair of operations need to surround your concurrent operation), and there's a good chance of contention, then ... yes, it will not perform well. But that's not how you should use it (or how it is used most of the time).
And .. transactional memory is an abstraction the leaks differently than locks, but you must still be aware of the failure modes. It's higher level, but does not magically resolve contention.
While the API of TSX (_xbegin(), _xabort(), _xend()) gives the appearance of implementing transactional memory, it is really an optimized fast path - the abort rate will determine performance. The technical term for what TSX actually implements is 'Hardware Lock Elision'.
If you are going to use TSX, don't use it directly unless you are writing your own primitives (i.e. transactional condition variables [0]), prefer using Andi Kleene's TSX enabled glibc [1] or the speculative_spin_mutex [2] in TBB.
[0] - http://transact09.cs.washington.edu/9_paper.pdf
[1] - https://github.com/andikleen/glibc
[2] - http://www.threadingbuildingblocks.org/docs/doxygen/a00248.h...