Hacker News new | past | comments | ask | show | jobs | submit login

Ah, I think I see. This would be useful for shared state that is read much more often than it is written. That seems to position it mainly as a competitor to RCU, which is likewise designed to have cheap reads and more expensive writes. But compared to RCU, both reads and writes would be faster with TSX (and in particular, uncontended writes would be especially faster).



The idea behind TSX was originally called speculative lock elision, and comes from this MICRO 2001 paper [1]. The paper provides a more coherent motivation of SLE/TSX than the Intel marketing literature, and contains some concrete (simulated) performance results.

[1] http://pages.cs.wisc.edu/~rajwar/papers/micro01.pdf


I see. So if I may paraphrase, TSX can also help write work-loads by eliminating locks that weren't actually needed because no actual contention over the same data occurred. This can happen if the lock was too coarse-grained and the writes were actually to different memory, or if the dynamically-executed path didn't actually mutate shared memory.

Both of these are problems you can code around, but doing that would likely require greater effort and complexity.

It's unclear to me whether recovering from mis-speculation is more expensive than acquiring the lock up-front. In other words, is there overhead in the case that you almost always encounter actual contention?

It's also unclear to me how this can be done without instruction set support, as the paper claims. Without having the critical sections explicitly marked how can this be implemented?


> I see. So if I may paraphrase, TSX can also help write work-loads by eliminating locks that weren't actually needed because no actual contention over the same data occurred. This can happen if the lock was too coarse-grained and the writes were actually to different memory, or if the dynamically-executed path didn't actually mutate shared memory.

Yes, those two cases are what I'd consider secondary benefits, that could be achieved straightforwardly (but not easily, as you note) without TSX. The primary case -- when the lock is fine-grained, and thread A does mutate a value that thread B does eventually read, but B's read doesn't come until A is already out of the critical section -- is the big win for well-designed software, IMO.

> It's unclear to me whether recovering from mis-speculation is more expensive than acquiring the lock up-front. In other words, is there overhead in the case that you almost always encounter actual contention?

On big x86 cores, there shouldn't be much performance overhead. x86 already has mechanisms for holding tons of in-flight memory operations in the core without releasing and/or committing them to the rest of the memory system. TSX simply specifies a new reason for holding them ("we're in the middle of an SLE transaction"). If your transaction is aborted because another core touched the lock, you can flush that buffer of memory ops and rollback to a checkpointed register file almost instantly. The paper describes a hardware predictor that decides whether or not to initiate a transaction on a given store; in any smart implementation, this predictor would incorporate information like "have I incurred expensive rollbacks due to transaction aborts on this store?" In that way, a smart implementation should tune itself to yield arbitrarily low average- or worst-case overhead over the non-TSX case.

> It's also unclear to me how this can be done without instruction set support, as the paper claims. Without having the critical sections explicitly marked how can this be implemented?

(No pedantry intended, I'm sure you already read this) It's explained in Section 3.3. They look for this pattern:

  Initially, MEM[x] = A
  STORE MEM[x] = B
  ...
  ...
  STORE MEM[x] = A (original value)
They consider anything between these "silent store pairs" to be a potential critical section, and the predictor can choose to try to elide the two stores and form a transaction, if it wishes. It is always OK to elide these two stores, whether it's a "real" critical section or not, as long as nobody tries to read MEM[x] in the middle (in the memory model they assume, anyway).

You definitely require ISA support if:

1) Your silent stores may have other side effects (I/O?)

2) You have synchronization primitives that don't fit the silent store pattern (ticket lock?)

3) You have a strong business interest in making the behavior of existing software, even at the microarchitectural level, as predictable as possible (Intel?)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: