It’s not so much cache misses as allowing the core to run something else while the write completes.
This is why some code scales poorly and other code achieves near linear speed ups.
A core stalls on a write only if the store buffer is full. As hyper threads share the write buffer, SMT makes store stalls more likely, not less ( but still unlikely to be the bottleneck).
It’s not so much cache misses as allowing the core to run something else while the write completes.
This is why some code scales poorly and other code achieves near linear speed ups.