
Lock-Free Algorithms For Ultimate Performance - ari_elle
http://www.infoq.com/presentations/Lock-Free-Algorithms
======
colanderman
Here's one better, if all your data is strictly less than (not equal to) a
cache line (usually 64 bytes) in size, and your processor guarantees memory
ordering within a cache line (most do):

1) Keep head and tail pointers local to the consumer and producer.

2) Associate a bit with each entry in the queue which denotes whether the
entry is full or not. The bit must live within the same cache line as the
entry itself.

3) Block on this bit, rather than head & tail pointers.

4) Set an entry's bit after filling the entry; clear it before.

5) You can now elide the memory fence (implicit in the .lazySet() method of
the atomic objects). Performance will skyrocket.

~~~
justanotheratom
Interestingly, the USB 3.0 xHCI Specification uses the scheme you mention for
circular queues shared between software and hardware.

4.9 TRB Ring: [http://www.intel.com/content/www/us/en/io/universal-
serial-b...](http://www.intel.com/content/www/us/en/io/universal-serial-
bus/extensible-host-controler-interface-usb-xhci.html)

------
svag
The presentation in pdf [http://qconsf.com/dl/qcon-
sanfran-2012/slides/MartinThompson...](http://qconsf.com/dl/qcon-
sanfran-2012/slides/MartinThompson_LockFreeAlgorithmsForUltimatePerformanceMOVEDTOBALLROOMA.pdf)
for off-line viewing.

------
mgl
If you interested in HPC topic, especially in terms of HFT, you should his
previous presentation on LMAX system:
<http://www.infoq.com/presentations/LMAX>

------
exDM69
Really good presentation, although a bit long.

What was most interesting to me was the actual performance figures and the
small optimizations that yield big improvements.

------
cnlwsu
great presentation

