Hacker News new | past | comments | ask | show | jobs | submit login
Simple, Fast, Practical Non-Blocking and Blocking Concurrent Queue Algorithms (rochester.edu)
70 points by luu on May 26, 2014 | hide | past | favorite | 5 comments



You can get better performance in the one writer thread / one reader thread case by using simple lock-free circular buffers. No special instructions are needed for this. Cache line thrashing can be minimized by allocating the pointers into four cache lines: one writer local, one for writer-to-read communications, one reader local, and one for reader-to-writer communications.

You do need barriers: issue a write barrier before giving a copy of the write pointer to the reader. Issue a read barrier before reading the data.

Avoid cache line thrashing in a mostly full case by never allowing the circular buffer to become completely full (leave at least one cache line free).

The reader thread can quickly poll many circular buffers for work. The pointers will all reside in the reader's cache until someone has written some data (and updates the reader's copy of the write pointer).

You can get more benefits by delaying the update of the other thread's pointers: on the writer's side, until we have no more data available to write. On the reader's side, until we have read all the data (or some larger chunk of it). This allows the cache-line prefetching hardware to work (to prefetch likely used data).

Anyway, if you really want to use a linked list, at the very least allocate multiple items per cache line, and then link the cache lines together (so one link pointer or less per cache line).


If you want to see an implementation along the lines that jhallenworld describes Martin Thomson has some very high quality implementations here in Java. These are just single producer, single consumer queues

https://github.com/mjpt777/examples/tree/master/src/java/uk/...

I have a number of Go implementations of the same queues here

https://github.com/fmstephe/queues


This should really be tagged as 1996, especially as since this was written the world of lock-free algorithms has moved forwards a lot. At the very least Fober et al suggested an alternative that performs a tad better in 2002 (http://nedko.arnaudov.name/soft/L17_Fober.pdf).


I see a problem with implementing the non-blocking version in C. Compare-and-swap must operate on an entire struct. But for example gcc only allows integer types. Quote from gcc docs:The definition given in the Intel documentation allows only for the use of the types int, long, long long as well as their unsigned counterparts. GCC will allow any integral scalar or pointer type that is 1, 2, 4 or 8 bytes in length.

C11 has a new header stdatomic.h which only allows atomic for integer types.

How do you perform CAS on a struct in C( without using a mutex of course )?


By storing the struct in union with an integral type - or, if you need a value but not an address, casting to such a union. You might also think to just abuse pointer casting, but IIRC that breaks C strict aliasing assumptions and causes GCC's optimizations to do bad things to your program.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: