
A Wait-Free Queue as Fast as Fetch-And-Add [pdf] - EvgeniyZh
http://chaoran.me/assets/pdf/wfq-ppopp16.pdf
======
svckr
I've been playing with wait-free algorithms some time ago but never got
anywhere meaningful.

A wait-free queue for multiple consumers and multiple producers would be kind
of the holy grail when it comes to wait free data structures. Is it MPMC
though? I skipped through the PDF and could not find a definitive claim…

Also, big thumbs up to Chaoran for putting the implementation on GitHub:
[https://github.com/chaoran/fast-wait-free-
queue](https://github.com/chaoran/fast-wait-free-queue) The code looks like it
is easy to follow and it's MIT licensed.

~~~
OJFord
Ah! I thought I recognised that name/domain.

He previously saved my bacon with `node-finish` - a Node.JS library that let's
you fire off a bunch of tasks, and wait for _all_ to finish before executing a
callback.

What a hero.

~~~
rco8786
Promise.all?

~~~
striking
But for callbacks, which is much more of a pain.

There was a time before Promises, you know.

~~~
rco8786
OP said "recently".

Even before Promises, wrapping up some callback functions and tracking async
task state isn't rocket surgery.

~~~
OJFord

        > OP said "recently".
    

I did? I think you misread my "previously".

Regardless, I was new to JS and not up to the task of implementing it myself.
@chaoran's library did exactly what I needed.

An appreciative note didn't need to turn into criticism of the project's
worth.

------
adrianratnapala
So I just recently learned how a rather simple ring buffer implementation is
lock-free and doesn't even need primitives like CAS (although it does need
memory barriers).

In spite of this, all the literature about lock-free queues that I see
(including this one) is for some kind of linked structure. Any reason for
that?

I guess one big limitation of the ring buffer is that it only works for a
single reader and single writer. Although maybe you can work around that using
CAS primitves...

~~~
Betelgeuse90
I'm not sure you're aware of this, but there's a distinction between wait-free
and lock-free.

[http://rethinkdb.com/blog/lock-free-vs-wait-free-
concurrency...](http://rethinkdb.com/blog/lock-free-vs-wait-free-concurrency/)

This might be relevant to answering your question.

~~~
adrianratnapala
I became aware of the distinction when I skimmed the paper. But I couldn't
figure out what it meant in this case.

Specifically does the _every thread_ requirement rule out algorithms where
there only one reader and one writer is allowed? If not, then this ring buffer
is wait-free, since there is no CAS-and-retry step. (Of course an operation
can _fail_ because the queue is empty/full, but that is still completion).

~~~
the8472
there are several aspects to consider when talking about a concurrent queue:

* multiple/single producer, multiple/single consumer

* progress guarantees (wait-free, lock-free)

* bounded vs. unbounded

* whether it's double-ended or single-ended

* memory reclamation being dependant on a GC or not

* which atomic operations they need (e.g. some require double-CAS)

What you mention is a spsc bounded queue. this is about a mpmc unbounded
queue. And from skimming the paper is seems like it does not require a GC
either.

So it's far more powerful.

Of course spsc queues still have their use even if we have a good mpmc queue,
they generally incur less overhead.

------
kbwt
For context, the traditional wait-free queues are only wait-free for one side,
writing or reading. The other side has to spin in an atomic fetch loop,
waiting for the wait-free multi-step operation to complete.

The double wait-free queue from the paper is using a technicality to achieve
its status. The spinning operation is replaced with a loop over all the
threads doing opposite (multi-step) operations, finishing them for threads
that may be blocked in the middle of the steps. It's not that it doesn't spin,
but the spinning is bounded by the number of threads. Edit: Actually, spinning
is bounded by O(#threads^4) according to the paper.

------
swalsh
Funny, I had a coworker several years ago whom built this exact algorithm. The
performance of our system increased literally orders of magnitude.

I've been slipping it into systems for a couple years now :D

~~~
bjterry
Did his algorithm include the helper concept that makes this wait-free, or are
you just talking about the fetch-and-add queue part?

------
biokoda
Is it better than: [http://www.1024cores.net/home/lock-free-
algorithms/queues/no...](http://www.1024cores.net/home/lock-free-
algorithms/queues/non-intrusive-mpsc-node-based-queue)

Which is what Rust mpsc uses. It has a big advantage in being short and easy
to understand.

~~~
gpderetta
Dmitry Vyukov queue is awesome, but as the author himself notices, a slow
writer can prevent the reader to make progress, and as such is not even non-
blocking.

------
zvrba
Hmm.. why bother with a complicated approach now that transactional memory
(Intel TSX) is available in HW now? What are the advantages and disadvantages
of this approach vs using TSX?

~~~
hendzen
Hardware transactions as implemented by TSX are not wait free. There is no
guarantee that any TSX transaction will ever commit, so you need a non-
transactional fallback path to guarantee forward progress.

~~~
zvrba
Do you know the relative performance of Power7 and Haswell? The implementation
is not wait-free on Power7 because of HW limitation, and has significantly
lower throughput than Haswell in the benchmarks.

------
brightball
So how different is this from something like
[https://github.com/chanks/que](https://github.com/chanks/que)

