
Thread-Safe Lock Free Priority Queues in Golang - slobdell
http://scottlobdell.me/2016/09/thread-safe-lock-free-priority-queues-golang/
======
fmstephe
It is interesting to me how popular linked lists are for non-blocking data
structures. They are often slow and always so hard to reason about and
implement safely.

Here's a very quick alternative

    
    
        push all messages into a channel IN.
        A goroutine reads from IN inserts into a sorted TREE.
        same goroutine reads max from TREE and pushes to channel OUT.
        workers read from OUT.
    

This is a pretty mediocre implementation. But I post it because the numbers
posted in this article were so low. I think we can do better with something
far less sophisticated.

[https://gist.github.com/fmstephe/4fdc930ff180be3e92c693ad5a2...](https://gist.github.com/fmstephe/4fdc930ff180be3e92c693ad5a24e1b3)

I timed that to write 600,000 messages concurrently and read 600,000
sequentially in 1.5 seconds. Not a perfect measurement but I'm on a train and
I'm about to get to my stop.

If we need really high performance we can start substituting faster, and
simpler, queues than Go's channels and we can very likely improve that red-
black tree that is doing the sorting. Each of these components is simpler,
more likely correct, and from what I can see substantially faster than the
sophisticated lock-free queue described in the article.

I don't want to come across as snarky. But I am surprised by the popularity of
linked-list based lock-free data structures like this.

EDIT: So the tone of my post came across nastier than I really intend. I quite
enjoyed the article, I'd be interested how far the performance can be
improved.

~~~
slobdell
thanks for posting...I will try the implementation you linked to and see the
effects on performance.

Channels end up locking under the hood as well, so from a purity standpoint I
wanted to avoid those...if channels are indeed faster, then clearly I need to
rework my approach.

~~~
jasonwatkinspdx
My experience is that you can lock and unlock a golang mutex in around 130ns,
do a cas in around 60ns, and a fetch and increment in around 30ns. So building
something atop the atomic primitives is potentially faster, but the difference
is not so dramatic that you shouldn't try a simple implementation first.

~~~
greenleafjacob
Lock free structures typically have worse average or even uniformly worse
performance. The issue is what happens under contention or predictability.
Lock freedom guarantees that at least one process makes progress whereas with
locks a process can acquire a lock and get switched between cores or get GC
paused or whatever.

------
theaustinseven
These queues are thread safe and lock-free, but they are far from efficient.
My biggest issue here is that the runtime evaluations are wrong. The insert
operation is definitely an average runtime of N (where N is the number of
elements in the queue) and a worst case of infinity(since the insert operation
is restarted if a race condition occurs, an insert operation could be repeated
forever). I don't know if I missed anything, so I could be wrong, but this
implementation seems incredibly flawed.

That said, it is always nice to see new data-structures and I really wish
there were more posts like this. I always like to see the different ways that
people try to solve these problems.

~~~
slobdell
I agree, these are currently far from efficient (I hope to resolve this). My
suspicion right now is that a lot of time is being spent somewhere in spinning
loops waiting for progress to be made.

Average runtime of insertion is constant time because the size of the linked
list does not grow beyond a certain size, and inserting into the priority
queue is constant. The worst case is not infinity because there is always at
least one goroutine making progress.

Thank you for the comment about wanting to see posts like this :)

I do not think this implementation is flawed from the standpoint that at least
one goroutine is making progress, there are no locks, and results are returned
in order if dequeues are not saturated

This is not something that would make it to a white paper because the
implementation does not return deterministic results, and I could not explain
from a mathematical standpoint how flawed the results are returned because
they're not guaranteed to be in order.

I should have also included the profiling results. Currently 25% of time is
spent garbage collecting, which is expected because the priority queues
effectively use multi-version concurrency control, but this will come in handy
when I want to persist data to the hard drive with as little locking as
possible (and for general simplicity in avoiding corruption)

Another 15% or so is spent sleeping, which would be the spinning loop part.

~~~
theaustinseven
The insert is still an N operation since N in this case is the number of nodes
in the list. It would only be O(1) if the list was of constant size(and even
that would be misleading). The worst case is still infinity, because you are
concerned with a single insert operation, and not that there are some making
progress. We are merely concerned with the progress that a given insert
operation makes, and there could potentially be an infinite loop. You can
reduce the probability of this happening by shortening the loop between
repeating the CAS operation.

Otherwise I think it would be awesome to see profiling results, I suspect that
the garbage collection time can be reduced significantly. I have recently been
implementing a locking thread-safe queue, and I think it would be interesting
to compare the differences in speed and garbage produced, just to get a better
idea of the pros and cons. I suspect if you shortened the retry loop you have,
your implementation would be faster.

When I said flawed, I meant that it was not really enforcing order on
insertion, but for many applications, this doesn't matter _that_ much. Close
enough is often good enough(and the probability of this going wrong is fairly
small, except in extreme cases).

------
bjacokes
I get that this is dry subject matter and some humor helps to lighten it up,
but I found the writing style distracting. Some paragraphs are dense with
algorithm details, others just have a picture of a tech office, a story about
running into someone on BART, or descriptions of data structures as "lame" or
"losers". Made it really tough to retrace my steps and get context from the
preceding section.

~~~
Animats
Yes. I'm starting to call this "neckbeard writing".

~~~
pfarnsworth
I don't think neckbeards have 24" arms (as per OP's site).

------
Animats
Can you write lock-free code in Go without assembly language support?
Sometimes you need fence instructions or hardware compare-and-swap.

~~~
jasonwatkinspdx
Yes. Portable primitives are provided in the sync.atomic package. They've been
careful about the details (eg, inserting fences on architectures like arm).
Most applications shouldn't touch this stuff but it's there if you want to try
to write a lock free data structure or algorithm.

~~~
mappu
There are some interesting caveats to using sync.atomic. Some working code on
x86_64 did panic on x86_32 since the target struct member was no longer 64-bit
aligned.

~~~
jasonwatkinspdx
Yeah. I believe the API docs do call that out. There's not much that can be
done about it either, short of forcing 8 byte alignment for all golang objects
on 32bit platforms.

~~~
mappu
Not all, only those that can be proven to be passed to atomic calls.

Given the rejuvenated emphasis on compile performance, it's probably not
worthwhile to figure that out :) Overall Go's "don't sugarcoat everything with
abstractions" attitude suits me fine for now.

------
jalfresi
Hats off, this is a GREAT post! Lots of detail and I'm way out of my depth but
lots to dig into and learn from!

Thanks for posting!

------
happytrails
I didn't make it past the bar pic :(

~~~
employee8000
Comments like this are NOT appreciated on Hacker News, especially if you
haven't taken the time to read the article.

~~~
softawre
Really? This seems like (potentially poorly worded) feedback that might be
useful to the author.

