
Implementing Queues for Event-Driven Programs - ingve
http://ithare.com/implementing-queues-for-event-driven-programs/
======
heavenlyhash
> In the extreme cases (and ultra-small sizes like 2 or 3), we can even run
> into deadlocks (!).

If this happens to you, you're doing it wrong. You _have_ deadlocks; they just
aren't happening "usually" unless you're "unlucky"... but happen they will!

Build your program to run correctly when all queues are buffer size of 1. If
it doesn't work, you're doing it wrong.

 _Now_ raise the buffer sizes to something reasonable to reduce the
performance frictions of too-strict scheduling. You will never have deadlocks.

~~~
jcbeard
Agreed. If having deadlocks, then there's likely a cyclic dependency the
programmer/author didn't consider. This happens a lot in streaming/data-flow
code. Worse, something in the queue that's wonky. Seems to happen on multi-
arch lockless FIFO's when the authors didn't consider corner cases of
architectures with more relaxed coherence, or architected features (e.g., 128
vs. 64-byte cache lines).

------
gpderetta
Regarding "Removing locks completely", the author laments not knowing of any
readily available library providing the blocking-on-necessary support.

I suggest looking into Event Counts, which can be used to non-intrusively add
blocking behavior to most lock free queues; they are the non-blocking world
equivalent of condition variables. Facebook's Folly library provides a readily
available open source implementation of Event Counts.

~~~
no-bugs
THANKS! I've added it (not 100% sure how it would work in practice, but
certainly looks interesting).

~~~
gpderetta
You are welcome. Note that the event count is a general synchronization
primitive, the one in folly is just an implementation.

BTW, you can build an efficient event count on top of eventfd, so you can even
poll/select it.

------
colanderman
On Linux, another way to signal arbitrary events instead of an anonymous pipe
is via eventfd(2).

~~~
no-bugs
Added it, THANKS!

------
Koromix
> is an ability to push asynchronous messages/events there (usually from
> different threads), and to get them back (usually from one single thread) –
> in FIFO order, of course.

It is worth noting that FIFO only holds for messages coming from a single
thread. If you have one thread posting keyboard events and another mouse
events it is very possible that the reader will get them in the "wrong" order,
even with tens of milliseconds separating them from the user's POV.

Now it seems obvious when I say it, but it is easy to forget that when you
develop the reader side, especially since most of the times the ordering will
hold. This leads to intermittent and non-obvious bugs down the line.

On the other hands, if you fully acknowledge the lack of ordering guarantee,
you can use that to make a more efficient queue by using one-buffer-and-lock-
per-thread. Each producer fills its own lock-protected queue, and the reader
just needs to find a non-empty queue, lock it and pop an item, without any
penalty for the other threads.

Of course, the inter-thread ordering will be even worse but you keep the
ordering of events from one producer, which was the only actual guarantee you
had to begin with in the naive version.

~~~
no-bugs
> make a more efficient queue by using one-buffer-and-lock-per-thread.

How the reader would block on such a distributed queue when it's empty
(without creating one single mutex, which would re-establish the single
contention point)? If there is no way to block while waiting for input, it
means polling, and polling is a Really Bad Thing...

~~~
Koromix
The reader scans all queues, without locking. If one is non-empty, then you
lock it and pop an item. Only if are all queues are empty then you hit the
slow path and wait on a synchronization primitive (condition variable).

Of course if you can assume that the queue will always be very busy, you can
ditch that too and poll instead. Polling is often bad, but not always. When
things go very fast, it can be more efficient.

A good example of that occurs with the new storage NVM Express storage
technologies and Linux, where the traditional I/O completion interrupt used
until now is starting to bottleneck:
[https://lwn.net/Articles/663879/](https://lwn.net/Articles/663879/)

------
exDM69
From the first code example:

> Yep, notifying outside of lock is usually BETTER. Otherwise the other thread
> would be released but will immediately run into our own lock above our own
> lock above, causing unnecessary (and Very Expensive) context switch

Is this true and if so, on which platform(s)? Aren't most mutex/condition
variable implementations optimized to avoid this case by deferring the
condition signal to the mutex unlock?

Additionally, in the implementation of kill() in the latter examples, there
should be notify_all() instead of notify_one().

~~~
no-bugs
> Aren't most mutex/condition variable implementations optimized to avoid this
> case by deferring the condition signal to the mutex unlock?

Yes, _some_ but not _all_ implementations are doing it; however, as it is not
really guaranteed (and actually is a workaround for poorly written programs) -
standing recommendation is still to notify after the lock (which can be
better, can be the same, but won't be worse than doing it within), see for
example
[http://en.cppreference.com/w/cpp/thread/condition_variable/n...](http://en.cppreference.com/w/cpp/thread/condition_variable/notify_one):

"The notifying thread does not need to hold the lock on the same mutex as the
one held by the waiting thread(s); in fact doing so is a pessimization, since
the notified thread would immediately block again, waiting for the notifying
thread to release the lock. However, some implementations (in particular many
implementations of pthreads) recognize this situation and avoid this "hurry up
and wait" scenario by transferring the waiting thread from the condition
variable's queue directly to the queue of the mutex within the notify call,
without waking it up."

> Additionally, in the implementation of kill() in the latter examples, there
> should be notify_all() instead of notify_one().

In MOST cases, it didn't matter (as there was only one blocking thread - the
one reading), but yes, there was one case when it was indeed important. Fixed
now, THANKS!

~~~
exDM69
Since most platforms defer condition signalling to mutex unlocks, the way the
code is now written will cause _more_ spurious wakeups and context switching
than necessary.

The pathological case in scheduling goes like this:

    
    
        1. Reader A enters pop_front on an empty queue, goes to wait on the condition variable
        2. Writer W enters push_back, adds an element to list and releases mutex and get pre-empted (on line 25 first example)
        3. Reader B enters pop_front, sees queue not empty, pops element and leaves
        4. Writer W signals condition, wakes up Reader A
        5. Reader A wakes up but the queue is empty, goes back to sleep causing unnecessary context switch and trashing the TLB
    

The code is "correct" either way, but it's now optimized for the rare
platforms that have a badly implemented condition variable.

I would say it's usually better to signal your condition variables with
mutexes locked unless you _know_ you're running on a platform with flaky
condition variables. Easier to guarantee correctness and avoid spurious
wakeups that way.

related note: cppreference.com is a terrible resource. It was worse 10 years
ago, but it's not improved much. I would not trust the weasel wording ("some
implementations" etc) in the link you posted, what it says about pthreads
contradicts the pthread man pages (that encourage signal while holding lock)

~~~
no-bugs
From what I've seen (YMMV), chances of it happening are MUCH smaller than
chances of getting context switch under the lock (with lots of threads running
into this lock and having their own context switches), because of spending
more time under the lock than it is really necessary (and ANY call, especially
kernel call at 300+ clocks, is a LOT of time). Strictly speaking, it needs to
be measured, but until that point - I'm keeping my mutex locks as small as
possible.

------
Myrmornis
Do educated adults find the cartoons of hares to usefully or enjoyably
complement the text?

~~~
aarongolliver
Some, yes. [0]

[0] [http://i.imgur.com/Y5X3h96.png](http://i.imgur.com/Y5X3h96.png)

~~~
Myrmornis
Haha OK, thanks! I don't mind the cartoon of the queue at the top but for some
reason I don't like having a little cartoon lagomorph appear in a speech
bubble by the paragraphs while I'm reading.

