
The Convoy Phenomenon - feross
https://blog.acolyer.org/2019/07/01/the-convoy-phenomenon/
======
PaulHoule
Many systems have a performance vs load curve which, at a given workload, may
have two branches, a high performance branch and a low performance branch.

A good example is a web server which is memory constrained. Once it starts to
swap, performance degrades rapidly, and memory consumption per unit of
throughput goes up dramatically since it takes so much longer to handle each
request.

Thus there is hysteresis: a brief load spike kicks off swapping, puts the
system in a low-performance state, and then the machine stays in the low-
performance state despite having a workload which the system can handle just
find if it were in the high-performance state.

------
cryptonector
> Given a single processor, if a convoy exists when a high contention lock is
> released then the releasor dequeues all members of the convoy from the lock,
> marks the lock as free, and then signals all members of the convoy.

This is a thundering herd problem.

------
andreareina
How does going from awarding the lock in FIFO order to random order help?

~~~
T3OU-736
Switching to random order from FIFO would increase the chances of breaking the
convoy cycle (where a process which initially held a high-contention resource
lock comes back at the end of it, to repeat the cycle again and again, since
the queue/convoy didn't clear in the time it took for the process to come
back)

Edit: message cut off due to fat-finger

~~~
andreareina
I'm still not making the connection. If processes enter the queue every N
cycles on average, and hold the lock for M cycles, then you'll hit a steady
queue state anytime you have N/M runnable processes. If a waiting process
becomes unblocked every M cycles how does it matter whether it came from the
front, back, or middle of the queue? Isn't the rate of enqueueing and
dequeueing processes the same in any case?

~~~
muststopmyths
At least in Windows, the theory (edit2: theory of lock convoy mitigation) goes
something like this:

\- There is a queue of threads contending on a lock.

\- There are some threads in that group that release the lock, do some
context-switchless work, and need the lock again.

\- In a lock convoy, these would go back to the end of the queue while any
other threads that were waiting on the lock now need to be readied and
context-switched in.

\- By instead making the locking unfair and letting the next ready process
grab it, there is a better chance that the threads which have quantum slices
left will finish out their use of the lock and exit the convoy.

Synchronization objects were redesigned in Vista around this mechanism of
taking a ready thread instead of the next thread in a fair queue.

I suppose something similar is at play in the article ? I don't see how just
randomness fixes anything.

(edit: formatting)

