
Measuring context switching and memory overheads for Linux threads - ingve
https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/
======
amluto
I didn’t see a mention of kernel version. On Linux 4.14 and newer, if you
context switch between different processes, PCID gets used to try to avoid a
TLB flush, which makes your code run faster after the switch. It shows up more
if you do something a bit more involved than just switching right back to the
process you came from.

And yes, damnit, _this_ is why I made Linux start using PCID. It had nothing
to do with the Meltdown mess, and it was in a released version of Linux before
I learned about Meltdown.

------
pron
There are several issues with this analysis. At the core is the implicit
assumption that programming with lightweight threads looks the same as
programming with heavyweight threads. But the goal of lightweight threading is
to allow the use of a _software_ unit of concurrency (thread) to easily map to
a _domain_ unit of concurrency (session, request, transaction, service call
etc.):

1\. When programming with lightweight threads we want thread creation to be
very, very cheap; pretty much as cheap as a context switch, because many
concurrency units are transient (e.g. every incoming request can result in
hundreds of outgoing concurrent service requests).

2\. When programming with lightweight threads, it is very much possible that
many of them would work significantly less than 1-5us between blocking.

3\. When programming with lightweight threads you have hundreds of thousands
or millions of concurrently active threads. Kernel scheduling overhead should
be measured under those conditions, not with only 2 threads.

4\. Finally, when relying on virtual memory, the smallest unit is the page.
But when working with potentially millions of lightweight threads, many of
those would have stacks smaller even than 1K.

This is not to say that kernels will not be able to make their threads
sufficiently lightweight in the future to enable a lightweight-concurrency
programming style, but for now, they don't seem competitive with language-
runtime implementations.

------
tzs
> These code samples measure context switching overheads using two different
> techniques:

> 1\. A pipe which is used by two threads to ping-pong a tiny amount of data.
> Every read on the pipe blocks the reading thread, and the kernel switches to
> the writing thread, and so on.

> 2\. A condition variable used by two threads to signal an event to each
> other.

Another technique I've seen used was to have some shared memory with several
threads running loops that just copy the CPU time stamp counter to the shared
memory. Call these the "writer" threads.

Another group of threads (the "readers") run loops that read the counter,
subtract the value from shared memory, and whenever that difference is lower
than any previous difference they have seen print it. Each reader prints a
decreasing sequence of times bounded below by the minimal thread switch time.

