
Single Writer Principle (2011) - jorangreef
https://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html
======
chrisweekly
Wonderful stuff, as usual. And, because it's written from first principles, it
holds up almost a decade later.

------
hpoe
So I have heard about the idea of OOP being about passing messages between
objects, rather than just calling functions with parameters but I can't seem
to figure it out in a mental model.

How does passing a message to an object differ from just calling an object's
methods with parameters? Is there other good sources for understanding the
differences and rewiring the mental model?

~~~
spooneybarger
A large part of your confusion might be that a lot of OO teaching sources
define message passing as calling a function.

Message passing allows for you to have the receiver of the message run in a
different execution context; for example erlang.

Calling an objects methods with parameters is tied to that execution context.

Message passing is a larger concept than OOP. Microkernels for example are
built around message passing as the means of communication.

~~~
int_19h
> Calling an objects methods with parameters is tied to that execution
> context.

Not necessarily - this depends on the language, or rather, on the object
model. But consider COM or CORBA: you call a method, but the object might be
on a different machine altogether.

------
yeahgoodok
Can't believe this is almost a decade old. I recently tried explaining this
concept during an interview and got a lot of strange looks. Most devs (even
senior) were taught von Neumann architecture and assume all memory is shared.
This is an excellent explainer and one I'll refer to in the future.

~~~
brickbrd
What's a way to implement this in practice? Does this article imply that if 1
core was writing the value of a variable (without any locking) and other cores
were simply reading the value of that variable without any locking, it will
just work correctly due to underlying CPU cache coherency protocols?

------
ncmncm
This principle is the basis for the extremely better throughput of ring
buffers over other so-called "lock-free" queue structures. Things labeled
"lock-free" typically are not; instead, they rely on locks at the hardware
level that are subject to the same sort of contention seen with software-
visible locks, such as mutexes, and the same sort of stalls, with just smaller
numbers. So, a lock-free queue is often little faster than a locking one.

But a ring buffer suffers from none of these overheads. The reader has to be
sure to keep up with the writer, over a short enough time scale not to "get
lapped"; but you need that anyway.

~~~
troutwine
> Things labeled "lock-free" typically are not; instead, they rely on locks at
> the hardware level that are subject to the same sort of contention seen with
> software-visible locks, such as mutexes, and the same sort of stalls, with
> just smaller numbers.

This isn't quite true. The 'hardware level lock' you're referring to are
memory ordering primitives that allow you to coordinate threads of execution
in a way that allows them to see memory operations in sequences that come to
the same thing. Your notion of a software-visible lock is a construct built
out of these primitives: the simplest mutex is an Acquire/Release pair. The
difference might seem subtle -- and it is -- but it's very important. A lock
requires that all coordinated threads operate in lockstep with one another at
some point in their execution. A lock-free structure does not. A lock-free
structure will be built to allow coordinated threads to see memory in ways
that come to the same thing but in ways that don't require lock-step. A
subclass of these kinds of structures are lock-free/wait-free, that is, they
provably do not have any 'stalls' where coordinated threads must wait for the
forward-progress of one of their cohort before they can operate, if I
understand correctly what you're referring to as a stall.

I do however acknowledge that x86 will coordinate your threads in some cases
because of it's strong consistency guarantees. That is a complication.

> So, a lock-free queue is often little faster than a locking one.

This strongly, strongly depends on the environment you're running in and the
nature of your queues, how over-contended your structures are. OS provided
locks will often have hooks into the scheduler, which is a significant win
over naive user-space spins. That's not to say you can't signal the scheduler
yourself about being unable to proceed, and this is often done. I will readily
admit that lock-free structures -- especially wait-free -- are specialty
structures and you're almost surely better off starting with a traditional
locking structure, unless everything you're doing can be done with relaxed
ordering.

> But a ring buffer suffers from none of these overheads.

This is not quiet accurate. While there are many various designs for ring-
buffers -- do you want MPSC, MPMC, do you need to detect 'lapping' etc etc --
any of the variants that are thread safe that I'm aware of will have to
participate in some kind of explicit memory ordering, not to mention memory
reclamation.

~~~
ncmncm
"Not quite true", here, meaning not false. Hardware sequencing differs in
detail from mutexes, but has to solve many of the same problems, ultimately
much the same way. If your atomic-swap fails, you have to loop and try again,
which absolutely counts as a stall, particularly counting your branch
misprediction. A mutex ever held much longer than that mispredicted loop would
indicate poor design.

And, of course, a single-writer ring buffer is entirely different in
character. Writes cannot stall, and readers can work wholly in parallel
without contention of any kind. (Multi-writer ring buffer? Good one.) New
Intel chips have an instruction for a reader to sleep waiting on a cache
invalidation, allowing the other hyperthread to run unimpeded by its partner's
spin loop.

Lapping is easy to detect with a generation counter if needed, but anyway it
is better just to keep up. Detected, being lapped is generally a system
failure anyway.

------
dang
Discussed just a bit at the time:
[https://news.ycombinator.com/item?id=3219193](https://news.ycombinator.com/item?id=3219193)

