
Notes on Lock Free Programming - ingve
https://loonytek.com/2017/03/17/lock-free-algorithms-part-1/
======
ajamesm
I don't like usage of the term "lock free" to mean "locks that are less
costly".

There's a difference between (A) locking (waiting, really) on access to a
critical section (where you spinlock, yield your thread, etc.) and (B) locking
the processor to safely execute a synchronization primitive
(mutexes/semaphores).

CAS is "lock free" only in the sense that it doesn't require the processor to
stop the world in order to flip the mutex boolean. It's still a mutex, and it
still gates access to a critical section, and you still need some kind of
strategy to deal with waiting for the critical section to become available
(e.g., spinlocking, signaling the OS to sleep thread execution).

    
    
      An example of operation using coarse locks would be a method in Java with “synchronized” keyword.
      If a thread T is executing a synchronized method on a particular object,
      no other concurrent thread can invoke any other synchronized method on the same object.
    

That's just putting one giant mutex around all access to the object. Having
finer granularity != "lockless".

~~~
fwilliams
I've always understood lock-free to mean that every worker is guaranteed to
make progress in a (finite) bounded amount of time.

~~~
raphlinus
I like the terminology that Dmitry Vyukov uses[0], where "wait-free" means
that any individual thread is guaranteed to make progress, and "lock-free"
means that some thread is guaranteed to make progress. There's also the weaker
guarantee "obstruction-free" where livelock is a possibility.

[0] [http://www.1024cores.net/home/lock-free-
algorithms/introduct...](http://www.1024cores.net/home/lock-free-
algorithms/introduction)

~~~
stcredzero
Couldn't there be an even stronger requirement like "obstruction-free"? I
could imagine code that is "obstruction free" but which severely slows down
execution for itself or other threads.

~~~
dbaupp
Obstruction-free is the weakest requirement, so your use of "even stronger"
confuses me.

In any case, in practice wait-free has this property a bit: getting the
guarantee of all threads making progress in finite time generally requires in
them executing slower individually, on average, e.g. one common strategy is
for thread A help thread B finish its work, because thread A needs that
result.

~~~
stcredzero
Executing a little slower would be fine. Executing two orders of magnitude
slower can be almost as useless as being obstructed.

------
joshuak
Perhaps an example will help clarify the difference between mutex locking and
CAS (lock-free) operations. Check out Sled[0]. Sled is an implementation of a
'ctrie'[1] data structure in go. Sled is a lock free key value store. Because
it is lock-free any number of threads can read and write to it simultaneously
while never interfering with eachother. The application remains parallel at
all times.

Locks cause parallel applications to become single threaded for the scope of
the lock. This means that most of the time you should just put things that
require locks into their own thread and then communicate with that thread from
others.

"Do not communicate by sharing memory; instead, share memory by
communicating." \- ancient go proverb

However, that is still potentially a locking operation because channels must
block until both threads are in sync (with some optional caching, to help
reduce the effect).

With truly lock free data structures, all threads can share data but do not
need to synchronize access. This leads to much, much higher performance.

[0]: [https://github.com/Avalanche-io/sled](https://github.com/Avalanche-
io/sled)

[1]: [https://github.com/Avalanche-
io/sled/blob/master/ctries_pape...](https://github.com/Avalanche-
io/sled/blob/master/ctries_paper.pdf)

------
raphlinus
Note that the linked paper predates the C++ / C11 memory model, which I would
consider essential for anyone implementing lock-free algorithms in C or C++
today. It also long predates an understanding of the ABA problem[0].

[0]
[http://www.stroustrup.com/isorc2010.pdf](http://www.stroustrup.com/isorc2010.pdf)

~~~
yongjik
Eh? ABA problem was known since a long time ago. Google scholar[1] shows a
paper titled "Correction of a Memory Management Method for Lock-Free Data
Structures" in 1995, which in turn cites "ABA problem" mentioned in
"System/370 Principles of Operation. IBM Corporation, 1983" (!!).

[1]
[https://scholar.google.com/scholar?q=aba+problem](https://scholar.google.com/scholar?q=aba+problem)

~~~
raphlinus
I stand corrected, thanks! I had the feeling that the ABA problem affected a
lot of the earlier lock-free algorithms, but didn't know the understanding of
it went that far back.

------
chinhodado
I don't quite understand it. When you replace lock with CAS (Compare and
Swap), aren't you just replacing one synchronization primitive with another?
Is lock not implemented in CPU as a primitive instruction and therefore less
performant? What overhead does locking have compared to CAS?

~~~
devbug
Besides other benefits mentioned, a spinlock is a user-space construct; you
don't incur the overhead switching to kernel mode at the loss of the benefits
and guarantees those primitives provide. (I'm simplifying some.)

~~~
netule
On Win32, this has been abstracted into the critical section API, which lives
entirely in user-space and is a lightweight mutex. Many of the functionality
implemented in the article are already part of the API (spin-aquire, etc.).

Though from what I've read, on Linux, pthreads implements its mutexes in user-
space (futex) as well and are fairly cheap to use.

~~~
gpderetta
Yes, the difference is that spin locks never enter the kernel, while a good
mutex only does on contention. In practice most mutexes use an hibrid
approach.

------
nhaehnle
Pet peeve: The example linked list implementation suffers from not using
double-pointers (or pointers-to-pointers).

For example, when the "special case" of adding an element at the head or tail
are discussed - those special cases disappear entirely when the search
procedure simply returns a pointer to the _next_ field of the preceding
element (or a pointer to the root pointer of the linked list, if the new
element should be inserted at the head).

------
maykr
This post directly compliments the original one
[http://concurrencyfreaks.blogspot.com/2017/02/bitnext-
lock-f...](http://concurrencyfreaks.blogspot.com/2017/02/bitnext-lock-free-
queue.html)

------
stcredzero
Are there any lock-free in-memory spatial databases?

------
jankedeen
Last time I checked lock free meant no locks.

