

A spin lock in C that does not require a machine level exchange instruction - keithgabryelski
https://github.com/keithgabryelski/spin-lock

======
colanderman
(EDITED; misread algorithm earlier.) No, sorry, that doesn't work. Consider
the following sequence:

1\. A sets g1 = A

2\. A reads g2 = none

3\. A reads g1 = A

4\. B sets g1 = B

5\. B reads g2 = none

6\. B reads g1 = B

7\. A sets g2 = A (read in #3)

8\. A reads g1 = B

9\. A retries

10\. A reads g2 = A

11\. A succeeds

12\. B sets g2 = B (read in #6)

13\. B reads g1 = B

14\. B reads g2 = B

15\. B succeeds

You also are missing memory fences, which are needed to ensure proper ordering
of memory operations. (Otherwise the processor is free to speculate reads and
delay writes between cores.) See
[https://www.kernel.org/doc/Documentation/memory-
barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
for a thorough explanation.

~~~
keithgabryelski
the order you supply here can not happen (at least if i understand what you
are saying)

if step #9 happens then #10 will be skipped by the "if (state->g2 ==
SPIN_LOCK_NO_LOCKER) {"

what am i missing?

~~~
wzdd
Thread A sets g1 to A (line 31), thread A enters the "if" statement (line 32),
thread A reads the value of g1 and is about to write that value to g2 but gets
pre-empted (half of line 33).

Thread B sets g1 to B (line 31), thread B enters the "if" statement (because A
hasn't yet written g1 to g2, line 32), thread B reads the value of g1 (half of
line 33) and writes that value to g2 (the other half), then does its tests and
returns OK.

Thread A sets g2 to A (the other half of line 33). Thread A does its tests and
notices that g1 != g2. Thread A returns to the top of the loop and notices
that g2 == A (line 22). Thread A returns OK.

This is assuming that reads are strongly ordered between processes and cores,
and that code executes in-order, none of which is true.

------
jacquesm
Hello dear Keith,

I can see a few problems with this code, please consider the following
constructive criticism.

    
    
      for (;;) {
        if (state->g2 == locker_id) {
          return;
        }
        while (state->g2 != SPIN_LOCK_NO_LOCKER) {
          if (spin_wait_callback != NULL) {
            spin_wait_callback();
          }
    

You don't need this 'continue', the loop will do that just fine without it.

    
    
          continue;
        }
    

The following four lines are problematic:

    
    
        state->g1 = locker_id;
    
        if (state->g2 == SPIN_LOCK_NO_LOCKER) {
    

Imagine the consequences of a task switch _right here_.

    
    
          state->g2 = state->g1;
    

This if does nothing.

    
    
          if (state->g1 == locker_id) {
    

And you're going to do the following if without all the extra conditionals
above when you do not return from here the next time you iterate through the
loop. So all that extra testing of state->g1 above does nothing.

    
    
            if (state->g2 == locker_id) {
              return;
            }
          }
        }
      }
    
    

I think I can see what you're trying to do here but once you fix the basic
problems with the code and you test it under very high load locking and
unlocking a resource you'll find that fairly soon two threads will be
accessing the resource at the same time.

Try a very tight loop that locks a counter, does a load, increment, store on
it and then unlocks the counter again. Keep separate tallies per thread and
use a third thread to keep track of the expected sum and the actual sum.

That should show you reasonably quickly why atomicity in the basic operation
of acquiring a lock is a base requirement.

Locking issues almost never show up when you are running lightly loaded on a
single core. But the lack of ensuring all cores see the same information, that
race conditions can't occur and that instructions are executed in the right
order will show up with a terrible certainty under load on multiple cores.

~~~
keithgabryelski
so, the attempt here (line 32 which you believe is problematic) I think can be
pre-empted.

if thread 1 is pre-empted by thread 2 and thread 2 is able to get to the same
point (line 33) then G1 will be equal to 2. both threads will assign G2 = 2
and thread 2 will win the race -- thread 1 will loop and begin to spin again.

have i missed something?

~~~
jacquesm
Yes, task switches unfortunately do not happen at the level of 'C' code but
between machine instructions!

Run your compiler -S and save the output, then try to reason through the
consequences of a context switch in between every instruction with multiple
threads contending for the lock. And have a look at the manual for your CPU to
learn about out-of-order execution and cache coherency between multiple cores
/ cpus.

Some interesting reading on the subject:

[https://www.kernel.org/doc/Documentation/memory-
barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)

If you want to do this entirely in userspace and you must avoid atomic
instructions for some reason consider abstracting out your locking to a
separate process and use IPC, first come first serve.

------
jpollock
Both the compiler and CPU will reorder operations. Proper locking requires a
memory barrier which will prevent both CPU and compiler from doing that. For
more information I recommend reading up on double checked locking. DDJ had a
good pair of articles the first where they presented the faulty algorithm, the
second where they talked about why it was broken.

C/c++ doesn't have many guarantees about operation ordering.
[http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLo...](http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html)

------
colin_mccabe
There seems to be an implicit assumption in this code that writes which one
thread makes will be seen by other threads in the same order. This is not the
case on modern architectures. You need both compiler and machine-level memory
barriers to ensure this. C's "volatile" keyword does not provide either.

The Linux kernel guys wrote an in-depth explanation of why volatile doesn't
work for this use-case here:
[https://www.kernel.org/doc/Documentation/volatile-
considered...](https://www.kernel.org/doc/Documentation/volatile-considered-
harmful.txt)

------
comex
For real solutions to this problem, see:

[http://www.di.ens.fr/~pouzet/cours/systeme/bib/dijkstra.pdf](http://www.di.ens.fr/~pouzet/cours/systeme/bib/dijkstra.pdf)

[http://phoenix.goucher.edu/~kelliher/cs42/sep30.html](http://phoenix.goucher.edu/~kelliher/cs42/sep30.html)

------
tsukikage

        if (state->g2 == SPIN_LOCK_NO_LOCKER) {
          state->g2 = state->g1;
          if (state->g1 == locker_id) {
    		if (state->g2 == locker_id) {
    			return;
    		}
          }
        }
    

You need to insert memory barriers between the writes and the reads. Without,
on systems with multiple cores, other cores may not see the state changes in
program order, which will make it possible for multiple threads to conclude
they won the race.

[http://en.wikipedia.org/wiki/Memory_barrier](http://en.wikipedia.org/wiki/Memory_barrier)
[http://en.wikipedia.org/wiki/Weak_consistency](http://en.wikipedia.org/wiki/Weak_consistency)

~~~
mgraczyk
More importantly, the explicit barriers are needed on ARM chips where writes
are permitted to be reordered with reads.

~~~
tsukikage
Also PowerPC and any other weakly consistent architecture, natch.

------
Moral_
The example code he provides doesn't even compile:

    
    
         real_test.c: In function ‘main’:
         real_test.c:14:10: error: too many arguments to function ‘spin_allocate’
         spin-lock.h:20:32: note: declared here
         real_test.c:16:3: error: too few arguments to function ‘spin_lock’
         spin-lock.h:19:13: note: declared here
    

Easy to fix, just annoying.

------
asveikau
It seems like this is just storing a thread id in some shared variables a few
times and double checking that your id "made it" and you win the race. My
understanding of why this doesn't work is without atomics or memory barriers,
what you read might not reflect what another core has in its cache, and when
you write, it might not be immediately visible to others. So while in your
thread's view of the world you might have successfully written those 2 words
and won the race, it might not matter to the other guy. Though this depends on
the consistency model of the CPU. The table here comes to mind:
[http://en.m.wikipedia.org/wiki/Memory_ordering](http://en.m.wikipedia.org/wiki/Memory_ordering)
\- my memory is that x86 is generous in how much it lets you get away with
this, you might have more trouble running on say, power.

------
nkurz
Here's a copy of the functions in question so that it's easier to see what
people are talking about:

    
    
      void 
      spin_lock(struct spin_lock_state *state, int locker_id,
                void (*spin_wait_callback)()) {
        for (;;) {
          if (state->g2 == locker_id) {
            return;
          }
          while (state->g2 != SPIN_LOCK_NO_LOCKER) {
            if (spin_wait_callback != NULL) {
              spin_wait_callback();
            }
            continue;
          }
          state->g1 = locker_id;
          if (state->g2 == SPIN_LOCK_NO_LOCKER) {
            state->g2 = state->g1;
            if (state->g1 == locker_id) {
              if (state->g2 == locker_id) {
                return;
              }
            }
          }
        }
      }
    
      void
      spin_unlock(struct spin_lock_state *state, int locker_id)
      {
        state->g2 = SPIN_LOCK_NO_LOCKER;
      }

------
frozenport
I know how to do a spinlock in pthreads, and have done it using a while loop
and what I hope was an atomic instruction.

What is a `machine level exchange instruction`?

~~~
mbell
> What is a `machine level exchange instruction`?

[http://en.wikipedia.org/wiki/Compare-and-
swap](http://en.wikipedia.org/wiki/Compare-and-swap)

------
michaelmior
I'm confused why the description refers to it as a "user level" spinlock. Are
there some architectures where exchange is a privileged operation?

~~~
tsu
compare-and-swap instruction itself may not be privileged operation but user
space spin-lock seldom work in user-space (it's dangerous at least), because a
user thread could be preempted.

~~~
jacquesm
What's even more dangerous, on a single core machine or one that is lightly
loaded without any aggressive contention for the lock it may even _seem_ to be
working.

------
keithgabryelski
a reasonable primer:
[http://en.wikipedia.org/wiki/Spinlock](http://en.wikipedia.org/wiki/Spinlock)

