
Lock-Free Bugs - stablemap
https://research.swtch.com/lockfree
======
davidtgoldblatt
Here's a fun related issue: A common Linux mutex-implementation strategy to
deal with this issue (waker-waiter races) is by releasing a mutex before
futex-waking any threads blocked on it. Since the waker doesn't know for sure
if a waiter is present at the time of waking, the locking thread may fail its
futex-wait attempt, acquire the mutex, and destroy it, freeing the mutex's
memory back to the allocator. In the end the waker issues a wakeup call to an
address no longer occupied by a mutex.

As a result, the mutex implementation might spray futex-wake calls into
arbitrary addresses not owned by the implementation. This means that other
libraries in the same process can't rely on accurate wakeups, even if it would
otherwise be algorithmically correct to do so (unless it has some way of
guaranteeing that its synchronization locations never share an address with
other libraries).

If you've ever wondered why pthreads-style condition variables allow spurious
wakeups, this is one reasonable situation in which they can occur.

------
valarauca1

        “By June 1949, people had begun to realize that it was 
         not so easy to get a program right as had at one time 
         appeared. … It was on one of my journeys between the 
         EDSAC room and the punching equipment that the 
         realization came over me with full force that a good 
         part of the remainder of my life was going to be spent 
         in finding errors in my own programs.” - Wilkes 
    

That cuts deep

------
RossBencina
Interesting read. It's about a lock;free bug (freeing a mutex while it is
locked) in code using a custom mutex implementation.

------
eloff
This is not a lock free bug. But it does illustrate how hard it is to get
thread synchronization correct. The only bug that ever beat me was a lock free
bug in a concurrent skip list. I spent two months of evenings and weekends
trying to find it. I wrote special logging tools and visualization tools to
try and gain insight. The bug would happen roughly once every hundred billion
operations. In the end I decided it is easier to change the entire algorithm
than to fix the bug.

~~~
mikeash
It's not a lock-free bug, but it is a lock, free bug.

My favorite lock-free bug involved a concurrent ring buffer. At one point the
code had to figure out how much data to read, which was either the size of the
buffer passed in, or the amount of data available in the ring buffer,
whichever was smaller. So the code did something like:

    
    
        toRead = min(requestSize, availableData)
    

But, surprise! The min macro was a custom job using the typical bad
implementation of:

    
    
        #define min(a, b) ((a) < (b) ? (a) : (b))
    

But this is no problem as long as the two expressions have no side effects,
right? Wrong! Every so often, another thread would write to the ring buffer
between evaluating a<b and evaluating a or b the second time. And every so
often, this would happen when availableData was smaller than the request size,
but then became larger after writing the additional data. In those rare
circumstances, the read function would use that too-large availableData as the
amount of data to copy, and write past the end of the passed-in buffer.
Hilarity then ensued.

I deleted the custom min macro and used the (properly implemented) system-
provided one, and all was well.

~~~
jjnoakes
Your bug doesn't seem to be related to the 'min' macro, but to the fact that
your lock-free algorithm is not actually lock-free: if one thread can update
an 'availableData' variable while another thread can read it, make a choice
based on it, and then act on that choice, then that's not lock-free; that
requires a lock to keep the read, decision, and action "atomic".

I'm skeptical that using some other 'min' implementation is a real fix to the
root of your problem, but without knowing more, I can only speculate.

~~~
mikeash
The algorithm is sound given a proper min() implementation. The reader just
needs to read at most however much data is available at the moment or however
much data the passed-in buffer can hold. If the available data increases
during this operation, it's no problem, because the reader is allowed to read
less than all of the data available.

For some concrete numbers, if the ring holds 4 bytes, the caller passes in a
6-byte buffer, and a concurrent writer adds 4 more bytes, it's OK if the
reader only returns 4 bytes. It would also be OK if the reader sees the new
data and returns 6 bytes. It is not OK if the reader returns 8 bytes, which
the incorrect min macro could cause.

In this scenario, the incorrect min macro could return 4, 6, or 8, despite the
fact that one of the parameters was a constant 6 and the result should never
exceed 6. The correct min macro always returns 4 or 6.

~~~
jjnoakes
I see. So you are relying on single-reads of your shared variables. They must
be suitably defined to be atomic on your architecture (so no word-tearing,
out-of-order visibility), etc.

I would think that if you had all the proper barriers in place to do the reads
and writes atomically, though, that you would never have been able to use the
min() macro in the first place. You would have had to do your barrier reads
into local variables first, and then the min() macro on those local variables
would have produced the right result.

Unless I'm missing something. Like I said, it all depends on your
implementation, but in general, this sure sounds like the min() macro wasn't
at fault. Perhaps your specific scenario is much more constrained (single
architecture, no barriers required, non-portable, etc).

~~~
mikeash
Most lock-free code I've seen is pretty architecture dependent and often non-
portable, and this was no exception. I should also note that it was only
intended to be correct for a single reader and a single writer.

Given those constraints, it's not all that difficult. The amount of data in
the ring buffer gets stored in a properly aligned word-sized field which the
architecture guarantees will have atomic reads and writes. You just need a
write barrier after writing new data and before updating the amount of data
available, and a read barrier after reading the amount of data available and
before actually reading the data. The read code would look something like:

    
    
        amountToRead = min(outBufferSize, availableData)
        barrier()
        memcpy(outBuffer, ringBuffer, amountToRead)
    

(Ignoring the tricks you have to play to deal with wraparound, just to keep
things simple.)

This works fine as long as min() actually evaluates to the smaller of the two
numbers passed in. The moment it evaluates to a number larger than
outBufferSize, you have a buffer overrun.

~~~
alfalfasprout
Eh, the portability is actually getting a lot better nowadays. Most consumer
and server processors being released in the last few years support double-
world compare and swap. Atomic increment, load, store, etc. are also pretty
widespread now.

The key is like someone said above that you cannot make your code ambiguous.
In C++ especially, the atomic primitives can compile to different instructions
depending on how you use them so it's important to be very explicit.

One thing that can be a pain is memory fences. The guarantees can be different
on different CPUs so you can run into weird bugs. In my case, generally I
compile to target particular CPUs on servers we own so it's not a big deal.
But I can imagine for consumer software this can be a huge pain.

~~~
mikeash
I haven't seen too much recent stuff. I made this fix probably around 2004,
and the code was at least a couple of years old at that point.

We initially targeted PPC Mac, then it eventually expanded to x86 Mac and ARM
iOS. All fairly similar when it comes to simple stuff like this, the barriers
are straightforward when needed, and no need for DCAS or similar fanciness.
(Indeed, no need for any CAS at all.)

------
beagle3
mods (and russ..): perhaps better to change title to "lock;free" or
"lock+free" bugs (or something else entirely), T he current title "Lock-Free
Bugs" is confusing since the article has nothing to do with lock freedom.

~~~
cespare
I read it as an intentional play on words.

------
fantasticsid
I really enjoyed reading it! System programming is hard :)

