

Practical Lock-Free Buffers - Sandman
http://www.ddj.com/hpc-high-performance-computing/219500200

======
kmavm
While this provides perfect ordering information, it does so at the cost of
bouncing a cacheline (the head of the queue) around the entire machine. That's
fine if perfect ordering information is really, really important, but if a
lack of probe effect is more important, try the DTrace approach: each thread
writes into a private buffer, with a best-effort timestamp (say, the CPU's
TSC). Drains merge-sort all the thread's buffers by timestamp.

------
agazso
I don't get it. At the beginning of the article he talks about the dangers of
locking IPC (inter-process communication) calls, then to prevent this, he
describes a lock-free mechanism using the CAS primitive, which can be used
only in-process, not between processess.

The TransferString function he proposes seems overly complex to me, using
locks would make it more simple and even faster. It would almost make the code
look like "lock(); memcpy(); unlock();" which is not prone to deadlocking.

I even downloaded the source code to the article, but he uses threads to test
it. Anyone care to explain this thing to me?

~~~
mustpax
You can use the CAS primitive between processes. It is guaranteed to be
atomic, which is why semaphores are usually built on top of CAS.

He essentially packs 4 byte strings into integers (assuming 32 bit integers)
in an atomic manner. In the strictest sense there are no locks here, but at
the end of the day he treats each memory location as its own lock. He also
does some busy waiting, which is just a good old spin lock:

    
    
        while(!InsertAt(data,insertedAt));
    

Interesting concept but nothing too new. I would also caution against using
this code without understanding it first. I don't see much error checking in
there.

~~~
dmoney
How is CAS guaranteed to be atomic? Is it actually implemented by a machine
instruction, rather than the pseudocode in the article?

~~~
Freaky
Yes, look up cmpxchg and friends.

FreeBSD has atomic.h providing a bunch of different atomic ops, e.g:
[http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/amd64/include/...](http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/amd64/include/atomic.h?rev=1.46)

I believe FreeBSD also uses this style of buffer for dmesg, which sadly also
results in a lot of interlacing if two kernel threads are trying to write to
the buffer at once.

------
tptacek
I did something similar at Arbor; single producer, multiple consumers, high-
volume message buffer (individual TCP connections off a monitored ISP core
network). When I got in the door, it was SYSV semaphores. Don't ever use SYSV
IPC. We needed an event loop, so we could do fine-grained timers. Instead of
using locks, we did a distributed commit scheme (using an atomic increment),
just as if we were synchronizing over the network.

