

Barrier: Multithreading bugs are very delicate - l0stman
http://ridiculousfish.com/blog/archives/2007/02/17/barrier/

======
haberman
Please read the Linux documentation on memory barriers instead of this:
[http://lxr.linux.no/linux+v2.6.35/Documentation/memory-
barri...](http://lxr.linux.no/linux+v2.6.35/Documentation/memory-barriers.txt)

The Ridiculous Fish guy is clearly very smart and writes lots of interesting
stuff, but this is obviously not his area of expertise, and you can't afford
to learn from someone who has any confusion on the topic.

In his conclusion he makes what I would consider a highly misleading
comparison between locks and memory barriers. He calls locks "tanks"
("powerful, slow, safe, expensive, and prone to getting you stuck"). About
memory barriers he says: "Memory barriers are a faster, non-blocking, deadlock
free alternative to locks. They take more thought, and aren’t always
applicable, but your code’ll be faster and scale better."

But memory barriers aren't an alternative to locks at all. Locks let multiple
threads _write_ to shared memory. Memory barriers by themselves aren't very
useful; most lock-free algorithms need atomic operations like compare-and-
swap, which are comparable in cost to locks (indeed, locks are implemented in
terms of atomic operations).

~~~
barrkel
Failed atomic operations aren't as expensive as failed lock acquisitions
though. Locks aren't just implemented in terms of atomic operations - they
need the kernel too. They potentially involve blocking the thread, a context
switch out, a context switch back, etc. before being able to make progress
again.

~~~
haberman
It's true that lock-free algorithms don't block, but they generally degrade
under contention too, just in a different way. Whereas a contended mutex
blocks, a contended lock-free data structure can cause the compare-and-swap
step to fail a potentially arbitrary number of times (unless the algorithm is
wait-free, which few are AFAIK).

Also, a surprisingly hard problem with lock free data structures is knowing
when you can free/unmap any of their memory. Since there is no mutual
exclusion, there is no way of knowing that another thread didn't just read the
address of the thing you want to delete. He could read that address and then
get swapped out for 100 years, and you can't unmap that memory until he gets
rescheduled and finishes his load. Maged Michael published a technique for
dealing with this problem he calls "Safe Memory Reclamation" or SMR.

Don't get me wrong, I like lock-free data structures. I just think it's
important to understand that they have their issues too, and it's not as
though everyone should replace all their mutexes with lock-free structures.

I also think it's important to realize that atomic operations and memory
barriers are _not_ application-level constructs as mutexes are. People should
leave the atomic operations and memory barriers to the experts, and only use
higher-level abstractions in applications, like lock-free stack, queue, etc.
You wouldn't dream of implementing a mutex yourself in real code; the same
should be true of using atomic operations or memory barriers, unless you're
really an expert. One possible exception is atomic increment and decrement for
simple reference counting.

------
abstractbill
This was a very interesting post, but to be honest it didn't make me think "I
need to learn more about multithreading", it just convinced me that I need to
continue to stay away from multithreading whenever at all possible [1]. Having
programs run in a way that's so far away from the way you would expect _can't_
be the right way to do things.

[1] I tend to use processes and IPC whenever I can, for example.

~~~
viraptor
I agree. I've been many times in situations where (smart) people said "this is
ok, this situation is very simple to manage with threads", then discovered
some deadlocks after a week. It _is_ like juggling chainsaws [1]. I try very
hard to avoid multithreading - most of the high-performance code I write is
single threaded by default. I know MT programming enough to avoid many
problems, but also know it well enough to see it's a very costly trap, if it
can be avoided at all...

[1] [http://www.thecodist.com/article/writing-multithreaded-
code-...](http://www.thecodist.com/article/writing-multithreaded-code-is-like-
juggling-chainsaws)

~~~
api
The best way to do multi-threaded programming is to find some way to make your
task data-parallel or divide it up into work units using a technique like
MapReduce and then divide it into separate processes or separate worker
threads that run mostly like separate processes.

If you have a lot of critical sections all over the place, you're probably
doing it wrong.

The only exception is when you're forced to implement network code using a
thread-per-connection or thread-per-socket model, in which case you might end
up having to have your client/server threads work within your app's regular
workflow. Icky, and should be avoided when possible.

~~~
houseabsolute
Separate processes works fine as long as the constant overhead of your harness
is much smaller than the amount of data used by your worker. But I find that
this is often not the case at my workplace, and the immutable data, including
the program code itself, is much larger than the work unit input data. In a
situation like this, threads (or fork) are basically the only way to make use
of the additional cores without wasting the remainder of the RAM on the
machine.

~~~
jerf
I don't know the details of what you are doing, but the other possibility you
may consider is the use of a immutable-functional language, which can share
immutable values without duplication but doesn't let you get into too much
trouble. (Of course you're probably on top of things now, but when I need to
write a new program like this I reach for the immutable-functional languages
now unless I absolutely can't use them. Fortunately, Erlang is an option at
work for me.)

~~~
houseabsolute
Sure, but you'd still need a thread or a fork to take advantage of the
immutable object from more than one execution context without duplication.

~~~
jerf
The way you are using those words leads me to believe you don't know how they
work in these languages; I feel you are using them synonymously with the
operating system's idea of threads and forks. In fact that is not true; both
Haskell and Erlang can run in a single OS process, while handling threads
internally to their own runtime. They don't all the time, but they can. In
more conventional languages this is called green threads, in the functional
world it's simply how it is done.

I did say "I feel" because I could be wrong, but there are still a lot of
people who think that the operating idea of a thread is still the only
possible meaning of the term (I meet them every time Node.js comes up and the
Node.js partisans argue passionately against operating system threads), but
those days are long gone. So even if you do understand this, there are others
who don't.

------
houseabsolute
The real takeaway is:

\- Don't write this stuff on your own.

\- Use someone else's mutex, and someone else's non-blocking datastructure.

\- Really, don't write your own.

Threading is not that hard unless you insist on leaving the safety off and
aiming it at your foot.

------
xilun0
This guy pretend that x86 is strongly ordered, while it is not...

------
knodi
Erlang!

