

A Fast Wait-Free Hash Table that Scales Linearly to Hundreds of Threads - caustic
http://www.youtube.com/watch?v=WYXgtXWejRM

======
kmavm
If you are into wait-free and lock-free data structures, you owe it to
yourself to get Herlihy and Shavit's "Art of Multiprocessor Programming."

[http://www.amazon.com/Art-Multiprocessor-Programming-
Maurice...](http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-
Herlihy/dp/0123705916)

Not only does it provide a well-debugged, well-written collection of wait-free
data structures; it teaches you to _think_ about concurrency in a structured
way. The mutual exclusion chapter doesn't teach you which pthread routines to
use, but instead tells you how to implement mutual exclusion, and the trade-
offs inherent in, e.g., spin locks vs. queueing locks. A significant bonus is
Maurice Herlihy's distinctive prose voice: correct, concise, and funny in that
order. It is the best computer book I've read recently, and the one I
recommend to all colleagues who are trying to take their concurrency chops up
to the next level.

(I took a course from Herlihy in '99, but am otherwise unaffiliated.)

~~~
nkurz
I just got this book as an interlibrary loan based on this recommendation, and
so far I'm not finding it that helpful. This isn't because it's a bad book: to
the contrary, just as you say it's very readable. But it's also very strongly
a book about concurrency in Java.

My first thought was that this wouldn't matter --- an algorithm is an
algorithm. But for the purposes I'm interested in (multiprocessor malloc
replacements, fast userspace read-write locks, generally low-level Linux on
x86) I'm finding the high-level theoretical view to be impractical.

It's not that that using Java for the examples is a problem, but that apart
from the appendices there is very little discussion of what I find are the
real performance issues: cache misses, memory bandwidth, and processor
pipelines. All of which could be considered hardware specific implementation
details, but all of which make the difference between a 'provably correct
solution' and one that actually performs well in the real world.

Anyway, a fine book, but perhaps better titled "The Theory of Concurrency in
Java". I'll keep reading it, and I'm sure I'll learn from it, but I'll also
keep looking for something more nitty and gritty.

------
bensummers
Download here: <http://sourceforge.net/projects/high-scale-lib/>

And some relevant blog posts by the author, Cliff Click:

[http://www.azulsystems.com/blog/cliff-
click/2007-03-26-non-b...](http://www.azulsystems.com/blog/cliff-
click/2007-03-26-non-blocking-hashtable)
[http://www.azulsystems.com/blog/cliff-
click/2007-04-23-nonbl...](http://www.azulsystems.com/blog/cliff-
click/2007-04-23-nonblocking-hashtable-source-code)
[http://www.azulsystems.com/blog/cliff-
click/2007-06-03-engin...](http://www.azulsystems.com/blog/cliff-
click/2007-06-03-engineering-hash-table)
[http://www.azulsystems.com/blog/cliff-
click/2007-09-13-more-...](http://www.azulsystems.com/blog/cliff-
click/2007-09-13-more-thinking-about-non-blocking-structures)
[http://www.azulsystems.com/blog/cliff-
click/2008-01-07-addin...](http://www.azulsystems.com/blog/cliff-
click/2008-01-07-adding-transactions-non-blockinghashmap)
[http://www.azulsystems.com/blog/cliff-
click/2008-08-07-just-...](http://www.azulsystems.com/blog/cliff-
click/2008-08-07-just-what-heck-wait-free-algorithm)

His blog is a must-read if you like compilers, running things at scale, JITing
Java, interesting data structures, and generally being Really Clever At
Computing.

~~~
sparky
He also gave a version of the talk linked here as a Google Tech Talk
<http://video.google.com/videoplay?docid=2139967204534450862#>

------
dws
This is actually a two-part talk. At the 50 minute mark, there's a case study
on optimizing a heavily-threaded Java app. It's partially about throwing heaps
of RAM and cores at the problem (using the Azul platform), but it's also a
general refresher on how to attack performance problems that involve lots of
threads.

------
malkia
Now that is something worth patenting... I'm kidding of course (I am all
against them), but it seems non-trivial and non-obvious approach, unlike many
other ridiculous patents.

------
CodeMage
Is there a downloadable version of the video anywhere?

------
itistoday
How does this compare to Clojure's approach?

~~~
barrkel
Clojure relies on persistent containers - i.e. immutable containers where
every mutating operation returns a logically new container, which are cheap
(logarithmic) when implemented in terms trees - with transactional memory
slots, which may refer to such a container. An update might be in the form of
a function mapping the old container value to the new container value (with a
value added or removed), but it may be applied multiple times before it "gets
in", according to transactional semantics (rollback if conflict etc.).

This approach is different; for the specific purpose of concurrent updates to
a hash table, Cliff's approach is far more scalable to multiple threads. But
in many ways, the fact that it's a hash table is beside the point. Cliff is
presenting this as a different way of thinking about lock free concurrent
programming: consider every possible state and transition, and make sure that
they all make sense.

As a way of reasoning about multithreaded programs, Cliff's approach is not as
scalable as Clojure's approach, but it's very useful for thinking about low-
level implementation details.

~~~
itistoday
OK, but let me restate my question then:

Does Cliff's approach result in a faster hash table than Clojure's? Is it
"better"? In some situations or in all? If so, which?

When would you want to use Cliff's instead of Clojure's built in one? Would
you at all?

~~~
barrkel
Far faster. You'd want to use Cliff's version if you have thousands of threads
all potentially mutating the table at the same time.

But thinking in these terms is crude; it's more important to understand the
principles, hence my answer. It's like asking which is better: taking the
train or driving in a car?

~~~
itistoday
Sure, I understand, but I don't think my question was crude. I agree it's
important to understand the principles.

I understand the basic idea of how Clojure's data structures work with STM and
how they're immutable and how a "new" data structure is created upon mutation
without copying the old one. But that understanding isn't helping me.

I don't understand Cliff's version well enough to be able to perform
benchmarks in my head, nor do I understand either of the data structures well
enough to understand the principles (as you said) to know when to pick one
over the other.

So you said that when thread-count > 1000 I should pick Cliff's? Is it that
simple? I would suspect not. What if there's only 2 threads involved? I'd just
like a generalized overview of which is better for what types of situations,
and by all means, to understand the "why" of it would be great too.

Or is the idea that because Cliff's version is mutable and does the whole
"conflating state w/identity" thing, that it has no place in Clojure at all?

~~~
barrkel
With STM and persistent data structures, you're fundamentally limited by how
fast you can change a single memory location; and you can't freely modify the
memory location because of the risk of missed updates. Cliff's table operates
concurrently over the whole data structure - there's no single bottleneck,
rather every slot in the table is potentially an update location.

When I said thousands of threads, I was stressing the massively concurrent
potential of Cliff's design, but more importantly, the point where the
hardware is fully subscribed and physical hardware threads are frequently
trying to update the data structure. If there are only 2 threads involved on a
dual-core machine, depending on everything else (constant factors), I would
still expect Cliff's table to be faster.

Clojure's approach is at a different level of abstraction than Cliff's
approach. Probably a better analogy would be whether it's better to
manufacture a car or use a travel agent. Not everybody could safely
manufacture a car, but almost anyone can use a travel agent with moderate
phone, social and organizational skills. But they also solve problems with
radically different scopes.

