Skip lists and splay trees, two frequent suggestions on this thread, are "lesser...

sesqu · on May 22, 2010

for any given skip list implementation there is most probably an encoding of balanced binary trees that outperforms it

Not just probably, absolutely. Skip lists are probably pretty balanced, which is always less balanced than actually pretty balanced. Their sole claim to fame is their simplicity in implementation compared to actual self-balancing trees.

pjscott · on May 23, 2010

Not so! There's one thing you can do with skip lists that I don't know of any easy way to do with other data structures. Suppose you want a priority queue, and you have a bunch of cores, and these cores want to insert into the queue and remove the minimum element concurrently. How do you implement this to allow fast concurrent access from a lot of threads?

First, there's the approach everybody remembers from Intro to Algorithms: use a binary min-heap. It's guaranteed to be balanced, so you get O(lg n) time insertions and delete-the-minimum operations, with low constant factors. Nice! But how do you make it concurrent? You can put a lock on the whole thing and only let one thread use it at once, but that's slow. You could use fancy fine-grained locking, but there will still be inter-thread memory conflicts arising from the heapify operations you need to maintain the heap invariant. There has been some work on this, and they've come up with some decent ideas, but it still has scaling problems.

Now look at skip lists. It's a randomized sorted list data structure, and it claims to be, as you put it, probably pretty balanced. Various threads can insert concurrently without breaking that "probably pretty balanced" property. The memory read- and write-sets are very local, and it's possible to do all this with lock-free synchronization. The end result is a priority queue data structure that scales to hundreds of cores. And the code doesn't fry your brain, which is a plus. There's a pretty neat paper about it here:

http://www-cs-students.stanford.edu/~itayl/ipdps.pdf

By the way, if you happen to be using a processor with hardware transactional memory support (you aren't, yet), then this code becomes even easier to write, as you don't have to worry about how to do lock-free synchronization. I almost felt cheated by how simple it was.

jamii · on May 23, 2010

The general lesson is that randomized algorithms often remove the need for synchronization between concurrent processes. For example, choosing ids at random from a large space rather than coordinating to choose unique ids.

scott_s · on May 23, 2010

What and where are you working that you have access to a processor with hardware transactional memory support?

pjscott · on May 23, 2010

Only in simulations, I'm afraid. As far as I know, the only processors to have HTM support are Sun's Rock processor (now discontinued by Oracle, I think) and the Vega chips from Azul Systems. I don't have access to either of these, and neither of them have the HTM enhancements that I'm studying. The state of the art in research is a lot more complex than the playing-it-safe state of the art in production chips.

That said, future HTM systems have a lot of potential, and I think we'll see them come into wider use eventually. The main problem with them seems to be that existing software ecosystems aren't written with HTM in mind, so you get pathological memory access patterns slowing things down. But with some minor enhancements to the HTM design (like a load instruction which immediately adds its destination address to the write set of a transaction) and compiler and runtime support, and a reasonable set of concurrent data structures, HTM can absolutely fly.

tptacek · on May 23, 2010

You can find blog posts and Stack Overflow articles talking about lock overhead in rb-trees versus skip lists, and just looking at a diagram you can intuitively sense where that idea comes from. But these kinds of analyses are usually flawed by the fact that they assume the most naive possible encoding of tree edges (void* tree* tree*).

jey · on May 23, 2010

Hm, what are some conncurrency-friendly ways to encode tree edges?

tptacek · on May 23, 2010

I don't have an answer for you (I'm just making the point that when people argue about how inefficient trees are, they're usually assuming a struct with two edge pointers), but a more direct answer to your question might be:

http://www.cl.cam.ac.uk/research/srg/netos/lock-free/

(That's a lock-free red-black tree impl).

xtacy · on May 23, 2010

An interesting application of skip lists: http://www.skorks.com/2010/03/faster-list-intersection-using...

But I can't see why the same cannot be done using trees. :-/