
Relativistic hash tables - yiransheng
http://lwn.net/SubscriberLink/612021/92c8ccb5b82d4d90/
======
JoshTriplett
I wrote the paper and dissertation this work was based on, and I'm happy to
answer any questions people might have.

(Really awesome to see a production implementation of this.)

~~~
jperras
The application of a causally consistent model for hash table operations is
brilliant; treating CPUs and memory as a "traditional" distributed system is
brilliant.

I know I said it twice already, but: brilliant.

Reading your dissertation now.

~~~
chubot
I basically think of a single computer as a distributed system now. In other
words, it's not a Turing machine / state machine any more; it's a network of
such things.

Once you have multiple CPUs, you can't avoid concurrency, and message passing
and immutability become common themes. The strategy for many years was to
naively add locks to traditional stateful algorithms and mutable data
structures, but this often leads to bad performance (and unpredictable
performance)

There is an elaborate backward compatible illusion presented by the
architecture, C, and most languages built on top of C. Internal busses and
caches are not exposed to you as a programmer, but they are there and affect
performance. But if you think of your app from the OS perspective rather than
the language perspective, the distributed model becomes more clear and
natural.

------
chris_va
In case people find it interesting, this is very similar to how one
efficiently builds a search engine for high QPS and update rate (lockless
realtime document index), though that example is slightly more involved than a
hash map.

The basic premise is that the entry point to your data structure (or internal
pointers) can change over time. You don't dismantle older entry
points/pointers until all older readers release, though you don't have to wait
for all concurrent readers to exit.

~~~
b0b0b0b
I'd be interested in learning more about this.

~~~
chris_va
Oof, this may take a much longer blog post, but here is the very high level
basic view.

The basic construction on one doc-sharded server looks like: 1) Maximum valid
local_docid 2) A map of local_docid => state (valid, deleted) 3) A map of
token_id (indexed term) => map of local_docids to positions in doc.

On document update, you increment the next local docid. You then rip through
the doc and extract the tokens. For each token, you insert the docid,position
into map (3). Then you add the document to map (2) with state "valid", and
finally increment (1).

On query, you first copy (1), then do the typical AND/OR retrieval over (3).
Any docids seen higher than (1) are ignored, and any docs retrieved are then
filtered by (2).

In this model, (1) is a volatile memory access. (2) and (3) are very similar
to this "relativistic hash map".

Deletions are complicated, and usually you filter out invalid docids from (3)
as a background compaction process.

------
howeyc
Maybe I'm daft, but the growing and shrinking explained looks like how you'd
do it for any hash table, is that not how a normal one works? Do normal hash
tables "freeze the world" to change tables or something?

Looks to me like an "RCU grace period" (not sure what this is, sleep maybe?)
is introduced to allow concurrent threads time to "finish reads" in between
pointer changes.

~~~
JoshTriplett
With a normal hash table, you'd acquire either a table-wide lock or every per-
bucket lock before resizing the table. With this algorithm, readers can
continue to run concurrently with the resize, and the concurrent resize will
never cause a reader to incorrectly fail to find a hash element.

~~~
kazinator
This issue arises even in completely single-threaded programs that have hash
tables. Suppose you're iterating over a hash, hitting a callback function for
each element which has access to the hash and inserts new elements. During the
callback, new elements could be inserted which trigger a growth. If you allow
the hash table reorganization, then it has to be reconciled with the in-
progress traversal, so that it doesn't miss some elements, or visit some
elements more than once.

~~~
im3w1l
I think if you insert an element during iteration, it is unpredictable whether
said element will be visited during the iteration or not, so it is probably a
bad idea even if you wont have a resize.

~~~
seabee
It's a bad idea if you don't have a well-defined mode of operation; for
example, you're fine if the hash table uses copy-on-write semantics.

------
mamcx
Forgive me if this is stupid, but this could work for implement a concurrent
VM?

~~~
JoshTriplett
Yes, that's exactly the kind of program you might want to use this for.

------
vkjv
Could someone explain all of the grounds?

~~~
MattHeard
Null next pointers?

~~~
riking
Or 0x00000001 next pointers, if it's using a hlist_nulls. See
[http://lwn.net/Articles/609904/](http://lwn.net/Articles/609904/)

