Hacker News new | past | comments | ask | show | jobs | submit login
A Fast, Minimal Memory, Consistent Hash Algorithm (arxiv.org)
121 points by luu on Aug 5, 2014 | hide | past | favorite | 22 comments

It doesn't seem to be mentioned anywhere in the paper, but this is a description of the "consistent hashing" algorithm in Google's Guava library: http://docs.guava-libraries.googlecode.com/git/javadoc/src-h...

I find it kind of funny that they created and released an apparently novel hashing method, and then waited 2.5 years to actually explain how it works.

Hm I think you are right. The Guava code is apparently the simpler version of the algorithm that takes linear time (but still no memory). (Apologies, it appeared to be different at first and I downvoted you; wish I could undo it).

It seems likely that the new part is the optimizations. Still, it is pretty interesting as I didn't think Guava overlapped with this area and these authors much within Google (I work there). It's usually inaccurate to speak of Google as one entity with specific reasons for doing things a certain way; there are many people working on many different and overlapping things :)

It takes time and effort to do a writeup. And it is easy to procrastinate. That reminds me...

A bit of a plug here. I have a cute nearly-minimal perfect hashing algorithm designed to have good cache-friendly properties. Briefly, it is somewhat similar to hopscotch hashing, only you pre-calculate the positions of the elements to put them into the 'best' spots by solving the assignment problem. Works for up to about 50k elements. It feels like it might have good theoretical properties too, might be even optimal, but it was a while since I've taken the algorithms class.

If anyone is interested to do a writeup and publish clean source code - you'd be welcome.

It sounds similar to robin hood hashing. Is there source code anywhere?

Yes, it is similar to robin hood. Only you actually place items into optimal positions (by solving the assignment problem on your memory/cache access costs & access probabilities), rather than stochastically swapping items.

I'll put up sample code if somebody would be willing to do a writeup ;)

I first read that as prohashtinate.

One point to consider is that this algorithm appears to rely on a double-precision floating-point divide at its core, so the speediness measured on a Xeon E5 may not translate to speediness on architectures with weaker floating point units.

The purpose of reals is just to map from [0-1) to [0-n) where n is the number of hosts.

Floating points are used to ease the presentation, I think the algorithm can be ported to integer operations without loss of performance (didn't prove it, I just tried to write a pure integer implementation and checked distribution of results on some inputs).

Using doubles in a hash algorithm seems (to me) very dangerous.

For years I have had problems with different optimisation levels providing different results for floating point code, as different registers have different internal sizes, FP registers can have different internal precision, etc.

Can this hash function be trusted to produce repeatable results?

From the paper, this originates from:

> ...Since i is a lower bound on j, j will equal the largest i for which P(j ≥ i), thus the largest i satisfying i ≤ (b+1) / r. Thus, by the definition of the floor function, j = floor((b+1) / r).

It seems to me that since all the numbers are positive, you can safely use integer division if you like. AFAIK, floor(((double)a)/b) and a/b coincide for that case at least.

Can this hash function be trusted to produce repeatable results?

It turns out that it can be trusted to always produce consistent results regardless of which compiler or optimization settings you're using. Here's a quick proof:

  int32_t JumpConsistentHash(uint64_t key, int32_t num_buckets) {
    int64_t b = ­1, j = 0;
    while (j < num_buckets) {
      b = j;
      key = key * 2862933555777941757ULL + 1;
      j = (b + 1) * (double(1LL << 31) / double((key >> 33) + 1));
    return b;

Step 1:

      key = key * 2862933555777941757ULL + 1;
The 8-byte 'key' is multiplied by a large number (hex value: 0x27BB2EE687B0B0FD). After this multiplication, the first 4 bytes and the last 4 bytes of 'key' are nonzero. E.g. a key of 1 will yield 0x27BB2EE687B0B0FE, a key of 2 will yield 0x4F765DCD0F6161FB, etc. (The fact that the first 4 bytes become nonzero is important for the next step.)

Step 2:

      double(1LL << 31) / double((key >> 33) + 1)  
The lower 33 bits of 'key' are discarded, then 1 is added.

In order to understand what can happen, let's examine extreme inputs.

If key is 0, the denominator will be 1, and the result will be (double)0x80000000, which can be represented exactly. (A 64-bit float can exactly represent all integers in the range [-2^53, 2^53].)

If key is (2^64 - 1), then the denominator((key >> 33) + 1) is 0x80000000, which is equal to the numerator (1 << 31), so the result will be 1.

Let's plot this equation for all possible values of key: http://www.wolframalpha.com/input/?i=plot+%282%5E31%29+%2F+%...

(Wolfram Alpha doesn't understand bitshift notation, but that formula is equivalent to the C code.)

Step 3:

      j = (b + 1) * ( /*...result from step 2...*/ );
'b' is the previous value of 'j'. It's a uint64_t, and its value is always less than 'num_buckets' due to the nature of the while loop.

'num_buckets' is an int32_t, so its max value is 0x7FFFFFFF.

The result from step 2 has a maximum possible value of (double)0x80000000 and a minimum possible value of 1.0.

(b + 1) is promoted to a double, which is then multiplied with step 2's result.

Since all of the following conditions are true...

- step 2's result has a max value of 0x80000000

- the final result must be less than 'num_buckets' to continue iterating

- the final result is less than 2^53 (the maximum integer that can be exactly represented by a 64-bit float)

- each iteration of the loop quantizes the final result from a 64-bit float to a 64-bit integer (by storing it into the variable 'j'), preventing error from accumulating

... then it must be true that this floating point arithmetic will always yield consistent results. It can't matter which compiler you use or which optimization level / strictness settings you choose.

(It's probably possible to trim this proof down, but since it gets the job done, I'm just going to post it as-is.)

EDIT: In fact, I'm pretty sure that the algorithm would still produce consistent results even if it tried to store integers larger than 2^53 into a 64-bit float. Even though that would be out of range, the truncation would still be consistent.

The reason why it's consistent is because at each iteration of the loop, it quantizes the intermediate 64-bit float result into a 64-bit integer, essentially flooring the result. This prevents small floating point errors from carrying over into the next iteration. A secondary reason why it's consistent is because the actual values being divided and multiplied are very small (less than 0x80000000). For small integers x and y, it seems like (uint64_t)((double)x / (double)y) will always truncate consistently regardless of compiler variation or floating point optimization settings.

Maybe (uint64_t)((double)x / (double)y) will always yield a consistent result regardless of compiler or optimization settings, even if x and y aren't small. But I'm definitely sure that (uint64_t)(9.0/5.0) will always result in 1, and (uint_64)t)(11.0/5.0) will always result in 2, etc, and in that case, x and y are small. So, since the max intermediate values are less than 0x80000000, it seems likely that there's no numerator/denominator combination that could cause inconsistent results if you switch compilers or float optimization settings, as long as numerator > denominator.

In case anyone is interested in the random number, `2862933555777941757`, but didn't catch it while scanning the PDF, it is shared with the "64-bit Linear Congruential Generator" [1]

The author specifies that if the key is larger than 64 bits, then it should get a 64 bit hash for use in input. But I wonder, supposing the compiler or processor handled higher bits in a single register and we had a need for it, would this algorithm handle a change of the magic number without a problem? Where would one look for such numbers?

[1]: http://nuclear.llnl.gov/CNP/rng/rngman/node4.html

super quick, super dumb comparison to a variant on bob jenkins 64-bit mix/hash: https://gist.github.com/robmccoll/38f03971df66ca15e030

$./bin/googlehash 512 1000000 google hash 0.0765907 255374889 2.18324e+10 bob jenkins hash 0.0158996 255484409 2.18324e+10 google is 0.207592x faster bob is 4.81714x faster

$./bin/googlehash 29 1000000 google hash 0.0478618 14000129 6.99994e+07 bob jenkins hash 0.0158052 13999989 6.99994e+07 google is 0.330225x faster bob is 3.02824x faster

./bin/googlehash 65536 1000000 google hash 0.117784 32781492370 3.57002e+14 bob jenkins hash 0.0162152 32744680953 3.57004e+14 google is 0.137668x faster bob is 7.26383x faster

print out is time(s), sum, variance (could use a suggestion on better test for uniformity - only other idea was look at a histogram, anyone have suggestions?).

edit: it is likely that this is a poor comparison :-)

You are comparing apples to oranges.

The purpose of google algorithm is that, when changing the number of bins, a minimum number of items get moved.

This version of your codes prints the number of items which have been reaffected: https://gist.github.com/def-lkb/58243299e114244d3b90

$ ./a.out 1000 1002 100001 google hash 0.0343956 49967527 8.34091e+09, different bins 207 bob jenkins hash 0.0039651 50033275 8.34095e+09, different bins 99794 google is 0.115279x faster bob is 8.67458x faster google moved 0.206998x% objects bob moved 99.793x% objects optimal distribution required 200 movements (0.199998%)

bob is performing extremely bad.

edit: updated the gist to include the integer version

Also, the google algorithm does O(log(bins)) iterations in the loop, which explains the speed advantage of bob one. Given the task accomplished, it's really efficient and near optimal.

Very cool! Thanks, I clearly missed the point :-)

The algorithm is called "Jump". A quick search on GitHub revealed three projects that implement it and they are all written in Go: https://github.com/benbjohnson/jmphash

Given that the paper is by people from Google, where Go is fairly well adopted, and that both Go's and this algorithm's primary use-cases are server-related, that makes kind of sense.

Just small note: This is not a pair-to-pair competition to consistent hashing.

This algorithm requires consistent mapping between nodes and (consecutive) integers--that's not something you get for free in distributed systems where nodes may join or leave the pool at any time.

It seems a simple key mod bucket_size works to divide a workload based on a numeric key. I imagine this has a different distribution which works better for certain use cases. Anyone have an example of when mod will fail for something like this?

Edit: The paper covers this.

The problem this algorithm solves is not how to distribute keys in buckets (key mod bucket_size works fine for that) but how to gracefully adapt when bucket_size needs to increase. This algorithm minimizes the number of keys that have to be moved to new buckets while incurring no memory overhead nor requiring a complex data structure.

It fails when bucket_size changes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact