
A Fast, Minimal Memory, Consistent Hash Algorithm - brandonb
http://arxiv.org/abs/1406.2294v1
======
ykl
Woah, this jumps out at me immediately since Eric Veach's name is on it. I
haven't seen a (relatively) new Eric Veach paper in a long long time.

Some background: Eric Veach is huge in the computer graphics world for laying
a ton of the foundations of modern physically based rendering in his PhD
thesis [1]. He then went on to work for Pixar and did a ton of work on
Renderman (for which he recently got an Academy Award), and then in the early
2000ish left Pixar to go work for Google, where he was the lead on developing
AdWords [2]. In short, he's had quite a career, and seeing a new paper from
him is always interesting.

[1]
[https://graphics.stanford.edu/papers/veach_thesis/](https://graphics.stanford.edu/papers/veach_thesis/)
[2] [http://www.bloomberg.com/bw/stories/2006-03-05/the-secret-
to...](http://www.bloomberg.com/bw/stories/2006-03-05/the-secret-to-googles-
success)

~~~
berkut
Slight correction: his sci-tech award was for MIS and "efficient Monte Carlo
path tracing for image synthesis", which have only been in PRMan for the last
two releases (PRMan 18 added geo area lights with MIS evaluation, and 19 added
pure path-tracing), so his award wasn't for his work at Pixar really as that
happened 12+ years before the techniques the award was for made it into PRMan.

------
dbenhur
Smaller standard error, significantly less memory, faster execution,
equivalent rebalance cost... what's not to like?

Thanks, Google:

> Google has not applied for patent protection for this algorithm, and, as of
> this writing, has no plans to. Rather, it wishes to contribute this
> algorithm to the community.

~~~
nightcracker
Except that's useless. Realistically you'd need a legal patent clause, because
right now Google can change its stance on patenting it at any moment, and make
your code violate the patent.

~~~
noelwelsh
I was taught that patents can't acquired on discoveries that have already been
made public. If that's the case Google's publication of this method before
attempting to patent it makes it unpatentable.

~~~
DougMerritt
There is a 1 year grace period between disclosure and the deadline for
patentability.

[http://www.uspto.gov/web/offices/pac/mpep/s2133.html](http://www.uspto.gov/web/offices/pac/mpep/s2133.html)

~~~
gojomo
Grace periods are shorter or non-existent outside the US. Publication date is
June 9 of last year, so Google has less than 3 months to change their minds
and turn-evil to pursue a patent on this technique.

------
nkurz
That's a nifty algorithm. Sillysaurus3's wonderful explanation from last year
helped me figure out what's happening. Mostly, at least (my fault, not his):

[https://news.ycombinator.com/item?id=8140385](https://news.ycombinator.com/item?id=8140385)

It feels like there should be some faster way of doing the jump calculation
that doesn't require the floating point divide. Relatively, it's a slow and
expensive operation, and doing ln(n) of them on each bucket lookup would add
up to a lot if you have lots of buckets.

~~~
ChrisLomont
His explanation has mistakes. For example, in step 1 he claims "After this
multiplication, the first 4 bytes and the last 4 bytes of 'key' are nonzero"

Take any 8 byte value b, multiply by 16133697096952638549ULL (mod 2^64) to get
a value for the key. Now key * 2862933555777941757ULL = b. Thus there is an
input value for key that I can make hit any number. The first and last 4 bytes
are thus certainly not always nonzero.

He then uses "The fact that the first 4 bytes become nonzero is important for
the next step" and proceeds. This is wrong.

For a concrete example, suppose key is 16133697096952638549ULL. Then key *
2862933555777941757ULL = 1. The first 7 bytes (actually, the first 63 bits)
are all zero. Adding 1 as in the code still leaves the first 62 bits zero.

(the value 16133697096952638549ULL is the multiplicative inverse of the
constant in the algorithm mod 2^64).

------
ccleve
This has already been discussed here:

[https://news.ycombinator.com/item?id=8136408](https://news.ycombinator.com/item?id=8136408)

It's been in guava for a few years now.

------
cnlwsu
HRW worth looking into as well if you have a smaller number of buckets:
[http://en.wikipedia.org/wiki/Rendezvous_hashing](http://en.wikipedia.org/wiki/Rendezvous_hashing)

------
bcook
How revolutionary is this? Will it replace all preceeding hash algos or does
it have a narrow usefulness?

~~~
gojomo
It's useful in the field of consistent-hashing: mapping items over shards
where the number of shards varies over time. (Only some systems do this, and
most hash algorithms are used in other fields.)

This technique also requires the shards be assigned contiguous integer ID
starting at 0, and appears to work best if all shard entry/exit is intentional
and at the end of the ID range. (That is: if you have 15 shards, they're named
#0-#14. If you decide you need one less, you turn off shard #14.)

It doesn't appear to work well for systems with more chaotic entry/exit.
(Shard #4 just crashed, and you don't need it anymore? You'd still want to
restart/fill one shard as #4, then shut-down/drain shard #14. Other
consistent-hashes could just map #4's old duties among the remaining nodes,
albeit with their higher space/time costs.)

~~~
bcook
Would something like a network packet queueing algorithm, which hashes each
stream and iterates through them (ex. Stochastic Fair Queue), benefit from
this hashing algorithm?

~~~
gojomo
Do those algorithms already make use of consistent hashing? If so, yes, but
otherwise: probably not.

------
Lethalman
This is a nice algorithm and I thought about writing something that only
depended on numbers and not on allocating nodes in a ring.

It works well when adding nodes, but then how does it work when removing a
node in the middle? When allocating nodes in a ring it's easy, just reassign
the vnodes. But what with this algorithm? Can anybody explain how does it keep
track of which nodes take control of which part of the removed node?

------
beefsack
The paper was really interesting to read, it motivated me to implement it in
Go: [https://github.com/beefsack/go-jch](https://github.com/beefsack/go-jch)

~~~
maxpert
Same here :) even did a blog post about it
[http://www.creapptives.com/post/113786047187/improving-
shard...](http://www.creapptives.com/post/113786047187/improving-shard-re-
balancing-with-jump-consistent)

