
Fibonacci Hashing: The Optimization That the World Forgot - ingve
https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
======
aappleby
Don't do this. Use a real hash function that guarantees a highly random
distribution, make your hash tables power-of-two sized, and map from hash
value to table index using (hash & (size-1)).

The fibonacci constant thing will help clean up the distribution of a bad hash
function, but it does nothing for collision resistance if the underlying hash
function is weak.

-Austin, author of Murmurhash and SMHasher

~~~
mickronome
He's clearly, and obviously not talking about using it as a hash, or even to
consider it as a secondary hash as someone mentioned.

The use case according to the article is strictly to replace integer modulo to
map into buckets for cases where that operation is the limiting factor. In his
case that's when 9ns per key is too much, and roughly 1ns is good.

For small hash tables sometimes the worst case of a linear scan of the entire
table is perfectly acceptable, and the additional performance the other
99.999% of the time is a welcome bonus.

~~~
attractivechaos
Fibonacci hashing takes the form of "h * k>>(64-b)", where k is determined by
golden ratio and b is the bit size of the table. This involves one generic
multiplication, which is the bottleneck. This multiplication is not strictly
necessary. You can replace it with k=1033 for example, which can be
implemented as "h+(h<<10)+(h<<3)". This will be a good enough safeguard
against naive hash functions but is much faster to compute.

With "h * k>>(64-b)", the result has a cycle of 2^b. Suppose b=3 and you have
input 1<<3|1, 2<<3|1 and 3<<3|1, Fibonacci hashing will put them to the same
bucket – it is not that effective. A safer strategy is to use a proper integer
hash function like Thomas Wang's 32-bit hash function. Although it involves
more steps, it only involves plus and bit operations and probably can be
computed faster than generic multiplication.

At the end of day, however, Fibonacci hashing or similar ideas only helps when
you hash keys to similar integers but has no effect when you hash different
keys to the same integer. You have to use a reasonable hash function anyway.

~~~
acqq
> Although it involves more steps, it only involves plus and bit operations
> and probably can be computed faster than generic multiplication.

It the most recent popular CPUs the multiplication is exactly "one step" long,
that's how wonderfully fast they got to be. See e.g. Agner Fog instruction
tables. And where not, the number of "steps" is typically not more than 2 or
3. The multiplication is implemented very efficiently today, unless the CPU
has to be with a very low transistor count (linke in some embedded systems).
The more exotic CPUs can, of course, be different.

------
rntz
Summary: The article is looking for an efficient way to map a hash code into a
smaller power-of-two-sized range, for use as a hashtable index. It dismisses
the common solution, masking off the high bits, because it discards
information, and proposes Fibonacci hashing: multiply by the golden ratio and
shift down. It gives measurements suggesting this gives better performance in
practice, and some theory as to why this might be.

But later it mentions, in passing, a simpler approach: just shift the high
bits down and xor! The author only tries doing this as a preprocessing step in
front of Fibonacci hashing to avoid "bad patterns". So I'm left wondering:
might shift-down-and-xor be good enough on its own?

~~~
twic
The article presents Fibonacci hashing as an operation to map a hash code into
a smaller range, but it isn't that, really. The operation that does that is
still just taking some bits from the hash code.

What Fibonacci hashing actually is is a way of stirring a hash code before
use, to spread its entropy out more, so that the bits you end up taking a more
likely to be well-distributed.

If your hash codes are already well-distributed, then this is pointless. But
if they aren't, it's useful. So, it seems to me that rather than applying
Fibonacci hashing to all hash codes, it would be better to use it as the
original hash function for types which currently have bad hash functions. This
is something a C++ standard library could easily do.

For example, LLVM's libc++ implements string hashing using MurmurHash2 on
32-bit machines, and CityHash on 64-bit machines:

[https://github.com/llvm-
mirror/libcxx/blob/master/include/__...](https://github.com/llvm-
mirror/libcxx/blob/master/include/__string#L854)

[https://github.com/llvm-
mirror/libcxx/blob/master/include/ut...](https://github.com/llvm-
mirror/libcxx/blob/master/include/utility#L954)

But hashes all sizes of integers to themselves:

[https://github.com/llvm-
mirror/libcxx/blob/master/include/ex...](https://github.com/llvm-
mirror/libcxx/blob/master/include/ext/__hash#L94)

Changing that to a Fibonacci hash, or a simpler shift-and-xor, could be a
quick win.

Provided that libc++'s unordered_map uses power-of-two table sizes, that is.
The code is labyrinthine, but i think, rather gloriously, sometimes it does,
and sometimes it doesn't:

[https://github.com/llvm-
mirror/libcxx/blob/master/include/__...](https://github.com/llvm-
mirror/libcxx/blob/master/include/__hash_table#L2131)

[https://github.com/llvm-
mirror/libcxx/blob/master/include/__...](https://github.com/llvm-
mirror/libcxx/blob/master/include/__hash_table#L125)

__constrain_hash is a simple but entertaining bit of bit-dickery (reformatted
slightly):

    
    
        size_t __constrain_hash(size_t __h, size_t __bc) {
            return !(__bc & (__bc - 1))
                ? __h & (__bc - 1)
                : (__h < __bc ? __h : __h % __bc);
        }
    

The x & (x - 1) tests whether a number is a power of two, because for any
number that is not a power of two, subtracting one leaves the top bit set, so
the bitwise and will contain at least one set bit. If the bucket count (number
of slots) is a power of two, use a mask to extract the bottom bits of the
hash. If it's not, do a modulus - but spend a branch to avoid that if the hash
is already in the right range, which i'm surprised is a win.

~~~
acqq
> LLVM's libc++

Great find!

So the library solutions are still often suboptimal, and it's even more easy
to hide bad decisions in the C++ sources, so whoever has the approach "just
use the default library" should be aware of that once the performance is
important.

Yes, even the simple multiplicative constants can significantly improve the
hash if it by default doesn't do anything with the input! The libraries
definitely should be fixed, and adding the multiplication step is really a
simple and fast change for a great benefit.

As an inspiration, Kernighan and Ritchie in their book about C used a simple
number 31, and that simple hash is still quite good compared to much more
complex and more recent solutions as K&R also haven't used the (I guess
misleadingly named) "open addressing" for their hash table. Their solution is
amazingly minimalistic and in that context amazingly good for chain hash
tables. I wouldn't be surprised if just changing

    
    
        return __c;
    

to

    
    
        return __c * 31;
    

in the functions discovered would result in great improvement. The good side
of such a constant is that it can give the fast and small code even on the old
architectures where the "normal" multiplication is slow (e.g. even if there's
no fast multiplier the result can be obtained by one shift and one
subtraction!). Also on modern architectures using this constant can't result
in any performance degradation but improving the hash behavior of these
formerly unprocessed inputs guarantees speedup. And there are surely use cases
when using more complex functions is much better, e.g. those suggested by
aappleby:

[https://news.ycombinator.com/item?id=17330787](https://news.ycombinator.com/item?id=17330787)

Back to the "open addressing", if you are rolling your own hash table and
don't plan too much hash tables to be present in memory at once, it's often
much faster to use "chains" (like in the K&R C book) than trying to store
everything only in the table (which is misleadingly often called "open
addressing" even if "closed hashing" is a better term) and jump through the
table in the collision case. Maintaining lists per entry is typically much
faster when the table is fuller, provided the allocation routines are fast.

[https://www.strchr.com/hash_functions](https://www.strchr.com/hash_functions)

By the way, MurmurHash2 or 3 and CityHash are definitely very good functions,
the problem is when they aren't used in the library, like, it seems, in
libcxx. And in the cases where the simpler code is needed, even a simple * 31
is much, much better than nothing!

And note, it seems there are even problems with these good functions, security
wise: apparently the language implementations or the services accepting
uncontrolled inputs also have to care about the security aspects of their hash
functions:

[https://131002.net/siphash/](https://131002.net/siphash/)

"Jointly with Martin Boßlet, we demonstrated weaknesses in MurmurHash (used in
Ruby, Java, etc.), CityHash (used in Google), and in Python's hash. Some of
the technologies affected have switched to SipHash."

"SipHash was designed as a mitigation to hash-flooding DoS attacks. It is now
used in the hash tables implementation of Python, Ruby, Perl 5, etc."

"SipHash was designed by Jean-Philippe Aumasson and Daniel J. Bernstein."

~~~
twic
> "SipHash was designed as a mitigation to hash-flooding DoS attacks. It is
> now used in the hash tables implementation of Python, Ruby, Perl 5, etc."

It's also the default hasher in Rust.

Rust's hashing is interesting. Types that want to be hashable implement the
Hash trait. What the Hash trait requires is that a type knows how to feed its
fields to a Hasher, as a sequence of primitives - it doesn't require that it
actually computes a hash itself. It's the Hasher which computes the hash. This
is nice, because it's very easy to implement Hash; indeed, so easy that it can
be done automatically using a derive macro. The downside is that it's not
possible for a type to implement a custom hash that takes particular advantage
of its own structure, and so to get a particularly good tradeoff of
distribution against performance. The only place to make that tradeoff is in
the choice of Hasher, where it has to be made generically across all types.

That said, you can choose the hasher used for individual HashMaps, so if you
have a HashMap where you know the keys are integers, you can use a Hasher
which just does a Fibonacci hash.

------
allenz
It's not really fair to compare a custom implementation against the standard
unordered_map implementations, which need to be fully general. See
[https://news.ycombinator.com/item?id=9675608](https://news.ycombinator.com/item?id=9675608)

Still, I'm shocked that GCC, LLVM, and boost all assign buckets using modulus,
which is very slow. I would love to know the reasoning. I assumed that they
mask the high bit (or & with the table size).

Fast hashmaps use xor to mix information from low bits (examples below).
Fibonacci hashing amounts to running a second multiplicative hash over your
input, which is only worthwhile if you're paranoid about your input
distribution.

Java SDK:
[http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/s...](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java#l320)

Android (C++):
[https://android.googlesource.com/platform/system/core/+/mast...](https://android.googlesource.com/platform/system/core/+/master/libcutils/hashmap.cpp#83)

~~~
twic
std::unordered_map is a bit of a red-headed stepchild in the C++ world. I'm
not surprised that it hasn't had nearly the amount of tuning that hashmaps
have had in other languages.

When i asked some C++ers about it, they warned me off using it, for reasons i
didn't fully understand, but i got the impression that there are structural
reasons why it can never be really fast, so anyone who needs a really fast
hashmap uses some non-standard one anyway.

------
strainer
The Golden Ratio is proposed as a special multiplier for what is otherwise
known as 'linear congruential' psuedorandom number generation. I'm not clear
on whether the non-repetitive property of the ratio will benefit its
performance at all.

This decimal is computed for 2^64 /1.618... 11400714819323198486 -it looks
like this in binary:
1001111000110111011110011011100101111111010010100111110000010101

The runs of 7 ones, 5 zeros and 5 ones could be sub-optimal.

[1] Donald Knuth himself discovered this one for 64bits:
101100001010001111101000010110101001100100101010111111100101101

It contains a run of 7 ones, and 5, but maximum zeros in a row is 4.

I did once mine multipliers for LCGs of different bitlengths by comparing
quickly measured ratios in their output to those precomputed from good
psuedorandom sequences (average deviation etc) Then having found numbers which
achieved the basic signature of random data, they were tested with Marsaglia's
old 'diehard' battery of tests - and often passed.

Here is a multiplier discovered for a 32bit LCG 110010011101110100001011

I had a list of them I meant to examine here, but have lost it :(

Anyway, there is plenty of academic work to read on this subject:

1 -
[https://en.wikipedia.org/wiki/Linear_congruential_generator#...](https://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use)

------
wrs
It's a strangely distinct memory that when I needed to implement a hash table
in 1992, I read Knuth--in 1992 I guess there wasn't much else to do--and
apparently read it correctly. The beauty of the golden ratio technique made it
memorable and I've been doing "multiply by 2^n/phi" ever since! In fact just
last week I launched Ruby to calculate it again...

------
no_identd
I wonder if, under some circumstances, one ought to use this instead for
optimality of this or something similar:

[https://en.wikipedia.org/wiki/Plastic_number](https://en.wikipedia.org/wiki/Plastic_number)
for a similar use. It shares a property with phi which no other irrational
shares with it (They are known as the only two Morphic numbers, which one must
avoid confabulating with a similarly named concept whose name I can't recall
right now.). _And_ Knuth liked it (well, the reciprocal of it's square anyway,
albeit the cubed version seems more interesting to me.), but never found any
application for it. (He even made a special TeX symbol for the square of it,
'High Phi', see Wikipedia for details.)

~~~
extremelearning
As @twic and the OP discussed, the equal distribution of numbers within a
defined range is naturally achieved through low discrepancy sequences (eg
recurrence,Halton, Sobol, etc..) Furthermore, in ultra high-speed / low-level
computing situations the additive recurrence methods are often preferred due
to the incredibly fast and simple method of calculating the each successive
term simply by adding (modulo) a constant value to the previous term.

For the one-dimensional case, it is well known, and relatively easily proven
that the the additive recurrence method based on the golden ratio offers the
optimal 'evenness' [low discrepancy] in distribution [1]. For higher
dimensions, it is still an open research question as to how to create provably
optimal methods. However, one of my recent blog posts [2] explores the idea
that a generalization of the golden ratio, produces results that are possibly
optimal, and better than existing contemporary low discrepancy sequences. In
the one dimensional case, the critical additive constant is of course, the
golden ratio. In the two dimensional case, the additive constant is based on
integral powers of the plastic number. The generalization to even higher
dimensions follows other Pisot numbers.

[1] [https://en.wikipedia.org/wiki/Low-
discrepancy_sequence#Addit...](https://en.wikipedia.org/wiki/Low-
discrepancy_sequence#Additive_recurrence)

[2] [http://www.extremelearning.com.au/unreasonable-
effectiveness...](http://www.extremelearning.com.au/unreasonable-
effectiveness-of-quasirandom-sequences/)

------
carapace
Moral of the story: Always _always_ consult Knuth.

~~~
rurban
Unless Knuth is outdated. With hash tables Knuth is seriously outdated. With
hash functions you can consult his test functions, but he is also outdated.

E.g. the CRC scheme is the fastest by far (one even exists in HW), but too
easily attackable. Trivial really, any 10 year old can do that, due to some
unfortunate CRC properties.

~~~
nhaehnle
> any 10 year old can do that

Is this kind of hyperbole really necessary?

~~~
rurban
Yes, because almost nobody knows. But when you see it it's super trivial.
Knuth would be ashamed.

------
ttctciyf
The article attempts to explain this property of phi:

> Maybe you have a picture of a flower, and you want to implement “every time
> the user clicks the mouse, add a petal to the flower.” In that case you want
> to use the golden ratio: Make the angle from one petal to the next 360/phi
> and you can loop around the circle forever, adding petals, and the next
> petal will always fit neatly into the biggest gap and you’ll never loop back
> to your starting position.

And points to a video. I think there's a much clearer explanation which shows
how this is because (in some sense) phi is the "most irrational" ratio,
stemming from its derivation from continued fractions:
[https://www.youtube.com/watch?v=sj8Sg8qnjOg](https://www.youtube.com/watch?v=sj8Sg8qnjOg)

I freely admit I have not a clue how this does or doesn't improve hash algos,
but it's a cool video! :)

------
gct
If you have a reasonable hash function, why is power-of-two and bitmasking
bad? If you're uniformly distributed over a range N you'll be uniformly
distributed over N/2

~~~
jamiek88
It isn’t. You are correct. This is useful to support the underlying hash
function but at that point you might as well just improve that and take the
power of two, bitmask approach.

~~~
vidarh
I think that boils down the problem nicely: if you're using a hash table or
writing your own for a specialised use-case, you should pick a good hash
function.

But if you're writing a general purpose hashtable implementation you have to
deal with the fact that a lot of users won't use a good hash while some will,
so you need to find a tradeoff between using their hash as-is and mixing it up
to improve on the bad ones.

The latter need to come almost free, however, or you'll ruin performance for
those who actually do their homework.

------
jmcminis
I think the search part is implementing the Fibonacci Search Technique[0].
It’s related to Golden ratio search[1].

[0]
[https://en.wikipedia.org/wiki/Fibonacci_search_technique](https://en.wikipedia.org/wiki/Fibonacci_search_technique)

[1] [https://en.wikipedia.org/wiki/Golden-
section_search](https://en.wikipedia.org/wiki/Golden-section_search)

------
danbruc
On a related note, growing a hash table by a factor of two might not be
optimal due to interactions with the memory allocator, more specifically the
new allocation will be larger than all previous allocations combined so that
you can not reuse previously allocated memory. Starting with m buckets one
will allocate m * 2^t buckets after t growing operation, in total m * (2^(t +
1) - 1), but that is one bucket less than the m * 2^(t + 1) buckets required
for growing again [1]. Whether this might be an issue obviously depends on the
memory allocator used and other factors like the usage of a compacting garbage
collector, phi appears again when looking for a better growth factor, and I
will just leave this link to a Stack Overflow question [2] as a starting point
because I am unable to find the article I had initially in mind.

[1] Assuming you can somehow incorporate the current allocation, previously
freed were only m * (2^t - 1) buckets.

[2] [https://stackoverflow.com/questions/2369467/why-are-hash-
tab...](https://stackoverflow.com/questions/2369467/why-are-hash-table-
expansions-usually-done-by-doubling-the-size)

------
ot
If you need to map a uniform b-bit integer x into the [0, n) range, using ((x
* n) >> b) will work just as well as the modulo in terms of (quasi-)even
distribution, and it avoids integer division.

The idea is just to imagine x as a b-bit fixed-point uniformly random number
in [0, 1), and rescaling it by multiplication.

This works great for hash tables whose size is not a power of 2, provided that
you start from a good hash function.

------
qume
Light travels around a foot in a nanosecond.

Makes thinking about processing times a bit easier when you can visualise it
as a distance.

~~~
fogleman
[https://www.youtube.com/watch?v=JEpsKnWZrJ8](https://www.youtube.com/watch?v=JEpsKnWZrJ8)

------
stochastic_monk
This sounds quite similar to the fastrange [0] method. I’ve used it for random
sampling but not a hash table. It’d be great for non-power-of-two tables while
avoiding integer modulus.

[0] [https://github.com/lemire/fastrange](https://github.com/lemire/fastrange)

~~~
lower
fastrange is discussed in the article.

------
exDM69
Am I correct in thinking that using this Fibonacci method of mapping hash
values to buckets will solve the "accidentally quadratic" behavior in a Robin
Hood hash table? Rust's default hash table suffered from the issue but their
fix was much less elegant.

------
pubby
The comments here have given me an idea for my own hash tables.

Rather than accounting for poor hash functions using phi or fmix, I'm going to
measure the hash function's distribution at run time and throw an error if its
bad. For release builds I'll disable these checks.

~~~
twotwotwo
Runtime errors seem a little bit fussy, but "try a fast approach that usually
works, detect worst-case behavior, then fall back to a thing that's usually
slower but avoids the worst case" is a common implementation pattern (think
introsort). In principle, you could start with something trivial as your hash,
then rehash your key with a better-behaved-but-slower function if your normal
probing strategy is going on far longer than it ought to given the load
factor.

But we rarely see that, and there are probably good reasons. Hash tables are
more rarely the bottleneck in the real world than in benchmarks, and when they
are an issue, other factors (size, concurrency, weird pathologies, mem
latency) may matter more often than hashing time.

------
chucklenorris
is there a way to use this technique for consistent hashing?

------
cinek
Great article, really blow my mind

