

Avalanche diagrams showing the effectiveness of various hashing functions - petercooper
http://sites.google.com/site/murmurhash/avalanche

======
patio11
Another fun graphical way to see the effectiveness of a hash function is to:

1) Run a known dictionary against the hash function with a specified number of
buckets.

2) Calculate how much of an error there was versus the theoretical optimum
(even distribution of the inputs across all buckets).

3) Graph that (each bucket becomes a pixel in a rectangle) and use color to
convey the relative magnitude of the error.

If you do this, you get to use some very powerful software backing your eyes
as anomaly detectors. Low intensity noise is a good result. Anything which
your brain can sort out is a bad result.

They're perfectly good enough to identify most bad hash functions. (Try this
with an English dictionary and Java string hashing. You will, ahem, not have
difficulty finding the problem.)

[Edit to add: I might just be getting old, but I'm _positive_ a professor and
I broke the Java hash function wide open in college using this method, yet my
15 minute attempt to replicate this result gives me an image which looks like
a good hash function would result in: white noise.]

~~~
epochwolf
> I might just be getting old, but I'm positive a professor and I broke the
> Java hash function wide open in college using this method, yet my 15 minute
> attempt to replicate this result gives me an image which looks like a good
> hash function would result in: white noise.

Is it possible Java has updated the string hashing function since you've done
this?

------
ComputerGuru
MurmurHash [1] (which is by the same other as TFA) is one of my favorite ever
finds. I use it in conjunction with Google Sparse/Dense Hash Maps [2] (another
great find, which is Google's very nicely-designed implementation of C++0x
compatible hash table STL structure with great performance and memory usage)
in almost all my projects these days.

1: <http://sites.google.com/site/murmurhash/> 2:
<http://code.google.com/p/google-sparsehash/>

------
Keyframe
I've been doing some experiments with bloom filters recently, thanks for the
link. I enjoy these kind of topics related to information complexity and
cryptography (it's my pet hobby).

------
jbellis
This (and speed) is why Cassandra uses murmur for its bloom filters:
[http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-
kn...](http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-
about.html)

