Why Bloom filters work the way they do

DanielRibeiro · on Sept 27, 2012

If you like blooms filters, you may like many variations of it, and some of its applications:

A Garden Variety of Bloom Filters[1]

[1] http://matthias.vallentin.net/blog/2011/06/a-garden-variety-...

Rickasaurus · on Sept 27, 2012

A great post. I wish we saw more like it on Hacker News.

shutton · on Sept 27, 2012

Very well written. It's the first technical CS article I've read in a while and really enjoyed it.

Dylan16807 · on Sept 27, 2012

Interesting, I read the introduction and could only think of three steps myself. Apparently you and I think of hashes in a different way. I went directly from 'list of urls' to 'hash table of urls' (with a very minor note of 'we only need to store yes/no so make the bucket size one bit'). This skips a large part of your train of logic entirely. From there it's the same 'use a couple hashes to avoid collisions', 'use overlapping storage because it's more efficient'.

alecdibble · on Sept 26, 2012

Thanks! I enjoyed the way you approached the explanation by exploring the problem instead of the solution.

michael_nielsen · on Sept 26, 2012

Thanks! Yeah, sometimes solutions alone are opaque -- it's often illuminating to explore other approaches, and to understand why they don't work so well, in order to understand why a solution is the way it is.

knodi · on Sept 27, 2012

I believe LevelDb is getting bloom filter to alleviate the key not-found performance issue soon.

Very cool article.