

A Garden Variety of Bloom Filters - srsamarthyam
http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/

======
jonpaul
For those of you who don't know what a Bloom Filter is or the concept isn't
quite clear, you should check out this site:
<http://www.jasondavies.com/bloomfilter/>

He has a nice interactive demo with an explanation on how it works.

~~~
peacemaker
Unless I'm misunderstanding how a Bloom Filter works, I don't think the
interactive demo is working correctly.

If you add letters a - z as keys, then do a search for some numbers, such as
the number 6, you get "Probably there". I was under the impression that a
Bloom Filter could determine if something was definitely NOT in the data set,
in which case this the example isn't working.

Please correct me if I'm wrong :)

EDIT: After playing around with it some more I'm questioning if I do fully
understand how it is supposed to work. It seems there are many cases of
"Probably there" when the key definitely is not. So I'm guessing in those
cases you'd want to search the set to make sure?

~~~
mej10
You are slightly confused. False positives do indeed happen, but false
negatives can never happen (if one of the bits aren't set for the
corresponding hash, then it is guaranteed to have never been added before).

Collisions in hash tables are related to false positives in Bloom Filters.

Also, regarding your last statement: You don't always need to look at the
actual set in some cases, but for a lot you do. Bloom Filters give you a fast,
compact way to prevent a lot of unnecessary searching. Spell Checkers, for
example, you could reduce the amount of times you had to search through the
dictionary, or for finding files or something on disk, you could eliminate a
percentage of searches for the files. How much these can be reduced is related
to the size of your Bloom filter.

~~~
peacemaker
Ok that makes sense, thanks for taking the time to explain. Seems obvious now!
:)

