
What are Bloom filters? (2015) - diggan
https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff
======
dmlittle
Bloom filters are great, but unfortunately don't support deletions of items as
a particular key might be being used by more than element in the filter and
you don't know if it's same to delete. A few weeks ago, someone posted about
Cuckoo Filters [1] which are like Bloom Filters but allow for key deletion.

[1]: [https://www.cs.cmu.edu/~dga/papers/cuckoo-
conext2014.pdf](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

~~~
dlhavema
Isn't there a variant that uses a counter instead of a bit. If counter > 0 the
flag is set?

~~~
dmlittle
Yes, but by using a counter you increase the size of the filter and you
introduce the possibility of overflows. Let's say you use 3 bits to count
(rather than 1) your filter is now x3 the size and if you end up adding more
than 7 items with the same key, you'll get an overflow and your false positive
rate will go up. Cuckoo filters allow for deletions while using less space
than a bloom or counting filter.

------
rajathagasthya
Previous discussions on Bloom Filters:

[https://news.ycombinator.com/item?id=12231623](https://news.ycombinator.com/item?id=12231623)

[https://news.ycombinator.com/item?id=12124722](https://news.ycombinator.com/item?id=12124722)

------
krat0sprakhar
Like most devs, I too bit the bullet and wrote my version -
[http://prakhar.me/articles/bloom-filters-for-
dummies/](http://prakhar.me/articles/bloom-filters-for-dummies/)

------
cwisecarver
A bit long winded, but a good, thoughtful explanation. I honestly didn't know
what they were, only how they could be used. Now I know.

~~~
usmanajmal
Indeed. It was a fun read.

------
lanna
I really liked Daniel Spiewak article about it:
[http://www.codecommit.com/blog/scala/bloom-filters-in-
scala](http://www.codecommit.com/blog/scala/bloom-filters-in-scala)

------
ianleeclark
Bloom filters are a really interesting data structure. Whenever I found out
about them I really wanted to use them, so I ended up building a distributed
hash table. I thought bloom filters could be an interesting way to optimize
key lookup value throughout the network, as I could use a bloom filter to
state that a remote node definitely didn't have a key (therefore, no need to
send a request to them) or if a node likely had a key (therefore, sending a
request to get the key's value).

Naturally, if consensus were established between nodes, using something like
this would be unnecessary, but it turned out to be an interesting way of
optimizing lookups in a DHT.

~~~
yabro
this is why Cassandra works this way

------
triplesec
One correction of computational irrelevance for this article: Bistromathics
has nothing to do with an Infinite Improbability Drive, as any fule kno.
That's what powered the Heart of Gold, whish Zaphod stole. Bistromathics
poweres Slartibartfast's ship, which I don't think had a clever name beyond
Starship Bistromath.

[https://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%2...](https://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%27s_Guide_to_the_Galaxy#Heart_of_Gold)

------
triplesec
I have two unlikes for this story:

1\. The promoting of Medium's homepage several times in the first sentences of
this anvilicious story proved annoying to me. I'd rather have a straightup
explanation of why they used bloom filters for this application and context,
without the condescending PR-sugar-coating which isn;t really a useful
addition and only detracts from our information.

2\. We could retitle this: 'Jamie Explains Technology to a Woman'. Come on, if
you're going to add entropy to our data, don't make it reinforce yet another
stereotype.

~~~
newscracker
Talking about dislikes of the article, this was my second time skimming
through it. The first time around, I just didn't get through it fully because
even though it was just a "3 minute read" according to Medium, there was too
much unrelated fluff that didn't keep me engaged in it. It seemed quite
unfocused due to the writer's penchant for seemingly amusing or funny
anecdotes. The second time I read it, I again skimmed through all the fluff
looking for the pieces of value. This article could definitely have been
written a lot better in a focused way and explained Bloom filters a lot better
for the same "reading time". I'm going to search for something like that,
which I'm guessing is already out there.

I generally have a dislike for articles on Medium because many that I've seen
look like lightweight pieces with very little value and too many words. The
way images are displayed is also kind of disturbing to me. Articles either
seem to have some gawdy, blurry, shaky, seizure-causing GIFs or blurry images
that come into focus only after some scrolling. In my experience, Medium was
good and different when it started, and seems to have become worse in quality
and presentation over time. Maybe it's a temporary phase. Maybe it's not.

------
ipunchghosts
How many times on HN must this topic come up?

~~~
_ao789
To understand recursion, you must first understand recursion.

~~~
gumby
While to understand iteration, just keep studying until you're done.

~~~
catnaroek
But you still need to use recursion to prove that, in fact, you will
eventually be done.

~~~
hinkley
Halt! I see a problem.

~~~
pjscott
You should halt if and only if you know you'll never halt.

