
Bloom Filters by Example (2013) - tosh
https://llimllib.github.io/bloomfilter-tutorial/
======
dang
2017:
[https://news.ycombinator.com/item?id=14740032](https://news.ycombinator.com/item?id=14740032)

2013:
[https://news.ycombinator.com/item?id=6626128](https://news.ycombinator.com/item?id=6626128)

~~~
nreilly
And a week ago:
[https://news.ycombinator.com/item?id=19838163](https://news.ycombinator.com/item?id=19838163)
(Understanding Bloom Filters with Pharo)

------
Piezoid
In my field (bioinformatics), there is a trend of using a single hash
function, an over-sized bloom filter which is then compressed with something
like entropy optimal RRR[1] or RoaringBitmap bit vectors. The resulting bloom
filter ends up taking the same space than a properly sized bloom filter. There
is two advantages: knowing the size of the set in advance is not necessary,
and a single hash computation and memory access per query. However updates are
not well supported.

[1] Raman, R., Raman, V., & Rao, S. S. (2002, January). Succinct indexable
dictionaries with applications to encoding k-ary trees and multisets. A blog
post: [https://alexbowe.com/rrr/](https://alexbowe.com/rrr/)

------
tosh
Does anyone have interesting examples for when a bloom filter came in handy as
a solution to a problem?

I would love to have some (more than the contrived standard) examples to draw
upon to make it easier for me to intuitively notice that a bloom filter might
be a good fit in a future problem I encounter.

~~~
PaulBGD_
I used one in a bittorrent-like protocol, each block hash was added to the
filter then sent to peers. Peers could then have a high accuracy way of
checking if a peer has a block without having to communicate with them.

~~~
faceplanted
That's really quite cool, how many blocks could an instance potentially have
though? Seems unlikely you'd be saving that much space over having a list

~~~
PaulBGD_
Sorry I wasn't clear, each peer maintains their own personal bloom filter that
contains the hashes of the blocks that they have. The filters are sent upon
connections with another peer and updated every so often.

