
What are Bloom filters? - duck
https://medium.com/the-story/what-are-bloom-filters-1ec2a50c68ff
======
krat0sprakhar
Like everyone else, I too wrote a blog post on Bloom Filters -
[http://prakhar.me/articles/bloom-filters-for-
dummies/](http://prakhar.me/articles/bloom-filters-for-dummies/).

~~~
scrumper
I preferred yours, but I'm not sure what you had for dinner on your last
birthday.

~~~
krat0sprakhar
Haha! What is that supposed to mean? :D

~~~
Nadya
You didn't include non-important life details when explaining what a bloom-
filter is.

 _> I sigh. I’m hungry and the main course has just arrived — venison glazed
in honey, served with a sweet potato hash._

I learned more about what the author had for dinner on their last birthday
than I did about bloom filters in the first paragraphs.

So while they prefer your blog - they aren't sure what you had for dinner on
your last birthday...

------
notacoward
If Bloom filters look interesting, you should probably also check out cuckoo
filters.

[https://www.cs.cmu.edu/~dga/papers/cuckoo-
conext2014.pdf](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

The differences are somewhat subtle, but the use cases overlap a lot.

~~~
jamra
Would it be safe to say that Cuckoo filters are bloom filters with some
additional overhead for the ability to add and remove?

~~~
notacoward
Deletion is the big difference, but even without that I think there can be
other significant differences as well. For any given number of items, it can
be pretty unclear which one will give better performance (including the cost
of false positives) for bounded size, or better space efficiency for bounded
performance. Usually you'll need at least an accurate model for both, if not
an actual implementation.

------
bunderbunder
I'm beginning to suspect that if I want to really drive a whack of traffic to
my blog, I should write a post about how to implement Bloom filters using
monads.

~~~
bshimmin
For maximum impact, you'll need to choose your programming language carefully.
I would suggest Elm with a sprinkling of Coq, though I think using either K or
MUMPS would earn you a lot of kudos in some quarters.

------
halayli
trying too hard.

Just go to
[https://en.wikipedia.org/wiki/Bloom_filter](https://en.wikipedia.org/wiki/Bloom_filter)

It's clearer and more informative.

------
jasode
_> To understand Bloom filters, you first have to understand hashing._

As pedagogy, I think this is the wrong approach.

The author _already knows_ Bloom filters and therefore, it seems like the most
logical thing to first talk about is hash functions because _that 's how it's
implemented_.

Unfortunately, that's not how a person brand new to the concept thinks about
it. The first thing to talk about is _motivations_ and _scenarios_ and _use
cases_.

For that, the first 2 paragraphs of wikipedia article[1] on Bloom filters is
fairly straightforward. It explains _why_ it's an interesting technique. Imo,
those paragraphs are a better introduction than the author's dive right into
"hash functions" immediately after unrelated blurbs of " _I put my fork down_
" and " _my wife shakes her head with a rueful smile._ "

[1][https://en.wikipedia.org/wiki/Bloom_filter](https://en.wikipedia.org/wiki/Bloom_filter)

~~~
baddox
I'm fine with explaining the motivation first, but I struggle to see how you
can explain bloom filters without eventually getting to hashing. Obviously you
don't need to explain any actual real-world hash function, but you need to
explain the concept of deterministically mapping X bits to Y bits where X > Y,
which is essentially what a hash function does.

~~~
FellowMarketer
I don't think at the level of the article that it's inappropriate to assume
that the reader is at least familiar with the term "hashing", and can wait on
the technical details of the particular types of hashing functions needed for
bloom filters until the second or third section.

I honestly just closed the article when he started in on hashing, because I
want to know about bloom filters (which I don't know about), not the basics of
hashing (which I learned as an undergrad, and need to know day-to-day as a
working programmer).

------
shostack
Anyone happen to have a great Bloom filter tutorial in Ruby geared towards
early programmers?

I read the article, but am still not grasping the full construct and how it
functions. I'm hoping a hands on tutorial might give me a better sense.

------
bru_
Is anybody else irritated with the incredibly tryhard tone in this article?
"venison glazed in honey, served with a sweet potato hash"? Seriously??? It's
an article about bloom filters for god's sake, not Leveraged Sell Out for the
tech industry...

~~~
jathu
Much like most TED talks, this is in line with the common Medium article
theme. The article becomes more about ME ME ME, rather than the subject at
hand. It's like some Moses syndrome.

When it comes to technical articles this starts to become extremely annoying
to read blogs/papers as if it was written for BuzzFeed.

------
bbcbasic
Can anyone give some cases of where Bloom filters are used? Has anyone used
one in their job?

~~~
SCHiM
I've programmed bloomfilters for my job.

A certain product of ours keeps track of certain urls visited. We're talking
millions of (unique) urls. We use bloomfilters to quickly check if a url was
visited or not. If the bloomfilter search is positive a more expensive search
inside a log file begins that gives a conclusive result (since bloomfilters
have a (very) small false positive-rate, but we want to be perfectly sure).

~~~
mortehu
> bloomfilters have a (very) small false positive-rate

One of the neat things about Bloom filters is that you can choose your own
false positive rate, by tuning the number of hashes and the storage size.

~~~
Freaky
Calculator: [http://hur.st/bloomfilter](http://hur.st/bloomfilter)

