
How to write a Bloom filter in C++ - schmatz
http://blog.michaelschmatz.com/2016/04/11/how-to-write-a-bloom-filter-cpp/
======
nly
I feel the use of vector<bool> is an iffy choice.

The Bitcoin codebase has a simple Bloom filter implementation you can take a
look at that has been in use for some time

[https://github.com/bitcoin/bitcoin/blob/master/src/bloom.h](https://github.com/bitcoin/bitcoin/blob/master/src/bloom.h)

[https://github.com/bitcoin/bitcoin/blob/master/src/bloom.cpp](https://github.com/bitcoin/bitcoin/blob/master/src/bloom.cpp)

~~~
bmohlenhoff
std::bitset is also a good choice if the number of desired bits is known at
compile time

~~~
schmatz
I think this would probably be the best choice after templating the filter

------
nate_martin
This example works well for raw data but not for complex types. You could make
the filter a template, taking the key and a "hasher" function as template
args.

~~~
schmatz
Great suggestion; I wasn't sure the idiomatic way to template this, thanks for
letting me know!

~~~
nate_martin
Probably something like this:

template< class Key, class Hash = std::hash<Key> > class BloomFilter;

~~~
bradleyjg
I don't use c++ so I'm not sure how std:hash works or gets implemented, but
the way that guava (Google's java library) does it is by passing in a key and
a funnel object. The funnel object is essentially responsible for decomposing
the object into a byte stream. The advantage of doing it this way rather than
making the caller specify his own hash is that you can use murmurhash3 which
you thought had the best properties for the bloom filter.

------
sokoloff
Learned something today. Thanks for the article.

Minor nit: it will save readers time if you call out that "p is the false
positive error rate". (You reference the error rate, but don't attach a
variable name to it.) I had to go to an external reference to figure that out,
which meant I learned something else of course.

~~~
schmatz
Great suggestion, thanks! I've updated the article accordingly :)

------
barsonme
Or in C[0] or in Go[1]...

[0] -
[https://github.com/EricLagergren/bloom](https://github.com/EricLagergren/bloom)
[1] -
[https://github.com/EricLagergren/bloom-c](https://github.com/EricLagergren/bloom-c)

------
j_s
Is there a standard implementation "everybody uses"? Bloomd seems popular as a
bloom filter server.

[https://github.com/armon/bloomd](https://github.com/armon/bloomd)

------
m00dy
I have also experiments about Bloom Filter in python
[https://github.com/erenyagdiran/BloomFilter](https://github.com/erenyagdiran/BloomFilter)

------
nvcken
How is your laptop spec ? please

~~~
schmatz
Intel Core i7-4980HQ

~~~
nvcken
SSD, RAM ? I think maybe it affects performance

~~~
schmatz
RAM is 1600MHz DDR3. I do have an SSD, but as the memory usage of the program
was ~32MB, so I doubt it would have an effect haha

