
Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters - rbanffy
https://dl.acm.org/doi/fullHtml/10.1145/3376122
======
devj
I have a requirement of checking group membership wherein a group may contain
> 10K members. Each member is an 8byte ID. Do you think Xor filter would be a
better fit compared to Bloom here? Or am I looking at it the wrong way?
Thanks.

~~~
gopalv
> Do you think Xor filter would be a better fit compared to Bloom here?

The 8byte key is the only scenario where you should consider XorPlus (i.e a 8
bytes mapped to a long).

The lookup properties of the Xor filter are better with that case, but the
real question is whether you have an entire collection to start building the
bitset or not.

The sketch production isn't incremental - there is no add(k) after building it
once.

So you can't build add data once it is built, while the Bloom filters do
support adding entries after the fact (in fact, it can add bloom filters into
it, rather than sending all the new keys).

And both of those approaches are missing an unset operation.

~~~
devj
Sorry.. Forgot to mention that.

Yes insertion/deletion is required and the frequency may be higher. The exact
usecase is that group membership is dependent on some conditions. We schedule
this check every 15 minutes and based on it, we are adding/deleting the
members.

~~~
cakoose
Xor filters require all the members of the set be provided up front.

Bloom filters allow adding members, but not removing them.

Cuckoo filters allow removing members:
[https://www.cs.cmu.edu/~dga/papers/cuckoo-
conext2014.pdf](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

And just for completeness, storing everything losslessly in a btree, radix
tree, or hash table would probably be under 300 kB. (80 kB just for the IDs
plus, say, an additional ~2x overhead.)

~~~
throwaway_pdp09
Safe removal is only possible if you know it is already in filter. The 'you
can delete from cuckoo filters' bit I feel is often oversold.

~~~
cakoose
Oh wow, I only read the paper's abstract and was fooled. Thanks for the
correction.

------
mabbo
I'm trying to read through this and understand it, but I'm missing it- too
little coffee probably.

Can someone succinctly explain the key idea for how this beats standard
bloom/cuckoo filters?

~~~
buckhx
The filters are more efficient in space and query time at the cost of being
immutable after being constructed. There is no Add() operation as there is
with the others.

