
A General-Purpose Counting Filter: Making Every Bit Count [pdf] - gbrown_
http://www3.cs.stonybrook.edu/~rp/tech_reports/sbcstr-c6ff764fdd8f9d2b5ea3b31972a787bc/report.pdf
======
FreakLegion
This is misleading for some aspects of the cuckoo filter.

 _> The cuckoo filter can also support a small number of duplicates of some
items. In the authors’ reference implementation, each slot can actually hold 4
values, so the system can support up to 8 duplicates, although its not clear
how this will impact the probability of failure during inserts of other items.
One could attempt to add counting to the cuckoo filter by associating a
counter with each fingerprint, but this would increase the space usage._

Cuckoo filters support buckets of arbitrary size. (They can also support an
arbitrary number of candidate buckets per item, but that requires a different
scheme for the partial-key cuckoo hashing. It also reduces how much
compression is possible, as described in the note below.)

The upside of larger buckets is better load factors; the downside, in theory,
is the need for more bits per item and longer query times. Cuckoo filters are
amenable to compression via integer coding within buckets, though, so in
practice reasonably large buckets are free.

 _> The space savings from these optimizations make the RSQF more space
efficient than the Bloom filter for false-positive rates less than 1/64 and
more space efficient than the cuckoo filter for all false-positive rates. In
contrast, the original quotient filter is less space efficient than the cuckoo
filter for all false-positive rates and the Bloom filter for false-positive
rates larger than to 2^-36._

The reference semi-sorted cuckoo filter is more space-efficient than the RSQF
at +2 bits per item, vs. +2.125 for the RSQF. The same can be true of a cuckoo
filter with compressed buckets, e.g. Golomb coding gets you down to ~+2.54[1]
regardless of bucket size, while also allowing the load factor to approach 1
thanks to larger buckets. For low false positive rates, that makes the cuckoo
filter quite a bit smaller.

1\. Assuming two candidate buckets per item. Specifically it's +(1 -
Bth_root(1 - ε))^(-1) bits per item, where ε is the desired false positive
rate and B is ε * the number of candidate buckets.

