
The opposite of a bloom filter - Anilm3
https://www.somethingsimilar.com/2012/05/21/the-opposite-of-a-bloom-filter/
======
pents90
I would argue that the opposite of a bloom filter doesn't really exist, at
least not in a satisfying way. A bloom filter's size is dependent only on the
desired false positive rate, whereas its opposite must be dependent on the
size of the data. (And don't be fooled by data that can be represented by a
primary key, that's not as general as a bloom filter.) I tried, with limited
success, to explain my point of view in this answer on StackExchange:
[https://cstheory.stackexchange.com/questions/6596/a-probabil...](https://cstheory.stackexchange.com/questions/6596/a-probabilistic-
set-with-no-false-positives/14455#14455)

~~~
frankmcsherry
This probably runs afoul of your "at least not in a satisfying way"
constraint, but:

It is pretty easy (an exercise) to implement the "opposite of a Bloom filter"
if you start from a summary of the complete set of events and support
deletion, rather than starting from the empty set and supporting addition.

What makes everything seem hard is the (often unstated) requirement that you
start from an empty set and support addition, which is roughly as hard as
implementing a Bloom filter that starts from the complete set and supports
deletion. Neither of the links make this requirement explicit (though, it is
implicit in their "motivation" sections).

------
DenisM
TLDR: a cache with LRU eviction, but only storing the keys, not the values.

~~~
chewbacha
I was thinking the same thing

------
ww520
"A Bloom filter is a data structure that may report it contains an item that
it does not (a false positive), but is guaranteed to report correctly if it
contains the item (“no false negatives”)."

I'm afraid that is not how it works. A Bloom filter can tell whether an item
may be in the set (false positive) but can definitely tell an item is NOT in
the set (no false negative).

~~~
ScottBurson
It's correct; you've misread it — in fact, you're agreeing with it. What it's
saying is, if the item is in the set, the Bloom filter is guaranteed to report
that it is in the set. What you're saying is, if the filter says the item is
not in the set, then it is guaranteed not to be in the set. Those two
statements are equivalent (being contrapositives: "A implies B" is equivalent
to "not B implies not A").

~~~
magicalhippo
The article states a Bloom filter is "guaranteed to report correctly if it
contains the item", however a Bloom filter cannot do this. The bits set by the
various hash functions could very well be set due to some other key. What the
Bloom filter _can_ say, is that if none of the bits are set, then clearly the
key was never inserted.

~~~
kata
I think it might just be a strangely formed sentence where the part "if it
contains the item" is not the subject of the report, but a condition:

If it contains the item -> reports that it contains the item (correctly)

If it doesn't contain the item -> may report that it contains the item
(incorrectly)

~~~
kakarot
Nothing wrong with being pedantic when dealing with definitions.

~~~
egocentric
Since we're on the topic of being pedantic, "well, you know, that's just,
like... your opinion, man". I'd argue that there's nothing excessive about
arguing if A may, in fact, be the opposite of A :)

~~~
kakarot
Certainly not, I'm all for some arguing on the internet.

------
ahazred8ta
"It's a cache." The use case is deduplication in an event-stream environment.
This calls for exact matching without hash collisions.

------
supermatt
Low memory version:

`return false;`

~~~
supermatt
Not sure why it's getting fownvoted so much. On large datasets it's just as
effective...

