
Opposite of a Bloom Filter - netvarun
https://github.com/jmhodges/opposite_of_a_bloom_filter
======
ComputerGuru
Guys, this is real. [http://www.somethingsimilar.com/2012/05/21/the-opposite-
of-a...](http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a-bloom-
filter/)

Yes, it's a hash table without conflict resolution. No, that's not the joke.

From the article:

The simplest data structure that fits the description of a “opposite of a
Bloom filter” is a hash map. With a large corpus like the one we were dealing
with, the memory consumed by a hash map is far too large and its growth is
bound only by corpus size. However, from this simple idea a more fruitful one
comes out.

The design I pitched uses an array, a hash function, and the ability to swap
ids in and out of the array concurrently. The hash function and the array are
used in the same way they are inside of a hash map implementation. Unlike a
typical hash map implementation, the array size is bounded and hash collisions
are not avoided.

~~~
martinced
As I understand it there's more to it than simply being a hashmap without
collision resolution: performance.

They mentioned hundreds of thousands of events per second. At that level
typical Java objects and locks simply do not cut it anymore. That is why, for
example, LMAX wrote their disruptor pattern using gigantic primitives array
and fast CAS, no locking, operations.

So here they're using a very fast "hashmap" updated by using CAS operation
which do not lock.

Why would it be a joke that said!? There are use cases for Bloom Filters, so
there probably are for what they created here too no!?

------
andrelaszlo
Isn't this just an implementation of the filter described by Wilt and Wither
back in 2003?

~~~
pretoriusB
I'm amused by the above comment, and all the wild assumptions it makes.

Whereas a Bloom filter is relatively well known (in compsci circles), how many
people know:

1) Who Wilt and Wither are? 2) What they "described back in 2003"?

Are they scientists? Did they wrote a paper in 2003 describing something
similar? Have them or anybody else implemented anything based on their
description?

The use of the word "just" especially, is, err, especially misguided.

Can you point to other implementations? If not then it's not "just an
implementation", it's the FIRST FRIGGING IMPLEMENTATION!

~~~
othermaciej
I think it's a joke - "Wilt" and "Wither" both being opposites of "Bloom".

~~~
pretoriusB
Sorry, but I'm not a native english speaker, so the joke went over my head.

(And while I now recognise "wither" I didn't knew at all about the "wilt"
word)

~~~
othermaciej
It was an overly subtle joke, took me a while to get it myself.

------
dexen
Discussed before: (meta: how's _that_ for reverse bloom filtering? ;-)

<http://news.ycombinator.com/item?id=4251536>
<http://news.ycombinator.com/item?id=4251759>

tl;dr:

 _> Specifically, it functions the same as a cache with a random removal
policy, (...)_

 _> you might as well have an LRU valueless cache if locality is going to play
a role._

------
seanmcdirmid
It took me awhile to get the joke. Its simply a unique mapping dictionary with
a fixed number of buckets and no hash key conflict resolution.

------
gojomo
Previously posted and discussed:

<http://news.ycombinator.com/item?id=4251313>

------
CJefferson
Out of interest, is there any difference between this, and a cache?

It is still nice to see a bloom filter as the opposite of a cache of course.
In effect a cache contains a subset of the things it has seen, and a bloom
filter keeps a superset.

I have used something like this to great effect when doing calculations which
do many complex trigonometric calculations, which often end up using a small
set of input values. Something similar can be used before any expensive pure
function call, where you expect to see the same values called repeatedly, but
don't want to use unbounded space, or have to write a complicated cache
management system.

------
jergosh
A Moolb filter?

~~~
htp
I was thinking "Gloom filter", what with the possibility of reporting false
negatives.

------
martinced
I responded to a first comment but then read the others...

I think you guys are all missing the point: they're talking about crazy high
perfomances.

They've implemented it using non-locking algorithms, like LMAX's disruptor
pattern (gigantic primitives arrays, CAS operation).

It's not a "joke" to be able to process 12 millions events per second on a
single core in Java (that's where the LMAX disruptor patter is at right now,
and it shall go higher once Oracle fixes the JVM so that it uses a faster CAS
operation when possible [it's possible on many mainstream CPU and there are
RFEs for this]).

The point here is that it is _fast_. As in "faster than anyone commenting here
that is a joke could have ever build".

~~~
seanmcdirmid
Why should I take what we all learn in computer architecture 101 seriously?
Ya, it's crazy fast, but someone in the 50s or 60s invented it, and we should
all know this class of algorithms like the back of our hands.

Bloom filters are interesting because they reverse what we typically do, the
reverse again is then not that interesting. So I thought this was a joke.

