Opposite of a Bloom Filter

ComputerGuru · on Jan 25, 2013

Guys, this is real. http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a...

Yes, it's a hash table without conflict resolution. No, that's not the joke.

From the article:

The simplest data structure that fits the description of a “opposite of a Bloom filter” is a hash map. With a large corpus like the one we were dealing with, the memory consumed by a hash map is far too large and its growth is bound only by corpus size. However, from this simple idea a more fruitful one comes out.

The design I pitched uses an array, a hash function, and the ability to swap ids in and out of the array concurrently. The hash function and the array are used in the same way they are inside of a hash map implementation. Unlike a typical hash map implementation, the array size is bounded and hash collisions are not avoided.

martinced · on Jan 25, 2013

As I understand it there's more to it than simply being a hashmap without collision resolution: performance.

They mentioned hundreds of thousands of events per second. At that level typical Java objects and locks simply do not cut it anymore. That is why, for example, LMAX wrote their disruptor pattern using gigantic primitives array and fast CAS, no locking, operations.

So here they're using a very fast "hashmap" updated by using CAS operation which do not lock.

Why would it be a joke that said!? There are use cases for Bloom Filters, so there probably are for what they created here too no!?

andrelaszlo · on Jan 25, 2013

Isn't this just an implementation of the filter described by Wilt and Wither back in 2003?

pretoriusB · on Jan 25, 2013

I'm amused by the above comment, and all the wild assumptions it makes.

Whereas a Bloom filter is relatively well known (in compsci circles), how many people know:

1) Who Wilt and Wither are? 2) What they "described back in 2003"?

Are they scientists? Did they wrote a paper in 2003 describing something similar? Have them or anybody else implemented anything based on their description?

The use of the word "just" especially, is, err, especially misguided.

Can you point to other implementations? If not then it's not "just an implementation", it's the FIRST FRIGGING IMPLEMENTATION!

othermaciej · on Jan 25, 2013

I think it's a joke - "Wilt" and "Wither" both being opposites of "Bloom".

pretoriusB · on Jan 26, 2013

Sorry, but I'm not a native english speaker, so the joke went over my head.

(And while I now recognise "wither" I didn't knew at all about the "wilt" word)

othermaciej · on Jan 29, 2013

It was an overly subtle joke, took me a while to get it myself.

andrelaszlo · on Jan 25, 2013

Sorry, my sense of humor doesn't work well before I have coffee. If ever.

pretoriusB · on Jan 25, 2013

Damn it gets even more amusing!

A Google search for "Wilt Wither 2003" doesn't return anything relevant (at least in the first 2-3 pages).

dexen · on Jan 25, 2013

Discussed before: (meta: how's that for reverse bloom filtering? ;-)

http://news.ycombinator.com/item?id=4251536 http://news.ycombinator.com/item?id=4251759

tl;dr:

> Specifically, it functions the same as a cache with a random removal policy, (...)

> you might as well have an LRU valueless cache if locality is going to play a role.

seanmcdirmid · on Jan 25, 2013

It took me awhile to get the joke. Its simply a unique mapping dictionary with a fixed number of buckets and no hash key conflict resolution.

gojomo · on Jan 25, 2013

Previously posted and discussed:

http://news.ycombinator.com/item?id=4251313

CJefferson · on Jan 25, 2013

Out of interest, is there any difference between this, and a cache?

It is still nice to see a bloom filter as the opposite of a cache of course. In effect a cache contains a subset of the things it has seen, and a bloom filter keeps a superset.

I have used something like this to great effect when doing calculations which do many complex trigonometric calculations, which often end up using a small set of input values. Something similar can be used before any expensive pure function call, where you expect to see the same values called repeatedly, but don't want to use unbounded space, or have to write a complicated cache management system.

jergosh · on Jan 25, 2013

A Moolb filter?

htp · on Jan 25, 2013

I was thinking "Gloom filter", what with the possibility of reporting false negatives.

martinced · on Jan 25, 2013

I responded to a first comment but then read the others...

I think you guys are all missing the point: they're talking about crazy high perfomances.

They've implemented it using non-locking algorithms, like LMAX's disruptor pattern (gigantic primitives arrays, CAS operation).

It's not a "joke" to be able to process 12 millions events per second on a single core in Java (that's where the LMAX disruptor patter is at right now, and it shall go higher once Oracle fixes the JVM so that it uses a faster CAS operation when possible [it's possible on many mainstream CPU and there are RFEs for this]).

The point here is that it is fast. As in "faster than anyone commenting here that is a joke could have ever build".

seanmcdirmid · on Jan 25, 2013

Why should I take what we all learn in computer architecture 101 seriously? Ya, it's crazy fast, but someone in the 50s or 60s invented it, and we should all know this class of algorithms like the back of our hands.

Bloom filters are interesting because they reverse what we typically do, the reverse again is then not that interesting. So I thought this was a joke.

pi18n · on Jan 25, 2013

I hope this won't come off as tactless, but is there something going on in there more significant than CAS and writing to the database if it's a different value? Because I do non-blocking programming as a hobby and I'd love to know.