Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Opposite of a Bloom Filter (github.com/jmhodges)
38 points by netvarun on Jan 25, 2013 | hide | past | favorite | 18 comments


Guys, this is real. http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a...

Yes, it's a hash table without conflict resolution. No, that's not the joke.

From the article:

The simplest data structure that fits the description of a “opposite of a Bloom filter” is a hash map. With a large corpus like the one we were dealing with, the memory consumed by a hash map is far too large and its growth is bound only by corpus size. However, from this simple idea a more fruitful one comes out.

The design I pitched uses an array, a hash function, and the ability to swap ids in and out of the array concurrently. The hash function and the array are used in the same way they are inside of a hash map implementation. Unlike a typical hash map implementation, the array size is bounded and hash collisions are not avoided.


As I understand it there's more to it than simply being a hashmap without collision resolution: performance.

They mentioned hundreds of thousands of events per second. At that level typical Java objects and locks simply do not cut it anymore. That is why, for example, LMAX wrote their disruptor pattern using gigantic primitives array and fast CAS, no locking, operations.

So here they're using a very fast "hashmap" updated by using CAS operation which do not lock.

Why would it be a joke that said!? There are use cases for Bloom Filters, so there probably are for what they created here too no!?


Isn't this just an implementation of the filter described by Wilt and Wither back in 2003?


I'm amused by the above comment, and all the wild assumptions it makes.

Whereas a Bloom filter is relatively well known (in compsci circles), how many people know:

1) Who Wilt and Wither are? 2) What they "described back in 2003"?

Are they scientists? Did they wrote a paper in 2003 describing something similar? Have them or anybody else implemented anything based on their description?

The use of the word "just" especially, is, err, especially misguided.

Can you point to other implementations? If not then it's not "just an implementation", it's the FIRST FRIGGING IMPLEMENTATION!


I think it's a joke - "Wilt" and "Wither" both being opposites of "Bloom".


Sorry, but I'm not a native english speaker, so the joke went over my head.

(And while I now recognise "wither" I didn't knew at all about the "wilt" word)


It was an overly subtle joke, took me a while to get it myself.


Sorry, my sense of humor doesn't work well before I have coffee. If ever.


Damn it gets even more amusing!

A Google search for "Wilt Wither 2003" doesn't return anything relevant (at least in the first 2-3 pages).


Discussed before: (meta: how's that for reverse bloom filtering? ;-)

http://news.ycombinator.com/item?id=4251536 http://news.ycombinator.com/item?id=4251759

tl;dr:

> Specifically, it functions the same as a cache with a random removal policy, (...)

> you might as well have an LRU valueless cache if locality is going to play a role.


It took me awhile to get the joke. Its simply a unique mapping dictionary with a fixed number of buckets and no hash key conflict resolution.


Previously posted and discussed:

http://news.ycombinator.com/item?id=4251313


Out of interest, is there any difference between this, and a cache?

It is still nice to see a bloom filter as the opposite of a cache of course. In effect a cache contains a subset of the things it has seen, and a bloom filter keeps a superset.

I have used something like this to great effect when doing calculations which do many complex trigonometric calculations, which often end up using a small set of input values. Something similar can be used before any expensive pure function call, where you expect to see the same values called repeatedly, but don't want to use unbounded space, or have to write a complicated cache management system.


A Moolb filter?


I was thinking "Gloom filter", what with the possibility of reporting false negatives.


I responded to a first comment but then read the others...

I think you guys are all missing the point: they're talking about crazy high perfomances.

They've implemented it using non-locking algorithms, like LMAX's disruptor pattern (gigantic primitives arrays, CAS operation).

It's not a "joke" to be able to process 12 millions events per second on a single core in Java (that's where the LMAX disruptor patter is at right now, and it shall go higher once Oracle fixes the JVM so that it uses a faster CAS operation when possible [it's possible on many mainstream CPU and there are RFEs for this]).

The point here is that it is fast. As in "faster than anyone commenting here that is a joke could have ever build".


Why should I take what we all learn in computer architecture 101 seriously? Ya, it's crazy fast, but someone in the 50s or 60s invented it, and we should all know this class of algorithms like the back of our hands.

Bloom filters are interesting because they reverse what we typically do, the reverse again is then not that interesting. So I thought this was a joke.


I hope this won't come off as tactless, but is there something going on in there more significant than CAS and writing to the database if it's a different value? Because I do non-blocking programming as a hobby and I'd love to know.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: