
Handling big data with probabilistic data structures.  PyCon 2011. (video) - herdrick
http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-handling-ridiculous-amounts-of-data-with-probabilistic-data-structures-4899047
======
imurray
Request: could submitters of long videos add a few words in a comment to say
what it's about and for whom it will be worth watching?

From listening to the audio while doing other things:

This talk was largely about shotgun assembly of genetic sequence graphs using
Bloom filters. There is some detail on Bloom filters, with Python code, and
how to piece together overlaps of strings using them. There is high level
discussion of how it would be nice to chunk up biological problems to make
them map-reducable, but (understandably) not much detail.

The talk's pretty slow for anyone that has seen a few uses of Bloom filters,
but it's a clear and nicely motivated talk for anyone that hasn't seen that
flavor of data structure.

