

BlinkDB: Queries with Bounded Errors and Response Times on Very Large Data - lightcatcher
http://blinkdb.org/

======
benhamner
I'm thrilled that AMPLab and CSAIL are building this.

For the vast majority of analytics problems and projects I've worked on,
approximate numbers are just as good as exact results. One of the biggest
productivity blockers can be queries and analytics that take hours instead
days to run, instead of seconds to minutes, as these dramatically decrease the
number of iterations you can execute and ideas you can test.

We commonly work on sub-sampled versions of datasets to enable interactive
queries and analytics - it's really great to see someone formalizing this
process and handling the details in a simple and principled manner.

------
ksikka
Very cool idea - glad it's being production-ized.

You might benefit from a different name though. "[word]DB" is starting to
become a pattern in people's internal spam filters. And it reminds me of
CouchDB, MongoDB, RethinkDB, etc.

------
Scaevolus
This is an excellent solution to long latency tails, which become more and
more noticeable at large scales. Here's a blog post discussing Google's
experience with it: [http://highscalability.com/blog/2012/3/12/google-taming-
the-...](http://highscalability.com/blog/2012/3/12/google-taming-the-long-
latency-tail-when-more-machines-equal.html)

I expect it will be especially helpful for businesses analyzing data-- they
can get useful results from massive datasets without massive hardware
expenses.

~~~
PaulHoule
I've often dealt with "big data" by using sampling and stratified sampling and
it is nice to see they're building something that can automate this process.

