

Druid – Distributed, real-time, analytical data store - misframer
http://druid.io/

======
gleenn
Who "uses this in production" and how is it better than MySQL and Postgresql?
They need a little more show instead of tell.

~~~
wesm
It's used by Metamarkets in production
([http://metamarkets.com/);](http://metamarkets.com/\);) I've seen talks about
it at Strata and they're getting really impressive query times on terascale
problems.

Shameless plug: if you're interested in fast in-memory analytics and more of
the PyData bent, I'm building a system with some similarities to Druid at my
company ([http://www.datapad.io/](http://www.datapad.io/)) but more focused on
optimizing single-node performance (i.e. minimizing EC2 expenses) on medium
data (single columns of tables typically fit in memory) than high scalability
/ real time ingest.

------
kposehn
This is quite a great piece of kit. Definitely going to look into it for my
own RTB uses.

One thing that strikes me as an immediate use is real-time impression bus
aggregation for behavioral analysis. We ended up building our DSP primarily in
Ruby+Redis/Postres and having this would have allowed some significant
architectural changes.

------
mratzloff
Would it be fair to say that Druid is more suitable for known questions that
are queried programmatically and mostly updated by time segment?

~~~
xvrl
Druid was actually built to answer arbitrary questions that could not be
answered with pre-aggregated data (because it can become computationally
intractable), but still needed to be answered fast enough to power interactive
dashboards. At Metamarkets, almost all queries are user driven through our
interface. Currently Druid is indeed tailored for time-based event data, but
it can shard on more than just time. Disclaimer: I am a Druid comitter and
work at Metamarkets.

