A modern cloud-native, stream-native, analytics database
Druid is designed for workflows where fast queries and ingest really matter. Druid excels at instant data visibility, ad-hoc queries, operational analytics, and handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases.
recent: https://news.ycombinator.com/item?id=22868286 and https://news.ycombinator.com/item?id=22739461
a bit from 2014 https://news.ycombinator.com/item?id=7128091
a bit from 2012 https://news.ycombinator.com/item?id=4693224
Introduced in 2011 https://news.ycombinator.com/item?id=2501160
It can be divided into a few categories:
1. OLTP, thing that requires frequent updates
2. Distributed key-value store
3. OLAP, analytical, batch
4. ETL, stream processing
5. OLAP, analytical, low-latency
There are a bunch of other auxiliary projects that makes deploying those things at scale feasible, such as ZooKeeper, HDFS what not....
But why do so many projects depend on Zookeeper? What does it provide that couldn't be done through a embedded library? Seems like a lot of databases don't really need it. Is it worth the extra network dependency and operational complexity?
One answer is that coupling the consensus part of the system with parts that do active work results in harmful resource conflicts. Those resource conflicts can cause consensus algorithms to fail or take much longer to return answers. Example: Java VM clocks misbehave if the process can't get enough CPU. This can cause systems like ZK to lose quorum.
Hadoop ecosystem legacy. Most companies adopting tech like Druid were already running Hadoop and had Zookeeper as a result. Probably made sense to take advantage of a reliable, or at least well-known, system.
sounds good to me