
Apache Druid 0.18 - wochiquan
https://imply.io/post/introducing-apache-druid-0-18-0
======
SomeHacker44
In case you are wondering what it was, after several clicks I found this:

A modern cloud-native, stream-native, analytics database

Druid is designed for workflows where fast queries and ingest really matter.
Druid excels at instant data visibility, ad-hoc queries, operational
analytics, and handling high concurrency. Consider Druid as an open source
alternative to data warehouses for a variety of use cases.

------
dang
See also:

recent:
[https://news.ycombinator.com/item?id=22868286](https://news.ycombinator.com/item?id=22868286)
and
[https://news.ycombinator.com/item?id=22739461](https://news.ycombinator.com/item?id=22739461)

2016
[https://news.ycombinator.com/item?id=11400681](https://news.ycombinator.com/item?id=11400681)

a bit from 2014
[https://news.ycombinator.com/item?id=7128091](https://news.ycombinator.com/item?id=7128091)

a bit from 2012
[https://news.ycombinator.com/item?id=4693224](https://news.ycombinator.com/item?id=4693224)

Introduced in 2011
[https://news.ycombinator.com/item?id=2501160](https://news.ycombinator.com/item?id=2501160)

------
megaman821
Apache seems to have 100 databases and data processing projects under its
umbrella. Is there a cheat-sheet on what they do (and sometimes how they
differ) and how popular they are?

~~~
2bethere
Yeah, only few are actually used widely with enough feature support.

It can be divided into a few categories: 1\. OLTP, thing that requires
frequent updates HBase, CouchDB 2\. Distributed key-value store Cassandra 3\.
OLAP, analytical, batch Hive, Impala 4\. ETL, stream processing Spark 5\.
OLAP, analytical, low-latency Druid

There are a bunch of other auxiliary projects that makes deploying those
things at scale feasible, such as ZooKeeper, HDFS what not....

------
rb808
This seems to be the right product for loads of systems. I haven't seen much
take up though.

~~~
logicslave
It has significant operational load and complexity. Five node types,
communicating with zookeeper

~~~
mrits
With Kafka's removal of ZooKeeper I wonder if Druid could piggy back off of
that to simplify their architecture. At least using some of the Kafka
terminology would be a step in the right direction. I think even a hard
dependency of Kafka would be better for the adoption rate than their current
design.

~~~
doliveira
I don't know much about it other than what I've learned through the struggle
of trying to set up a cluster.

But why do so many projects depend on Zookeeper? What does it provide that
couldn't be done through a embedded library? Seems like a lot of databases
don't really need it. Is it worth the extra network dependency and operational
complexity?

~~~
tomnipotent
> But why do so many projects depend on Zookeeper?

Hadoop ecosystem legacy. Most companies adopting tech like Druid were already
running Hadoop and had Zookeeper as a result. Probably made sense to take
advantage of a reliable, or at least well-known, system.

~~~
2bethere
Have an upvote.

------
advisedwang
Official release notes at
[https://github.com/apache/druid/releases/tag/druid-0.18.0](https://github.com/apache/druid/releases/tag/druid-0.18.0).

------
2bethere
Product manager at Imply for Druid. AMA...

~~~
frankmcsherry
Your example join output at [https://imply.io/post/introducing-apache-
druid-0-18-0](https://imply.io/post/introducing-apache-druid-0-18-0) has the
incorrect answers for the join.

------
pachico
I thought Druid lost the battle against ClickHouse long ago. Am I wrong?

~~~
gilbetron
Incorrect. They both have their strengths and weaknesses:

[https://medium.com/@leventov/comparison-of-the-open-
source-o...](https://medium.com/@leventov/comparison-of-the-open-source-olap-
systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7)

~~~
rb808
> ClickHouse is simpler and has less moving parts and services.

sounds good to me

~~~
gilbetron
Having administered Druid for several years, ClickHouse's supposed simplicity
is definitely appealing were I to start a new project with similar
requirements. Then again, back then, I needed Petabyte scale and > 1 million
inserts/sec and ClickHouse couldn't do it.

~~~
pachico
I managed to do 5m/s in a single server. Something must be off.

~~~
gilbetron
Depends on how big your structs are coming in, how you are holding on to the
data, and what else you are doing with it on ingestion.

