
A Comparison of Time Series Databases and Netsil’s Use of Druid - fangjin
https://blog.netsil.com/a-comparison-of-time-series-databases-and-netsils-use-of-druid-db805d471206#.4k4xtf8xz
======
bbrazil
Great to see an analysis of the various options.

A few corrections on Prometheus:

> an AlertManager which handles generating and transmitting alerts

Alerts are generated in Prometheus, the Alertmanager then groups, deduplicates
and transmits them. See [https://www.robustperception.io/prometheus-and-
alertmanager-...](https://www.robustperception.io/prometheus-and-alertmanager-
architecture/)

> Data Storage Model: Prometheus uses LevelDB as the storage backend

LevelDB is only used for metadata (we used to use LevelDB for everything, that
didn't work out too well). The primary timeseries storage is a custom format,
which you can find more details about in
[https://www.youtube.com/watch?v=HbnGSNEjhUc&index=6&list=PLo...](https://www.youtube.com/watch?v=HbnGSNEjhUc&index=6&list=PLoz-
W_CUquUlCq-Q0hy53TolAhaED9vmU)

> It has an active community but no enterprise support.

There's several providers of support, my company being one of them. See
[https://prometheus.io/community/](https://prometheus.io/community/)

And some others:

> InfluxDB supports ingesting data via a CLI, client libraries and an HTTP
> API.

It also accepts a custom TCP protocol, the graphite protocol and the OpenTSDB
protocol.

> Influxdb .. supports GroupBy but not TopN queries.

[https://docs.influxdata.com/influxdb/v0.13/query_language/fu...](https://docs.influxdata.com/influxdb/v0.13/query_language/functions/#top)

~~~
smb06
Thanks for your comments. They have been incorporated in the blog.

------
dataloopio
Broader comparison of the facts:

[https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCE...](https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit)

Also a blog comparison:

[https://blog.dataloop.io/top10-open-source-time-series-
datab...](https://blog.dataloop.io/top10-open-source-time-series-databases)

From reading benchmarks and various articles on how to setup druid for real-
time analytics it isn't great. Druid was designed for batch workloads. Feeding
the database slowly and then once complete performing analytics over a large
data set. Real-time streaming analytics are dreadful to setup and use.

~~~
fangjin
While Druid was initially built for batch loads, the architecture has evolved
substantially as the project has matured. Today, Druid supports exactly-once
streaming ingestion from Kafka, and large production deployments routinely
stream millions of events per second into Druid.

~~~
dataloopio
Can you point me to a source for 'routinely stream millions of events per
second into druid'?

While it is true that druid is great at querying billions of rows per second
it's not very good at ingress. Here is a mailing list discussion for some
background.

[https://groups.google.com/forum/#!searchin/druid-
user/benchm...](https://groups.google.com/forum/#!searchin/druid-
user/benchmark%7Csort:relevance/druid-user/90BMCxz22Ko/73D8HidLCgAJ)

~~~
cheddar
What kind of ingestion numbers are you working with? The thread you link to
shows that Druid can ingest ~27.5k events/sec per node, which is roughly
2.376bn events a day per node.

While you can claim bias here too, we have multiple clusters ingesting in the
high hundreds of thousands of events/second and our largest cluster does close
to 2m/s. That's definitely scaled horizontally across multiple nodes.

If you are suggesting there is a system out there that can ingest millions of
messages a second on a single node, I'd love to hear about it :).

edit: Ah, I see from the spreadsheet that you linked that there are systems
out there that claim 2.5-3.5m writes per second per node. That's really quite
amazing, would be awesome if you could provide the methodology used to collect
those numbers. For example, if you are sending in 500 byte events (a rather
common size for what we do), if my calculations are correct, you are now
sustaining 14 Gbps, which means those benchmarks were done on some beefy
hardware. Can you link to a blog post that details the methodology?

~~~
dataloopio
Most benchmarks are given a colour for reliability and link to repeatability.

~~~
cheddar
Ah, cool, I chased down what you are doing and figured out that you are doing
an apples to oranges comparison.

As described in your benchmark description:

[https://gist.github.com/sacreman/b77eb561270e19ca973dd505527...](https://gist.github.com/sacreman/b77eb561270e19ca973dd5055270fb28)

You are running 200 agents emitting 6000 metrics a piece using Haggar to
generate load, which is at

[https://github.com/dalmatinerdb/haggar](https://github.com/dalmatinerdb/haggar)

The specific thing of interest is how you are generating your data, which
looks like you have a single set of dimensions and 6000 metrics dangling off
of it. The loop that populates all of the "metrics" are:

[https://github.com/dalmatinerdb/haggar/blob/master/main.go#L...](https://github.com/dalmatinerdb/haggar/blob/master/main.go#L51-L56)

And the thing that actually populates the bytes are at:

[https://github.com/dalmatinerdb/haggar/blob/master/util.go#L...](https://github.com/dalmatinerdb/haggar/blob/master/util.go#L21-L37)

So, if we take this to an apples-to-apples comparison, you have 200 agents
sending a single event every second with 6000 metrics in it. That means that
you are successfully ingesting 200 events per second in the way that we would
measure event ingestion for Druid.

Note, also, that the thread you link to is ingesting 17 independent dimensions
with each and every event that flows in. From the Daltaminer docs, it looks
like you put all dimension data into postgres and you don't expect any large-
scale deployment to ever need more than a single postgres node:

[https://gist.github.com/sacreman/9015bf466b4fa2a654486cd79b7...](https://gist.github.com/sacreman/9015bf466b4fa2a654486cd79b777e64)

Look under "Setup Postgres".

We routinely have billions of unique combinations of dimension values per day
flowing into our system. Delegating the finding of the right keys to a
relational database for such operations is going to be very cost-prohibitive,
not to mention, you are going to have to materialize hundreds of millions of
keys in order to do a simple aggregate over the day.

So, I guess this is just another case where you should never trust benchmarks
that you didn't do yourself or that don't follow a standard pattern like
TPC-H. It's too easy for the same words to be used with different meanings.

~~~
dataloopio
DalmatinerDB, InfluxDB, Prometheus and Graphite each claim their numbers based
on similar benchmark methods. The results range from 500k/sec to a couple of
million metrics /sec. Druid comparatively, for the same benchmark would be
closer to 30k/sec. If that's factually wrong please post some details and we
can update the spreadsheet.

Expanding the benchmark to cover cardinality and other aspects would indeed be
comparing apples to oranges.

In terms of benchmarking DalmatinerDB with billions of unique combinations
indexed in Postgres.. I think we know what will happen there :) That's what
it's designed for. We can also shard in the query engine, or use any of the
multi master Postgres options, but I doubt that would even be necessary.

~~~
fangjin
The databases listed above, to the best of my knowledge, are commonly used for
dev ops metrics data and share similar terminology. Druid on the other hand,
draws much of its terminology from the OLAP world. As cheddar clarified in his
post above, the benchmarks for Druid are misleading as it is not an apples to
apples comparison (I suspect the benchmarks for ES also suffer from this
problem). A single Druid event may consist of thousands of metrics.

------
marknadal
Any comparison on the cost to run those systems?

Disclosure: Reason why I ask is because we're a competitor, however currently
less focused on timeseries (we don't nearly have all those interesting
features yet), but our latest load testing saved 100M+ records for $10/day[1]
which is a pretty disruptive price, for over 100GB of data. Next up is TB
scale! So it would be useful to see price comparisons of other systems.

[1]
[https://www.youtube.com/watch?v=x_WqBuEA7s8](https://www.youtube.com/watch?v=x_WqBuEA7s8)

------
alex_hirner
Not to forget chronix.io, which was shown to have favorable performance for
more complex queries and space efficiency at last week's [http://munich-
datageeks.de/dadada/](http://munich-datageeks.de/dadada/) (slides yet to
come).

------
mej10
What amount of data are you ingesting per day and what is the cost?

How much time do you spend maintaining Druid?

Do you have dedicated SRE/devops people managing it or is it something a dev
could do with minimal time investment?

Answers to any of these would be super helpful -- it is hard to find details
on these aspects of real world uses of Druid.

~~~
gillh
We are packaging our product as a self-hosted machine images
(www.netsil.com/download). We are using DC/OS runtime to orchestrate all our
internal services and have written various health check scripts to make sure
the data pipeline components such as our stream-processor, Kafka and Druid
stay operational. This blog post describes Netsil's architecture:
[https://blog.netsil.com/listen-to-your-apis-see-your-
apps-a9...](https://blog.netsil.com/listen-to-your-apis-see-your-
apps-a91ece5aa5bd)

In-terms of data being ingested -- we are stream-processing service
interactions (network packets!) in real-time, storing interaction logs in
memory and rolling them up to generate multidimensional timeseries data
emitted every second. Therefore, the actual timeseries ingestion rates by
Druid vary across deployments.

------
mattdeboard
Am I completely alone in thinking ElasticSearch is out of place here? Am I
misunderstanding something about ES? It's a NoSQL doc store but is there
something about its indexing or storage tech (Lucene, right?) that
specifically makes it optimal for time-series data?

~~~
lobster_johnson
ES is quite good at time-series data, actually.

Lucene is effectively a sparse column store. Its storage format is quite
compact, and they do some clever things to balance between speed and memory
usage. Lucene uses append-only index files (segments) that are compacted
(merged) after a while; not too dissimilar from how Cassandra or LevelDB work.

The aggregation API in ES is quite extensive, too. Date histograms, top-K,
etc., all composable and extremely fast, particularly when parallelized over
multiple shards (though you have to prepare for how sharded aggregation can
skew some aggregation calculations).

ES is also impressively fast at bulk indexing.

There are some downsides. ES has a global field cache of some sort (I forget
the details) that makes it increasingly bloated as you add more unique fields,
which can be a problem for some applications, particularly if you federate
many different apps in one cluster. In general, ES -- being on the JVM -- is
RAM-hungry, and it's a little too easy to cause it to run out of memory, even
with a huge heap and all caches etc. configured with hard limits.

ES's schema system (the "mappings") also leaves a lot to be desired, but this
problem is less applicable to an analytics app.

Lastly: If you ever want to compact historical data by merging events to a
lower granularity (is there an industry term for this?), ES doesn't give you
any tools to do it. You will have to write the logic yourself.

~~~
iampims
> If you ever want to compact historical data by merging events to a lower
> granularity (is there an industry term for this?)

I believe this is called a roll-up.

