
MetricsDB: TimeSeries Database for storing metrics at Twitter - akulkarni
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/metricsdb.html
======
wolf550e
A blog post by Dan Luu ('luu) about using this to save twitter a lot of money
on RAM for JVMs: [https://danluu.com/metrics-
analytics/](https://danluu.com/metrics-analytics/)

------
kasey_junk
Still waiting for someone to open source an observability db that uses
distributions as the core storage abstraction instead of just single values.

I thought Circonus’ IronDB was open source but they’ve either brought it back
closed source or I was mistaken.

In any case it seems weird that everyone is just rehashing the same space with
metrics dbs and not moving to where Google/Circonus have been for a while.

~~~
ekabod
clickhouse as aggregate functions (Pearson correlation coefficient,quantiles ,
etc... )[0].

Associated with their Materialized view and Liveview features, you can achieve
observability [1].

[0][https://clickhouse-docs.readthedocs.io/en/latest](https://clickhouse-
docs.readthedocs.io/en/latest)

[1][https://www.altinity.com/blog/2019/11/13/making-data-come-
to...](https://www.altinity.com/blog/2019/11/13/making-data-come-to-life-with-
clickhouse-live-view-tables)

Edit: fixed broken link.

~~~
hodgesrm
That second link seems to be broken. The base URL is:
[https://www.altinity.com/blog/2019/11/13/making-data-come-
to...](https://www.altinity.com/blog/2019/11/13/making-data-come-to-life-with-
clickhouse-live-view-tables)

~~~
ekabod
fixed. Thanks.

------
polskibus
Is this open source? How does it differ from TimescaleDB or InfluxDB?

~~~
redis_mlc
MetricsDB appears to be HDFS-based. TimescaleDB is Postgres-based and InfluxDB
is memory and log-based. Prometheus is file-based.

~~~
squarecog
While HDFS is indeed used for exporting old data and storing some partition
mapping metadata, it's clear from the blog post that MetricsDB is much more
reliant on BlobStore as well as MetricsDB-specific services.

> The servers checkpoint in-memory data every two hours to durable storage,
> Blobstore. We are using Blobstore as durable storage so that our process can
> be run on our shared compute platform with lower management overhead.

------
etaioinshrdlu
This applies to all time series databases... What is special about time series
that makes unique databases attractive? Is it the assumption that only 1 main
index is created (time)? Is is that the data is append-only?

In particular, I'd love to know if theres anything major that generic RDBMS's
could do better here.

~~~
dima_vm
Yes, time-dependence and only numbers allows to use a lot of tricks that
cannot be done with generic data.

For example, double delta compression, or Gorilla for floating point numbers.
For more, take a look at our open source VictoriaMetrics database, which uses
all of such tricks.

~~~
citrin_ru
ClickHouse DBMS allows to combine delta, double delta, Gorilla with LZ4 or
ZSTD for column compression. But it is not often used as DB for monitoring
metrics so something else is probably expected from time series DB.

~~~
redis_mlc
It would work, but Clickhouse is a Russian (Yandex) thing, and SQL isn't
really needed for most monitoring and alerting use cases.

If I didn't want to use MySQL or Postgres, I'd rank Prometheus #1 and
Clickhouse #2.

The killer thing for Clickhouse is that Percona supports it, so if you want to
outsource the installation, mgmt. and support, you can just write a check and
get good results.

Also, Clickhouse is a column store with SQL, so you could use an instance for
monitoring and another to replace Vertica or Greenplum or whatever so long as
it has the client libraries you need.

~~~
rcatcher
ClickHouse is licensed under Apache License 2.0 and Yandex is incorporated in
the Netherlands. What are your concerns with it being developed by russians
(other than xenophobia)?

~~~
redis_mlc
No, you're the xenophobe.

I'm concerned with there being a support issue, as well as a smaller user base
also affecting support in the long term.

------
abeppu
I'd be curious to see a couple of organizations that have taken different
approaches in this area discuss how they made decisions around investing in
observability systems. When metrics gets to be a meaningful fraction of your
total data size, and when it gets to be an operational burden on its own, I
think it's tempting to ask "Do we really need to emit and store all this?"
especially if you acknowledge that only ~2% of writes ever get read.

In some sense, the organization is making an expected value of information
estimate, where there's a complex interaction between the data you have and
actions you can take, especially to preempt or resolve issues.

~~~
hodgesrm
I've seen two approaches. (We have a lot of customers who deal with this exact
problem.)

1.) Down-sample data to aggregates. Keep the aggregated but drop the source
data after a certain period of time. You can see historical trends down to the
sampling interval but cannot drill down to individual observations.

2.) Use tiered storage. Keep recent data on NVMe SSD and migrate to high
density HDD or object storage over time. This allows you to keep more data.
Not all DBMS support this out of the box.

------
DSingularity
1.2 PB saved using gorilla compression! Didn’t realize how sparse this data
was.

~~~
cbsmith
Not necessarily sparse. Just lacking in entropy.

You can compress an arithmetic series down to a handful of bytes...

------
manigandham
And yet another company that builds its own (time-series) database. Why does
this keep happening for such a narrowly defined use-case at similar scales?

At 1.5 petabytes, they could just dump this in Google's BigQuery for the cost
of an engineer and have full SQL power.

~~~
rco8786
Bigquery does not have anything close to real time query support like you
want/need for observability data. And plus the cost is more like one engineer
per query. That shit is crazy expensive.

~~~
manigandham
What do you mean by real-time? BQ can scan a petabyte in less than a minute,
but partitioning and clustering will get most simple queries back in a few
seconds. If you mean data freshness than BQ supports streaming ingest and will
show those rows in the next query.

SQL also lets you do a lot of aggregations in the database instead of pulling
back raw metrics. And BQ has flat-rate pricing and discounts for large
clients.

~~~
rco8786
Less than a minute is not real time. Something like < 100ms for most queries
would be real time.

And then do that 10s of 1000s of times per minute, every minute, 24 hours a
day.

You can see how the query costs add up...

I know it's easy to come on HN and play armchair architect but this is one
area where you are just incorrect in your assumptions. Did Twitter need to
build a(nother) in-house database? I don't know for sure. But I do know that
BQ is not well suited to observability use cases.

~~~
manigandham
As I said, that's for scanning a petabyte. Selective queries would be much
smaller and faster, costs can be handled with flat-rate pricing, and there's
the BI Engine (in-memory cache) feature too.

The details in the article (1.5PB logical data, 3x replication factor, only 2%
of metrics ever read) and the fact that Twitter already runs infra in Google
Cloud seem to align well with BQ in my experience (10+ years in adtech
building even more complex backends).

Please tell me what assumptions are incorrect so I can reconsider.

~~~
rco8786
My experience is ~2.5 years as a senior engineer on the Observability team at
Twitter (from 2013-2015). I was part of the migration to Manhattan (mentioned
in the post) from Cassandra, new alerting infra, query language design, among
many other things. Your adtech experience should make you particularly
sensitive to query latencies, so I find it interesting that you're glossing
over that.

Yes, only 2% of metrics ever read. That was the same back then too. The kicker
is that you can't be sure which will be read, so the systems are built to be
able to read any of the metrics with the same SLA. This is especially critical
during an outage where engineers will need to _quickly_ read metrics that in
many cases were only written only a few seconds ago that they wouldn't
otherwise read and are not configured as part of an ongoing alert.

Additionally, the alerting infrastructure that runs on top of the TSDB is
configured with 10k+ queries that run every minute. So even at 2% read, you're
querying them over and over and over again because you always need the latest
data plus whatever trailing data is needed to fulfill the alerting needs
(trailing 10m, hour, day, month, etc). This also makes it a particularly hard
caching problem.

I can't speak to the state of GCP at Twitter. I know they had started to
migrate some things but when I was there it was all colo.

 _Could_ they use BQ? Sure it could probably be tooled with
partitions/caching/etc to work I guess, but at the end of the day it's not
what BQ is designed for which is data warehousing and BI. _Could_ they have
used something that wasn't a custom built in-house database? Yea almost
certainly (No secret that there is some NIH going on at Twitter well before
this)

This happens alll the time on HN so I'm not here to call you out
specifically...but it's really easy to read a company's blog post about their
infrastructure decisions and immediately scoff and say "psh why didn't they
just use X?"...as if you now have the same context that the engineering team
has by reading a single blog post.

------
nitred
Could someone tell me the disadvantages of solving the problem in the article
using S3? I am actually curious.

~~~
redis_mlc
They're inserting 5 billion events per second (for some reason), and AWS
throttles everything (the following link mentions a limit of 3,500/second):

[https://aws.amazon.com/about-aws/whats-
new/2018/07/amazon-s3...](https://aws.amazon.com/about-aws/whats-
new/2018/07/amazon-s3-announces-increased-request-rate-performance/)

Also, if they're planning to do alerting, S3 would not help with that.

PUTs are 1 cent per 2000:

[https://aws.amazon.com/s3/pricing/](https://aws.amazon.com/s3/pricing/)

~~~
buremba
5 billion events per second? That seems to be too much for monitoring data
even at Twitter's scale. The article refers that they store the raw data and
use rollup tables for aggregate queries, I wonder how much the ELT process
costs in total.

~~~
dan-robertson
Indeed, the article mentions 5b metrics per minute

------
deppp
If there is interest, we're working on time-series data labeling tool. Sneak
peek at it here:

[https://htx-pub.s3.amazonaws.com/Screen+Shot+2020-06-17+at+1...](https://htx-
pub.s3.amazonaws.com/Screen+Shot+2020-06-17+at+17.56.23.png)

If you think it might be useful and you want to try it out drop me an email
michael @ heartex.net

------
ssv33
Can someone offer up how this is different than InfluxDB? Also a bunch of
machine log folks are jumping on the observability 'marketing' bandwagon -
i.e. Elastic Search, Splunk, Devo, SumoLogic etc. Interestingly don't see them
here..

