
Why Not to Build a Time-Series Database - dgildeh
https://www.outlyer.com/blog/why-not-to-build-a-time-series-database/
======
tuukkah
TLDR: "Why Not to Build a Time-Series Database? Because we're building one and
you should pay us."

> _Hopefully our story will make you think twice before trying to build your
> own TSDB in house using open-source solutions, or if you’re really crazy,
> building a TSDB from scratch. Building and maintaining a TSDB is a full time
> job, and we have dedicated expert engineers who are constantly improving and
> maintaing our TSDB, and no doubt will iterate the architecture again over
> time as we hit an even higher magnitude of scale down the line._

> _Given our experience in this complex space, I would sincerely recommend you
> don’t try and do this at home, and if you have the money you should
> definitely outsource this to the experts who do this as a full time job,
> whether its Outlyer or another managed TSDB solution out there. As so many
> things turn out in computing, it’s harder than it looks!_

~~~
cmroanirgo
Hmmm. I used to be part of a team that handled market data at crazy rates and
we took exactly the opposite approach to these guys.

When I see:

"You Can Lose a Few Datapoints Here and There"

I see that these guys are barking the wrong tree.

1\. We used single thread per network card. (Yes, we architected
clusters/failovers, etc... but not once was it required because of data rates)

2\. The server could handle a fully saturated Gibit network at <50% CPU (per
core)

3\. Data was NEVER thrown away (but we had allowances in our API to let the
client reading the data to drop updates and get sub-second aggregates instead
-- eg OHLC or summation)

4\. Data was stored in basically flat file systems.

5\. Our calculation engine was run 'downstream' toward the client ends, or on
the client end, away from data collection. If needed (ie. the calcs were
expensive to run), these could feed back into the server for long term
storage.

This was mid 2000. I'm sure this is not rocket science for modern day
timeseries guys.

~~~
wmfiv
Right. For those interested OpenHFT has created a really nice set of open
source solutions to do this.

[https://github.com/OpenHFT/Chronicle-
Queue#design](https://github.com/OpenHFT/Chronicle-Queue#design)

~~~
pvitz
Do you know any article or book outlining the architecture of a full HFT
system, I.e. from market data consumption to pricing to trading? Thanks in
advance!

------
manish_gill
As I was reading through the post I kept wondering why they weren't using some
warehousing technique for older data - either dump it to S3 or better yet,
Google BigQuery, which is amazingly fast at that scale. They only did it after
doing lots of fire-fighting and per-tenant clusters.

Clickhouse would also be a good option for doing aggregating queries that
TSDBs are mostly used for.

One of my wishlist items in the data space is a Managed Clickhouse offering.
:-)

------
bra-ket
We used a combination of Kafka + Hbase+ Phoenix
([http://phoenix.apache.org/](http://phoenix.apache.org/)) for similar
purpose. It takes some effort to setup initial Hbase cluster but once you do
it manually once and automate with Ansible /systemd it's pretty robust in
operation.

All our development was around query engine using plain JDBC/SQL to talk to
Hbase via Phoenix. Scaling is as simple as adding a node in the cluster.

~~~
PopeDotNinja
Does Kafka have timestamps? I didn't see any when I looked, but I was working
with an older client version & didn't get far into it.

~~~
bra-ket
we didnt rely on it, or on the ordering of messages received from kafka,
timestamps and transaction IDs were generated by the client app/kafka
publisher and were part of the message put into kafka topic, when we consume
that message with one of the parallel kafka consumers and save a row in hbase
table that original timestamp + transactionId becomes part of the rowkey
string, other parts being attributes that we wanted to index (secondary
indices are supported in hbase/phoenix but we didnt use them too much,
basically the composite rowkey is the index). Then when querying hbase it
works as a parallel scanning machine and can do a time range scan + filtering
+aggregation very fast.

On a separate note we didn't use joins even though they are supported in
Phoenix, data was completely denormalized into one big table.

------
camel_gopher
In case anyone is interested in a video on this, here's the talk presented at
the local San Francisco monitoring group, monitorSF
([https://www.meetup.com/MonitorSF/](https://www.meetup.com/MonitorSF/))

[https://youtu.be/lA85vs6e3UA](https://youtu.be/lA85vs6e3UA)

------
marshf
Time-series data handling/storage seems a mostly solved problem in the mining,
oil, and manufacturing industries. Deployed in the field since the 80's,

[https://www.osisoft.com/about-osisoft/#more-about-pi-
system](https://www.osisoft.com/about-osisoft/#more-about-pi-system)

Discourse: Industry user now OSIsoft employee

~~~
cheriot
A solved problem at what scale? A kid with pencil and paper can solve the
problem when it's small enough.

~~~
seanhunter
I don't have personal knowledge, but my understanding is that timeseries data
in the oil industry is insane scale. Think 10s or 100s of wells in a field,
sensors at regular distances in every well each taking multiple readings
(temperature, flow etc) on a sub-second cadence. And a big oil company will of
course have many fields.

It's in the same league as timeseries data in financial markets, which I have
more direct experience of. Financial timeseries data probably has more
metadata (trade/order flags etc) but in terms of raw data it's similar.

~~~
cheriot
What the article is talking about is an order of magnitude larger. Think
millions of data points per minute coming out of a medium sized data center.
It gets harder when you look at the cardinality of the data (explained better
by the article), which corresponds to the sized of the indexes.

------
statictype
Nice article.

>its not uncommon for some of our customers to send us millions of metrics
every minute

What kind of customers/services generate millions of points a minute?

~~~
varikin
I use to work at a Fortune 50 retailer on the cloud platform (a lot of tooling
around CI/CD for the teams that manage the website). We had a large problem
with keeping the metrics pipeline current. A major issue is that be default,
Spring Boot publishes about 500 different metrics on a 10 second slice.
Allowing every application to pump out that many default metrics, most of
which are never used, means that it takes only 334 instances to get one
million metrics per minute (1,000,000 / 6 / 500 = 333 1/3). I would guess we
had a couple thousand instances in production on a normal day. In a couple
weeks, they have Black Friday. Any team that hasn't been able to fix
performance problems are given the go ahead to just throw money at it and
scale horizontally in obscene ways.

Of course, our metrics were all handled in house. From talking to the teams
that handled the metrics pipeline, the vendors were great for smaller
companies, but there was no off the shelf solution for a companies that large
with that volume. But I did very little with that myself, other than look into
the fact that Spring Boot published way too many default metrics. Who needs
P50, P70, P75, P80, P85, P90 - P99 on all web requests?! Just set a default
that is small and worthwhile and let the developers adjust as needed.

~~~
djrobstep
Folks need to resist the inclination to just gather maximum data for the hell
of it.

If you're pumping out a million metrics per minute, almost none of those are
ever going to actually be used to generate meaningful insight.

~~~
LukeShu
I used to work at a startup that made physical robots. The robot generated
several GBs of data every time it turned on. You're correct, most of that data
wasn't looked at most of the time. But every now and then, someone would say
"Hey, I saw a robot do something funny the other day, what the hell happened?"
And having all that data usually made it possible to figure out what happened.
To me, "maximum data for the hell of it" isn't about generating insight by
looking at trends, it's about generating insight during incident analysis.

~~~
djrobstep
That is a very particular use case, where very high res data is critical. I
note that even here, you're interested in data from "the other day", not years
ago.

In most cases, time spent maintaining terabytes of rapidly aging time series
data would be better spent elsewhere.

~~~
all2
I think that really depends on the case.

A particularly good high-frequency trader might be interested in Terabytes of
minutia when they're trying to sort out what caused yesterday's spike and
crash of ticker XYZ.

Systems and sales analysts that are looking at web store front ends (and back
ends, if there are issues) would be interested in large volumes of data,
specifically corner cases (users who don't follow a statistically significant
path), when trying to sort out a UI/UX redesign.

Traffic and transit analysts might want terabytes of data (especially with
date and weather indicators) when considering what kind of freeway interchange
to add to a growing area.

I suppose I could go on...

------
manigandham
"time-series database" is some of the most overhyped nonsense since noSQL.

Time-series is just data with time as a primary component. It comes in all
shapes and volume, but if you have a lot of data and are running heavy OLAP
queries than we already have an entire class of capable databases.

Use any modern distributed relational column-oriented database, set primary
key to metric id + timestamp, and you'll be able to scale easily with full SQL
and joins. You can keep your other business data there too, along with JSON,
geospatial, window functions, and all the other rich analytical queries
available with relational databases.

We have trillion row tables that work great. No special "TSDB" needed.

~~~
ghc
While I understand your point, you are quite mistaken if you think that time
is just another key. Dealing with time properly requires a concept of point
distance, similar to GIS systems requiring 2d distance understanding. You
cannot do joins on time with SQL databases unless you want to throw away
important data.

As an example, in the industry I work in, you may have no readings for days or
weeks, and then hundreds of readings from the same sensor. Why? Many systems
in industrial environments send new readings only "on-change", and assume the
underlying data storage architecture will forward fill to in-between times.
This is why the practically ancient time series architecture of data
historians still dominates in these use cases.

In fact, for many time series functions you actually have to throw away the
notion of relational joins to be able to efficiently perform time-preserving
joins. Window functions only work in basic use cases with relatively small
amounts of data where you're aggregating.

~~~
manigandham
I'm not sure where the complexity is that you claim, nor what it has to do
with data historians?

So what if there are missing rows? This doesn't affect the database and any
aggregations will work fine. Databases don't "fill-in" data, but you can
definitely write whatever SQL you need to create averages and buckets to
smooth out results.

From reading your website, it seems you're talking about the "last value
recorded" as of a certain time, which doesn't seem to be a common query but is
totally possible. KDB+ has "asof" joins and others can handle it with window
functions using last_value().

We run queries on a table containing 2.7+ trillion rows of data that has no
set pattern and infinite cardinality, and results return within seconds.
Window functions and joins work without issue. Have you actually tried using a
columnstore?

~~~
ghc
Relational databases represent a column/row-oriented architecture. Data
historians are a specialized, non-relational, _time_ -oriented architecture.
Using time as a key in a relational index implies that only ordering is
important, but that is not the case. Distance between points in time is
extremely important because time operates on a continuous 1d line and data
points are represented at varying distances between each other on that line.
Data historians are architected to both preserve this temporal relationship
and take advantage of this by eliminating duplicate data, employing temporal
compression techniques to be able to store millions of readings per second for
years worth or data.

> From reading your website

My website doesn't have much to do with this because Sentenai isn't a time
series database system. I did, however, spend most of my time in research
working on temporal data systems, and have been fortunate to collaborate with
or learn from researchers who have spent decades solving the unique problems
that temporal data presents. What you might consider uncommon for your use
cases is extremely common in manufacturing, defense and other areas.

There's a decades-old industry around database systems that handle time
natively. And while many support SQL as a lingua franca, and some are column
stores, they're not relational by any means as they either extend SQL to
support time, or limit non-temporal joins to ensure performance. StreamBase,
Kdb, Aurora and many other specialized architectures exist because one size
does not fit all. Michael Stonebraker, whose work has included StreamBase,
Vertica, Tamr, Postgres, Aurora, and many others, famously published this
paper about the very problem:
[https://cs.brown.edu/~ugur/fits_all.pdf](https://cs.brown.edu/~ugur/fits_all.pdf)
.

Further reading that might be illuminating:

    
    
      1. http://cs.brown.edu/research/aurora/vldb03_journal.pdf
      2. http://www.cs.rochester.edu/u/james/Papers/AllenFerguson-events-actions.pdf
      3. https://books.google.com/books?id=BK6oCAAAQBAJ&pg=PA9&source=gbs_toc_r&cad=4#v=onepage&q&f=false (excerpt)

~~~
hardwaresofton
I appreciate your links to further reading, and I'm trying to read the Aurora
paper right now but after reading the abstract and the intro (I'm in progress
right now), I can't find a case that is uniquely fit/perfect for data
historians... I know this is already asking a lot, but would you mind giving
me _one_ go-to- use case that really made you think "this is what purpose-
built data historian-style databases are good for?".

Every issue mentioned in the abstract/intro (which are meant to motivate the
paper) seems like it can be solved as an add-on to existing application
databases (albeit with their most recent developments/capabilities in mind).
The very description of HADP vs DAHP systems seems silly, because it's just a
question of write load, and that's fundamentally only solved with batching and
efficient IO, or if you give up durability, it doesn't seem inherent to the
data model. There's also assertions like:

> Moreover, performance is typically poor because middleware must poll for
> data values that triggers and alerters depend on

But like, postgres though, you're free to define a better/more efficient
LISTEN/SUBSCRIBE based trigger mechanism, for example, you can highly
optimized code right in the DB... Thinking of some of the cases called out in
the paper here's what I think in my head:

\- Change tracking vs only-current-value -> just record changes/events, as far
as tables getting super big, partitioning helps this (timescaledb does this)

\- Backfilling @ request time -> an postgres extension could do this

\- Alerting -> postgres does have customizable functions/procedures as well as
LISTEN/SUBSCRIBE. The paper is right (?) about TRIGGERs not scaling then this
might be the most reasonable point.

\- Approximate query answering is possible with postgres with stuff like
HyperLogLog, but the paper is certainly right in that it is not implemented by
default.

Maybe I'm mistaking the extensibility of postgres for the redundancy of the
paradigm, akin to thinking something like "lisp is multi-paradigm so why would
I use Haskell for it's enhanced inference/safety".

I'm still reading the paper so maybe by the end of it it will dawn on me.

~~~
ghc
So Aurora isn't a historian, but is a complex event processing system. It's an
entirely different beast that solves very specific problems around high-speed
queries that could theoretically require scanning through all data stored
historically for queries.

I'm not a huge fan of historians (I've spent too much of my career working
with them), but I can definitely tell you where they make sense. The scenario
is this:

Imagine you have a large facility with thousands of machines, each with a
programmable logic chip for controls and monitoring. These machines create
lots of data and so often employ data reduction semantics by reducing data to
on-change rather than sampling sensors at thousands of hertz. A single machine
may have dozens or hundreds of variables to track. These tags might be
hierarchical: Machine 1, subsystem 5, variable b. If you say there's 100,000
total tags to track in the facility, and they're on average sampled at 10hz,
you need a system capable of writing a million durable timestamped values per
second. Now that's child's play for, say, google, but if you're a
manufacturer, you can't afford to spend massive amounts of money on cloud
systems, and usually want to do this all on a single server on the factory
floor because you need realtime monitoring that can display the current value
in time for every single tag. ( [https://www.ws-
corp.com/LiveEditor/images/SLIDES/10/3.jpg](https://www.ws-
corp.com/LiveEditor/images/SLIDES/10/3.jpg) ). Ideally, in a single node
scenario, you want compression. It's not uncommon to store 100 billion
timestamped values per day and keep them for a year or more for audit purposes
is something goes wrong. Today, for the sake of predictive maintenance, data
retention policies of up to 10 years are becoming more common.

So what would you sacrifice to be able to do efficient realtime monitoring and
ingestion of millions of data points per second? You can't use queueing
semantics to protect an RDBMS because logging can't take more than a 10th of a
second per point. If you think about the use case, what you'd sacrifice is
transactional queries and row-level joins, because you just don't need them.
At the same time, this data is really sparse when you look at it from a
table's perspective, so you'll want something like a column store to underly
the data storage.

So what we do is throw out transactional guarantees, choose a storage system
that is good at compression (roll-ups in some historians will store a formula
approximating the data instead of raw data itself over a window), and
prioritize speed of point retrieval for most recent "hot data" by caching it
in-memory.

You can of course extend Postgres to achieve many of these things, but having
done it myself, in practice it's sub-optimal in the exact same way that using
bubble sort for all your programmatic sorting needs is sub-optimal.

One thing you might want to keep in mind is that many of the people involved
in Aurora are the authors of Postgres. They're not arguing you can't do things
in Postgres, they're arguing that in practice the RDBMS's guarantees are
theoretically incompatible with high performance in the area of Complex Event
Processing, because alignment between different simple events (recorded as
rows in a database) can drift so far that memory requirements become
prohibitive if you don't use a stream-processing architecture.

Also keep in mind that Aurora is from 2002 and many of the ideas have been
implemented elsewhere over time. The great thing about Postgres is that it's
perfect scaffolding on which you can build other stuff.

~~~
hardwaresofton
Thanks for taking the time to give such a detailed answer and explanation on
historian systems.

------
Daneel_
I’m surprised no one has brought up Splunk in here (that I could see at a
cursory glance).

They manage to do time-series storage on a pretty large scale (over 5PB/day
for their largest customer).

~~~
viraptor
I wouldn't really say that splunk is time-series in the same way they wanted.
It's mainly logs indexing, but not the kind of aggregation you'd want from
numeric, labelled metrics.

~~~
dominotw
> aggregation you'd want from numeric, labelled metrics.

We use it exactly for this, works great.

~~~
viraptor
Do you use the new metrics store? I just learned they have one now in the new
versions.

~~~
dominotw
No we don't use metrics store.

~~~
viraptor
That doesn't really sound like a cost effective way of doing metrics... The
amount of data you license from splunk is likely quite expensive compared to a
dedicated metrics system, right?

------
objektif
Not very knowledgable in the area but can someone please explain how does kdb
fit within this class of time series dbs and whether there are any
alternatives available to kdb.

~~~
unixhero
Kdb is a priprietary db that runs using the processor level 2 cache as memory.
Therefore it is crazy fast. Also therefore, it does not handle large datasets.

~~~
unixhero
At least in the past.

------
dgildeh
As the blog author, great to see the discussion and feedback, so appreciate
it!

Without going through comments one by one, the main ones about this being a
solved problem or there's already solutions out there that do this, I would
just say those comments remind me exactly of the type of conversations I had
years ago with my team. We all thought it would be much easier or thought
there would be something off the shelf that could do everything, and after
several years of fire fighting, the reality was the problem looks much simpler
than it really is, by a long mile.

Now that we've been doing this for a few years, and spoken directly with
creators of many other TSDBs, we take a very skeptical view of all claims made
about any database. They all sound amazing when you first read about them,
maybe even work great in testing, till you hit scale and then you find all the
limitations. If there was a perfect TSDB out there, everyone would be using it
and there wouldn't be a new one announced on a weekly basis!

I think the one comment on query loads being different sums things up - I've
no doubt all the other options thrown out there work well for data historians,
but for monitoring tools with loads of concurrent users, loading dashboards
with 10's or 100's of queries each, and alerting systems polling every few
seconds in parallel, the query load can get very high quickly, and making
those fast while still writing metrics in at scale, is a hard problem and I
don't think any individual TSDB has really solved that properly, which is why
we ended up building our own distributed architecture ourselves.

------
OldHand2018
It's pretty ridiculous that "Time-Series Database" has come to mean ingesting
massive amounts of streaming data. They've been around a long time and have
many use cases.

They're a great way to store data efficiently, accessing specific data if you
know the time range you are looking for is very fast and simple, and you can
roll your own in a few dozen lines of C if that's what you want to do. If
that's all you need, why not?

~~~
davidjc1
That may be a perfectly good solution if you have a very static infrastructure
and narrow use case.

As a thought exercise, for the most trivial solution, you could create a
single append only flat file. This may work well for writes, but what happens
when you want to read the datapoints for only a single series in time order?
This would result in an expensive scan over the whole file. An improvement
could be to create a file per series, but this becomes problematic when
writing many small datapoints across each different file. The problem worsens
in the case of a dynamic containerised infrastructure which produce a unique
number of timeseries over very short intervals, which was the catalyst for the
development of Prometheus TSDB v2, as the prior version stored a file per
timeseries.

As the post states, there is a balance between the read and write pattern -
achieving that with a few lines of C for a general purpose case is a difficult
task, if not impossible.

~~~
OldHand2018
To be clear, my post was to state that there are many use cases for time-
series databases and bemoan the fact that most current development centers
around a specific use case. That is in fact what I wrote.

I have a hard time believing that "a dynamic containerised infrastructure
which produce a unique number of timeseries over very short intervals" is the
superset of all time-series use cases, but perhaps it is so.

