
A Review of Time Series Databases - slyall
https://blog.dataloop.io/top10-open-source-time-series-databases
======
skrebbel
Dear HN, recently time series DBs appear to have become very popular on HN but
I'm confused. What are they for?

I understand the idea for, say, physics experiments. Lots of parameters
sampled thousands of times per second, gotta store that stuff somewhere. But
this is HN, not an experimental physics subreddit, so I must be missing
something.

What do people on HN who care about this stuff use time series databases for?
Why are there 20+ competitors?

~~~
perlgeek
Google basically does all its monitoring through time series databases.

Basically the record response times, error rates, and internal health metrics
from all kinds of services and service instances all as time series, and then
define rules on top of these to define when to generate alerts.

Other things you typically want to record are all kinds of resource usage
metrics, like disk usage, RAM usage, number of processes, network traffic and
saturation etc.

~~~
a3n
Interesting to uncover data that's not _directly_ related to a requirement:
get some search results. Of course it's humans sitting at the laptop, so time
matters. But the other data you mention also has the nature of "the thing we
care about" along with "time associated with the thing we care about."

I wonder what other extra-dimensional (totally made up term on the fly) data
we might care about. Power per query. Temperature per query. Water consumed
per query. Water consumed per current activity level of trending topics.

~~~
perlgeek
Money earned per... is usually a pretty big thing :-)

------
TheHippo
Dataloop develops DalmatinerDB. Guess which database won first place.

~~~
marianoguerra
it's developed by @heinz_gies from Project FiFo, they may pay him to do
changes on it, but it wasn't started there and AFAIK heinz doesn't work there
full time.

~~~
Licenser
Mariano is correct, DalmatinerDB is build and maintained by Project-FiFo.
Dataloop (the authors of the blog post) is currently the biggest user (as far
as I know at least), using it in their SaaS.

They have however been excellent open source citizens and contributed back
improvements, bug reports and suggestions.

------
dintech
KDB+ is the number 1 time-series database, no question. Per-core, it's 10
times faster than any other time-series database. Seems like an odd omission.

[http://kparc.com/q4/readme.txt](http://kparc.com/q4/readme.txt)

~~~
wlesieutre
As noted in the scope section along with some other omissions:

> Only free and open source time series databases and their features have been
> compared. Therefore if someone asks “have you tried Kdb+ or Informix?” the
> answer will be no. They are probably awesome though.

Article could have been titled "Review of the Top10 FOSS Time Series
Databases" since that's what it is.

~~~
gaius
The 32-bit version is free.

I agree that any comparison skipping it is meaningless, it is the "standard"
in this space, to the extent there is one.

~~~
evanpw
> The 32-bit version is free.

Only for non-commercial use: [https://kx.com/2015/09/19/32-bit-kdb-for-non-
commercial-use-...](https://kx.com/2015/09/19/32-bit-kdb-for-non-commercial-
use-only/)

------
coredog64
> It seems there is a never ending supply of time series databases being
> written on top of Cassandra. Unless any of them are significantly better
> than KairosDB I'm not going to change the top 10.

I really appreciate the work Brian Hawkins has done with KairosDB. We settled
on KairosDB at work after having a bunch of trouble with InfluxDB. At the same
time, it's very much stuck in the Cassandra 2.1 world* . There have been some
significant advancements with Cassandra for this particular use case (new
storage engine, TWCS) and KairosDB is missing out.

*Yes, it can be run on 2.2.x by enabling the now-disabled-by-default Thrift, but that feels like a last gasp.

~~~
JamesBarney
I'm evaluating InfluxDB for work and was wondering what issues you ran into?

~~~
Licenser
Blind guess, lack of clustering and a history of changing file formats so you
loose your data every few month.

------
sixhobbits
> I decided to pen a magnum opus of my own opinions

"magnum opus (noun) a large and important work of art, music, or literature,
especially one regarded as the most important work of an artist or writer."

Considering that this review has a word count of 3708, describing it as a
magnum opus (something I associate with a work that takes close to a lifetime
to complete) rankles.

~~~
64bitter
I only counted 3705 words. I might need to count them again, damn fingers.

Did you include the comments? I didn't myself, but thats sure to bump up the
count. We should investigate what the cutoff limit is, maybe 10,000 words
would qualify?

I'm struggling weather to classify the work as art or literature, maybe both?
Its definitely a seminal piece.

------
triplesec
To be more precise, this is a review of more than 10 free and open source time
series databases, wherein the author chooses 10 favourites, by some metrics.

~~~
threeseed
What do you think is missing ?

Because those are probably the 10 most popular and well known.

~~~
polyfractal
Well, for one thing, this isn't even attempting an objective comparison. The
article basically grabs whatever third party benchmark numbers are available
and pretends all the metrics are comparable. In many cases, the external
benchmarks are testing entirely different things, so it's meaningless to give
those numbers in a "review" of all the options.

It's really just a "top ten list according to my arbitrary thoughts", which is
fine. But it's not really a useful comparison at all.

------
vslira
Question:

Does it make sense to use a time-series database if you're not too sure that
you can trust the "time" in the series?

For example, let's say you're logging user's locations on a jogging app and
using the system's clock as the time record (I understand this may not be
ideal). Someone could log a run 2 years from now, for example.

~~~
chaotic-good
It is usually a good idea to discard late writes. It should be possible to
write data with some time-stamp skew but if skew is too large data should be
discarded as erroneous. It's totally depends on an application so.

------
lormayna
What is a good TDB for collecting data from home automation sensors and that
can run on a little PC like a Raspberry?

~~~
threeseed
InfluxDB is perfect for these use cases. It's a single Go binary with
excellent performance and use of resources.

[http://www.aymerick.com/2015/10/07/influxdb-telegraf-
grafana...](http://www.aymerick.com/2015/10/07/influxdb-telegraf-grafana-
raspberry-pi.html)

------
RyanHamilton
For a similar comparison but that encompasses commercial databases as well see
here: [http://www.timestored.com/time-series-data/column-
oriented-d...](http://www.timestored.com/time-series-data/column-oriented-
databases)

------
fnl
What about BTrDB from Berkeley? [http://btrdb.io/](http://btrdb.io/)

~~~
dataloopio
It's in the spreadsheet. I've not tried it and there wasn't much info
available at the time of writing the blog.

------
socmag
I'm on my phone and my eyes are not so good as it is.

Does he say what the architecture of the platform he was testing this on?

~~~
socmag
Never mind, I found it in the Gist...

[https://gist.github.com/sacreman/b77eb561270e19ca973dd505527...](https://gist.github.com/sacreman/b77eb561270e19ca973dd5055270fb28)

1 x DalmatinerBD server GCE n1-standard-16 (16 cpu, 60GB memory, 1 x 375G
local SSD disk)

I would assume that is the same setup they used for all the other tests,
though not clear about that yet

Anyone know the network bandwidth on these instance types?

~~~
dataloopio
Hi, I'm the original author. I think that GCE instance has 2Gb per core but
I'd need to check. It wasn't bottlenecked on network bandwidth though.

The other benchmarks are linked in the spreadsheet to their respective
details. It's not an absolute, direct, fair comparison. However, we wanted to
start somewhere with information available right now and try to collect better
results over time as the respective interested parties benchmarked and blogged
about their databases.

I'd like to spend more time benchmarking every database in the spreadsheet but
it feels like something the project owners should do themselves. I'd probably
only get the setup wrong.

~~~
socmag
Thanks for the clarification, good to know.

------
xmatos
What specific features a time series database offer over mongo or redis?

~~~
gaius
Well one advantage compared to Mongo is they actually keep your data safe.

~~~
detaro
I wouldn't trust on all of them to do that reliably. If you look at a large
group of database systems, some of them are going to have bugs or bad defaults
as well.

