
Ask HN: Best time series database currently - mads
I am looking for a database to store time series data from sensors and found CrateDB. It looks very interesting. Does anyone have experience with it?<p>I need something that will scale well horizontally, which it seems CrateDB should be able to do, and CrateDB also has a MQTT broker built into their enterprise version.<p>Or maybe some other alternative that will scale well?<p>I am looking at influxdb and Prometheus also but the MQTT broker in CrateDB really appeals to me.
======
ehllo
Just have a look at Timescale. It's a open source postgresql extension, and
can sit side by side with other stuff. Scaling postgresql should be no problem
for you.

[https://www.timescale.com/](https://www.timescale.com/)

~~~
jurgenwerk
I don't think PostgreSQL scales nice horizontally. I do use TimescaleDB though
and it scales really nice vertically, but would probably be hard to make it
run on multiple machines.

~~~
ehllo
I see timescale more as an alternative option, since the OP also aksed for
this. - And you're right, since Timescale is an postgresql extensions, it also
has the same benefits/problems as postgresql itself in this manner.

~~~
stalller
Thanks for the recommendation of Timescale :)

Here are some more details on our future plans for clustering. We do have
horizontal scale-out clustering on our roadmap and it's hard to say exactly
when it will be released, but we are aiming for the 2nd half of 2018.

That said, we do often find that there multiple reasons why people ask about
"clustering" or say they need scale-out:

A. Because you want to scale the amount of available storage - (we allow you
to elastically add disks to scale-up the capacity on a single hypertable, have
had customers scale a single hypertable to 500B rows)

B. Because you want high availability - (we support this today, via PostgreSQL
streaming replication and will be documenting this further)

C. Because you want to support more concurrent queries - (supported today
across primary replicas)

D. Because you want to support high ingest rates - (depending on your use
case, we have users doing 100-400k rows / second)

E. Because you want to parallelize individual queries (that touch a lot of
data) - (some support for parallelization today, more to come)

So we do meet the needs of many today without support for full scale-out
clustering (scaling vertically, as jurgenwerk points out). If your
requirements are closer to millions of rows per second inserts and storing
100s of TBs / PBs of data, we can't yet support this, but working towards it!

------
johann8384
OpenTSDB scales well, 10 million DPS writes on a 36 node cluster for example.
It's not easy to get working _well_ at scale out of the box. That is something
we really need to work on in that community. There is a project called Splicer
(github.com/turn/splicer) that will shard the incoming queries into 1 hour
blocks and cache the 1 hour blocks, it also sends the queries to the same node
where the region server is that hosts the data. This makes the queries VERY
fast.

------
Maven911
how about kdb, the time series in memory db, long history in financial
services

------
dyeje
We use DynamoDB at my place. We had to make some tooling around it and figure
out some growing pains, but now we have an extremely scalable solution.

------
kevindqc
I don't know anything about this, but it might help if you provide information
on what you want to do with the data.

~~~
mads
My main concern for now is to be able to store/write the data collected from a
million IoT devices in a scalable fashion, so there is not really a
requirement for what to do with it other than, when the data is there it will
be analyzed on an adhoc basis and then we will see what we can do with it.

~~~
brianwawok
That is going to be hard to achieve.

To know how to store the data, you need to know how you will query it.

If you know how you will query it, you can say devise a way to store it in
Cassandra.. which will scale up to PB.

If you just throw in a PB of data, I am not aware of ANY system that is going
to let you drop ad-hoc queries at the data and get fast answers. You
effectively need to load most / much of the data off disk to process it.

If you only want to store it and not query it yet.. store it off on flatfile.
When you decide how you will query it, load it from the flatfile and switch
writes to your new system.

~~~
mads
I recently built a system capable of handling this load, but is was based on
MongoDB and thats not an option in this project because of hardware
constraints.

Cassandra may be an option. I will look into it.

~~~
ehllo
Maybe this is a better option for you.

[http://www.scylladb.com/](http://www.scylladb.com/)

------
IpV8
It may be worth looking into what AWS has to offer. Their IoT offerings seem
to be pretty solid.

