

Ask HN: What's the best way to store time series data in a relational database? - thewarrior

We are currently having a write heavy load with about a half a million records being added every day. This could grow in future ,  so having one record for every minute with an index would slow down the writes.<p>Is it ok to have one record per hour with sixty columns for each minute ?<p>Or is it better to use a non relational database ?
======
bjourne
> Is it ok to have one record per hour with sixty columns for each minute ?

No. That's wasteful and makes many types of queries very hard to write.

A good way of storing the data is only storing change events. So for example,
each time a stock ticker changes you store a new row with the timestamp and
the new value for the stock. This achieves very high level of "compression"
since most stocks values doesn't change every minute. But it depends on what
kind of data you have.

~~~
mandeepj
> since most stocks values doesn't change every minute.

Are you serious? Stock data changes every second and sometimes earlier than
that.

~~~
bjourne
No, most stock prices doesn't change that frequently.

------
euroclydon
Don't. Use an industrial historian.

[https://en.wikipedia.org/wiki/OSIsoft](https://en.wikipedia.org/wiki/OSIsoft)

[http://software.schneider-
electric.com/products/wonderware/p...](http://software.schneider-
electric.com/products/wonderware/production-information-management/historian/)

[https://en.wikipedia.org/wiki/Operational_historian](https://en.wikipedia.org/wiki/Operational_historian)

------
fsk
Your index wouldn't be clustered, is it? For a regular index, insert is O(log
n). For a clustered index, insert is O(n).

You can also bulk insert records, rather than 1 at a time.

We have 2k operations per second on our aws rds.

~~~
pjungwir
Definitely agree re bulk inserting, say every 1 minute or 1 hour or whatever.

I was paying around recently with Amazon's Kinesis for this, but if you don't
want the vendor lock-in you could easily roll your own system that bundles up
events and stores a bunch at once.

~~~
fsk
If you do

open database connection

do one insert

close database connection

That adds up fast.

At work, I've been very disappointed in the performance of their Amazon RDS.
They had unexplained master/slave replication issues.

You also can put a memcached in front of the database.

------
brudgers
If you're storing time series data as events, then it's essentially a log and
a relational database would seem to be better suited a secondary structure for
batch analysis. I found this article useful for understanding logs and the
possibilities for their use:

[https://engineering.linkedin.com/distributed-systems/log-
wha...](https://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying)

------
kasey_junk
You've mentioned a few things about your write side (but didn't specify the
critically important what you want to store, and what resolution your time
series needs? You hinted at it only needs minute resolution?)

Further, you didnt mention at all your read requirements. Without that any
recommendation you could possibly get would be useless.

~~~
thewarrior
We're storing floats at one minute resolution . We could have thousands of
writes a second at peak load . Reads are not that heavy for now .

~~~
kasey_junk
The read volume is much less important than the _kinds_ of queries you want to
run on the data set. But from the sounds of it you do not have a SQL problem,
the data doesn't have/need varying query styles/indexes, it probably degrades
in importance quickly over time, and it isn't interrelated. Use one of the
umpteenth time series databases that were written for this problem.

That said, any decently tuned RDBMS/sql app can easily handle thousands of
writes a second, especially if you are just dumping them into giant buckets of
resolution.

------
sjg007
Use a NoSQL database like Cassandra. For your use case you can easily do time
series. Cassandra is optimized for write heavy workloads.

~~~
higherkindly
Look at Khronus, it relies on Cassandra and Akka Cluster:
[https://github.com/despegar/khronus](https://github.com/despegar/khronus)

