

Ask HN: How do you manage time series data in your medium-data application? - bsmith

We&#x27;re a small team currently using a rolled-our-own MySQL&#x2F;Rails solution for time series data, for an application that records ~100,000 raw data points per day. Previously, we tried a solution based on HBase (OpenTSDB, to be exact), but a horizontally scalable solution is overkill at the moment and we can&#x27;t justify the operations overhead of running a service that complex. The API, built-in rollups and caching were nice, however.<p>At the moment, we&#x27;re in the middle of trying to expand our client-facing features and spruce up the backend (including using some form of caching on our time series queries), and have reached a bit of an impasse on how best to proceed. I&#x27;m wondering what kind of solutions HN readers have seen&#x2F;employed in crossing the bridge between an unoptimized RDBMS-based solution, and the seemingly overkill BigTable-esque options.
======
valarauca1
100,000 points per day? If a point is just a 64bit time stamp, and 64 bit
float value. That results in ~122Gb per day of data. (or 44.5Tb per year).

I'd suggest looking into a horizontally scaling solution if that is actually
every day, day to day use.

:.:.:

The traditional solution to this problem when your working with raw DAQ
communication/values is you store your data logs on a cold storage cluster
(lots of slow cheap HDD's).

While you store 'meta-data', when data was collected, by who, what device,
what day, a pointer/path to orginal data. In a more active SQL data base.

:.:.:

I'm speaking from live data acquisition experience. May not 100% align with
your use case. Feel free to email if you need more info.

~~~
bsmith
We actually see more on the order of a few 10s of MB per day (the data points
are fairly small, and we've done some optimization to de-duplicate
timestamps). Our entire database is currently less than 10GB.

~~~
valarauca1
If you de-duplicate time stamps how can you dynamically re-generate the data?

I.E.: Receive a sine wave, and give back the same sine wave?

~~~
bsmith
We collect data from a number of sensors at once, at an interval of 15 seconds
(nothing super high speed or precision, here), so a row contains only one
timestamp (UNIX epoch), but several sensor readings. It's for an electricity
metering application.

------
bennyp101
I'm using influxdb at the moment, a quick back of beer mat is ~288,000 a day -
it works great, and I like the SQL like querying. Not sure on size here, as
I'm in the pub.

More a testing thing at the mo, but its been running fine for the last few
months.

