
Time-series data: Why (and how) to use a relational database instead of NoSQL - cevian
https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c
======
LogicX
Great overview of the landscape.

Any thoughts on Influx's upcoming release and its reduced memory footprint?
[https://www.influxdata.com/path-1-billion-time-series-
influx...](https://www.influxdata.com/path-1-billion-time-series-influxdb-
high-cardinality-indexing-ready-testing/)

------
cmccollough
What might the transition out of a Cassandra-based tech stack look like if I
wanted to go relational?

~~~
RobAtticus
Its tough to know exactly, because I think schema transformation would be the
most time-comsuming part and its hard to say exactly how that goes.

But once that's settled, for migrating your data, probably the most straight-
forward manner would involve outputting to CSV and then using that to re-
import to TimescaleDB. Then TimescaleDB would just fit into place where your
Cassandra instance used to be.

~~~
cmccollough
Gotcha. This may be a bit of a tangent to my above question but it came to
mind from the topic of transitioning:

Do you all discuss event sourcing and how it fits within an overall
TimeScaleDB strategy? Anything more than a peripheral concept?

~~~
RobAtticus
I'm not entirely sure what you mean by event sourcing. Can you elaborate?

~~~
cmccollough
It's a design pattern that's mostly relevant in the context of traditional DBs
handling non-time series data - if we cut down to it, it's really about
keeping a transactional DB of all events taken on the data. Microsoft seems to
have some decent high-level documentation on the concept:

[https://docs.microsoft.com/en-
us/azure/architecture/patterns...](https://docs.microsoft.com/en-
us/azure/architecture/patterns/event-sourcing)

When talking time series, there certainly seems to be less of a use case for
it in the sense that data is mostly immutable with few updates to old records,
just new entries as data streams in. That being said, I wonder if there are
use cases where the event sourcing concept may bring value to time series DBs
- maybe I ingested some bad data that I need to go back and clean up, maybe
the structure of my data changed requiring a change in my DB schema, etc. Was
just curious if this is something you all have put any thought into at a high-
level.

~~~
RobAtticus
Not sure it's come up too often, but it's something we can think about going
forward.

------
RobAtticus
Earlier discussion about our launch (covering some of the points here):
[https://news.ycombinator.com/item?id=14035416](https://news.ycombinator.com/item?id=14035416)

Around if there are any questions.

~~~
dgacmu
This may be a question as much to other prometheus users as you, sorry. :) I'm
using Prometheus to record both "metric" data (temperature of CPUs, etc.) as
well as "event" data (program produced result x). It's seemed awesome for
metrics and poor for events. (a) Is there a way to coerce Prometheus into
doing what I want? (b) Is this something fundamentally easier or harder in
TimescaleDB, and why? aka - should I consider switching even though you're
beta-ish, and is it a compelling reason, or just something I'm doing wrong?

(Yeah, I know, I'm asking the Internet to do my job for me... thank you,
Internet!)

~~~
cevian
Hey,

Not sure if this is part of the question but if you are querying the event
data by something other than time (like a property of the event itself --
maybe the program return code in your example) than prometheus requires a full
table scan since it does not have secondary indexes. With TimescaleDB you
could add indexes on properties of the event data itself.

~~~
dgacmu
Ahh, I think this answers my question. Painful - thank you!

------
strivedi
We have to deal with a lot of historical TS data as well as recent. Is
Timescale a solution we should look into if we have to backfill a lot of data
or is a NoSQL option more ideal?

~~~
cevian
TimescaleDB is actually a lot better for backfilled/out-of-order data then a
lot of NoSQL options. This is because in TimescaleDB data is fundamentally
organized by data time instead of insert time (as it is for LSM trees, for
instance).

------
jterrace
What kind of compression ratio do you get?

~~~
cevian
(TimescaleDB developer here)

Yeah, so on-disk compression is one area where we aren't as competitive with
NoSQL column stores.

However, two things to note:

1) Often many of those column-oriented DBs, based on LSM trees, actually need
to consume a lot more memory to index all of their disjoint SSTables. So it's
a tradeoff of memory vs. disk.

2) There are various things we have on our TODO to test, like just running
Postgres on ZFS. We'll write up the results when we do.

