On servers with a very high CPU load, backup done without using continuous streaming to a second server and done from this second server, something (I have stopped using timescale so I can't tell you what did) during backup caused a peak in load and IO, impacting read and write performance of the primary server, causing a cascading failure of the processes due to timeouts, eventually taking the server down due to swap issues and OOM triggering a reboot.
So we stopped doing backups. Actually, that's how we started using clickhouse: for cold storage, as the files in /var/lib/clickhouse used far less storage space and issues. Eventually the same data was sent both to timescaledb and clickhouse, in a poor's man backup. Finally, timescaledb was removed.
> As with any DB you do have to size and configure your database correctly (which these days isn't hard).
Thanks for supposing we didn't try. We did not end up with 256Gb of RAM per server for no reason.
All I'm saying is that Timescale totally has a place, but not beyond a certain scale and complexity.
> We've never seen anybody claim that ClickHouse offers significantly better compression than we do overall
Altiny does, so do a few others. mandigandham above says that you are now at 70% of what clickhouse does. I'm not saying you're not improving. It was just one of the too many issues we had to fight.
Also, you have only recently introduced compression - good, but I'm not aware if you already offer something like DateTime Codec(DoubleDelta, LZ4), or the choice of compression algorithms. LZ4 can be slow, so there is a choice between various alternatives.
For example, T64 calculates the max and min values for the encoded range, and then strips the higher bits by transposing a 64-bit matrix. Sometimes it makes sense. zStd is slower than T64 but needs to scan less data, which makes up for it. Sometimes it makes more sense.
Large databases need more flexibility.
> If you are processing all of your data for all your queries then yes, click house sequential scans may be better
I confirm, it is better.
And for some workloads, continuous aggregates make no sense.
> We've seen customers successfully use our single-node version with 100s of billions of rows so claiming that we are just for small use-cases is simply untrue, and especially with the launch of multi-node TimescaleDB
I have about 50Tb of data per server. What is below 1Tb I call "small use cases".
> I understand people may have different preferences and experiences, but some of these felt a bit off to me.
When I was trying to use timescaledb and reported weird issues, I had the same return: my use case and bug report felt "off" to the person I reported them to.
Maybe it is why they weren't addressed - or maybe much later, when reported by more clients?
Personally, I have no horse in the game. If you become better than clickhouse for my workload, and if the license change to allow me to deploy to a cluster of AWS servers (just in case we ditch our own hardawre), I will consider timescale again in the future.
For now, I'm watching it evolve, and slowly address the outstanding issues, like disk usage, and performance. By your own admission and benchmarks, you are now at 70% of what clickhouse does - in my experience, the actual difference is much higher.
But I sincerely hope you succeed and catch up, as more software diversity is always better.
- I believe the Altinity Benchmarks [0] are from 2018, on TimescaleDB 0.12.1. TimescaleDB has gotten much better since then (now on version 1.7.2), and most notably, offers native compression now (it did not then).
- I believe manigandham's 70% comment is more of an offhand estimate and not a concrete benchmark. But perhaps he can weigh in. :-)
- Re: compression algorithms, TimescaleDB now employs several best-in-class algorithms, including delta-delta, gorilla, Simple-8b RLE. Much more than just LZ4. [1]
Overall, I don't think anyone has done a real storage comparison between TimescaleDB and Clickhouse since we launched native compression. It's on our todo list, but we also welcome external benchmarks. But based on what we've found versus similar systems, I suspect our storage usage would be really similar.
So we stopped doing backups. Actually, that's how we started using clickhouse: for cold storage, as the files in /var/lib/clickhouse used far less storage space and issues. Eventually the same data was sent both to timescaledb and clickhouse, in a poor's man backup. Finally, timescaledb was removed.
> As with any DB you do have to size and configure your database correctly (which these days isn't hard).
Thanks for supposing we didn't try. We did not end up with 256Gb of RAM per server for no reason.
All I'm saying is that Timescale totally has a place, but not beyond a certain scale and complexity.
> We've never seen anybody claim that ClickHouse offers significantly better compression than we do overall
Altiny does, so do a few others. mandigandham above says that you are now at 70% of what clickhouse does. I'm not saying you're not improving. It was just one of the too many issues we had to fight.
Also, you have only recently introduced compression - good, but I'm not aware if you already offer something like DateTime Codec(DoubleDelta, LZ4), or the choice of compression algorithms. LZ4 can be slow, so there is a choice between various alternatives.
For example, T64 calculates the max and min values for the encoded range, and then strips the higher bits by transposing a 64-bit matrix. Sometimes it makes sense. zStd is slower than T64 but needs to scan less data, which makes up for it. Sometimes it makes more sense.
Large databases need more flexibility.
> If you are processing all of your data for all your queries then yes, click house sequential scans may be better
I confirm, it is better.
And for some workloads, continuous aggregates make no sense.
> We've seen customers successfully use our single-node version with 100s of billions of rows so claiming that we are just for small use-cases is simply untrue, and especially with the launch of multi-node TimescaleDB
I have about 50Tb of data per server. What is below 1Tb I call "small use cases".
> I understand people may have different preferences and experiences, but some of these felt a bit off to me.
When I was trying to use timescaledb and reported weird issues, I had the same return: my use case and bug report felt "off" to the person I reported them to.
Maybe it is why they weren't addressed - or maybe much later, when reported by more clients?
Personally, I have no horse in the game. If you become better than clickhouse for my workload, and if the license change to allow me to deploy to a cluster of AWS servers (just in case we ditch our own hardawre), I will consider timescale again in the future.
For now, I'm watching it evolve, and slowly address the outstanding issues, like disk usage, and performance. By your own admission and benchmarks, you are now at 70% of what clickhouse does - in my experience, the actual difference is much higher.
But I sincerely hope you succeed and catch up, as more software diversity is always better.