For example, they mention their automated data retention and how it's achieved with one SQL command, and how DELETEing records is a very costly operation, and how "even if you were using Postgres declarative partitioning you’d still need to automate the process yourself, wasting precious developer time, adding additional requirements, and implementing bespoke code that needs to be supported moving forward".
There's zero mention anywhere of pg_partman, which does all of these things for you equally as simply, and is a fully OSS free alternative .
I get that it's a PG extension that competes with their product. I know that TimescaleDB does a few other things that pg_partman does not. But I can't help but find its (seemingly) purposeful omission in these, otherwise very thorough blog posts, misleading.
Shameless self-plug: we're also building database technology, but for embedding vectors (https://milvus.io) rather than time-series data - when doing query performance comparisons, we get several orders of magnitude of performance improvement over traditional databases. It's an unfair comparison, so we generally avoid it.
We have published many, many benchmarks versus other database systems. IIRC all of them also made the front page of HackerNews.
Here are some of them, for your reading pleasure :-)
TimescaleDB vs. InfluxDB:
TimescaleDB vs. ClickHouse:
TimescaleDB vs. Timestream:
Maybe I need to do my own comparison at some point.
That's a fair question.
We find that most developers storing time-series data on Postgres are doing so without pg_partman. So we first wanted to provide a benchmark that would be useful to most developers.
This benchmark was also the result of months of dedicated work. So the team did spend a lot of time on this. Unfortunately, they ran out of time to cover pg_partman. But that comparison is on our list of things to do soon.
So the relevant question isn't what a typical PostgreSQL user is doing, but whether someone wanting to optimize their storage should look at a PostgreSQL extension, or an upstart database like TimescaleDB.
TimescaleDB is a PostgreSQL extension, just to be clear.
Though I didn't write this post, I'd imagine at least part of it is that it's already nearly 4000 words and a 15 minute read and we just didn't want to add another set of things to it, to be perfectly honest.
`pg_partman` is cool! I haven't used it in a while, and because it uses declarative partitioning, it has some locking issues that we address with our partitioning scheme, but implying that it is OSS and we're not in terms of things like data retention features is a bit misleading as well. The `drop_chunks` command used for data retention is in the Apache 2 licensed portion of Timescale.
Just to clarify: Nothing on Timescale is closed-source. It is all source available, all on Github. Some of it is Apache2 licensed, some of it is Timescale Licensed. And it is all free.
While some people on HN feel that this is an impurity they can’t live with, I personally think it’s a small price to pay to enable development of TS to continue. In my opinion, claiming that it’s closed source is somewhat dogmatic. Many open source licenses have some kind of restrictions on use; the GPL comes to mind.
This license turns out to be very difficult to use for almost developer.
Do you want amazing things? Everything can't be "free as in beer" wtf does that even mean, i don't get free beer from anywhere.
That's what "free as in beer" means -- it's a well-established phrase meaning "zero monetary cost": https://en.wikipedia.org/wiki/Gratis_versus_libre
In the case of the non-Apache-licensed version of TimescaleDB, you're allowed to use the software without payment, and you can distribute unmodified copies. But you're essentially forbidden from letting users define their own schemas, or from modifying it or reusing components unless your modified version imposes that same restriction. (The exception is if you agree to transfer ownership of your changes back to Timescale.)
Nobody's saying that Timescale can't build a non-open-source database, only that they should be clear about which parts are actually open. In my opinion, describing it on the homepage as an "open-source relational database" and then promoting it by benchmarking the proprietary version is at least a little bit misleading.
Easy fix - we can do a benchmark comparing TimescaleDB to pg_partman.
Longer reply: pg_partman does address many of the same developer experience items we do, but it doesn't offer things like compression or modified plans. It will perform roughly the same as declarative partitioning (because that's what it uses), and I'm guessing we will see results similar to the last large table (in the Declarative Partitioning section).
This could be done by just calculating the start date in code too.
> When hypertables are compressed the amount of data that queries need to read is reduced, leading to dramatic increases in performance of 1000x or more.
At my workplace we recently experimented with storing time series data in an array in a postgres row. This gets compressed as a TOAST array, can store thousands of ints in just a few DB pages (aka loading it is about the same as an index scan). We also use Timescale for a different service mind you. I'm sure this format is more efficient than the Timescale format too. In Timescale you would need rows containing (for example) (date, user_id, time_on_site), one row per day. The postgres array format (start_date, user_id, time_on_site_by_date) indexing where 0=start_date, 1=start_date+1 is like 1/3rd the size uncompressed. And yea, even if something is compressed, you still gotta put the uncompressed version in memory somewhere.
We are kindred spirits I think! I did this too  a while back at a previous company and it actually served as part of the inspiration for our compression work! It's fun, but a bit difficult to query at times. Our compressed columns do also get TOASTed and stored out of line.
I'm not sure that it's going to be much more efficient than the Timescale format once it's compressed, we have some pretty good compression algos, but I might be missing something about your case, we generally can achieve close to 10x compression, but right now you can't write directly compressed data, so you would save on the write side I suppose.
It is true that you need to put the uncompressed version into memory at some point, but we do try to limit that and in many cases you end up IO limited moreso than memory limited. We're also thinking about doing some work to push processing down towards the compressed data, but that's still in the "glint in our eye" stage, but I think it has a lot of promise.
(As a side note, TOAST is still the best acronym around ;) ).
Looks like these are quite different in features
Yes, 100%. We deliberately choose the "+" symbol instead of "vs." for this blog title. We love PostgreSQL. :-)
Or hybrid databases like StarRocks or TiDB?
Timescale apparently lags pretty far behind modern columnstore engines.
As with anything, it depends on what you want to do.
If you have an OLAP heavy workload with long scans, etc (which is the type of queries prominent on the ClickHouse page - e.g., Q0 is "SELECT COUNT(*) FROM hits;"), then I would highly recommend systems other than Timescale. (Although we are also working on this ;-) )
But if you have time-series workload, or even, if you love Postgres and are building a time-series and/or analytical application, then I would recommend TimescaleDB.
ClickHouse is great. I just believe in using the right tool for the right job. :-) There are many areas where column store engines beat TimescaleDB. But nothing comes for free - everything has a tradeoff.
Do you have some experience with it?
What's the best approach to get excellent compression? Can I exploit this redundancy within the column somehow? I don't even know the right search terms for this.
TimescaleDB compression will work only for integer columns, right? And in any case, this is not a time series.
- Gorilla compression for floats
- Delta-of-delta + Simple-8b with run-length encoding compression for timestamps and other integer-like types
- Whole-row dictionary compression for columns with a few repeating values (+ LZ compression on top)
- LZ-based array compression for all other types
So as to your question, just turn on compression; it's very common to see 94-97% reduction in storage.
I can’t understand whether HA would rely on the standard postgresql tooling or if you have to pay for some kind of enterprise license to get it.
The code is source-available, license philosophy is explained in this blog post https://www.timescale.com/blog/building-open-source-business...
That includes the HA implementation, and I think someone else shared the docs for that. Hope this helps.
Update/delete of current chunk data blocked during compression of old data