Hacker News new | comments | show | ask | jobs | submit login
Timescale, an open-source time-series SQL database for PostgreSQL (timescale.com)
358 points by pbowyer 162 days ago | hide | past | web | favorite | 100 comments



Could you contrast this with the approaches mentioned in the series of blog posts starting here: https://grisha.org/blog/2015/09/23/storing-time-series-in-po...

That blog post grew to be tgres http://github.com/tgres/tgres https://grisha.org/blog/2017/03/22/tgres-0-dot-10-dot-0b-tim...


To my understanding, Tgres is really more of a "middleware" layer that collects metrics and performs aggregations on them that are stored back into Postgres (e.g., generates aggregate rates for evenly spaced time intervals a la RRDTool), rather than being a scalable time-series DB itself.

That's useful in many dashboard-based server monitoring applications, but time-series DB have many other applications (and can benefit from more complex queries even in monitoring).

Tgres and Timescale are actually a bit complementary, and you might even be able to use Timescale as a better backend for Tgres.


As the author if Tgres, I can chime in here - dr. mfreed is correct.

While Tgres is an application layer, the main motivation behind developing it was to answer the question "can TS be stored in Postgres efficiently, ideally without requiring an extension". Not that there is anything wrong with custom extensions, but I wanted to keep the requirements to the absolute minimum.

I always had issues with people saying that relational databases are fundamentally not suitable for TS storage, and Tgres debunks this by demonstrating that you can sustain very high rates of incoming data by simply organizing the data in a more creative way.

The graphite functionality that Tgres emulates is just there to prove the point - as in, look it does all these things, and it's all in the database.

Hypothetically Tgres could work on top of Timescaledb with a few changes, I just only have so much spare time to tinker with this experimental stuff that I haven't tried it.

Another interesting thing I came across is PgPointCloud [1], it's designed for LIDAR data but is perfectly suitable for time series as well. It is a C extension. It's performance advantage comes from storing large numbers of data points in a variety of compact/compressed formats.

[1] https://github.com/pgpointcloud/pointcloud


It is still not very clear at this point if Tgres approach is sufficient enough, and if timescale adds significant performance advantage over native array based Postgres implementation.


I think the broader point is that Tgres is generally focused on computing regular aggregations. Once you do that aggregation, you lose significant information about the relation between data collected at the same time, which eliminates your ability to ask a variety of questions. For some basic dashboarding/monitoring applications you don't need this, for other applications you absolutely do.

So, it's pretty common in Timescale to store raw data in one hypertable (with a shorter data retention policy), and aggregations in a separate table (with a longer data retention).

Regarding native storage, the insert rate really goes down as your table gets large, e.g., Timescale gets 20x higher throughput than PG: https://blog.timescale.com/timescaledb-vs-6a696248104e

I don't see a reason this wouldn't apply to Tgres' use of native storage as well...but once you do aggregations (say, per minute), your tables are just much smaller (only 525K minutes / year), so it perhaps matters less.


> Regarding native storage, the insert rate really goes down as your table gets large, e.g., Timescale gets 20x higher throughput than PG: https://blog.timescale.com/timescaledb-vs-6a696248104e

I actually looked at this benchmark briefly, but couldn't find what kind of PostgreSQL schema you used there. Did you use similar array based schema as was described by Tgress author in his post? https://grisha.org/blog/2015/09/23/storing-time-series-in-po...


We used a pretty basic approach for both: 12 columns - timestamp, hostname, 10 CPU metrics.


The not so obvious difference in the Tgres approach, and my blogs might not be doing a great job of explaining it, is that some time in Feb 2017 I significantly revamped the storage approach to what I dubbed "vertical" storage whereby a timeslot stores an array of points in which every array element represents an element of a(nother) series.

So it went from:

    series1, array[val1, val2, val3 ...] --> time direction
    series2, array[val1, val2, val3 ...]
    ...
to

    slot1, array[series1_val1, series2_val1, ...] |
    slot2, array[series1_val2, series2_val2, ...] |
                                                  ^ time dir
With this structure you can write a single row and insert data points for n series (where n is array length) in a single row insert.

Thus, if I have 10,000 series, and my arrays are 1000-long, I can insert a data point for each of the 10K series in only 10 row inserts. This only works if the data points for all series for the same slot arrive at approximately same time, which in a monitoring-like scenario they usually do, but in other situations might not be the case.

The flip side of this approach is that querying the data then becomes less efficient because to read one data point of a series you end up reading an array-length of data points you might not care about for this particular query.

Also, tgres takes the round-robin approach, versus the timed partition approach and that's completely apples and oranges when it comes to performance. The round-robin approach also works only if the data points are evenly spaced (or transformed to be evenly spaced on the fly, which is what tgres Go code does), and again, it's hard to judge whether that's fundamentally "good" or "bad".

I can see how readers of this thread my be looking for which technique is faster, but it's just not that simple, and very much depends on the what the actual requirements are. The round-robin versus timed partition is also not mutually exclusive, you can combine the two, which may or may not be faster, not sure, the devil is in the details.


> to read one data point of a series you end up reading an array-length of data points you might not care about for this particular query.

But DBs and FSs operate on pages of data and not individual records, so you will be reading that row anyway, and likely much more.


> But DBs and FSs operate on pages of data and not individual records, so you will be reading that row anyway, and likely much more.

Yes, when dealing with database performance understanding this goes with the territory.

The "game" here is to organize data in such way that the stuff you read inadvertently is something that you will need eventually (as in in a few microseconds). This is where things like CLUSTER and BRIN indexes become important, and this is also why partitioning is a win.


The guy wrote something about TimescaleDB as well: https://grisha.org/blog/2017/07/13/timescaledb/


There are some confusions in that post -- perhaps understandable, as trying to deduce things from code review alone is tricky -- but Grisha and I have been communicating and hopefully it'll be updated.

Overall like the conclusion though :)


A project I work on has time series stats in postgres--it's essentially an interval, a period, a number of fields that make up the key, and the value. There's a compound index that includes most of the fields except for the value. It works surprisingly well, for tens of thousands of upserts per second on a single postgres instance. Easy app integration and joins are a huge plus. I'm really curious to check this out and see how it performs in comparison.


In a funny bit of coincidence -- we didn't post this link to HN :) -- we just published a blog post today comparing Timescale vs. native Postgres:

https://blog.timescale.com/timescaledb-vs-6a696248104e

tl;dr: 20x higher inserts at scale, faster queries, 2000x faster deletes, more time-oriented analytical features


And actually, if you want to run the benchmarks yourself:

https://github.com/timescale/benchmark-postgres


Ok and if one is partitioning by date and dropping partitions instead of deletes in vanilla postgres how does it compare ?


The delete performance will probably be similar, but standard partitions in postgres have a bunch of current limitations.

For example, the insert pipeline is still quite a bit slower, partition creation is still manual, can't do as good constraint exclusion at query time, can't do certain query optimizations we've built in, can't support user-defined triggers, can't handle UPSERTs, doesn't support various constraints, can't do VACUUMing across the hierarchy, etc.

We plan to write a blog post comparing against PG10 partitioning in the future to expand on this a bit.

All this said, we do love Postgres and realize that it's trying to provide a more general-purpose solution, so don't mean this as criticism. We can just build something more targeted at the time-series problem.


Thank you for the explanation looks like you have dedicated a good bit of effort to making a good ts solution, will be testing it out :)


This would be good for a eventsourcing storage? Fast events insert and fast reads?


Yep, it can be used for either "irregular" events or "regular" time-series like monitoring data.

For event sourcing, just make sure you index on the proper user/session/thing (Docs or Slack for more info).


Curious about your comment about "upserts" WRT time series data. When and why would you update a record in such a table?


Not sure about parent's use case, but we've seen scenarios where users want to synchronize data from downstream (say, they are collecting data on a hub in an IoT setting, and even using Timescale both on the hub and in the cloud).

But because they don't want to keep track exactly which batches they've uploaded already (in a fault-tolerant way), they want to execute the insert to the cloud DB as an UPSERT.

So most of the time it'll actually just be inserting, but in the rarer case that the data has already been merged, the 'ON CONFLICT' side of things (in Postgres speak) can take over: DO NOTHING, DO UPDATE, etc.

As aside, turns out the constraints you'd need for upserts aren't supported by Postgres table inheritance (the typical way you do sharding), nor in PG 10 partitioning. But, we did add special support for this in our latest release :)


Why do you usually advertise the write performance? Let's say that I have "100+ billion rows (the number in your landing page)", how much time it takes to run a simple GROUP BY query?

The benchmark repo doesn't actually include the performance comparison between Timescale and Postgres: https://github.com/timescale/benchmark-postgres#benchmark-qu...

This blog post (https://blog.timescale.com/timescaledb-vs-6a696248104e) has some query benchmarks and the main benefit it that the hypertable will partition the data smoothly and if we query the table by filtering with timestamp column, it will be fast since Timescale uses partitioning as an indexing method.


Write performance is a much simpler metric than query performance, which is HIGHLY dependent on the actual query being performed. Plus, in many time-series settings, you actually need to support high-write rates, which vanilla RDBMS tables can't support.

On the query side, we find that most queries to a time-series DB actually include a time predicate, LIMIT clause, etc. It's pretty rare that you do a full table scan over the 100B rows. (And for these types of broad scans, performance depends on # disks and use of query parallelization.)

Not sure I understand the comment about the benchmark repo doesn't include the performance comparison? That repo is meant to accompany a blog post, which discusses the results (https://blog.timescale.com/timescaledb-vs-6a696248104e), while the repo allows you to replicate our results.


I mentioned about the benchmark repo because I wanted to learn why you usually advertise on the write performance instead of query performance. The benchmark repo shares the results for write performance but not query performance. Later on, I saw the benchmark for query part in your blog post, which was great.

I agree that full-table scan is not common in time-series use-case and you can't improve the performance in that case unless you use a different storage format. The confusing part for me is that if I have 100B rows, I would probably use a distributed (multi-node) solution unless the dataset includes 50 years of data and I want to query the last week because Postgresql is not good enough when aggregating huge amount of datasets.

Do you have any plan to release distributed version (the chunks may be distributed among the nodes in cluster) or implement columnar storage format?


Yes, we're working on a distributed version of Timescale as you describe.

But two clarifications:

1. It can aggregate better than you might think. We've had people run single-node Timescale with 20+ disks, then couple that with query parallelization, and you can do pretty good aggregation over larger datasets.

Plus because the way the data is partitioned, a GROUPBY will actually get good localization over the disjoint data (i.e., groups can be local to a chunk) and generate more efficient plans given the smaller per-chunk indexes.

(And the various cloud platforms make it really easy to attach many disks to a single machine. Our our published benchmarking is on network-attached SSDs.)

2. You can use read-only clustering today, i.e., with standard Postgres synchronous or asynchronous replication. So you can scale your query rates with the replicas as well.


Thanks for the clarification.

1. Do you use Postgresql 9.6 query parallelization (https://www.postgresql.org/docs/9.6/static/parallel-plans.ht...) or your own method for processing chunks parallelly? When we have >1B rows with >20 columns, the IO usually becomes a huge the bottleneck in our experience. If you use multiple disks and parallelize the work among different CPU cores, it would help I guess.


Currently support 9.6 query parallelization. Also considering extending with some of our own methods on chunks as well.

(Timescale supports multiple disks either through RAID or tablespaces. Unlike PG, you can add multiple tablespaces to a single hypertable.)

Happy to also go into more details on Slack (https://slack-login.timescale.com) or email.


Great to see what you all are doing.

Are there any plans to move timescale to be an extension as opposed to a fork? We've found ourselves at Citus that maintaining an extension lets us more easily stay up to date with current releases. Would love to see the same applied to timescale.

Edit: Looks like it is already one, just was unclear in the docs on the setup steps to me. Well done all.


Hi Craig, Timescale person here.

noir-york is correct. TimescaleDB was always an extension, never a fork. So all installations are just a `CREATE EXTENSION` and upgrades between versions just via `ALTER EXTENSION` commands.

But you're absolutely right -- way better than doing a fork!

(Say hello to Ozgun for us.)


It looks like it is indeed an extension: http://docs.timescale.com/getting-started/installation?OS=li...


Please stop tormenting me. This looks like exactly what we need (I was looking into manually partitioning the other day) it's just so annoying there is not yet Amazon RDS support.


Customer requests help the process with RDS along :)

https://github.com/timescale/timescaledb/issues/65


I've come to rely heavily on Cassandra, but I miss good old SQL and adhoc functionality. Systems like Cassandra bring orher requirements when you need flexible data (Spark, for example), technical debt is always a worry for me.

I want to give this a go for sure!


Hi, I'm curious what limitations you're running into with Spark SQL?


No limitations. Spark works and does a good job, it has many features that I can see us use in the future too.

With that said, it's yet another piece of tech that bloats our stack. I would love to reduce our tech debt: We are much more familiar with relational databases like MySQL and Postgres, but we fear they won't answer the analytics problems we have, hence Cassandra and Spark. We use these technologies out of necessity, not love for them.


Ah, I see -- so a thing that I'm curious about is, what do you miss about relational databases? Are they mainly aspects on the operational side, or the usability/API side?

Ultimately, the question that I'm interested in trying to answer is: would it help if there were more ways to make Spark feel like a traditional relational database? (e.g. being able to interact with the Spark driver using MySQL or Postgres wire protocol)


Spark already does a good job at that, imo. It's increasingly easy to query information, at this point we are basically writing SQL-like queries with it. BUT, Spark isn't a relational db, or even a storage solution. What I miss is just having the one piece of technology that deals with both storing and querying: Actual relational databases.

It's interesting. 10 years ago I would have probably said something like that "relational dbs will just get better as data grows", quite the opposite happened... Relational has been pushed to the side and we now have to learn a lot of new technologies, in my case: Cassandra; Spark; Pandas (python). This whole stack used to be just MySQL :)... And I miss those days!


Ah, yeah. I have so many mixed thoughts on this. I also think that the open source world copying Google's super-decoupled GFS-Bigtable-MapReduce-Dremel-etc... architecture was really not great for operational complexity. Few teams can operate like Google, and maintain so many moving parts in production all at the same time.

At the same time, of course there are some very good points to be made for this sort of storage agnosticism -- mainly from an efficiency standpoint (i.e. being able to choose the storage format for the occasion). I'm really not quite sure if this argument is strong enough for completely sacrificing the simplicity of a traditional database.

Sometimes I think that MPP engines like Spark should take the philosophy of "batteries included but replaceable" -- that is, basically serving as an all-in-one "database" that provides a default storage engine (e.g. a basic columnstore and a basic rowstore), but still letting the user plug in other data sources to join, only if they want.


Have you looked at memsql.com? It's the "better" relational database you're talking about, especially for data warehousing.


further evidence of how postgreSQL is eating noSQL. Every good concept first implemented in a custom noSQL solution eventually becomes an extension in postgres.


I also do the same when developing: Proof of concept in NoSQL, then go wild with the schema, then watch it become a horrible mess, then refactor into SQL (That being pg or mssql depending on the needs/politics)


We have a requirement of saving 100million data points every 5 mins. What options should we explore for real time system for last 15 days of data and archival system for last 3 years of data?


We've had users at that scale at least on the real-time side (e.g., 100M/5min, 500B rows), although requires some care.

Might be easiest to discuss more on Slack (https://slack-login.timescale.com/) or email (mike at timescale) if you're interested.


I don't have any experience with this type of thing, so that sounds like an incredibly large amount of data. What are you doing that requires it? What type of useful queries are you even able to perform over 432 billion records?


300k+ sensors sampling at 1 hertz would get you there.


What do you do with that data? Just store or do some analysis on it (how? With what kind of resources?)?


Alright, I'm excited to check this out. Been teetering on InfluxDB for a while, but not something I wanted to just introduce into corporate. Great work guys!


While nice, it suffers from the same problem storing timeseries in any sql database: you have to predefine your tables. For a fixed and known set of metrics, that's all fine, but if you look at the possible outputs of for example Telegraf, things become a bit more tricky to pre-create all tables/schemas...


Well...sorta. Postgres/Timescale have pretty rich support for JSON these days (and its more efficient "binary" JSONB), so there are a whole range of options you can do that make it feel much more schema-less than before.

In fact, last month we released a beta version of a Prometheus connector for Timescale/Postgres that allows you to store arbitrary Prometheus metrics without pre-defining all these varied schemas:

https://github.com/timescale/pg_prometheus

The same approach should work for Telegraf, we just haven't yet tried to generalize this plugin.


Well, JSONB has a performance or afaik a pretty significant disk usage impact if you use GIN. It's pretty nice stuff, but I'm not sure about using it for timeseries. I haven't seen any benchmarks though, this could be useful.

One advantage specialized db's like influx have is specialized/optimized storage layers for the type of data while timescale seems to use normal postgres tables behind the scenes.

That Prometheus extension and adapter however look nice! They seem to be a good drop-in replacement for whatever storage, and the missing link for anything that's able to talk to Prometheus (which is quite a lot, including Telegraf).


A common approach we find for storing devops-related time-series data (as you'd find with influx), is to not denormalize the "labels" or "tag set" into the main metrics table.

This obviously saves significant space just by avoiding denormalization, ignoring the indexing overhead as well. You can see that in our Prometheus extension, btw:

https://github.com/timescale/pg_prometheus#user-content-norm...

Regarding performance against Influx, it really depends. We're working on releasing more complex benchmarks soon.

But overall, they have a performance edge if it's a single column scan that precisely matches their particular architecture (e.g., WHERE clause or GROUP BY by a very specific label set). But, we've found that Timescale actually gets higher query performance across a whole set of pretty natural queries (sometimes ridiculously so, as Influx doesn't index numerical values). Plus higher insert rates and full SQL. Not only is the latter point important for enabling many more types of complex queries, but it means that any viz or reporting tool that already speaks to Postgres should just work with us.


> But overall, they have a performance edge if it's a single column scan that precisely matches their particular architecture

To be honest if they didn't - stuff like Influx wouldn't really have a reason to exist. I think 99.9% of the operations in a devops environment will be pretty simple and predictable, it's a a trade-off they make. Queries are mostly there to draw graphs quickly.

I quite like the simplicity of Influx, and while our current use is pretty limited and far from hitting it's limits, for reliability and maintenance reasons I'd prefer Postgres, for which we have quite extensive in-house know-how and tools in place. Prometheus is one of the tools we are considering to add to our monitoring stack, and having the option of storing it's data transparently in Postgres could be very interesting.


Postgres has jsonb - which allows you for full schema free operations. Postgres+jsonb is a very viable alternative to conventional nosql like mongodb.


Isn't mongodb still favored though for its scaling capabilities?


You can scale postgresql quite well with https://www.citusdata.com/ (other methods are available). The amount of horror I've seen with MongoDB over the years I'm surprised people still chose it (regardless of if it passes Jepsen tests now)


We have been using Postgres for a smaller event time series database (millions of rows) with good success. Tables are partitioned.

Some user reports (aggregations) are ~5secs so we batch-pre generate them currently.

Eeager to look into this to replace generation of reports with real time reports.


This is something i've been meaning to look into for a personal project that has a lot of time series data. It'll be interesting to see what they eventually come up with to make time series data not take quite as much space.


Hi Tostino, please let us know how things work.

Regarding compression: While we haven't yet built in any native compression, we regularly run on ZFS and typically get 3-4x compression using that. (Plus with ZFS, insert rates are actually a bit faster, at least when using a single disk (often 25%). It's definitely something to consider.)

Another thing to consider is that Timescale supports easy data retention policies, which can also vary by hypertable (i.e., keep raw data for 1 month, aggregated data for 1 year).

It also supports many disks per server, either via RAID or through Postgres' tablespace. But now you can have multiple tablespaces in a single "hypertable", rather than just one like normal. So especially in cloud settings, it's pretty easy to just add more and more disks to even a single machine.


Really interesting to hear you use ZFS for those benefits. Are you using ZFS on Linux or a Solaris variant?


We used it on an Ubuntu machine on Azure. We didn't do any real tuning yet, it was just an initial test to ballpark the benefit.


Whoa, fantastic!

I have managed to design a vanilla PostgreSQL solution, with partitions and BRIN indices, but there are too many hops to jump. I am excited to check if it will work out of the box. 100 billion rows per server sounds exciting!


Great work! I was curious about a few things:

1) Are you planning on using citus for clustering? Or will you have your own clustering implementation separate from Citus?

2) Can you still use barman, wal-e, etc for backups?

3) What are you guys using to generate docs.timescale.com? :)

4) Do you use any sort of custom on disk format?

5) Do you plan on implementing any sort of delta compression?

6) Is there/do you plan to have support for creating roll up/aggregation tables?

Cheers!


Thanks, great questions!

1) We are currently exploring all options for clustering, though we are likely to try something on our own. No final decisions made yet though.

2) One of the next tutorials we'd like to do is how to setup using Timescale with wal-e for backups (we use this in a hosted service we have). Generally we should work with tools that work with PostgreSQL, we just want to make sure we cover all the caveats.

3) It's a custom solution we've built sort of organically that converts Markdown files (with some custom syntax) into HTML. :)

4) Currently we do not.

5) We have had high level talks about various ways to better compress data including delta compression, but nothing definitive yet. We do find just running on ZFS gives 3-4x compression, so that’s already a nice win if compression a priority.

6) This is definitely on our roadmap but again is also in the early stages.

Thanks!


This looks cool. I love things that get rid of extra dependencies. Influxdb is nice but then I have to support it, get stuff into it and get stuff out of it.

Timescale isn't currently supported by RDS/Aurora though, so it looks like more influx for me wooohooooo!


We've been talking with Amazon, and you can help the process along:

https://github.com/timescale/timescaledb/issues/65


Nice!!!

Edit: emailed them, hopefully I matter a teensy bit.


How would this work together with something like Stolon? https://github.com/sorintlab/stolon


Generally speaking, a TimescaleDB database just looks like Postgres to admin tools. So this looks interesting, and we are already internal Kubernetes users.

Will have to look at more.


Can this be queried with Grafana or some other visualization tool?


If the viz tool can speak to Postgres, it works with Timescale. Tableau, Superset, SQLPad, Mode, Plot.ly, etc.

We internally use Grafana through a REST interface to a timescale backend. (And in fact, that's how we visualize the Prometheus data we store in Timescale: https://github.com/timescale/pg_prometheus )

But, Grafana Labs is still working on a native Postgres connector (MySQL released earlier this year). They promise us soon :)


Is the business model to charge for the clustering release?


Timescaler here. Although we haven't done any official announcements w.r.t. clustering, our plan is for this to be open source, like the single-node version.


Learning from the RethinkDB 'situation' [1] - do you have a viable business plan and are you monetising the database already?

1. https://news.ycombinator.com/item?id=12649414


We wouldn't do this if we didn't believe we could make a business out of it. That said, we are at a pretty early stage at this moment and are looking at many different options. As you may know, and if you've been following the discussions around business models for open-source projects, there are many approaches as well as challenges. All I can say is that we have been following the discussions and we are drawing lessons from past failures and successes.


At a higher level, is this the same concept as Datomic?


Not at all, although there are some similarities at a lower level (First class notion of time). Datomic has a much tighter focus on immutability, auditability, and database-as-a-value. Datomic isn't a good fit for use as a general purpose time series db where you have billions of observations.


So this is based on Postgresql. How does it compared to other solutions that are written from scratch to be a time series DB like influxDB?


We are working on benchmarks comparing ourselves to other solutions so hopefully we'll have concrete numbers on those soon. (Incidentally, we did a blog post on us vs plain PostgreSQL today: https://blog.timescale.com/timescaledb-vs-6a696248104e)

At a high level though, we do find that having native support for full SQL to be a big win. Also, if you already store metadata or other relational data that you want to combine with your time-series data, it's great to be able to use one DB instead of separate solutions. Performance wise we do believe we are competitive and in some cases much better, and we have the 20 years of stability from PostgreSQL to build on.


I look forward to a KDB comparison. Please dont forget them.


Me too.

At first blush KDB is orders of magnitude faster, especially if using a GZIP card.

But Timescale is open source and not core locked.

¯\_(ツ)_/¯


> At first blush KDB is orders of magnitude faster

Are there actual benchmarks that show KDB being orders of magnitude faster than Timescale? How many orders of magnitude are we talking about?


What is the extra size on disk as a result of using this? I'm guessing there's some overhead?


There's not really any meaningful extra size for Timescale compared to standard Postgres.

I mean, there's a little extra information about each chunk (table name, its constraints, triggers, etc), and we cache this information in memory to speed up the query/insert side of things. But it's pretty common for these chunks to be on the order of 100s MB to GB, so this is just noise compared to the underlying data/indexing size.

So on the real storage size, the only potential difference is index size: say 50 indexes over 2GB data each vs 1 index over 100GB data? Haven't really looked into this for all different index types, but seems rather modest. Can try to dig up some more data.


Looking at the tables we used for recent benchmarking comparing Postgres vs. Timescale (https://blog.timescale.com/timescaledb-vs-6a696248104e):

PG 100M rows: 30.98GB (1 table)

TS 100M rows: 30.93GB (1 hypertable, 6 chunks)

Same results for the 1B table, just 10x larger (and TS has 10x more chunks).


Wow. impressive. would have thought there would have been more overhead.


How timescale fits postgres maintenance patterns(replication, backup)?


Basically just looks like a postgres database on the admin side.

Replication works (we aren't munging with the WAL), docs for backup/restore (http://docs.timescale.com/api#backup), and you can just use pgAdmin.


How does this compare to Vertica?


Why no redirect to https here?


Any support for RDS? :)


We're talking with them...and you can help: https://github.com/timescale/timescaledb/issues/65


I submitted a request; my company's not huge but I hope it helps.

Timescale looks like the most promising replacement to InfluxDB on the market. Influx has been a source of pain, data corruption and other various issues; what a world it would be if we could use timescale!

The main blocker for us is Grafana support actually. I know Grafana is working on a Postgres connector; I am quite excited about this.


You might have experienced data corruption in an old version of InfluxDB, but we haven't had any reports of that kind of thing in over two years. There are certainly still things to be improved, which we're doing all the time, but we take data integrity seriously.


I shouldn't have said data corruption, I just meant time corruption: https://github.com/influxdata/influxdb/issues/8424

I was never able to fix this.


Any SQL database can do time-series well with more functionality then the specialized stuff like influxdb which doesn't really have much reason to exist at this point.

Citus is a another good alternative and SQL Server and MemSQL also have in-memory and columnstores if you need the performance and scalability.


Not really true. I point whatever thing that talks something like influx to it with the right credentials and it outputs whatever metrics it wants to it. No need to manage/pre-create all your tables for every single possible metric out-there.

I seriously dislike nosql databases for most purposes, and am absolutely a Postgres fan - but timeseries is the only thing I've encountered that benefits from a dedicated schema-less database engine.


How's that any different than something that talks SQL? It's the most universal data language there is.

Why do you have to make new tables? It's 1 table with timestamp, name, value to store all your metrics and you can use an array or json column if you have extra non-structured data. Add in the SQL joins and analysis and you get a much better tool for timeseries.


> How's that any different than something that talks SQL? It's the most universal data language there is.

To use SQL you need to know the schema you're working against. Every single tool has to agree on a specific schema - and there are tons of existing tools that push infrastructure/system metrics into timeseries databases. For them it's simple, Influx uses some API, OpenTSDB uses another, Prometheus uses yet another - but they're all pretty simple to use. If you would point them to a Postgres database on the other hand, they wouldn't have a clue what fields to insert.


TS are heavily schema oriented unless you are talking about logs and then you aren't really talking about the classical notion of a TS database where aggregates are important. There isn't a NoSQL database that does them well.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: