Hacker News new | past | comments | ask | show | jobs | submit login
Timescale Cloud: Multi-cloud, fully-managed time-series in AWS, Azure, and GCP (timescale.com)
200 points by bithavoc on Aug 12, 2020 | hide | past | favorite | 65 comments



TimescaleDB is great, not just for timeseries data, but also if you're trying to scale your Postgres database (with some cavaets, obviously.)

One often has a small number of tables, often a single table, with a high write workload. Changing it to a timescale hypertable, provided you can work with the additional restrictions that has, can let you instantly 10X your write throughput (or more with multi-node timescale.) That's a lot cheaper than trying to scale your database write IO by 10x.

I like multi-cloud services like this because it reduces the vendor lock-in that happens with these public clouds.


(Timescale person) This is also true for Timescale Cloud -- especially once people are using it for TimescaleDB (which, as you probably know, is implemented as a low-level extension to Postgres), we see some users start to migrate over their more traditional Postgres workloads to Timescale Cloud as well.

One database platform for all their data, one (great) support team to speak with, etc.

And thanks for the compliment!


If you want to drop data after a retention period, TimescaleDB is also great here. If you're going to delete millions of rows from a standard Postgres table, it can take a long time and require a lengthy vacuum to recover space - with TimescaleDB it's almost instant, since behind the scenes it is simply deleting the files (chunks) that fall within the time range.


Why wouldn't you use built in partitioning for this?


I think its a time/money tradeoff. I recall a comment along those lines (paraphrased) the last time I asked about TimeseriesDB. It's the fundamental calculation of SaaS/Cloud.


I don't feel there is a significant time investment with using the builtin way.

Timescale has some neat features for handling time series data, but for just sharding out data for scalability I can't see it offering anything substantial. And in that vein, I feel the marketing is somewhat disingenuous in the performance claims. When comparing to vanilla PostgreSQL it is comparing against a very naive usage pattern. People have been getting the same kinds of performance increases from partitioning out time series data for a loooong time.


The advantage on inserts with TimescaleDB is the automation. The built-in way on Postgres still requires a lot of manual maintenance, especially as data patterns change. TimescaleDB adjusts automatically.

(Nothing against Postgres - it just that Postgres needs to design for the general use case, while we can optimize for time-series.)

But TimescaleDB is much more than automatic partitioning. You also get:

* 15x native compression (95%+ storage savings) [0]

* Faster queries for time-series (b/c of our optimizations at the query planning / execution level)

* Automatic data retention policies (for automatically deleting old data) [1]

* Continuous aggregates (real-time materialized views automatically refreshed) [2]

(and lots more)

Happy to answer any more questions!

[0] https://docs.timescale.com/latest/using-timescaledb/compress...

[1] https://docs.timescale.com/latest/using-timescaledb/data-ret...

[2] https://docs.timescale.com/latest/using-timescaledb/continuo...


> 15x native compression

That is a rather remarkable compression ratio. Do you have a link to technical details? The lightweight compression outlined in the Gorilla TSDB and implemented in Uber's m3db is 11x which I thought was best in class: https://cs.brown.edu/courses/csci2270/papers/gorilla.pdf


Yep, absolutely.

Here's a blog post outlining our general approach, which applies type-specific compression based on a column's datatype. (While m3db only supports floats, TimescaleDB supports a whole range of 40+ data types.)

https://blog.timescale.com/blog/building-columnar-compressio...

For example, by default we employ:

- Gorilla for floats (+ some optimizations for faster decoding)

- Delta-of-delta + Simple-8b with run-length encoding compression for timestamps and other integer-like types

- Whole-row dictionary compression for columns with a few repeating values (+ LZ compression on top)

- LZ-based array compression for all other types

And the architecture is extensible to support new/updated compression algorithms for different data types. The above are all lossless algorithms; obviously one could drop precision to achieve higher rates.

We also wrote a post describing these various approaches in a bit more detail for folks interested in learning (although a summary compared to the Facebook paper): https://blog.timescale.com/blog/time-series-compression-algo...

Obviously, your compression rate depends heavily on workload, but the number 15x comes from median performance that users see in the field against their real databases, not just a synthetic or single workload.


And...if anybody is interested in helping us build the best cloud service for time-series data, we're hiring cloud engineers!

https://www.timescale.com/careers/4128985002-cloud-engineer

Timescale is a remote-first company with amazing folks all over the world.

We're also actively looking for technical support engineers with strong Postgres backgrounds: https://www.timescale.com/careers/4615287002-technical-suppo...


I would love to see something like this with pricing that also competed on the lower end with various RDS and Cloud SQL offerings.

Just using Timescale instead of RDS or Cloud SQL PostgreSQL would be a virtual assumption, but with the starting price points so much higher it makes the decision to START with Timescale a lot harder.


Agreed. Having your lowest tier offering priced at over $500/mo is cost prohibitive in a lot of situations. Even their "dev" tier is $130/mo for a single instance. It would be much nicer if they just let you choose how much storage, CPU, and memory you need rather than only letting you select from a limited set of offerings.


The main difference on Timescale Cloud between dev (starting $115) and production is just the resource availability and slightly longer PITR recovery period for non-dev options. Both offer same full TimescaleDB experience, console, ability to upgrade/downgrade, fork your database, etc.

If you do want a lower priced option, we actually recently also released a second managed cloud offering in public preview, Timescale Forge: https://timescale.com/forge.

Timescale Cloud provides great flexibility with multi-cloud and 75+ regions, security conscious options like VPC peering and SOC2/ISO/HIPAA compliance, and a more traditional DBaaS Experience.

Timescale Forge is a innovative platform with capabilities like decoupled compute and storage (so you can select yourself the exact configuration you want, starting at $49/month), instant pause/resume, native support for our Prometheus integration, and more coming...


Eek, Forge price starts at $49 but the next tier up is $166, after which it incrementally increases. That’s quite a jump. If my budget is in that price range, then such a jump is likely a huge deal for me.


Thanks for the feedback. I'm curious - what kind of pricing would you expect? Also, is this for an R&D workload, or for something that's already in production?


Just something that scales more smoothly. The price difference between the lowest (by price) and next lowest is $117, but the difference between that and the third lowest is only $19. I know that the difference isn't the same (more CPU and RAM, vs only more disk space), but it just seems like a big jump.

I could very much imagine that I might outgrow the $49 option, but not to the point where I'm willing to pay triple that amount.

I'm not necessarily saying you should change it, maybe its just messaging and targetting. I guess on HN you just get a load of us who are interested in using it for smaller projects (where the 49 option might be perfectly fine, of course).


Appreciate the feedback - thank you!


You're welcome. I've played with the OSS version of TimescaleDB in the past and am a big fan. Keep up the good work. Perhaps in time, I'll be in a position to use it more seriously.


I think people are thinking of something similar to Digital Ocean's managed databases or Amazon's RDS, but that's quite a different class of services than what it seems timescale is trying to provide with Timescale Cloud.

If people want to play around with timescaleDB for cheap, DigitalOcean claims that their managed postgres supports the timescale extension, but I've never tried it.


Yep, Digital Ocean (as well as Azure, Rackspace, and others) offer TimescaleDB with their managed Postgres offerings, but they can only offer the Apache-2 version.

So you don't get native compression, continuous aggregates, downsampling, etc., many of which make the Timescale Cloud version actually less expensive when considering capabilities (e.g., 50GB on Timescale Cloud can store akin to 1TB elsewhere and have faster queries, continuous aggs mean faster and less-CPU to run dashboards, etc).


I see Timescale Helm charts for Kubernetes: https://github.com/timescale/timescaledb-kubernetes. Are you using Kubernetes to host the managed service?


I brought Timescale to my current job, and I have to plug their Helm charts. They have made deployment, backup, scaling, and config so easy on k8s for Timescale. Probably the best postgres chart that exists, period. Great work TSDB team.


Yep, we like to eat our dog food =)

Timescale Forge, which I mentioned elsewhere, is based all around Kubernetes (including StatefulSets), which enables its decoupled compute/storage and instant pause/resume.

For that we employ helm charts, but also have additionally built k8s operators + more internally to handle the dynamic operation and reconfiguration you want from an "full lifecycle"-managed database service, while the helm charts are slightly more static.


> You can now incrementally scale up or down in the region of your choice, with CPU options ranging from 2 to 72 CPUs and storage ranging from 20GB to 10TB

What does that mean? The pricing calculator on https://www.timescale.com/cloud still has the same predefined storage configurations.

Also, you might want to remove this customer opinion from the post, standard Postgres won't break a sweat at 100M rows:

> We have over 100 million rows of data in development and production, and we needed a database that could handle this volume


Depends what you are doing with those 100M rows. We have a database with about that many rows and we need to do sums of a few hundred thousand rows, as well as bucket those rows into daily buckets and graph them in real time. With just regular Postgres these operations are pretty slow (> ~5 seconds), but with Timescale continuous aggregates and hypertables they run in milliseconds.


In the Timescale Cloud portal, once you create a database, it's one-click "upgrade" to move between any of these configurations, and it happens with ~zero downtime.

That's also true for multi-cloud migration, you just select that you want to migrate from a 100GB instance on AWS to a 200GB instance on GCP, and the migration happens behind the scenes.

(Technically, it does so by spinning up a replica of the new configuration / location, and continuously migrates the database, including any online changes happening concurrently. Then, at the last minute, it "swaps" over control from the primary to replica, and then retires the old primary. This is all hidden behind DNS too.)


That's my quote :) In hindsight I should have said "hundreds of millions"; we're at about a third of a million _sensor packets_ in production, some of which are dozens of rows across several tables.

Yeah, you could absolutely stick this in a vanilla postgres instance, but as a sibling post said, you wouldn't get the same performance on the queries we care most about.


oops, a third of a _billion_, not million


I was wondering about that ;)


We've used Timescale at ShiftLeft since before 1.0. Recommend 100%


Thank you Vlad for the kind words :-)


You have one of the smartest and nicest head of Marketing there. I worked with Prashant at AWS (more than 10 years ago!) and have the fondest memories.

Besides this: congrats on the launch. I tried your product a while ago and was impressed. Big cloud providers are the obvious perfect channel for you. Just get ready for some nasty competition from the same guys :)


Aw, I agree! Will let Prashant know.

Re: Cloud providers - I also agree, it's a complicated world ;)


I was interested in Timescaledb recently but found the cost to be extremely expensive and you can only get all the good features in their cloud version as PaaS.

I would also suggest looking at Citus for scaling Postgresql. With the Azure reserved discounts, Citus is like less than half the price of Timescaledb. Citus also helps tremendously with multi-tenant scenarios.

Of course, depends on what you are trying to do.


I’m curious what you mean by this - the community version has most of the features in it.. and if you are willing to pay for something, you’d an run the enterprise version on-prem.. my experience with that is that it’s quite cheap, and you get support as well.

Timescale also has a multi node version now that distributes your data across a cluster sort of like Citus.


I want a hosted PaaS solution. I don't want to self host it. That is the pricing I am comparing.


Citus and Timescale are apples and oranges.

Citus is just scale-out Postgres. TimescaleDB is Postgres for time-series.

There are a long list of capabilities that we've built for time-series that Citus does not have: native 15x compression, data retention policies, continuous aggregates (real-time materialized views refreshed automatically), interpolation, etc.

Nothing against Citus (we are friends with that team) - but they are designed to solve a different problem than we are.


I am currently using MongoDB to store Ethereum tick price data. I'd like to switch to Timescale, but wondering if there is a supported Node.js API client?

EDIT: Looks like the documentation recommends using Sequelize ORM.


ORMs that work for Postgres should "just work" for TimescaleDB. So while we mention Sequelize in some tutorials, if you have a favorite Node.js API client, you should just go for it.

As aside, you mention enjoy this developer Q&A that a community member published a few days ago that talk about his experience using TimescaleDB to build a crypto trading bot.

https://blog.timescale.com/blog/how-i-power-a-successful-cry...

Also, an earlier tutorial:

https://blog.timescale.com/blog/analyzing-bitcoin-ethereum-a...


Also, we have a very active slack community if you want any other opinions: https://slack.timescale.com/

I'm mike there - come say hi.


Does this come pre-configured for high availability (e.g. replication to a standby node)? Given the pricing I would have assumed "yes", but I don't see any mention of it in the blog post?


I meant to ask about backup with point-in-time restore too - is that provided out of the box?


Yes, there are a number of great options here provided out-of-box:

1. All plans (including the lowest-cost dev tier) includes PITR recovery out of the box. You can also "fork" any current database to a second instance (e.g., for dev & test), also at any recent point-in-time.

2. Once you create any instance, you can later create a "read replica" with one click, which creates an asynchronous replica of your primary (e.g., for ad-hoc queries to not interfere with the primary, or to just load-balance reads).

3. On the HA side, our "pro" plans includes automatic synchronous replication between primary/replica, so that any primary crashes failover without downtime to the replica. You also get DNS-named endpoints for replicas, again for load balancing reads.


On the Basic tier, I presume you have to pay for those read-only replicas? Can they be used for automatic failover, or only for scaling reads?

I don't know what VM SKUs you use on Azure, but at a glance it looks like your solution would cost 300-400% the cost of the VM, just for your Basic tier.

As a fan of TimescaleDB, I really want to like this, but I'm just not seeing enough value-add for the price. Maybe I'm missing something, please sell me on the benefits over managing VMs myself?


I'm curious when folks say they expect a dev plan offering at a pretty low price point to come out of the box with HA - are they comparing this to an Oracle High Availability solution?

Because $75 or $150 for HA on oracle is not going to happen - period.

It's just interesting how the price points are so wildly divergent.


I don't know what you are on about - I didn't mention Oracle, and would never compare TimescaleDB to a product that wants to bankrupt you and own your soul.

Using the price calculator linked in the article, on Azure the cheapest "Dev" tier (which I obviously wouldn't expect to have HA based on the name "Dev" alone) is $160/m - that's a lot for 1 vCPU and 4GB RAM. I want to give TimescaleDB money, but I just can't justify that.

The cheapest in the Basic tier on Azure is $715/m - yes, you can bet I expect HA at that price for 4 vCPUs and 16GB RAM.

I'm a big fan of TimescaleDB, but if they are competing with cloud provider's own managed solutions (which, AFAIK, all come with HA and PITR), it needs to ship with similar guarantees, and/or the pricing needs to be revised.


So their dev plan is $116 in US West on AWS.

I don't know Azure but do know AWS. PITR restores are not in seconds but in hours and you lose data.

"... a user-initiated point-in-time-restore operation will be required. This operation can take several hours to complete, and any data updates that occurred after the latest restorable time (typically within the last five minutes) will not be available."

Failover with AWS is generally 1-4 minutes? Someone fill in the blanks here?

* The smallest multi-AZ with failover MSSQL you can get runs $1,503 on AWS.

* Undifferentiated Postgresql is pretty cheap - you should be able to get failover for about $110 or so there.

I'm just pointing out that commercial / differentiated product offerings (which is what this is), beyond just basic postgresql costing so LITTLE for HA would actually be somewhat unusual.

In enterprise sales these prices for HA are so low no one would even know how to contract for them. What happens is once you need this level of uptime you can afford it (in most cases).

Quick feedback my side - $150/month to try something and then $500/month in production is not unreasonable for I think a fair number of use cases. Making the "try it" pricing as cheap as possible, and feature limiting is a good thing I think so the onboarding step is easier.

Folks with money to spend are used to sometimes TERRIBLE support despite spending big money on big players. One thing these smaller places have, if you are giving them $10K/month - they usually answer the phone or email! A secret benefit.


> I don't know Azure but do know AWS. PITR restores are not in seconds but in hours and you lose data.

You're talking about something different. I mean that if the node running Postgres goes down, your storage is simply connected to a new node (all this behind the scenes, of course).

> Quick feedback my side - $150/month to try something and then $500/month in production is not unreasonable for I think a fair number of use cases

$150/m just to play is way too much - IMO, $50/m would be much more reasonable. I also believe that $500/m for a tiny, single instance without HA is too much.


=]

And getting a true technical resource on the phone at AWS requires an enterprise support contract, which starts at an additional $15,000 / month.


Pricing differs on Timescale Cloud based on the cloud & region to reflect variances of the underlying costs of running infrastructure on those regions.

Azure Postgres and AWS RDS do _not_ include HA replication in their base price -- you need to pay extra for actual replicas. What they do provide is the fact that they use decoupled storage to automatically recreate the database in the case of failure (which we do as well).

But that's not the same thing as ~zero downtime from migrating to a synchronous replica, which you can also use for scaling read queries.


I realise the HA solution included with Azure doesn't have replication, but it does provide a 99.99% SLA and suggests any downtime would be in the order of seconds. This is done, as you say, by recreating the database on a new node, but they have nodes standing by and ready. For a lot of use cases, that is good enough, with read-only replicas useful at an extra cost for scaling reads.

My point is that TimescaleDB's pricing is high enough that I'd expect replication to be provided, or at least some other kind of HA guarantee.


I haven't done experiments to measure downtime on Azure myself. SLAs are tricky once you dig very closely into the details. If I recall correctly, years ago when Google Cloud came out that they defined "uptime" as "responding at least once within a 10 minute period", so that a service that's online for 1 minute, then is offline for the next 9 minutes...is still considered to have 100% availability from an SLA perspective.

Moreover, your actual costs from the cloud vendors is often tricky and muddled, given they have some many different ways to charge you. For managed databases, this often includes things like some backup storage costs, bandwidth costs for ingress/egress, etc. All of that is "included" in our flat pricing.

But the biggest thing: Timescale Cloud is not offering an undifferentiated Postgres offering. It's the only place to get the best version of TimescaleDB.

And in many cases, this will /save/ you money. For example, TimescaleDB's native compression typically gets 94% bandwidth reduction. So the 500GB on Timescale Cloud is equivalent to paying for 8TB on Azure or AWS. And without all the other TimescaleDB features, including performance improvements of executing queries against compressed data.


I actually think this is a false comparison. With TimescaleDB you get higher inserts, 1000x faster queries (depending on your query, of course), 95% storage savings from native compression, etc., than with vanilla Postgres.

You also get really useful features like continuous aggregates, data retention policies, interpolation, etc.

So the comparison isn't really 1-to-1. Depending on your workload, you could easily get better performance out of 1 vCPU on Timescale Cloud than 4 vCPU on any vanilla Postgres service (not to mention the 95% cost savings on storage).

(I work at Timescale.)


You're preaching to the converted on TimescaleDB's features - I love it.

I think it makes no sense to compare your managed offering to vanilla Postgres though - surely the comparison is vs TimescaleDB hosted in other ways.

In theory I can use TimescaleDB on cloud managed DBs already (if providers didn't offer such old versions of TimescaleDB).

Given your offering deploys to cloud VMs, perhaps the best comparison is vs deploying Postgres with TimescaleDB yourself on cloud VMs. And at a glance, it looks like your Basic tier is 300-400% more than the cloud VM - so what's the value-add? You get PITR, but beyond that?


Ha! Glad to hear :-)

Other cloud providers can only offer the Apache-2 version of TimescaleDB. So you wouldn't get native compression, continuous aggregates, downsampling, etc.

So again, the value add in Timescale Cloud are those features (all of which took a long time to build!). Also: we fully manage the database, are responsible for keeping it online, are available for support via email and Slack, etc.

The only real alternative to Timescale Cloud is to run it yourself, where you get the features but not the services/support. But you are welcome to do that if you'd like :-).


> Other cloud providers can only offer the Apache-2 version of TimescaleDB. So you wouldn't get native compression, continuous aggregates, downsampling, etc.

Ah yes - I was forgetting that!

> So again, the value add in Timescale Cloud are those features

Hmm, not really; those are the value-add from TimescaleDB itself, not Timescale Cloud.

I guess I'm going to remain unconvinced for now, but I really hope that, with time, the Timescale Cloud team works on adding more value vs self-hosting, as personally I don't think it brings enough value to justify the pricing as-is.


No problem at all, if you prefer self-hosting then we encourage that.

(And as you can see, we are continuing to invest in our self-hosted community [0])

[0] https://blog.timescale.com/blog/multi-node-petabyte-scale-ti...

Btw - always happy to chat directly if you'd like. Feel free to reach out! ajay (at) timescale.com


Would love to hear from the HN community: what are some of your favorite features in managed database or infra platforms, or what do you wish more platforms offered that are still hard-to-find?


I wonder, could GNU/Linux change its license (in a substantial way) that would allow it to get revenue from Amazon for deployments like AWS, similar to how it was done by Timescale? Would seem fare to get a good chunk of revenue to Linux development.

p.s.: could Postgres change its license to make Timescale share their revenue? :D


With AGPL, if you block AWS, you also block other users unless you have additional licenses for your software, like Mongo and Timescale have to do, and I'm not sure people would want to deal with that when free alternatives like MySQL exist.

With Amazon, they were able to make their own Linux flavor so they could pay Red Hat less. I think if Linux itself sold enterprise licenses, we'd end up with a much more fragmented or corporate batch of OSes taking its place, and I think that's very negative for the ecosystem in general. No one wants to deal with that :/

I'm surprised people want to deal with MongoDB at this point.


Why would they do that. These guys are generating revenue by offering managed service, not by selling opensource software.


For the same reasons MongoDB and Redis changed their licenses to avoid AWS et al offering it as a managed service?


That's a different story. Aws was competiting for revenue with the developers of MongoDB and Redis' for profit company. Postgres' publisher does not use managed service as a source of income.


Of course, but I was just offering a reason for why they would want to do that. It’s not as if it’s unreasonable if they did that.


No




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: