I've been ingesting stock market trading data into MongoDB and using Metabase[1] to visualize it. It is essentially date, price, volume, ticker symbol, exchange. Around 400M documents so far.
Queries out of Metabase take upwards of 2-5 minutes to run even simple questions like:
Plot the average price of Apple for the last 5 days grouping by minute.
Would Timescale Cloud be a better replacement in terms of performance? Is there a nice GUI visualization platform like Metabase for it?
We did benchmarks a while back comparing TimescaleDB to Mongo[1], and TimescaleDB was quite a bit better. So I think you'd definitely see much improved query time.
Also it appears that Metabase supports a PostgreSQL connection[2], so you could probably continue to use it.
Yeah - TimescaleDB comes with a time_bucket function that allows you to group things by minute, and specify a where clause that queries for just the last 5 days. You can build indexes that include the ticker, and also reorder data on disk to optimize how much data you scan on disk. So, TLDR - you should definitely try it! I did some quick googling, and it looks like Metabase supports PostgreSQL, so it should work with Timescale. We would love to hear how it goes!
"Powered by Aiven" + nearly identical interface, so this is kind of a reselling arrangement? Aiven supports timescale on postgres, so what additional features does Timescale Cloud provide?
Aiven Postgres supports the open-source version of TimescaleDB.
But Timescale Cloud is the only hosted service where you can get the full TimescaleDB experience, which includes our community and enterprise features (eg interpolation, data retention policies, continuous aggregates, data reordering, etc). [1]
It also includes special machines types and plans more suited for time-series data that we co-developed with Aiven.
It is also the only hosted service directly staffed by the TimescaleDB development team.
One main difference in these sorts of arrangements is vertically-integrated support.
If your Aiven database is having Timescale architecture problems, your support contact is someone working for Aiven who would need to turn around and reach out to Timescale about the bug (or suggest that you do so.)
If your Timescale Cloud database is having Timescale architecture problems, your support contact is someone working at Timescale who can just call over the guy who wrote the code with the bug in it.
(On the other hand, if your problem is with the Aiven backing cluster, it'll presumbly take slightly longer for Timescale Cloud to resolve, given that they'd have to bounce that forward.)
In this case we (Timescale) are your main line of support. Our job is for you to have a 100% positive experience on Timescale Cloud. The buck, as it were, stops with us.
Is there any autoscaling or pay-for-what-you-use pricing? It's not 100% clear, but it looks you essentially choose the instance type you want when using Timescale Cloud.
I understand why you might not want to call that out specifically in these promotional materials, but it's an important consideration when choosing which managed DB to use and when evaluating cost.
What specifically does this mean in practice - "Grow, shrink and migrate your workloads between configurations and plans with ease."
Growing, shrinking, and migrating involve moving to a different instance type, so you have to select a different instance type. That being said, there is very very little downtime (on the order of 3-5 seconds while the DNS resolves)
Interesting point of view - it's certainly always a bit hard to find the right verbage that everyone can understand, but hopefully this discussion clarified things!
Last time I used a traditional hosting provider, I could get a new bare metal server setup in under half an hour. I would hardly call them "pay what you use" even though I could start and stop servers and change the plan I'm on and still be only two to three times slower than doing the same on AWS.
Certainly - I've been seeing a bunch of usage based pricing that price on a different metric (like metrics per second) etc.
Regardless, with Timescale Cloud, if you get a machine, you pay the price for that machine for as long as you use it. So I guess to avoid the confusion, we can call this just paying for the machine :)
By the way, I've recently started using TimescaleDB (past month or two) for processing cryptocurrency trading information and I'm liking it a lot so far. I love that I can use Postgres as normal, but have efficient time-based queries.
My first ever test query was to generate minutely OHLC+volume from time,price,quantity trades. It was pleasantly easy to do:
select time_bucket('1 minutes', time) as minutely,
max(price) as high,
min(price) as low,
first(price, time) as open,
last(price, time) as close,
sum(quantity) as volume
from trades
group by minutely
order by minutely;
We haven't done a formal price comparison, since it's actually a bit hard to compare apples to apples since the two databases are architected differently. Definitely something we should consider doing! Thanks for the idea.
It seems disingenuous to call it the first. AWS has it's own time series db. In terms of open source Apache druid had a managed cloud variant that imply.io runs.
Q: Regarding multi-cloud. Say if AWS has an outage will Timescale cloud fallback to use GCP or Azure?
Can something like this be provided? Not sure if the network latency between different cloud providers would allow doing a multi-master replication scheme.
You select the public cloud vendor you want your machine spun up on. So no, if AWS has a full outage, it won't fall back to a different cloud. Failover is done at an availability zone level.
Since TimescaleDB is also open-source, if you want that kind of replication scheme, you can always install on VMs across clouds. However, as you rightly pointed out, network latency is a definite concern and impacts the feasibility of RPO and RTO.
Yeah! CockroachDB is also a really cool multi-cloud DB. That being said, they are really more for transactional workloads, and less purpose built for time-series.
I guess there are always trade-offs in the software world.
I think the quickest comparison is SQL vs NoSQL. We haven't done performance benchmarks against Druid yet, but do know of several users who have switched because they want to use PostgreSQL instead.
Queries out of Metabase take upwards of 2-5 minutes to run even simple questions like:
Would Timescale Cloud be a better replacement in terms of performance? Is there a nice GUI visualization platform like Metabase for it?[1] https://metabase.com/