Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Timestream is generally available (amazon.com)
91 points by dsflora on Sept 30, 2020 | hide | past | favorite | 39 comments



The pricing page seems off, $0.03 per GB-hour for the magnetic store, while the unreleased SSD store is $0.0004167 per GB-hour?

Comparing with S3 prices surely the magnetic store is closer to $0.03 per GB-month?


The pricing page is now updated to be more accurate: https://aws.amazon.com/timestream/pricing/


I'm pretty sure that is $0.03/GB/month (rated hourly) for magnetic storage as per the example. It's just written in a confusing way.


I saw the examples, but that was even more confusing because I think the maths is wrong on almost all the calcs:

From the example: "Writes cost: $ 8.64 per month. This is computed as: ( 2 writes * 100 EC2 instances * 60 minutes * 24 hours * 30 days) * $0.50/ 1 MM writes" I get $4.32 from ((2 * 100 * 60 * 24 * 30) * 0.5) / 1M

From the example: "Memory store cost: $ 3.55 per month. This is computed as: ( 2 KB per minute per instance * 100 instances * 60 minutes * 6 hours * 30 days) * $0.036 / GB-hour."

For starters I think there needs to be another "* 24 hours" in there - the 6 hours is to account for the retention period, but they're storing that 24 hours a day...

But even with that I get $1.87 for the memory store from ((2 * 100 * 60 * 6 * 24 * 30) / 1000000) * 0.036 (the div by 1M is to convert KB to GB)

From the example: "Magnetic store cost: $ 5.93 per month. This is computed as: (2 KB per minute per instance * 100 instances * 60 minutes * 24 hours * 30 days * 12 months) * $0.03 /GB-month."

But I get $3.11: ((2 * 100 * 60 * 24 * 30 * 12) / 1000000) * 0.03

From the example: "Query cost (alerting queries): $ 7.03 per month. This is computed as (10 MB per alerting query * 100 queries per hour * 24 hours * 30 days * $0.01 / GB). Alerting queries process 5 minutes of data. 5 minutes of data is 1 MB (2 KB per minute per instance * 100 instances * 5), which gets rounded up to 10 MB minimum charge for queries."

But I get $7.20 from (10 * 100 * 24 * 30 * 0.01) / 1000 (MB -> GB)

It's only on the "Query cost (ad-hoc queries)" that I can make the answer agree.

(They have now updated the magnetic price, as noted by others)


For posterity, the above is no longer correct. They updated their example to use 200 EC2 instances, which was most of the disparity, and it seems they do actually use 1024 KB/MB in a KB/MB, while I had in my head from elsewhere in AWS that they don't do that...


Yeah I would expect the magnetic to be $0.00035


Despite 2 years of preview, this GA launch looks like the most half-baked I've seen from AWS.

No SSD Store? Only available in US regions? Pricing page does not have the usual per-region dropdown and instead says "he pricing in Europe (Ireland) Region will be 13.1% more than the price below."

edit: also just noticed the Go SDK is uniquely split into 2 packages for some reason "service/timestream-query" and "service/timestream-write".


From my limited experience with time series, not being able to insert data older than X seems like a big disadvantage that may bite later down the road. Bugs happen.


Was just going to post the same, this seems great for rolling analytics on software or VMs, as a backing for monitoring infrastructure and pairing with Grafana.

However for IoT or hardware time series I’m left with a few questions... A few things it seems are not possible:

- An IoT device with limited network connection that batches data points while offline

- manually loading time series data from disk at a later date on an offline device such as a weather station

- loading data “in the future” (data points with bad/erratic timestamps, memoized predictions)

I guess the hardware / IoT use case is pretty small compared to infrastructure monitoring, where Timestream’s design choices make a lot of sense.


>I guess the hardware / IoT use case is pretty small

OSIsoft was bought earlier this year by AVEVA for $5 billion[1]. OSIsoft's offering is centered around storing industrial control system data (the buzzword-version is IIoT, though that is a bit more broad). They're also not the only player in this space. Many of the industrial automation vendors also have their own offerings.

[1] https://www.bloomberg.com/news/articles/2020-08-25/aveva-to-...


Take a look at vmagent [1]. It has been designed to run and collect monitoring data on IoT and edge devices and then upload it to remote storage in short time frames when connection to the storage can be established. There are remote storage systems such as VictoriaMetrics, which support loading historical data out of the box [2].

[1] https://victoriametrics.github.io/vmagent.html

[2] https://victoriametrics.github.io/#backfilling


Yeah, I currently work for a company that does a time series DB for industrial applications (think power generation, mining, etc).

Backfilling/inserting old data is common. Control network connectivity is not always great. Sometimes you need to do fill in old data.

Forecasts are also a common use case (e.g. forecasting grid demand/capacity), so not being able to handle future data is another limitation.


Yeah we're not talking about buggy telemetry publishing code, we're talking about systems that could have poor connectivity that can only publish data once every 10 days (but include all samples). Seems like a massive oversight.


The other surprising factor is that a lot of heavy industry will have data going back decades. If they can't get all of that data in to timestream, that's a massive mark against it.

I'd heard that it was initially developed for a major industrial operator (sounded like it was in the resources industry though I never found out who it was for), so the lack of support for late arriving data is a big surprise. That being said, there aren't any of those companies on the customers page - maybe it was just an incorrect rumour.


BTW, which solution do you use for this case? Did you look at vmagent? It supports the case for collecting data from industrial appliances and uploading it to a central time series database when connection to it can be established [1].

[1] https://victoriametrics.github.io/vmagent.html#use-cases


You can insert old data up to the in-memory retention period you choose (which can be 1 - 8766 hours).

You can't insert future data more than 15 minutes into the future though.


Sure, but if you configure your database for, lets say, 4 hours of in-memory, and you have a network outage of 12 hours (this is not uncommon on mine sites), you lose 8h of data and cannot put it in to your "master store".


If you expect a 12 hour outage, you should probably spec for 12 hours.

Memory-based storage is expensive, but not ruinously so.

The reason why this needs to be in-memory (I'm guessing) is a combination of dedupe checks and efficient ordered logs in the on-disk journals.

Additionally, you can scale the memory retention policy up and down as you see fit. So if you notice a network outage, you can scale up the retention policy for the duration and then reduce it back down again after.


>If you expect a 12 hour outage, you should probably spec for 12 hours.

Industrial facilities don't always expect their outages.

> Additionally, you can scale the memory retention policy up and down as you see fit. So if you notice a network outage, you can scale up the retention policy for the duration and then reduce it back down again after.

Better, though still a hassle to manage.


I believe Timescale has a similar issue if you’ve enabled compression (and it’s usually too expensive not to). Perhaps the main reason we’re sticking with Influx for the moment.


(Timescale person) You can backfill or update data that lives in a compressed chunk by first decompressing that chunk. [0]

But we are working on making it easier. :-)

[0] https://docs.timescale.com/latest/using-timescaledb/compress...


We have compression enabled for data older than 3 months. This gives us enough wiggle room to insert/modify anything recent.


Take a look also at VictoriaMetrics - it supports loading historical data without any restrictions [1]. It supports Influx line protocol for data ingestion [2] and it usually requires lower amounts of RAM / CPU / disk space comparing to InfluxDB [3].

[1] https://victoriametrics.github.io/#backfilling

[2] https://victoriametrics.github.io/#how-to-import-time-series...

[3] https://medium.com/@valyala/insert-benchmarks-with-inch-infl...


Do they have no historical backfill at all?


Only 2 years since the original announcement. :D

A bit strange that only memory & magnetic stores are currently available. I'm curious as to why the SSD option isn't available yet.


Yeah we signed up for the beta shortly after that announcement but never heard back from AWS except an automated reply. We even asked our account rep this summer as to the status of timestream and he had no idea. Now it's GA!


We didn't hear back until 4 months ago… and only after continually pestering our account manager.


Doesn't magnetic make more sense since I assume the data is laid down (timeseries) in a serial fashion?


SSD may help in the case when a query needs to process data from many time series over big time range. Data for a single time series can be laid down in a serial fashion. When reading data from many time series, then many disk seeks are usually needed for jumping between time series.

This may be optimized somehow for common case when the query touches related time series. Such optimizations are implemented in VictoriaMetrics [1], so it usually outperforms other time series databases on high-latency magnetic disks with low IOPS.

[1] https://victoriametrics.github.io/


AWS looked at Timescale's new license and named their db Timestream. Haha, classic.


Yes, and we (Timescale) are flattered ;-)

Unfortunately for Timestream, Timescale Cloud is 10x-70x cheaper to use than Timestream. [0]

Timescale Cloud is also in 76 regions across AWS, Azure, and Google. Timestream only in 4 regions, 1 cloud. [1]

[0] https://docs.google.com/spreadsheets/d/1Nb9wTLqlWB_uch_VKuIm...

[1] https://blog.timescale.com/blog/fully-managed-time-series-da...


Well, in AWS' defence, Timestream was announced at re:invent in 2018.


Timescale was announced in April 2017. The name isn't a coincidence. :-)

https://blog.timescale.com/blog/when-boring-is-awesome-build...


I see some references to Presto, but does anyone know what this engine is based on? Seems vaguely similar to Druid but it's hard to tell.


We use Timescale for IoT telemetry but we also are fully in GCP. I wonder if GCP are also planning a fully fledged timeseries database?


(Timescale person) Thanks for using the service. Feel free to send us any product feedback, we are all ears :-)


Keep up the great work :)


Thank you :-)

Will share with the rest of the team!


200 years LOL. That's some impressive hubris.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: