I saw the examples, but that was even more confusing because I think the maths is wrong on almost all the calcs:
From the example: "Writes cost: $ 8.64 per month. This is computed as: ( 2 writes * 100 EC2 instances * 60 minutes * 24 hours * 30 days) * $0.50/ 1 MM writes"
I get $4.32 from ((2 * 100 * 60 * 24 * 30) * 0.5) / 1M
From the example: "Memory store cost: $ 3.55 per month. This is computed as: ( 2 KB per minute per instance * 100 instances * 60 minutes * 6 hours * 30 days) * $0.036 / GB-hour."
For starters I think there needs to be another "* 24 hours" in there - the 6 hours is to account for the retention period, but they're storing that 24 hours a day...
But even with that I get $1.87 for the memory store from ((2 * 100 * 60 * 6 * 24 * 30) / 1000000) * 0.036 (the div by 1M is to convert KB to GB)
From the example: "Magnetic store cost: $ 5.93 per month. This is computed as: (2 KB per minute per instance * 100 instances * 60 minutes * 24 hours * 30 days * 12 months) * $0.03 /GB-month."
But I get $3.11: ((2 * 100 * 60 * 24 * 30 * 12) / 1000000) * 0.03
From the example: "Query cost (alerting queries): $ 7.03 per month. This is computed as (10 MB per alerting query * 100 queries per hour * 24 hours * 30 days * $0.01 / GB). Alerting queries process 5 minutes of data. 5 minutes of data is 1 MB (2 KB per minute per instance * 100 instances * 5), which gets rounded up to 10 MB minimum charge for queries."
But I get $7.20 from (10 * 100 * 24 * 30 * 0.01) / 1000 (MB -> GB)
It's only on the "Query cost (ad-hoc queries)" that I can make the answer agree.
(They have now updated the magnetic price, as noted by others)
For posterity, the above is no longer correct. They updated their example to use 200 EC2 instances, which was most of the disparity, and it seems they do actually use 1024 KB/MB in a KB/MB, while I had in my head from elsewhere in AWS that they don't do that...
Despite 2 years of preview, this GA launch looks like the most half-baked I've seen from AWS.
No SSD Store? Only available in US regions? Pricing page does not have the usual per-region dropdown and instead says "he pricing in Europe (Ireland) Region will be 13.1% more than the price below."
edit: also just noticed the Go SDK is uniquely split into 2 packages for some reason "service/timestream-query" and "service/timestream-write".
From my limited experience with time series, not being able to insert data older than X seems like a big disadvantage that may bite later down the road. Bugs happen.
Was just going to post the same, this seems great for rolling analytics on software or VMs, as a backing for monitoring infrastructure and pairing with Grafana.
However for IoT or hardware time series I’m left with a few questions... A few things it seems are not possible:
- An IoT device with limited network connection that batches data points while offline
- manually loading time series data from disk at a later date on an offline device such as a weather station
- loading data “in the future” (data points with bad/erratic timestamps, memoized predictions)
I guess the hardware / IoT use case is pretty small compared to infrastructure monitoring, where Timestream’s design choices make a lot of sense.
>I guess the hardware / IoT use case is pretty small
OSIsoft was bought earlier this year by AVEVA for $5 billion[1]. OSIsoft's offering is centered around storing industrial control system data (the buzzword-version is IIoT, though that is a bit more broad). They're also not the only player in this space. Many of the industrial automation vendors also have their own offerings.
Take a look at vmagent [1]. It has been designed to run and collect monitoring data on IoT and edge devices and then upload it to remote storage in short time frames when connection to the storage can be established. There are remote storage systems such as VictoriaMetrics, which support loading historical data out of the box [2].
Yeah we're not talking about buggy telemetry publishing code, we're talking about systems that could have poor connectivity that can only publish data once every 10 days (but include all samples). Seems like a massive oversight.
The other surprising factor is that a lot of heavy industry will have data going back decades. If they can't get all of that data in to timestream, that's a massive mark against it.
I'd heard that it was initially developed for a major industrial operator (sounded like it was in the resources industry though I never found out who it was for), so the lack of support for late arriving data is a big surprise. That being said, there aren't any of those companies on the customers page - maybe it was just an incorrect rumour.
BTW, which solution do you use for this case? Did you look at vmagent? It supports the case for collecting data from industrial appliances and uploading it to a central time series database when connection to it can be established [1].
Sure, but if you configure your database for, lets say, 4 hours of in-memory, and you have a network outage of 12 hours (this is not uncommon on mine sites), you lose 8h of data and cannot put it in to your "master store".
If you expect a 12 hour outage, you should probably spec for 12 hours.
Memory-based storage is expensive, but not ruinously so.
The reason why this needs to be in-memory (I'm guessing) is a combination of dedupe checks and efficient ordered logs in the on-disk journals.
Additionally, you can scale the memory retention policy up and down as you see fit. So if you notice a network outage, you can scale up the retention policy for the duration and then reduce it back down again after.
>If you expect a 12 hour outage, you should probably spec for 12 hours.
Industrial facilities don't always expect their outages.
> Additionally, you can scale the memory retention policy up and down as you see fit. So if you notice a network outage, you can scale up the retention policy for the duration and then reduce it back down again after.
I believe Timescale has a similar issue if you’ve enabled compression (and it’s usually too expensive not to). Perhaps the main reason we’re sticking with Influx for the moment.
Take a look also at VictoriaMetrics - it supports loading historical data without any restrictions [1]. It supports Influx line protocol for data ingestion [2] and it usually requires lower amounts of RAM / CPU / disk space comparing to InfluxDB [3].
Yeah we signed up for the beta shortly after that announcement but never heard back from AWS except an automated reply. We even asked our account rep this summer as to the status of timestream and he had no idea. Now it's GA!
SSD may help in the case when a query needs to process data from many time series over big time range. Data for a single time series can be laid down in a serial fashion. When reading data from many time series, then many disk seeks are usually needed for jumping between time series.
This may be optimized somehow for common case when the query touches related time series. Such optimizations are implemented in VictoriaMetrics [1], so it usually outperforms other time series databases on high-latency magnetic disks with low IOPS.
Comparing with S3 prices surely the magnetic store is closer to $0.03 per GB-month?