Is it a lot harder to get predictable performance out of this at scale?
Do you optimize the db the same way you would optimize a single Postgres instances, or do you need to learn a whole new set of rules for the Aurora query planner? Getting a well known single node DB to perform great at scale is already a fairly specialized skill, what does it mean to do the same for a proprietary black box multi-node cluster that only exists in the cloud?
Are there any success stories of people making the switch at scale that one can read more about?
I don't think you need to worry about tuning things for the planner. Aurora Postgres and Postgres use the same planner, executor, and storage engine. The two differ in how write-ahead logs and consequently replication works.
Slide 20 has a good technical comparison. TL;DR: RDS replicates data/logs to a standby instance and EBS. Aurora replicates logs to a highly replicated storage engine.
This slide also hints why Aurora MySQL highlights performance improvements, and Aurora Postgres doesn't. RDS MySQL replicates a lot more data and logs to the standby. Postgres has a better replication engine and therefore doesn't consume nearly as much resources. (red and yellow arrows in Slide 20)
On a personal note, I was curious about Aurora's architecture to understand how it compared to other solutions, such Citus. Aurora focuses on better replicating your logs/data and scaling storage. Citus scales out through sharding/distributing the underlying data and spreading the work across instances. I also love that Citus is open source. Then again, I'm biased. :)
We've definitely heard of Citus many times as well, it will be interesting to see what competition in this space will look like soon.
How does one get access to try this out? The article mentions it's available in us-east-1 (N. Virginia) however I checked the RDS tab in us-east-1 and I don't see it listed. Only the MySQL version is available.
The "... you can sign up now for access!" link points to https://pages.awscloud.com/amazon-aurora-with-postgresql-com... but it just redirects to https://aws.amazon.com.
But there's nobody there that offers managed PostgreSQL (unless you could Postgres Enterprise, which is a single node that runs an odd "enterprise" version of PostgreSQL; no replication!). In fact, most of the solutions there aren't managed at all.
The whole thing seems rather half-hearted.
Since they listed "planning to" I assume these aren't available yet, so if you use any extensions or stored functions you'll probably need to hold back until these are available.
They need to disable some features to make concurrent readers on the same network disk work. I'm sure it's harder than I'm making it sound, but I don't think it's a new storage engine beyond the disk level.
If so, though, hopefully there's a silver lining of faster incorporation of new pgsql features into the Aurora/PG service.
Citus users typically have a working set of a few hundred GB or more and run into memory and CPU limitations on a single machine on both the write and the read side.
Provided your data has a natural sharding dimension (e.g. tenant ID in a multi-tenant application), Citus can distribute tables across many servers to scale out memory, CPU and storage. It can also parallelise queries, index creation, aggregations, and deletion across the cluster, and it can horizontally scale out write throughput. This gives you the necessary tools to deal with very large data volumes.
e.g. 500k writes/sec https://www.citusdata.com/blog/2016/09/22/announcing-citus-m...
: For all practical purposes for mortals.
Aurora is scalable to 64TB. For 10 million users, that's 6MB per user.
We’re now previewing the addition of PostgreSQL compatibility to Amazon Aurora. This edition has the benefits customers have come to expect from Aurora, including high durability, high availability, and the ability to quickly deploy low latency read replicas while supporting the full SQL dialect and functionality of PostgreSQL 9.6.
RDS Aurora $4.640/h, RDS MySQL $3.780/h, RDS PostgreSQL $3.980/h
> db.r3.8xlarge: RDS Aurora $4.640/h, RDS MySQL $3.780/h, RDS PostgreSQL $3.980/h
Based on your numbers the difference in pricing between Aurora and Postgres ($4.64 - $3.98) x 24 * 30 = $475/mo. To the company using a db.r3.8xlarge, which has 32-cores and 244 GB of RAM, that's not even a rounding error.
That alone is worth the difference imo.
"We are also planning to support all of the PostgreSQL features and extensions that are supported in Amazon RDS for PostgreSQL."
RDS Postgres currently supports PostGIS, so this should as well.
"Q: How does Aurora improve recovery time after a database crash?
Unlike other databases, after a database crash Amazon Aurora does not need to replay the redo log from the last database checkpoint (typically 5 minutes) and confirm that all changes have been applied, before making the database available for operations. This reduces database restart times to less than 60 seconds in most cases. Amazon Aurora moves the buffer cache out of the database process and makes it available immediately at restart time. This prevents you from having to throttle access until the cache is repopulated to avoid brownouts."
Q: What happens during failover and how long does it take?
Failover is automatically handled by Amazon Aurora so that your applications can resume database operations as quickly as possible without manual administrative intervention.
If you have an Amazon Aurora Replica, in the same or a different Availability Zone, when failing over, Amazon Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which is in turn is promoted to become the new primary. Start-to-finish, failover typically completes within a minute.
If you do not have an Amazon Aurora Replica (i.e. single instance), Aurora will first attempt to create a new DB Instance in the same Availability Zone as the original instance. If unable to do so, Aurora will attempt to create a new DB Instance in a different Availability Zone. From start to finish, failover typically completes in under 15 minutes.
Your application should retry database connections in the event of connection loss.