Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Aurora Update – PostgreSQL Compatibility (amazon.com)
296 points by jasonlotito on Nov 30, 2016 | hide | past | favorite | 73 comments



What are some downsides one should keep in mind when considering moving from RDS Postgres to RDS Aurora? What's the catch?

Is it a lot harder to get predictable performance out of this at scale?

Do you optimize the db the same way you would optimize a single Postgres instances, or do you need to learn a whole new set of rules for the Aurora query planner? Getting a well known single node DB to perform great at scale is already a fairly specialized skill, what does it mean to do the same for a proprietary black box multi-node cluster that only exists in the cloud?

Are there any success stories of people making the switch at scale that one can read more about?


(Ozgun from Citus Data. I compiled the following from Aurora talks and tutorials I've attended. Happy to edit it with more info.)

I don't think you need to worry about tuning things for the planner. Aurora Postgres and Postgres use the same planner, executor, and storage engine. The two differ in how write-ahead logs and consequently replication works.

http://www.slideshare.net/AmazonWebServices/amazon-aurora-am...

Slide 20 has a good technical comparison. TL;DR: RDS replicates data/logs to a standby instance and EBS. Aurora replicates logs to a highly replicated storage engine.

This slide also hints why Aurora MySQL highlights performance improvements, and Aurora Postgres doesn't. RDS MySQL replicates a lot more data and logs to the standby. Postgres has a better replication engine and therefore doesn't consume nearly as much resources. (red and yellow arrows in Slide 20)

On a personal note, I was curious about Aurora's architecture to understand how it compared to other solutions, such Citus. Aurora focuses on better replicating your logs/data and scaling storage. Citus scales out through sharding/distributing the underlying data and spreading the work across instances. I also love that Citus is open source. Then again, I'm biased. :)


Nice, thank you for the links and the writeups! What would you say are the downsides then of going from RDS PG to this? Sounds like there are basically none, but I'm still skeptical. Should I not be?

We've definitely heard of Citus many times as well, it will be interesting to see what competition in this space will look like soon.


At this point it's a lot of guess on what the downsides will be. On the pages for Aurora they mention they're working on support for extensions and various PLs, but it's unclear if they're supported in the beta at this time. It's also unclear if they had to fork off in some way, if they did it will most likely lag behind what core Postgres at each new release. Of course right now it's a lot of guesswork. It should be much more clear when it gets into GA on what the trade-offs are.


Got it, thanks for clarifying. Hopefully if anybody from AWS sees this they can perhaps do extra nudging on their end about making SLAs more transparent to end users.


(Hopefully jeffbarr will read this)

How does one get access to try this out? The article mentions it's available in us-east-1 (N. Virginia) however I checked the RDS tab in us-east-1 and I don't see it listed. Only the MySQL version is available.

The "... you can sign up now for access!" link points to https://pages.awscloud.com/amazon-aurora-with-postgresql-com... but it just redirects to https://aws.amazon.com.


Below is the correct link to sign up for the preview of the PostgreSQL compatible edition of Amazon Aurora; we are fixing the link in Jeff Barr's blog post now, and thanks for letting us know!

https://pages.awscloud.com/amazon-aurora-with-postgresql-com...


Wonderful. Thanks!


I just read it - million to one chance, but I saw your comment! We fixed the link.


Well that sure is a killer feature. AWS is far ahead of Google Cloud for postgres users.


What's odd is that GCP does have a marketplace for third-party integrations through their "Cloud Launcher" [1] UI, where the products are either free prefabs (e.g. start N number of Cassandra nodes) or paid, managed VMs.

But there's nobody there that offers managed PostgreSQL (unless you could Postgres Enterprise, which is a single node that runs an odd "enterprise" version of PostgreSQL; no replication!). In fact, most of the solutions there aren't managed at all.

The whole thing seems rather half-hearted.

[1] https://console.cloud.google.com/launcher


GCP is working on it, it's one of their most requested features.


I wish Azure offered managed Postgres hosting.


I wish every cloud company had a managed Postgres offering. The potential pain of managing my own database is what keeps me locked in, and I'd be able to switch at a moments notice if that weren't the case.


So this looks pretty amazing. As a heavy postgres user, is there any reason why I shouldn't be using this? Or actually it's strictly superior in all regards?


From the article: On the stored procedure side, we are planning to support Perl, pgSQL, Tcl, and JavaScript (via the V8 JavaScript engine). We are also planning to support all of the PostgreSQL features and extensions that are supported in Amazon RDS for PostgreSQL.

Since they listed "planning to" I assume these aren't available yet, so if you use any extensions or stored functions you'll probably need to hold back until these are available.


So weird that you're not supporting python for stored procedures.


RDS doesn't support untrusted languages which includes Python [1].

1. https://www.postgresql.org/docs/9.5/static/plpython.html


This is actually a TIL for me, I've always just use pl/pgsql for sprocs as I've never had a need for anything beyond what it provides - I didn't realize the 'u' in python was for 'untrusted'.


It will cost a lot more than a single DB, and a bit more than a pair of DBs.


I'm curious how much of the underlying pgsql storage engine was rewritten since Aurora/MySQL uses a completely different engine than the default InnoDB.


I doubt much. Aurora looks more like those engines running on an EFS disk than a major rewrite of the storage engines themselves.

They need to disable some features to make concurrent readers on the same network disk work. I'm sure it's harder than I'm making it sound, but I don't think it's a new storage engine beyond the disk level.


I suspect you're right. I wonder if that's part of the reason why they claim a lower performance improvement of 2x over "standard" pgsql installations rather than the 5x improvement of Aurora/MySQL over standard MySQL installations.

If so, though, hopefully there's a silver lining of faster incorporation of new pgsql features into the Aurora/PG service.


And this is how software becomes closed-source in the cloud era.


And why SaaS companies hate the AGPL.


Netflix must be biting their fingers :)

https://news.ycombinator.com/item?id=11950811


Sorry to see for Enterprise DB and Citus. Tough biz to be in.


Don't worry ;)

Citus users typically have a working set of a few hundred GB or more and run into memory and CPU limitations on a single machine on both the write and the read side.

Provided your data has a natural sharding dimension (e.g. tenant ID in a multi-tenant application), Citus can distribute tables across many servers to scale out memory, CPU and storage. It can also parallelise queries, index creation, aggregations, and deletion across the cluster, and it can horizontally scale out write throughput. This gives you the necessary tools to deal with very large data volumes.

e.g. 500k writes/sec https://www.citusdata.com/blog/2016/09/22/announcing-citus-m...


Any chance that multi-master will be open sourced ?


Many of the underlying bits that make Citus MX possible will in time be rolled into the open source version. At the same time some of the ways we make it possible, such as by managing the streaming replication and high availability may take longer to productize and it's unclear in what form they'll evolve. For now Citus MX is available on Citus Cloud–due to the way we're able to take care of all of those things for you.


Does this include the full set of json/jsonb features and benefits of GIN indexes?


yes.


Infinitely scalable[1] Postgres? Yes please!

[1]: For all practical purposes for mortals.


DynamoDB and S3 are infinitely scalable.

Aurora is scalable to 64TB. For 10 million users, that's 6MB per user.


if anyone form google cloud reading this, are there any plans for adding postgres support on GC SQL?


they already have an alpha in the works... try emailing support or posting in the discussion forums


What's the advantage to using this over RDS?


(From AWS Aurora's website)

We’re now previewing the addition of PostgreSQL compatibility to Amazon Aurora. This edition has the benefits customers have come to expect from Aurora, including high durability, high availability, and the ability to quickly deploy low latency read replicas while supporting the full SQL dialect and functionality of PostgreSQL 9.6.


Storage cost, performance and auto-scaling are also big advantages.


Auto scaling in RDS? That'd require a multi-master cluster. As far as I know, that's impossible on RDS, Aurora or not.


I should have been more clear. I was referring to storage auto-scaling. Aurora gives you access to up to 64TB of disk billed in 10GB increments. You don't need to preallocate and manage the scaling of your disk as you do with RDS.


Autoscaling of storage at least (up to 64 TB).


nitpick: Aurora is RDS (an available engine alongside standard MySQL/Postgres/etc)


I'm also interested in whether this ends up being competitive with AWS Redshift for large-scale data warehousing.


It could be a very performant transparent caching layer when combined with the fdw extension[1]. You can push down the true analytics queries down to Redshift, but cache the resultsets, rollups, facts tables, etc, in materialized views. Then have the full power of modern Postgres available for exposing your data warehouse. The more predictable performance and auto-scaling storage makes it much better than the standard RDS for that setup.

https://aws.amazon.com/blogs/big-data/join-amazon-redshift-a...


Better performance, lower jitter, easy scalability. There's really no reason to not use Aurora.


Aurora is more expensive, more so on the high-end, so it's not entirely clear that it is the best choice for all applications:

db.r3.8xlarge: RDS Aurora $4.640/h, RDS MySQL $3.780/h, RDS PostgreSQL $3.980/h


> Aurora is more expensive, more so on the high-end, so it's not entirely clear that it is the best choice for all applications:

> db.r3.8xlarge: RDS Aurora $4.640/h, RDS MySQL $3.780/h, RDS PostgreSQL $3.980/h

Based on your numbers the difference in pricing between Aurora and Postgres ($4.64 - $3.98) x 24 * 30 = $475/mo. To the company using a db.r3.8xlarge, which has 32-cores and 244 GB of RAM, that's not even a rounding error.


For many installations, an Aurora read replica can eliminate the need for a multi-AZ master instance which cuts the cost of your master instance in half.


The failover guarantees with RDS Aurora are substantially better as long as you have 1+ read replicas.

That alone is worth the difference imo.


Very cool, can anyone tell if it supports PostGIS?


From the blog post, they say that's a planned feature.

"We are also planning to support all of the PostgreSQL features and extensions that are supported in Amazon RDS for PostgreSQL[1]."

[1] https://aws.amazon.com/rds/postgresql/


> We are also planning to support all of the PostgreSQL features and extensions that are supported in Amazon RDS for PostgreSQL.

RDS Postgres currently supports PostGIS, so this should as well.


PostGIS is absolutely supported.


Well...that's game changing.


I'd like to do some cost projections. When will this (along with the db.t2.medium instances) be added to the simple monthly calculator?


What sort of transactional support is there? Are Serializable transactions supported?


Aurora will support the same isolation levels as community PostgreSQL: up to Serializable on the writer, up to Repeatable Read on the replicas.


From the article I am not 100% sure whether support for AWS Lambda calls from stored procedures is planned, does anyone else also interpret that it is?


Had really hoped they would announce full mysql 5.7 support.


For Aurora, you mean? They support 5.7 for RDS, but a consequence of forking MySQL for Aurora is that it's unlikely that new MySQL features will quickly be incorporated into Aurora/MySQL.


For Aurora, yes. I understand the tradeoff and happy to stay on Aurora. Found this forum post stating that 5.7 compat will come next year


They already announced a year ago that it's something they're working on, but also that they would not provide a timeline.


How does this compare to an offering similar to Postgres on Compose.com?


is Aurora different from https://aws.amazon.com/rds/postgresql/ ?


No need for multi AZ. Faster, more cost effective. Downtime of less than a minute in the event of a crash. Geographically distributed storage that autoscales up to 64TB.


This isn't true, you still need a read replica if you want it to failover under a minute. Otherwise it'll try to create another DB.


From the FAQ

"Q: How does Aurora improve recovery time after a database crash?

Unlike other databases, after a database crash Amazon Aurora does not need to replay the redo log from the last database checkpoint (typically 5 minutes) and confirm that all changes have been applied, before making the database available for operations. This reduces database restart times to less than 60 seconds in most cases. Amazon Aurora moves the buffer cache out of the database process and makes it available immediately at restart time. This prevents you from having to throttle access until the cache is repopulated to avoid brownouts."

https://aws.amazon.com/rds/aurora/faqs/


Also from the FAQ (you are both right)

=========================================================

Q: What happens during failover and how long does it take?

Failover is automatically handled by Amazon Aurora so that your applications can resume database operations as quickly as possible without manual administrative intervention.

If you have an Amazon Aurora Replica, in the same or a different Availability Zone, when failing over, Amazon Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which is in turn is promoted to become the new primary. Start-to-finish, failover typically completes within a minute.

If you do not have an Amazon Aurora Replica (i.e. single instance), Aurora will first attempt to create a new DB Instance in the same Availability Zone as the original instance. If unable to do so, Aurora will attempt to create a new DB Instance in a different Availability Zone. From start to finish, failover typically completes in under 15 minutes.

Your application should retry database connections in the event of connection loss.


No more VACUUM then?


Vacuum is still there. Though in some cases it is significantly faster.



Thank you, we've updated the link from https://aws.amazon.com/rds/aurora.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: