A warning about Aurora: It's opaque tech. I've been on a project that switched to it by recommendation by the hosting provider, and had to switch away because it turns out that it does not support queries requiring temporary storage, i.e. queries exceeding the memory of the instances.
It manifested the way that the Aurora instances would use up their available (meagre) memory, then start thrashing, taking everything down. Apparently the instances did not have access to any temporary local storage. There was no way to fix that, and it took some time to understand. After having read all the little material I could find on Aurora, my personal conclusion is that Aurora is perhaps best thought of as a big hack. I think it's likely there are more gotchas like that.
We moved the database back to a simple VM on SSD, and Postgres handled everything just fine.
We’ve generally been happy with Aurora, but we run into gotchas every so often that don’t seem to be documented anywhere and it’s very annoying.
Example: in normal MySQL, “RENAME TABLE x TO old_x, new_x TO x;” allows for atomically swapping out a table.
But since we moved to Aurora MySQL, we very occasionally get stuff land in the bug tracker with “table x does not exist”, suggesting this is not atomic in Aurora.
Is this documented anywhere? Not that I’ve been able to find. I’m fine with there being subtle differences, especially considering the crazy stuff they’re doing with the storage layer, but if you’re gonna sell it as “MySQL compatible” then please at least tell me the exceptions.
I believe this issue is (or was) real. There are important differences in how Aurora treats temporary data. Normal postgres and rds postgres write it into the main data volume (unless configured otherwise). Aurora however always separates shared storage from local storage and it's not entirely clear to me what is this local storage physically for non-read-optimized instance types. The only way to increase it is to increase the instance size. [1][2] This is indeed frustrating because with postgres or rds postgres you just increase the volume and that's it.
Luckily since November 2023 it also has r6gd/r6id classes with local NVMEs for temp files. [3] This should in theory solve this problem but I haven't tried it yet.
I think Aurora has to go through the same development process as every database. They changed essential patterns in the database, and there are severe side effects that need to be addressed. You can see the same with Aurora Serverless and the changes in V2; there were some quite quirky issues in the first versions.
Calling it a hack is pretty unfair. The log storage engine is a huge innovation, in my experience makes large MySQL/pg clusters much more reliable and performant at scale in a variety of different ways.
It has a couple of quirks, but on balance it feels like the future - the next evolution of what traditional rdbms are capable of.
But if you don’t have scale or resiliency needs it probably doesn’t matter to you.
Isn't Aurora mainly about their unique handling of logging and replica which leads to high availability and fast recovery? If you switch to VM, how do you handle availability in multiple locations and backups? If database checkpoints are good enough for you, sounds like Aurora is overkill in the first place.
I’m going to preface this with I’m not an OPS person. But yes multi region. The main writer instance is us-east-1 it performs excellently when hitting this region. We have read replicas in us-west-2 and some in Europe/emea and Asia/pacific.
When hitting one of these with a write you end up with massive delays. The 7 seconds and below tends to be from us-west-2 and the higher numbers are from our Japanese users.
Our OPS team has struggled to figure out why the delays happen. There’s some code fixes we could probably do (i.e always write to the writer) but as team lead for the development side the deadline is too close and I don’t want to rewrite core parts of the app to split reads and writes. They engaged AWS support so I’m hoping something is just misconfigured or maybe this just isn’t the use case for Aurora.
Oh. Yeah the log file system and thus the consistent replica latency are local to a region.
It sounds like you might be using global aurora with write forwarding? That’s pretty new and not something I have experience sorry. AFAIU though it’s a whole different thing under the hood.
> It sounds like you might be using global aurora with write forwarding
Yes I believe this is what they chose. Honestly I’m going to leave it up to them and aws support. I have other fish to fry to get the functionality finished.
Having previously been on several managed PostgreSQL providers and now on AWS Aurora -- Aurora has been pretty great in terms of reliability and performance with large row counts and upsert performance.
However, Aurora isn't cheap and is at least ~80% of our monthly AWS bill. I wonder how it is cheaper than Heroku's previous offerings? Is it Aurora Serverless v2 or something like that to reduce cost? Aurora billing is largely around IOPS, and Heroku's pricing doesn't seem to reflect that.
Heroku Postgres has always been priced on platform convenience with very high margins. It's been many years now so I don't remember the exact numbers, but I moved a few databases from Heroku to AWS and reduced my DB costs ~90% (magnitude ~900/mo --> ~100/mo) for roughly the same specs. They probably have a lot of margins to eat into before they need to adjust prices.
We're using the highest tier Postgres instance at my work for one of our legacy Heroku apps and it costs thousands over what we'd pay for the equivalent on AWS directly.
I'm assuming this means that they are not providing any sort of guarantee on the amount of RAM available and packing these instances as tightly as they can.
“Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application's needs.”
Everything on Heroku is billed with a huge margin, plus as they're probably a partnered customer by now their pricing is a fraction of the average AWS customers pricing. I've been at companies on both sides of the partner pricing list and the difference is huge.
Aurora has treated us well. We make a self-hosted product that requires Postgres; our sales/customer engineering folks just started telling people to use Aurora, and it hasn't caused any problems despite the fact that all of our tests run against stock Postgres. Can't complain. Though a VM with Postgres would be plenty for our needs, and cost thousands of dollars less a month. But, HA is nice if you want to pay for it.
Yeah, that’s what we use as well but I don’t think that addresses the underlying instance cost? I’m not familiar with Serverless v2 though, if that’s what this is using.
Just curious, does Aurora scale down at all in price, i.e. if I have a test instance that's hardly ever used, does it ever end up being cheaper than a classic RDS instance?
Xata is (like Heroku) based on Aurora, but offers database branching and has different pricing. That should be ideal for lightly-used test instances, because you only pay for storage, and 15GB are included in the free tier.
You're thinking of Aurora Serverless, but the typical Aurora customer isn't using the Serverless offering. Additionally, the original version of Aurora Serverless scaled to 0, but v2 doesn't.
Came here to say this. Aurora is good but also very expensive. If your queries are not very well tuned you will pay through the nose for I/O that would be unnoticeable in other Postgres implementations. For a very modest installation I saw DB bills go down from $3,000 a month on Aurora to about $100 for self managed Postgres.
“Not hard” is very relative. Is it hard to run a Postgres database? No.
Is it hard to set up monitoring, automatic failover (without data loss), managed upgrades, tx wraparound protection, clustering and endpoint management, load balancing… and probably a bunch of other things I’m not thinking of? Yes.
So figure it out. I don’t understand why “ugh this is hard, I’ll pay someone else” has become the norm. You’re working in one of the most technically advanced fields in the world; act like it.
Most people aren't doing anything advanced. Also this has nothing to do with not wanting to do "hard things", that's ridiculous, it's a postgres cluster, you're not doing a PhD in math. People do it because there's limited time and no business advantage to operate postgres clusters. Use the time on what your business actually does.
> that's ridiculous, it's a postgres cluster, you're not doing a PhD in math.
It's not as difficult as a PhD (I assume; I only got as far as an MS), but based on what I've witnessed, it's up there in complexity. There are dozens of knobs to turn – not as many as MySQL/InnoDB to be fair, but still a lot – things you have to know before they matter, etc.
> People do it because there's limited time and no business advantage to operate postgres clusters. Use the time on what your business actually does.
I've seen this argument countless times for SaaS anything. I don't think it's accurate for a database. Hear me out.
For most companies, the DB is the heart. Everything is recorded there, nearly every service's app needs it (whether it's a monolith or micro service-oriented DBs), and it's critically important to the company's survival. Worse, the same skills necessary for operating your own DB generally overlap heavily with optimally running a DB, by which I mean if you're good at things like DB backup automation, chances are you're also good at query optimization, schema design, etc.
It's that latter part that seems to be missing from many engineering orgs. "Just use Postgres," people say; "just add a JSONB column and figure out the schema later," but later never comes. If your business uses a DB, then you do not have the luxury of running one poorly. Spend a few days learning SQL, it's an easy language to pick up. Then spend a few days going through the docs for your DB, and try the concepts out in a test instance. Your investment will be rewarded.
You're at a more surface level than what I'm talking about. Your advice at the end is just common sense advice for anyone using any tool. It doesn't mean you should spend time implementing your own custom backup process with ability to go back to a specific point in time, configurable in 1 minute. The amount of work needed to operationalize postgres in the same way and expose it to other teams in a company will take long enough that you won't get an Aurora-like experience in less than a quarter with a full team. What could they be doing instead to your product?
All I'm saying is it has nothing to do with difficulty. In my job for example we self-hosted HBase which is a beast compared to postgres, implemented custom backups etc, all because there was no good vendor for it. Postgres is much simpler and we always just used RDS and then switched to Aurora for the higher disk limits when it was launched. If there's a good enough vendor, you're just stroking your ego re-implementing these things when you could move on to the actual thing the business wants to release.
I've also seen senior engineering leads "proving" self hosting "saves money" but then 2 companies working on the same type of problem in the same industry with a similar feature set, on one side we had 5 people maintaining what on the other company it took 6 teams of 4-8 people. So it depends if you'd like to have a lot of your labor focused on cutting costs or increasing revenue. And they never include the cost of communicating with extra 5 teams and the increased complexity and slowness to release things this creates, while also being harder to keep databases with current versions, more flimsy backup processes, etc.
Ps: we got rid of hbase, do yourself a favor and stay away
> Your advice at the end is just common sense advice for anyone using any tool.
Common sense isn't so common. I've met a handful of devs across many separate companies who care at all how the DB works, what normalization is, and will read the docs.
> It doesn't mean you should spend time implementing your own custom backup process with ability to go back to a specific point in time, configurable in 1 minute.
If by implement you mean write your own software, no, of course not. Tooling already exists to handle this problem. Off the top of my head, EDB Barman [0] and Percona XtraBackup [1] can both do live backups with streaming so you can backup to a specific transaction if desired, or a given point in time.
Or, if you happen to have people comfortable running ZFS, just snapshot the entire volume and ship those off with `zfs send/recv`. As a bonus, you'll also get way more performance and storage out of a given volume size and hardware thanks to being able to safely disable `full_page_writes` / `doublewrite_buffer`, and native filesystem compression, respectively.
> If there's a good enough vendor, you're just stroking your ego re-implementing these things when you could move on to the actual thing the business wants to release.
Focusing purely on releasing product features, and ignoring infrastructure is how you get a product that falls apart. Ignoring the cost of infrastructure due to outsourcing everything is how you get a skyrocketing cloud bill, with an employee base that is fundamentally unable to fix problems since "it's someone else's problem."
> Ps: we got rid of hbase, do yourself a favor and stay away
HBase and Postgres are not the same thing at all. If you need the former you'll know it. If people convince management that they do need it when they don't, then yeah, that's gonna be a shitty time. The same is true of teams who are convinced they need Kafka when they really just need a queue.
My overall belief, which has been proven correct at every company I've worked at, is that understanding Linux fundamentals and system administration remains an incredibly valuable skill. Time and time again, people who lack those skills have broken things that were managed by a vendor, and then were hopelessly stuck on how to recover. But hey, the teams had higher velocity (to ship products with poor performance).
Have you ever been paid to do work before? There’s a price at which a business will prefer to pay to have SaaS/PaaS solve a problem. Allocating engineering hours to setting up and maintaining a Postgres cluster has a cost. You’ll want someone senior on it. Their time could be well over $100/hour. And that’s assuming your business is small enough to only need one DBA part time. A business that’s spending a ton on Aurora might need 3 specialists. Now you’re talking about hundreds of thousands of dollars per year. It could be better to just pay AWS.
However, at large scales cloud won’t make sense anymore. They do have a markup and eventually what you’re paying in markup could instead buy you a few full time employees.
Yes, many times, which is why I've developed this opinion.
> However, at large scales cloud won’t make sense anymore. They do have a markup and eventually what you’re paying in markup could instead buy you a few full time employees.
The issue is once you've finally realized this stuff matters, and have hired a DB team, I can practically guarantee that your schema is a horror show, your queries are hellish, and your product teams have neither the time nor inclination to unwind any of it. Your DB{A,RE}s are going to spend months in hell as they are suddenly made the scapegoats for every performance problem, and are powerless to fix anything, since their proposals require downtime, too much engineering effort, or both.
Hence my statement. Learn enough about this stuff so that when you do hire in specialists, the problems are more manageable.
You need to do things that are appropriate for a small company when you’re a small company. And then if you become a large company you change things to suit your new scale.
All of the troubles you described sound like bad management. I’m sorry if you’ve had to go through that. DBAs that are setting up a replacement are going to need time to do that right and expectations need to be set that this is a tricky problem.
You clearly don’t know what aurora is or does if you think people can just run their own. It’s not a regular Postgres setup, and nothing exists that’s equivalent for self hosted.
Heroku product here: the Essential 0 and 1 plans replace the older row-limited Heroku Postgres mini/basic plans at the same price points but better perf in a lot of scenarios and a storage instead of row limit - forcing people to denormalize for row count wasn't ideal under old mini/basic limits. The Essential-2 plan is a new option for a larger pre-prod/test/small-scale DB option above what we offered before.
We're expanding the Aurora-backed offerings to include larger dedicated DBs in the relatively near future as well.
I was for quite a few years - moved to Heroku in Q3 last year for an interesting opportunity. Apex is in good hands with Daniel Ballinger (and I stay in touch with a bunch of team if that helps).
These "Essential" tiers are bare bones instances for toys/mvps, they're much different than the bigger ones. No replication, 99.5% uptime target, no maintenance windows etc.
I'm curious how the Essential plans work, given that Aurora pricing starts higher than that in monthly costs. It is probably databases in a shared multi-tenant Aurora instance, and then the single-tenant plans that are currently in pilot give you the full Aurora instance. That also explains some of the limitations and the low connection limits.
We do, although we're in the middle of moving our entire Heroku Postgres spend over to Crunchy Data [1].
We were getting close to one of the big jumps on the standard pricing of Heroku Postgres, and we would have had to basically double our monthly cost to lift the max data we could store from 1.5TB to 2.0TB. On Crunchy Data, that additional disk space will be like 1% more rather than 100% more.
While investigating Crunchy, I ran some benchmarks, and I found Crunchy Bridge Postgres to be running 3X faster than Heroku Postgres.
Heroku seems to be working on some interesting new things, but I feel burned by the subpar performance and lack of basically any new features over many years. I don't know if the new Aurora-based database will be faster than Crunchy, but the benchmarks they're talking about sound like they're finally about to catch them. But we also have better features on Crunchy, too, such as logical replication. Logical replication is still not available on Heroku.
The experience for deploying apps and having add-ons is still pretty easy, but we'll see how that improves. HTTP2 support is still in beta.
My experience with going from Heroku Postgres to Crunchy Data (specific Crunchy Bridge) has been really good. Their product has been absolutely rock solid but what really made the difference was their support. They provided a huge amount of pre-sales support while I planned the move (and even suggested mitigations for the problems I was having with Heroku Postgres to make moving less urgent). Post-sales support has been just as good, though mostly I don’t even have to think about the database hosting anymore.
I also moved my app hosting to NorthFlank from Heroku and have been really happy with that as well. It’s got the features I always wanted on Heroku (simple things like grouping different types of instances together into projects really helps) plus again excellent responsive support.
Our experience of moving from Heroku to CrunchyBridge has been very similar - excellent help with the migration including jumping on a call with us during the switchover to resolve a broken index.
Would strongly recommend them to anyone looking to move off Heroku.
I was a bit concerned about the cut-over from the old database on Heroku, really wanted to minimise downtime. So they helped me produce a step by step plan, test as much of it as possible, then had an engineer join me on Zoom while I made the switchover. They were even able to accomodate doing it in the early morning in my timezone to minimise the impact. Ended up with maybe 5 mins of downtime which I was very happy with.
I'm working in a new startup, and I tried several "easy" solutions: AWS Lightsail, Heroku, Crunchy.
I settled up on AWS ECS :)
My main issue with Heroku was that they have not changed anything in _years_. No support for gRPC, no IPv6, and simple VPC peering costs $1200 a month.
Yeah, the lack of HTTP/2 support has been a long-standing issue with Heroku.
They just shipped HTTP/2 terminated at their router [0], and have it on their roadmap [1] to support HTTP/2 all the way through. But it seems like it's at minimum a few months off.
(As for VPC peering: the moment you need that, it sorta feels like Heroku is no longer the right place to be, even ignoring the costs.)
Update to this: we've switched over our staging database, and the call to do that couldn't have been. more productive or more pleasant.
I got to talk to someone who was intelligent about Postgres, who answered various questions I had, who offered a few pieces of insight that I wouldn't have thought, etc.
Compared to every single support interaction I've ever had with Heroku for 10 years, and this was light years more friendly, informative, and productive.
Despite interesting competition, my feeling is that the Heroku of 2024 remains... Heroku.
I feel this way even though -- depending on how you segment -- the list of "interesting" competitors is quite long at this point: Render, Railway, Northflank, Fly.io, Vercel, DO App Platform, etc.
I revisit heroku alts every ~6months and I am shocked how not ergonomic they still are, I switched to DO VPS + Ansible Container & Github Actions for any project that doesn't need infinite scale after Salesforce paused heroku development but I'd go back to literally any heroku clone.
It's crazy how the ergonomic still just aren't there.
Yes, completely agree; I'm equally surprised by the poor DXes.
(And: bugs. I'm also surprised by the kinds of issues I run into on some of those sites in my list -- problems that, even if not show-stopping, feel like revealing indicators of quality.)
> Yes, completely agree; I'm equally surprised by the poor DXes.
Any specifics/examples? I find it hard to imagine those "big name" companies/platforms you just mentioned don't have entire teams dedicated to hyper-optimizing experience.
Can you elaborate a bit more why render is good? we are on heroku and I have evaluated alternatives every 6 months since heroku/github outage 2 years ago [1]. But I don't see how render is better. 2 years ago render postgres did not have PITR. now they have build it, but Render's postgres offering is even more expensive than heroku, and queries run a bit slower on similar spec machines based on my test. I also don't like render charges per seat in addition to infra cost.
10yo+ B2B SaaS company we're still on Heroku. I think the value prop is particularly good for B2B SaaS and probably less so for consumer products. Our margin per customer is so much higher than the infra cost it just never makes sense to spend money on devops instead of building features.
That said it does feel a bit like a ghost town, I'm always happy to hear when someone is doing something over there.
A few years ago I was considering Heroku for something new. But then I learned that Heroku Postgres's HA offering used async replication, meaning you could lose minutes of writes in the event that the primary instance failed. That was a dealbreaker.
That was very surprising to me. Most businesses that are willing to pay 2x for an HA database are probably NOT likely to be ok with that kind of data loss risk.
(AWS and GCP's HA database offerings use synchronous replication.)
Noticed this too. The master failover is marketed like a strict upgrade, and the "async" part is only in the fine print. Many would actually prefer the downtime over losing data. A user who's experienced with DBs should think to check on this, but still.
I’ll just share our experience with Hetzner from earlier this week to spare everyone the learning experience. Their prices are indeed incredible. However, we spun up a VPS in their Hillsboro, OR data center only to find out that our IP address was blocked by Cloud Flare, so there was no way for us to connect to our error logging or transactional email providers. We also found this thread indicating that their entire IP range for that DC is widely blocked[0]. So, not really acceptable for professional use, IMO, unless you just need compute.
So in that case both Mailgun and Sentry were blocking our IP. When we googled the error message, it appeared to be a Cloud Flare message of some kind. But anyway, wouldn’t touch them with a ten foot pole for anything web related.
probably those who don't want to set uptime alerts, fine-tune configs, set up backups and restores (which are essential because sooner rather later someone always deletes a few rows/tables) and want to focus on business
It's really easy to be SOC2 compliant for a small SaaS on heroku. We need to grow in customers and Dev resources to pull it off on Raw AWS. Am looking for options though because heroku is increasing their prices.
me. tried to move to render but it's been a headache for some key things that i need. my heroku setup is dialed in so it makes it a no brainer versus the time i've wasted trying to get render to fit my use case. right now i'm using both for two different services, and will consider moving off both once i get enough customers.
Nothing sophisticated. My woes might be because I'm bad at devops.
But I spent several hours fighting with a DNS change, trying to host my marketing website as Cloudflare pages site from my root domain (with DNS managed by Cloudflare), and then wildcard subdomains routed to a Render server. I couldn't get it to work no matter what configs I tried. My root domain marketing site is proxied through Cloudflare and I was trying to get the wildcard subdomains as DNS-only, and I suspect this was the problem but idk. In other words, the Cloudflare pages marketing site is https://bookhead.net and I wanted my customer's subdomains to route to the Render server like https://forlornbooks.bookhead.net/ (I still get the error since I haven't finished my migration to Heroku). The subdomains worked with no problem until I tried to setup a separate marketing site at the root domain.
Also, I had a hard time setting up SSH with a containerized server. It was a weird DX that was a bit confusing to document so I can remember later. Can only imagine how confusing it might be if I ever have teammates. The Render CLI looks promising, though.
These are only the most recent issues. Seems like y'all improved the headaches I ran into the time I tried.
RDS is such a depressing database option. It does not matter how much money you throw at it, its performance will always be limited by the awful disk IOPS. Luckily these days you can easily run PG on EC2 (or simply use CRDB).
Weird for Heroku to ignore this huge efficiency opportunity.
io2 is generally better than io1, one advantage is that you can scale storage size and IOPS independently. That being said, RDS with io2 is still worse than an ec2 instance with nvme (a lot worse)
It provides a lot of benefits to the user and also a ton more to the service provider. Specifically you don’t overprovision storage or compute. Plus at least theoretically you can provide invite IO throughput at the storage level.
Happy Aurora Postgres serverless customer here. Be sure to use pgbouncer (self hosted, but it needs minimal babysitting) if you intend to use it in a serverless environment (and even if you are not, the benefits of not having to worry about connection pool exhaustion are still worth it). AWS's proxy won't work too well with prepared statements connection pooling (something known as connection pinning).
After Heroku pulled their stunt in India when RBI changed some credit card rules, Heroku is pretty much history for me. They screwed small customers got rid of them saying they can't charge credit cards with new regulations. However they continued to entertain large customers from India.
I'm not sure what you mean by "running your own on native NVMe" Are you talking about using the managed AWS Relational Database Service or something else? Aurora can also use instance types with NVMe.
Anyway, that's also ignoring the features that Aurora offers, which is why people pay more for it. The ability to have multi-AZ deployments and auto-scaling of (what can be cross-region) read replicas make it very resilient and it's dead simple to operate what would normally be considered advanced features of a DB cluster.
If you just need a managed Postgres or MySQL traditional single instance and none of those extra features, then obviously you would not need to pay the premium for Aurora. RDS exists for that reason.
I was referring to running your own DB on hardware with NVMe drives. Obviously you lose every nicety of managed services, but tooling exists to replace it, and you gain stupid amounts of performance.
RDS Multi-AZ Cluster gives you much of the advantages of Aurora, but with higher performance and more tuning capabilities, though you are limited to 3 nodes. Tbf 3 nodes is almost certainly enough for most companies. A few hundred thousand QPS would be easily handled by that.
Re: cross-region read replicas, eh… if you’ve somehow managed to ensure that every single aspect of your app is capable of withstanding the loss of an entire region – including us-east-1, since most of the control plane functions are there – then sure, maybe. But do you need it? If an entire AWS region drops out, half of the internet goes with it, and you can just blame that. I doubt the small possibility of higher uptime is worth the literal doubling in monthly costs.
To be clear, my comment stated RDS is better "in almost every aspect." Aurora is better at one [0] thing – storage scaling. You do not have to think about it, period. Adding more data? You get more storage. Cleaned out a lot of cruft? The storage scales back down.
Aurora splits out the compute and storage layers; that's its secret sauce. At an extremely basic level, this is no different from, for example, using a Ceph block device as your DB's volume. However, AWS has also rewritten the DB storage code (both MySQL/InnoDB and Postgres). InnoDB has a doublewrite buffer, redo log, and undo log. Postgres has a WAL. Aurora replaces all of this [1] with something they call a hot log. Writes enter an in-memory queue, and are then durably committed to the hot log, before other asynchronous actions take place. Once 4/6 storage nodes (which are split across 3 AZs) have ACK'd hot log commit, the write is considered persisted. This is all well and good, but now you've added additional inter-process latency and network latency to the performance overhead.
Additionally, the storage scaling I mentioned brings with it its own performance implications. If you're doing a lot of writes, you'll encounter periodic performance hits as the Aurora engine allocates new chunks of storage.
Finally, even for reads, I do not believe their stated benchmarks. I say this because I have done my own testing with both MySQL and Postgres, and in every case, RDS matched or beat (usually the latter) Aurora's performance. These tests were fairly rigorous, with carefully tuned instances, identical workloads, realistic schema and queries, etc. For cases where pages have to be read from disk, I understand the reason – the additional network latency of the Aurora storage engine seems to be higher than that of EBS. I do not understand why a fully-cached read should take longer, though.
As a further test, I threw in my quite ancient Dell servers (circa 2012) for the same tests. The DB backing disk was on NVMe over Ceph via Mellanox, so theoretical speeds _should_ be somewhat similar to EBS, albeit of course with less latency since everything is in a single rack. My ancient hardware blew Aurora out of the water every single time, and beat or matched RDS (using the latest Intel instance type) almost every time.
[0]: Arguably, it's also better at globally distributed DB clusters with loose consistency requirements, because it supports write forwarding. A read replica in ap-southeast-1 can accept writes from apps running there, forward them to the primary in us-east-1, and your app can operate as though the write has been durably committed even though the packets haven't even finished making it across the ocean yet. If and only if your app can deal with this loosened consistency, you can dramatically improve performance for distant regions.
Google has 2 Postgres implementations: Cloud SQL and AlloyDB. How do they compare against AWS Aurora, for the heroku scenario, i.e., multi-tenant database?
here is the English translation: Amplify Might Be AWS's Worst Service, Bar None
Confusing documentation, a mix of old and new systems, and it made a mess of my AWS account.
To put it simply, over the past two days, I attempted to deploy a full-stack assignment on AWS services. The front end was written in React, using Vite as the framework. For such Single-Page Apps (SPAs), I personally prefer using specialized services like Netlify or Cloudflare Pages for deployment, as these services offer very robust CI/CD services, allowing for one-click deployment and automatic updates, saving a lot of hassle.
Initially, I planned to manually deploy on AWS using the S3 + CloudFront model (since it was just a one-time assignment), but later I discovered that AWS has a service very similar to Netlify called Amplify, which also offers CI/CD one-click deployment services. Amplify goes even further by including user directory services, allowing for one-click registration and login via related components.
It sounds great, but you only realize how problematic it is after using it. After some research, my initial deployment method was to upload the code to GitHub and then click the deploy button in the Amplify interface. This is also the deployment method I use most often with Netlify.
However, I later found something wrong. The key issue was that applications deployed this way using Amplify couldn't directly use Amplify's UI components to access Cognito user directory services. After much searching, I found that Amplify has an Amplify CLI initialization command to create a new CI/CD project in the Amplify service, which also deploys additional resources like Cognito.
It seemed feasible, so I did it. Then I found some issues. The initial "issues" were just on the AWS account management level: after deploying the project via Amplify CLI, my AWS account quickly filled up with a bunch of "things"—the reason "things" is in quotes is that Amplify created a lot of fragmented resources, including but not limited to CloudFormation, IAM roles, etc., even creating two Cognito identity pools for me—it's hard not to call them "junk." Moreover, most of these resources have names that are impossible for humans to remember or distinguish, and there are no explanations or grouping features to tell you what these things are for.
If it were just like this, it wouldn't seem to impact the development process, right? The biggest problem is that the local debugging and production environment apparently don't use the same configuration files, and when I was cleaning up the automatically created resources in my AWS account earlier, I somehow deleted the roles calling the Cognito user pool in the production environment, causing the production environment to be unable to access the two user pools created by Amplify, constantly throwing 400 errors.
After several rounds of "deploy-delete-redeploy-redelete," I decided to start over and look for the related documentation again. Later, I found that Amplify has a set of documentation outside of AWS's own documentation system, and this documentation recommends a deployment method: clicking the deploy button on the GUI webpage—yes, you heard it right, the same deployment method I used initially.
So, how do you deploy additional components/services like Cognito this way? Amplify's answer is configuration files. As long as you create a folder for configuration files in the root directory of your project and write the corresponding configuration files in it, the cloud will automatically create the resources you need in AWS once it reads them.
It sounds reasonable, right? Then you go to find the part about configuration files in the documentation... What's going on? Why can't I find anything in the search box in the documentation? There's not even a sample configuration file! Algolia indexing service can't be this bad, right?
Searching for "defineAuth" in the Amplify official documentation returns mostly irrelevant information.
Is my search method incorrect? I entered keywords like "site
.amplify.aws defineAuth" in the Kagi.com search engine but couldn't find any examples or explanations of configuration file items. At this point, I'm completely convinced that the Amplify documentation is garbage. Fortunately, the API documentation of the Amplify framework is quite good, at least reducing my urge to buy a ticket to the US and blow up Amazon's headquarters while guessing the configuration file items...
Also, Amplify has a UI that is completely different and more modern than other AWS services. The discrepancy is still a minor issue; the main problem is that if you create a project using the (slightly outdated) Amplify CLI, and then try to configure the back-end services like Cognito it deployed on the webpage, you'll enter an old interface. That is, once you click in, you see a slightly ugly but familiar interface, yet it feels completely disconnected from the previous Amplify interface...
So now I understand why I hadn't heard of Amplify before—it's really hard to use. Complete integration is indeed an advantage, but even being born with a silver spoon doesn't excuse Amplify's messiness, simply throwing everything together and telling users "it just works." Users look at it, wondering what on earth all these things are, and then you hand them a manual that looks fancy but has zero information. Users, flipping through this tome with no useful information, can only throw this pile of stuff into the historical junk heap behind them in frustration.
Some rights reserved
Except where otherwise noted, content on this page is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
ENJOY
READ THIS
Short story: "The Cry of Accelerated Demise" in a place accelerating towards demise.
I don't know if it's still the case but a few years ago all major cloud providers were easily giving away thousands of dollars in cloud credits. I expect them to stop this soon since smaller cloud players build on top of them and offer better dx and startups prefer to work with these smaller companies despite free credits from larger players.
As a startup boy we are happily chewing through hundreds of thousands in GPU credits across all major cloud platforms + lambda labs.
And once those credits run out we are planning to expand our owned training hardware. Currently we just have 3x L40S but would expand to 32x L40S. I’m excited to now be a sys admin in addition to a full stack web dev.
Largely enterprise spend. Startups are a different market segment. They initially have small budgets, and eventually fail or grow large enough to move to different products.
It manifested the way that the Aurora instances would use up their available (meagre) memory, then start thrashing, taking everything down. Apparently the instances did not have access to any temporary local storage. There was no way to fix that, and it took some time to understand. After having read all the little material I could find on Aurora, my personal conclusion is that Aurora is perhaps best thought of as a big hack. I think it's likely there are more gotchas like that.
We moved the database back to a simple VM on SSD, and Postgres handled everything just fine.