Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL upgrades are hard (scherbaum.la)
237 points by omnibrain on June 25, 2022 | hide | past | favorite | 120 comments



I upgraded my development database from 12 to 15-beta this week. It was super painful, even with pg_upgrade, I cant understand why newer Postgres release can include compatibility for older storage formats. This is the number 1 weakness in Postgresql. I prefer to not upgrade major version for years because its so painful.


That seems strange, the only incompatibilities between major versions are generally only the catalogs.

I don't remember any version whose storage format is cannot be read by the next versions (none since 2010). Could you add some more info what issues you had?


In not the person you've responded to but I've never been able to get `pg_dump -Fc` and restore to work across versions. It always errors out for me, saying it's an invalid/broken backup. Super annoying because now the workflow is 1. Start old version just to restore, remove old version and start new version for upgrade.


You need to use pg_dump from the new version, not from the old version.


I think you found a common stumbling block. The tutorial people follow may say to run pg_dump and many people would assume you're running it from the old version.


Wait, what? Explain please.

OK upon further thought, what you're saying is to use the pg_dump tool that comes with the pg version you are upgrading to.

I read it as "use pg_dump instead of pg_restore"...


Yes. Put concisely: Use the client tools from the newest version involved for all steps.


Isn't that impractical for most Linux distros? You'd need to manually install the new client.


On Debian (and probably most of its derivatives), the package manager will not remove any running PostgreSQL versions and will let you have multiple versions side by side. It also has its own tool pg_upgradecluster, which can perform the upgrade using pg_dump+pg_restore or pg_upgrade (optionally with link or clone options).


Probably? Certainly if you rely on distro packaging for it, yes.

I was just trying to simplify the guidance here. :)


Yes, and usually, it means you need to use the full binary path unfortunately (because debian wrappers embedded in the package select the version from your 5432 cluster).


I've used dumps as backups and never experienced this issue, although they would stop prematurely from time to time. The solution was just to check the string at the end and repeat the dump if it wasn't there.


Considering that -Fc is a binary format, how would i check a string at the end of it?


If you do an SQL dump, it will end with "-- PostgreSQL database dump complete".

If you use -Fc, you can make an SQL file from the dump using pg_restore like this:

  pg_restore -f db.sql db.dmp
Or straight to stdout like this:

  pg_restore -f - db.dmp | grep "database dump complete"
Although I guess if it's corrupted pg_restore won't work.

It is really fast, but might not be practical if your database is too big.


My issue was indeed the mishmash of versions, as the restore succeeded with the mentioned steps (installing old version, restoring, upgrading).

nonetheless, interesting way to validate a backup, though it would probably take too long on most dumps


I'm curious as to what could have gone wrong. A few weeks ago I upgraded from 9 to 14 using pg_upgrade and had no issue whatsoever.


I think the issue is if you want zero-downtime (or close to zero downtime). pg_upgrade can take upwards of 10 minutes to run even for relatively small databases. It can potentially be hours for large ones.


If you use —-link it uses the existing data files instead of copying, which is significantly faster (though I’m not sure what your definition of relatively small is).


Doesn't this entail going 9.4-10, 10-11, 11-12, 12-13 and 13-14? (and possibly across various 9.x versions)

That would be enough to qualify as "super painful" for me


No. To be precise, it was 9.6.6 or 9.5.3 (I don't recall which one exactly because both were installed) to 14.0. And then I upgraded to 14.4, but that only required a compilation.


The upgrade path using logical replication is critical for minimizing downtime. The others should not be considered for production workloads.

As far as RDS goes it amazes me that, even with replicas, there are NO options for doing a zero downtime upgrade which is a major gotcha.

By this point they should buffer the writes during the critical section when the switchover happens and make this seamless for users and operators.


It's basically impossible to do a zero-downtime upgrade with an RDS. Systems that have zero-downtime upgrade "solve" this problem by not being relational (because it pushes the fault-tolerance to the query developer when they have to build their own relations across systems that are defined by API contract to not always be available).


> "solve this problem by not being relational"

That has nothing to do with this. All databases are fundamentally the same regardless of data model. The standard process is to start a new version, replicate the existing database, then cutover once the replica is caught up. Advanced systems can seamlessly switch the primary so it'll redirect new queries to the new primary upon that cutoff.

This has been done for decades through external tooling across many systems, including Postgres. However it would be easier if the database included this functionality internally.


Hypothetically it should be possible to make an entirely new RDS cluster as a replica at a new version and fail over to it, with a similar error rate to a normal replica failover.

Setting up the infrastructure to manually manage your own cluster failover would kinda go against the spirit of using RDS and letting AWS manage infrastructure for you, though.


Yes, its an obvious gap in their offering. I assume they wrap pg_upgrade in their RDS upgrade process, but of course it is in-place and requires downtime. Their multi-AZ replication is a high-availability solution but the primary and secondary must both be the same version of PostgreSQL, useless for major upgrades. For our upgrade 9.6 -> 13.4, we had the added complication that we used PostGIS which added a kink to the upgrade path. A reliable, caveat-free, simple AWS-provided logical replication solution for zero DT cluster upgrade was sorely missed and the downtime on the recommended path was painful.


As a counter example there's COMDB2 which has zero downtime upgrades, because the client (library) maintains the transaction state and can replay it when the server comes back online.


Yeah there isn't much publically available. We wrote a tool for this at Instacart, it relies on some internal tooling but much of it is generalizable (assuming you run pgbouncers). This is a good reminder for us to write a blog post or something.


Right now as we speak, our DBAs are applying patches to our SQL Servers. SQL Always On seems to solve the problem pretty well. Granted, this is a minor upgrade, but even so, we've not had any database downtime for any of the updates we've applied in the last year+ since we got this set up.


Some time ago I wanted to update my self-hosted PostgreSQL containers (for stuff like Nextcloud, and Gitea) and did not find a good solution to running pg_upgrade in such an environment. To that end I created a little container [0] that tries to upgrade the database from one version to another. Maybe this is useful to some of you :)

[0] https://github.com/alexandrospanagiotidis/postgres-upgrader


What's the point of running a database in a container?

There are well tested distro packages for databases which get security updates automatically and none of the weirdness of container environments.

Some people need to realize using the buzzword of the moment isn't necessary where it isn't.


I wanted <feature> (now lost to the mists of time), but was on Ubuntu 18.04, which only has postgresql-10. So I installed Postgres (13 or 14, I believe) from a PPA and happily used it.

When I finally went to upgrade Ubuntu, do-release-upgrade literally bailed out because Postgres was installed[1]. So my choices were: uninstall Postgres for the upgrade, or do something like containerize it. I decided I had had enough problems with non-standard repos and Ubuntu upgrades (e.g. a ton of pain with upgrades of Unifi Contoller + MongoDB + Ubuntu), and containerized it. In comparison to my upgrade to 20.04, my upgrade to 22.04 was much cleaner.

If you're going to tell me I should have used VMs or some hosted services, yes, thank you, I don't have the hardware to do so at home and I didn't want to spend more money on the cloud.

There are use cases for containers that have nothing to do with chasing buzzwords.

[1] Not my mail, but representative: https://www.postgresql.org/message-id/28768e4f-f6f7-ad78-d0d...


It's easier to run multiple clusters, to do testing, development and more elaborate networking.

Some people need to realize that distro packages aren't good in all use-cases.


Unbundling OS updates from service updates is a huge benefit. No one wants to have to juggle updating the OS and migrating databases at the same time.


For one thing, the isolation. I also run a “home lab”, and started off as a newbie running everything on bare metal. Quickly ran into dependency issues, and recreating the setup was painful, even if I had used something like a ansible

Nowadays my setup is a very minimal server install, setup “raw” services like SSH, basic monitoring, mDNS. Then install Docker and “up” all my compose yamls, pointing to persistent directories on disk

It’s easy enough that migrating from Centos Stream to Ubuntu server took 24hrs for the meat of it


> Some people need to realize using the buzzword of the moment isn't necessary where it isn't.

Please edit swipes out of your HN posts. Your comment would be fine without that last sentence.

https://news.ycombinator.com/newsguidelines.html


Consistency with dev and prod and getting scheduling and management for free with Nomad. I’ve been running a 2 TB Postgres database that has an average of 2000 QPS+ in a container for many years now with no issues, what kind of weirdness are you referring to?


Postgresql upgrades are a piece of cake. pg_upgrade just works. Or you could just dump and load the database (maybe even pipe it from one to the next). Minor upgrades are even simpler: forgot to compile a module? make and move the. so to lib/.

Oracle upgrades are the real nightmare.


> Or you could just dump and load the database

This is sometimes not so easy if the database is huge and you can't have a downtime. The article is also mostly focused on things related to running a replicated setup which makes things a lot harder than pg_upgrade that you "just" run on your production database.

PG upgrades work well and the pg_upgrade tool works well but it's not just something you run on a Friday evening if it's bigger than a side project with 2 users.


If you "can't" have any downtime at all then no matter if it's Postgres of MariaDB, it's going to require a replicated setup. You can't ry on a single instance.

It's not very fair to say "upgrades on Postgres are hard" if that's practically always true in your use-case, that's all.


> It's not very fair to say "upgrades on Postgres are hard" if that's practically always true in your use-case, that's all.

That is just nonsense. If it's hard then it's hard. It's not easier because it's practically always hard.

What's next, it's not fair to say that flying to the moon is hard because flying to LEO is tricky? Oh my.


I guess the point they're trying to make is that phrasing as "Postgres upgrade is hard" is not fair because it gives the impression it's a weakness in Postgress, but the use case would be hard in any database.

It would be more fair to say "Zero-downtime database upgrade is hard"


I wouldn’t say that MySQL is a better database but upgrades with it are very easy: you backup and then you upgrade the packages and the new packages run the upgrade command.


Yea, exactly as on PG.

What about replicated and zero-down time (the current thread context)? That's also hard with MySQL.


It's actually not too difficult in MySQL, in terms of the mechanical steps required. MySQL's built-in replication has always been logical replication, and they've made major ease-of-use improvements in recent years, for example:

* efficient binary data clone is a single command

* starting replication with GTID positioning is a single command

* data dictionary / metadata upgrades happen automatically when you start a newer-version mysqld

The hard part of major-version upgrades in MySQL is testing your application and workload, to ensure no deprecations/removals affect you, and checking for queries with performance regressions. Tools like Percona's pt-upgrade and ProxySQL's mirroring feature help a lot with this.


haven't used mysql in a couple of years but its replication (all methods) used to have a whole page of gotchas and idiosyncrasies with various corner cases.

They also introduced binary file format changes even with minor and patchlevel version number changes and downgrading stopped being supported. afaik in that case had to restore from backup.

it's just the exact opposite of postgres' upgrade guarantees.


It's hard to respond without any specifics, but in my personal view it's a pretty solid replication implementation in modern versions. Sure, there are some gotchas, but they're not common.

Realistically, every relational database has corner cases in replication. There are a lot of implementation trade-offs in logical vs physical, async vs sync, single storage engine vs pluggable, etc. Replication is inherently complex. If the corner cases are well-documented, that's a good thing.

I do totally agree the lack of downgrade support in MySQL 8 is problematic.

Postgres is a really great database, don't get me wrong. But no software is perfect, ever. Consider the silent concurrent index corruption bug in pg 14.0-14.3 for example. If something like that ever happened in MySQL, I believe the comments about it here would be much more judgemental!


I thought mysql could do replication across a major version? If so, that's big differentiator



Meh. That's an argumentation which is just a waste of time. This is a post about Postgres. Thinking about whether or not the title is "fair" towards the software itself is ridiculous and a waste of everyone's time. The title is correct and the software won't feel bad about it. Move on.


Not that I want to defend Oracle but I've been through the process of installing a new version, starting it up whilst loading the same database files currently being served by an older patch level, and seeing it fail over to the new version without dropping in flight queries. They've always done pretty well in this space in my view.

Of course, you've still got to obtain a patch. I usually see the licensing in the name of some senior manager who doesn't know how to use a keyboard and who is also the only person allowed to download patches.


You have to do that because Oracle has a nasty habit of changing what’s in the download and then changing the licensing scheme.

I’ve been burned by this before, and gone as far as to require a 3rd party audit for Oracle installations on a quarterly basis.


Are you referring to updating one node at a time of a RAC cluster or is that some other method?


Yes that is what I'm referring to. But note Oracle has the concept of a single node cluster, so you can have a completely standalone deployment and still perform this technique hot adding a new "node" when it's the same server.


They're hard if you can't tolerate downtime, which most production users can't.

While that's true for a lot of database systems, some more recently designed are much better. [1]

[1]: https://www.cockroachlabs.com/docs/stable/upgrade-cockroach-...


My experience is the opposite. Most productions users can tolerate downtime.

It's usually the company leadership that can't tolerate it to have downtime.

And it's fact, they are hit an unplanned with downtime of one service or the other every month or so because of an outage. They are used to it.

So if you plan for it, explain it, and limit the scope and time of it, it usually goes very well unless you are a fortune 500, a hospital or something alike.


I think you really underestimate the amount of dependencies (documented and otherwise) that exist in even medium sized company.

I once caused a production outage in a retail company that caused all cash registers to stop working. The team that worked on that had pushed in a last minute change and didn't test if it handed going offline gracefully.

Right now I'm on call for an identity provider used by people in various timezones, including logistics workers in places that operate 24/7. Even when we do weekend upgrades, we still cause quite a bit of collateral damage. 10 minutes of time, multiply by the number of employees affected. It adds up fast.


I’d say a system not designed to support maintenance is not properly architected. You will need maintenance windows for an individual subsystem to perform OS upgrades, DB upgrades are no different and Postgres upgrades using pg_upgrade are relatively painless, provided you test them first in CI or against production snapshots, something the author’s company seemingly lacks.


"a system not designed to support maintenance is not properly architected"

Indeed, but who cares about the system design anymore? How many companies/ teams can claim honestly they even had a person with proper DBA competency, while features over features were added in sprints doing the minimum required to get the feature shipped out at the soonest possible (usually one DB schema change with feature and then one or more to add index due to performance regressions)? DBA competency is only sought when DB schema has fubar'd to an extent that frequent outages are norms or the version used is EOL'd by a few months at least. And by that time the people who "designed" the system are gone, not having documented ever why a given decision was made.

And that is how my friend...

DB upgrades are hard.


Indeed you can get away with a lot now but just paying (a lot) more money; it feels like design is no longer needed as it works. It is how I make some money; people come to me with; ‘we run some trivial online shop made by our team and with 100k uniques a month we pay $15k+ for rds, is that normal?’. So I go in and fix it. Usually it is bad or no design of the db schemas which was countered by picking heavy rds instances. Fun times considering I expect this to be the case in a very high % of all rds deployments (all that I have seen so far, so for me it’s 100%), not only the ones that asked me for help. When asked, the story is the usual ‘people are more expensive than hardware’ blah. It usually takes me around $1000 in fees to cut half the costs so that is blatantly false. Not to mention that the human costs are one off; if someone pays me 20k to bring 15k/mo to 1k/mo (which is typical), it is worth it. Unfortunately that’s not the break everything and go bankrupt way of working I guess! Still I notice that in current harder Financial Times, I do get more requests.


How happy are you when your global bank has downtime in middle of the day?


You wouldn't schedule downtime for the middle of the day. You schedule downtime during your period of least usage.


You design the complete system so it does not have dependencies on a single component, that way each subsystem can have proper maintenance schedules. It takes a lot more up-front work, along with defining what levels of service degradation are acceptable during a window, but that's the difference between a professionally run service and one run by amateurs. Look up Google's SRE methodology and their concept of error budget.


You mentioned hospitals yourself. There are also alarm management solutions for alarm receiving centres, and security and emergency services. Can't really have downtime there too. Of course there are less busy hours, but an alarm can come at any moment and needs to be handled right then.


Blue green deployment is, while it requires a professional, is not particularly complex or unusual.

Postgres is a tool. It can do many things, but the tool cannot run your entire business.


"Postres is a tool. It can do many things, but the tool cannot run your entire business."

this is something majority of the decision makers pretend to don't understand (and sometimes really don't).


zero-downtime-postgres-migration-done-right:

https://engineering.theblueground.com/blog/zero-downtime-pos...


Would you consider that a piece of cake? I have doubts. The page you linked to describes that if you follow the instructions on many other blogs you'll have data loss. Not a symptom of a piece of cake process


If you need zero downtime, you already in a field where NOTHING is a piece of cake. Not your network, not your computing, not your storage, and i haven't even talked about the human aspect of "zero downtime".


Alright. But the GP comment claimed upgrading PG was a piece of cake. You claim it is not a piece of cake. So it sounds like you agree that the claim that upgrading PG was a piece of cake was misleading.


> the GP comment claimed upgrading PG was a piece of cake. You claim it is not a piece of cake. So it sounds like you agree that the claim that upgrading PG was a piece of cake was misleading.

GP claimed upgrading was a piece of cake, not that zero downtime upgrades are a piece of cake. The two claims aren’t interchangeable. The simple upgrade path is always available, though it may have downtime consequences you personally are unwilling to accept. And the complex upgrade path is complex for reasons that have nothing to do with PostgreSQL - it’s just as complex to do a zero downtime upgrade in any data store, because in all cases it requires logical replication.

So if anything it feels like you’re the one being misleading by acting as though GP made a more specific claim than they actually did, and insisting that the hard case is hard because of PG instead of difficulty that’s inherent to the zero downtime requirement.


So if I tell you that upgrading pretty much all databases is a piece of cake but not include the criteria "unless you want to keep your data" you would say that is a fair statement?

If you claim that process X is trivial one has to make some assumptions, right? Otherwise I could claim that going to the moon is trivial but leave out "assuming you have a rocket, resources, people and anything else you may require".

Claiming that something is a piece of cake as a broad statement without any details is meaningless at best.


> So if I tell you that upgrading pretty much all databases is a piece of cake but not include the criteria "unless you want to keep your data" you would say that is a fair statement?

Incredibly bad-faith comparison, this.

Many, many datastore deployments can tolerate 10 minutes of downtime every 4 or 5 years when their PG install finally transitions out of support. Data loss isn’t even in the same universe of problem. It’s reasonable to talk about how easy it is to upgrade if you can tolerate a tiny bit of downtime every few years, since most people can. It’s utterly asinine to compare that to data deletion.


CockroachDB and others come with huge downsides in performance and features. I keep trying to see of these databases are a good fit for me but just the sheer time it takes to import my database makes me shudder. I've used the Aws cloud hosted free tier of CockroachDB but ran into transaction timeouts on very moderate loads. There is a reason these databases aren't seeing massive uptake despite their lofty promises.


Yeah CockroachDB suffers from the same problem as PG: the defaults kinda suck for certain workloads. CockroachDB has an additional problem that their documentation isn't as detailed as PG so it can be hard to know what to tune. TiDB and Spanner defaults are much better for a wider range of workloads.


Can you say more about workloads which suck for either of the two databases at their default settings?


Yeah, super easy. Let me just shut down my database server for 30 hours so I can do a dump and load.


Have you looked into `pg_upgrade —link`?


I know it, and it's the way to go after making backups. It's not that hard, but not trivial either. You have to install the old & new postgres versions alongside each other, shut the database down, etc. I've done it many times, it's not hard, but I wouldn't call it a "piece of cake". And it does require some downtime.

Anyway, my comment was simply responding to the parent's:

> Or you could just dump and load the database (maybe even pipe it from one to the next).


So postgresql upgrades are a piece of cake if your users doesn't care if your system is available to them. Is this some paradoy of the older claim that MongoDb is a great system if your user doesn't care if their data is lost?


Do you actually have such an uptime requirement or do you think you have such an uptime requirement? How are you conducting it with other databases?


Yes, if the software I work on doesn't work for 5 minutes we have a ton of tickets from customer. We have tens of thousands of customers who pay for it. Not being able to shut down your system for an hour isn't exactly a unique requirement. Technically our SLA is 10 minutes to perform jobs but our customers don't wait that long before creating tickets.

We pay Microsoft to perform upgrades transparently for us. They have multiple servers and they shuffle transaction logs and traffic between them to ensure availability during outages and upgrades. There are 6 copies of each database in different regions. Not sure how that is relevant, though?


It is relevant, reread and understand the parent comment. Get a managed pg and you'll have the same thing.


That's the point.

It's not a weakness in Postgres.

Managing upgrades in a highly available, close to 100% uptime database is hard.

If you want piece of cake, outsource this service and be happy enjoying the database features working as you please.


Yes, that's what I was saying :)


I believe you are the one who need to reread the thread. The person I replied to claimed that "Postgresql upgrades are a piece of cake."

But it is not. It's complex to do properly, just like with most if not all other databases. Claiming that it is easy as is just ignorant and spreading such misinformation is bad.


I never claimed it's easy, I just said get a managed pg, offload it to someone else and then it's easy yeah. I'm wondering the same as other commentator above me, what's the difference in other unmanaged dbs?


even four 9's give 50 minutes a year to do maintenance work.

i am a fan of planning for maintenance. planned downtime is definitely accepted by most of the population. i mean what you gonna do?

much better a planned downtime than an outage. leaving a system to rot just because the architecture is "don't touch it while it's working" is a sure recipe for a disaster of some kind.


> even four 9's give 50 minutes a year to do maintenance work.

Just the security patching of the host OS will likely consume a bunch of those minutes.

Not sure what point you were trying to make apart from that. I am not advocating that people should leave system to rot.


Upgrades are the biggest pain of Postgres in my opinion. Especially when you want to do zero-downtime upgrades.


I see quite a few opinions here detailing the pain of PgSQL upgrades. From those in the know, if you've time to bestow: how does it compare with the main rival relational db engine, MySQL, or its offshoot MariaDB? (I know I could ddg but a few anecdotes always sweeten the deal)

I'm looking to select a relational DB for a web application, most of my experience is MS-SQL but I'm looking to go open source/Linux this time.


I don't know about how postgres compares to others, but if you're looking for an easy-to-manage relational DB, there are a handful of new cloud-native databases that have come out in the last few years. The ones I know of are Planetscale and Cockroach DB.

https://planetscale.com/

https://www.cockroachlabs.com/


CockroachDB has so many changes in each version that updates are bound to break something. PostgreSQL is much more stable in that regard.


> A good example is the recent release of PostgreSQL version 14.4. It is a minor upgrade, so no new features. But it requires extensive work on B-tree indexes.

I find it wrong to call out B-tree indexes -- any index that is (re)indexed concurrently in PG14 <14.4 might have had corruption. Just because PG14 only has a way to detect it for b-trees does not mean it doesn't exist in other index methods.

The author also seems to confuse upgrade complexity with the complexity of verification and decorruption tasks (that you should do if you have run concurrent (re)index in your cluster in PG 14.0-14.3). The thing is that you are not at all required to do that during the upgrade, because you might not run (re)index concurrently, or because fixing the corruption is also possible when still on <14.4, by using non-concurrent reindex, or by doing the manual snapshot control that I detailed in [1].

[1] https://news.ycombinator.com/item?id=31688332


The concurrent reindex operation only impacted breed indexes.

There's no known impact to brin or gin/gist indexes as these haven't seen major changes in recent versions.


The corruption issue is with concurrent (re)index operations, which can be done for all index types. That is, the corruption is best described as 'some table rows may skipped when scanning the table during reindex, so the index can be incomplete', and unrelated to index methods, as any index can be reindexed concurrently.

btree is only the default index type, but otherwise unrelated, with the sole exception being that it is easy to verify whether it's corrupted - but that doesn't mean that other index methods are not affected.

Absence of evidence is not evidence of absence.


Brin was implemented in a lazy fashion already. So concurrent reindexing isn't a thing


BRIN lazily summarizes blocks _at the end of the table_. When you build the index (by either creating or reindexing, concurrently or not) it will generate summaries for all full block ranges of the table. As such, it is possible that even a BRIN index was corrupted by concurrent reindex in PG14 < 14.4 because it didn't receive all tuples that it should have summarized.


I'm currently a casual user, I use Postgres for personal docker services, and I wish it would auto upgrade itself when there is a new release, like how MariaDB does it.


Same. Of all the containerized apps that I run, Postgres is probably the most painful to upgrade. Everyone else (eg Nextcloud) has invested the time to figure out how to auto-trigger an upgrade when starting a container with a new version. Why can't Postgres do the same???


Ask the author of whatever container you're using?


Its the official one, Postgres has an official container image.


https://www.postgresql.org/ where it's referenced from?


This is why I love SQLite. Upgrades are not an issue an all. And backup in the event of upgrades are just copy-paste.


Postgres is great, but frankly this is where Amazon RDS shines for us. Outsourcing the pain of maintenance, upgrades, mirroring ...


Major version upgrades on RDS are even more of a pain because you can't reliably plan the downtime you'll need for them to finish. You push a button and then you pray that it doesn't take much longer than your test runs with restored snapshots...

And then you still have to upgrade extensions and run analyze manually.


And realistically.. they always take longer because your production database storage is fragmented from use


I test on dev/stg and multiply the time by four.


Aurora still can't do zero downtime upgrades though, which surprised me. You get closer with vanilla RDS PostgreSQL. And if you use RDS proxy or pgbouncer it's better.

At a previous company I had to tend to a large here of MySQL clusters and we did all sorts of zero-downtime stuff there. I've found MySQL replication far easier to work with.


It still doesn’t do zero downtime upgrades though, does it?


I hate PostgreSQL. It's overly arcane and feels like something from the 1980s or early 1990s. Everything is clunky and manual and "mainframey." (Is that a word?) Unfortunately there's not a lot that's that much better. The commercial options are not even much better really. Maybe a little more polished.

CockroachDB is the biggest one I've been watching but unfortunately it still seems to have some limitations around performance with complex queries that rule it out for a lot of things.


> feels like something from the 1980s or early 1990s

Initial release was in 1996.


... but that was only the "Postgres" release. The project now known as postgres was started in 1985 or so, with early demos and releases in the 1980s.

And it probably traces actual code back to Ingres in 1982.

https://en.wikipedia.org/wiki/PostgreSQL


I wonder what do people think of Kubernetes operators for Postgres? They promise painless 0-downtime upgrades.


I don't keep up to date on all the k8s changes but a couple years ago wasnt the recommendation not to have your databases in k8s at all?

What has changed between now and then where putting postgres into k8s is an option?

Does the Postgres operator handle all the statefullness requirements and scheduling things that might be unique to pg?


StatefulSets were added to handle things like this. It always seemed like an afterthought to me, appeasing people trying to run things that Kubernetes was never designed to handle.

At $BIG_CORP our executives became very excited about "The Cloud" and many teams were forced to migrate their applications over. Didn't matter if it was a proper 12 factor application or a monolithic vendor JAR that required 64GB of ram. You were punished for using VMs, so teams spent an inordinate amount of time shoving square pegs into round holes.


Personally I tried, but decided to have the production Postgres in dedicated VM. This allows to give more attention, more resources and less risks.

For smaller setups and testing no problem in K8s.


Any chance you have enough experience to suggest a good option? This is something I guess I'm going to have to deal with soon


https://www.kubegres.io/ has worked well for me. You can use custom images, provided they follow the conventions of the official postgres image. I haven't tried a major-version upgrade yet - my impression is that it will probably be a bit painful and involve some downtime and manual ops.

Some of the more advanced operators might better handle the upgrade problem? But in general, I've been happy with Kubegres for a ~10GB database with low traffic.


I used this one recently and it worked pretty well: https://github.com/zalando/postgres-operator


I did some research and most popular ones seem to be zolando and crunchy. I have zero experience, but going to deploy it for test environment to see how it goes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: