Because the actual process of upgrading Postgres is terrible.
I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)
It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.
Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.
MS SQL Server does better on both of these fronts: Upgrades, when running MS SQL Server in a container, is rebooting the database with the new database engine version. The DBMS sees the old files, upgrades them in place and starts the DB completely seamlessly 100% of the time.
The MS SQL Server protocol, TDS, supports native named parameters, executes the T-SQL PL inline with SQL. TDS also supports inline query cancellations because the protocol is framed, and both T-SQL and TDS supports returning multiple result sets from a single query.
But yes, whenever I see PG docs saying, major versions require an export and re-load, I shudder.
As someone who has a decade of professional postgres experience (and built my company on it), and 6 years of SqlServer experience, I couldn't agree more with your comment. I really, really like the TDS wire protocol and what it enables.
I also really miss their query optimizer. It was just so damn smart.
I've only done it once, but doing an "rpm -Uvh" on the SQL Server binaries in Linux, then starting up the database did an immediate upgrade of the database files.
This has been solved by logical replication a few versions ago.
1. Leave your old database running
2. Start your new database with new version
3. Run logical replication and let it catch up
4. Decide you want to switch over.
5. Using a script: Stop your apps writing to the database (or switch to read only mode), let logical replication catch up one final time, stop logical replication, run a procedure to fix your sequences, and switch your apps over to the new database.
You are now done. Sure, it requires a one-time investment to work out the exact steps needed for your situation, but it's easy to do with minimal downtime.
Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
> Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
I think this is the right answer for 99% of companies. Yes, everyone wants zero downtime, but I see incredibly few cases where it is actually worth the cost and effort.
This has not been solved by logical replication. There are features in PostgreSQL not supported by logical replication. One of the issues I'm dealing with is our product switched from Oracle to PostgreSQL and the ORM stuff love LOBs, and you know LOBs can't be logical replicated.
Financial incentives likely, because PG is developed by companies that make money via providing support. Doesn't have to be an insidious plot but just why work on something you know well and that makes you money. MSSQL wants people to pay to upgrade, it behooves them to make it seemless.
I have always compiled from source so that I can have more than one version in place. The installation takes less than 1GB and is quite easy and fast to compile and I've never had any issues with upgrades. If you forget to compile an extension, you can do it later, copy the .so to the right folder, and you don't even have to restart the cluster.
If you don't want to use pg_upgrade, you can dump from one cluster and pipe directly into another, with no need for a temporary file. It couldn't be easier.
Can't say the same for Oracle. Even applying a patch can ruin several weekends.
MySQL, for all its faults, actually handles this quite well. Generally speaking, assuming there aren't any major backwards-incompatibilities (which there haven't often been in my experience), you can start up the old server in place of the new server and you're good to go. Some table-rebuilding can be done in-place if you need to do things like recreate indices, but _generally_ you're good to go from there.
For more complex deployments, cases of backwards-incompatibilities in the data structure, or where I had reason to rebuild many large tables, you set up a replication secondary, do pre-upgrade migrations (e.g. in the case of deprecated functionality), upgrade the secondary, do post-upgrade migrations (e.g. in the case of new functionality), and wait for replication to catch up 100%. Then, after sending traffic to it for a bit to ensure it works correctly, you swap the primary and secondary and do it again.
Yeah, but what is your dataset is partitioned horizontally? It’s very smooth until it’s not. For example, if you are using the postGIS extensions and something changes with the new version that could be a showstopper
We have customers with not quite TB levels, but at least several hundreds GB databases.
Upgrading the database server is as simple as shutting down the service, installing new executables and restarting the service. Downtime is measured in minutes.
When upgrading major versions, one might not get full functionality of the new features unless one upgrades the on-disk format (unload-reload), but you can run the new server version with older on-disk format just fine.
I worked at a place that was running a 8 year old install of Oracle 8i because upgrading was just too painful. Oracle 10g was out by that point. It's been over 15 years since... I wonder if they ever upgraded?
This!
I already mentioned this a while back and was basically berated (by someone that seemed like a dev/close to dev) that current setup is just fine and because of postgress complexity (extensions) it has to be done that way... and while I like postgress a lot it's quite annoying that the upgrade is such a terrible experience... :|
Very true. At a prior company, we rolled out a simpler and easier to use admin console for our enterprise product and all the customer admins hated it.
Because now it was easy to use hah.
We ended up rolling it back because the hate was so consistent.
(And no, this wasn’t us guessing or making this up - a number of them were quite explicit that we were threatening their roles, and to stop it.)
I'll confess - I have a project that uses Heroku's managed Postgres and my preferred upgrade method is to set the maintenance window to the middle of the night, create a backup, and be awake at 1am to make sure that nothing is broken after they force the upgrade. Their auto-upgrade process hasn't failed me so far, but there's no way to manually trigger it.
> Postgres is software you want to be managed by your distro.
Why?
For production systems I generally try to avoid using anything not built internally. When it comes to PostgreSQL, for example, if you want to submit benchmarks, or you want to report bugs, an important or sometimes a mandatory component is to provide compilation flags. Also, seeing how every kind of storage can act in a different way, tuning of higher-level storage programs seems inevitable. Even though PostgreSQL has a lot of configurations that can be done to the compiled program, some (eg. block size) are compile-time that can only be changed by compiling the program.
Depends on your scale. If you're a startup or even just a small side service, performance isn't going to be a bottleneck and you/the org wants the thing to be fire and forget, including bug fixes and especially security patches. A distro takes care of all of those generally and makes sure the dependencies have the same care taken.
This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
> This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
At least on Ubuntu (and likely Debian), your existing Postgres version is never dropped until you are ready to manually upgrade yourself.
It's very rare that the client protocol changes dramatically and in an incompatible way, so typically nothing changes and no one notices.
I've seen a few rare cases where an update to the client library is required for improvements to the protocol, e.g. when MySQL switched to the new password/authentication format, but that was also backwards-compatible: the old format was used until an account's password was changed, password changes could be told to use the legacy format instead, and you could set the legacy format by default. The only issue that occurred (and it did occur) was old client libraries + new server password hashes, but, again, in this case it didn't affect existing installations/users/passwords.
Upgrading OS in production environment... this reminds me of a joke about police academy intake test:
The participants were offered a board with a round and a square hole and two pegs of similar shape. After the test the new recruits were sorted into two groups: very smart and very strong
So, yeah, you ought to be either very smart or very strong to upgrade OS in production environment, and then also discover changes to your database as you go along.
That's not to say that upgrades don't happen at all... but you'd be upgrading in a testing environment ten times before you try that "for real", and at that point you would have probably solved the question of whether you need to (re)install the database and how :)
The idea here is more that you need to adjust these flags for the benchmark to make sense.
Take fore example the aforementioned block size. Say, you have a storage with 4x block size of the one used by PostgreSQL. If you run a benchmark over such a storage, you'll have insane write amplification. Nobody will consider results of such a "benchmark" because that's simply a misconfiguration of the program you are trying to measure.
More generally, Linux distros will tend to compile distributed binaries with "safe" defaults s.t. run on most h/w users can have, and this means optimizing for the lowest common denominator. Looping back to PostgreSQL, the default for block size was for a long time 4k, and iirc today it's 8k. This is fine, if we are talking about plain SSD / HDD, but with enterprise SDS, these are "rookie numbers", even the maximum supported by PosgreSQL (32k) is still a "rookie number", but it's still four times better than the default!
And considering how many people use some sort of abstraction layer like an ORM, have databases installed as part of some deployed application, or just do basic CRUD stuff, it probably would have worked on a postgres install from the 90s.
I keep an eye out for vulnerabilities, and that's about it.
My opinion is Postgres was designed by software developers for software developers. The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set, and the sysadmins having to support maintenance and production availability preferring MySQL.
One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.
This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.
Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.
The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.
That has not been my experience at all. The sysadmins I have worked with have strongly preferred PostgreSQL over running MySQL while the developers have cared, but less so. The reason is that when something goes wrong PostgreSQL is much easier to diagnose. So while some tasks like upgrades take less manual effort with MySQL PostgreSQL is more stable and gives better error messages for when you get paged in the middle of the night.
PostgreSQL has prioritized correctness and stability which while it has made certain features more clunky to use or taken longer time for them to be implemented, when you get paged n the middle of the night you get the time you spent back. PostgreSQL also has spent a lot of time on improving DBA experience so mixed DBA/sysadmins also usually prefer PG.
this simply can’t be true. there are endless stories of people moving from postgres to mysql because of reliability and operational issues. postgres has only had a mature replication solution for 7 years. mysql had it in 1999.
> The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set
I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.
> One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.
Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
> none of them could articulate what part of its feature set they actually needed to use.
Transactional DDL: migration errors never leave the database in an intermediate/inconsistent state.
Range types + exclusion constraint: just no way to do this in MySQL without introducing a race condition.
Writeable CTEs: creating insert/update/delete pipelines over multiple tables deterministically. Seriously though, the RETURNING clause is something I use all the time both in and out of CTEs.
Filtered aggregates and grouping sets: cleanly get multiple data points for a dashboard in one shot.
Unnest: converting arrays into a set of rows. Inverse of array_agg(...).
Types: arrays, booleans, IP/subnets, UUIDs (without binary(16) hacks), etc.
Materialized views: seriously, how does MySQL not have this yet?
Statement-level triggers: another option from per-row.
Row-level security: setting data visibility based on configurable policies.
I can cite specific use cases I've deployed to production for each of these and more.
That is a well-thought out list, and you’re clearly aware of and take advantage of the DB’s capabilities. Seriously, congrats. Especially RETURNING – it’s always baffling to me why more people don’t use it (or its sad cousin in MySQL that lets you get the last inserted rowid if using an auto-increment).
Most devs I’ve worked with don’t know about aggregations beyond COUNT and GROUP BY, and do everything in the app. I’ve pointed these out before, and am always told, “we don’t want to have additional logic in the DB.” So you want a dumb bit box then, got it – why are you using an RDBMS?
> Transactional DDL
I know this is a thing, and I’ve used it in Postgres, but I’ve also never found myself in MySQL being upset that I didn’t have it. Everything should be thoroughly tested in staging before prod.
> RLS
Yes, amazing feature if you use it correctly.
I will give MySQL a couple of nods in its favor: well, three.
1. Clustering index. If you design your schema around this fact, range queries can be WAY faster. Dropbox does this, IIRC.
2. Generated Virtual Columns. Not sure why Postgres still doesn’t support this.
3. ON UPDATE CURRENT_TIMESTAMP. It’s so nice to have the DB automatically track update times for you, IMO.
I use transactional DDL all the time, even during development. It's nice to not have to fully reset your schema every time you test a migration file locally. With transactional DDL, you run the whole list, and if any fails, it rolls back to where you started. You look at the error, edit your migration, and try again. It really is a time saver. There is a peace of mind always knowing your schema is in a consistent state not unlike the peace that comes from a good set of unit tests.
1. Yep, I definitely miss clustering indexes in Postgres sometimes. I can sometimes fake it with covering indexes when all I want are an extra column or two along with the primary key or similar without seeking to the main table, but you're right about that MySQL/MariaDB win here.
2. The dynamic computed column is an easy workaround with immutable functions that take the record as a param.
CREATE TABLE foo ( a int, b int, c int );
CREATE FUNCTION d(entry foo) RETURNS int LANGUAGE sql IMMUTABLE AS $$
SELECT foo.a + foo.b + foo.c;
$$;
SELECT a, b, c, d(foo) FROM foo;
It's not part of the table schema when doing a SELECT *, but it is just as efficient as a computed column in MySQL/MariaDB and only slightly more verbose.
3. ON UPDATE CURRENT_TIMESTAMP works in Postgres with a trigger function, which you can reuse if all your tables use the same name for your "last_modified" column (probably a good idea anyway). Not as convenient as the declarative syntax, but it's a fairly trivial workaround.
CREATE OR REPLACE FUNCTION update_last_modified() RETURNS TRIGGER AS $$
BEGIN
NEW.last_modified = now();
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER foo_last_modified BEFORE UPDATE ON foo
FOR EACH ROW EXECUTE PROCEDURE update_last_modified();
CREATE TRIGGER bar_last_modified BEFORE UPDATE ON bar
FOR EACH ROW EXECUTE PROCEDURE update_last_modified();
One function, many triggers. You also get to choose between "when transaction started" (now() or CURRENT_TIMESTAMP), "when statement started" (statement_timestamp()), or "right now" (clock_timestamp()).
I don't mind workarounds so much as functionality that simply cannot be replicated. For example I miss real temporal table support in Postgres like what you can find in MariaDB or MS SQL Server. The painful kludges for missing PIVOT support like in MS SQL Server is another one.
You never know how much you need deferred foreign key constraints until you don't have them anymore. Or a materialized view.
> It's nice to not have to fully reset your schema every time you test a migration file locally
In terms of dev flow, this is only a problem with imperative migration systems. Declarative schema management tools solve it by being able to transition any live database state into the desired state, which is expressed by a repo of CREATE statements.
If something fails, you fix the bad CREATE and run the tool again, and it effectively picks up where it left off. And well-designed declarative tools catch many problems pre-flight anyway by running a suite of linters, running the emitted SQL statements in a sandbox first, etc.
If the tool's diff returns clean, you know your schema is in the right state on the DB.
Ironically, lack of transactional DDL actually makes declarative schema management more straightforward in MySQL/MariaDB: you can't mix DDL and DML there anyway, so it's more natural to handle schema changes vs data migrations using different tools/pipelines.
That's a good list [1]. A handful of these are already doable in modern MySQL and/or MariaDB though.
JSON can often be used in place of arrays, and JSON_TABLE in both MySQL and MariaDB converts JSON into tabular data. MySQL supports multi-valued indexes over JSON, where each row can have multiple index entries (or no entries, e.g. partial index).
MariaDB has built-in convenience types for ipv4, ipv6, and uuid. Or in MySQL you can just use virtual columns to add human-readable conversions of binary columns, although that is admittedly slightly annoying.
MariaDB supports RETURNING.
[1] Edit to add: I do mean that honestly, it's an accurate and insightful list of nice Postgres features, most of which aren't in MySQL or MariaDB. Honestly baffled as to why I'm being downvoted.
This is like storing UUIDs as text. You lose type information and validation. It's like storing your array as a comma-delimited string. It can work in a pinch, but it takes up more storage space and is far more error prone.
> convenience types for ipv4, ipv6, and uuid.
That's nice to see. A shame you have to decide ahead of time whether you're storing v6 or v4, and I don't see support for network ranges, but a definite improvement.
> MariaDB supports RETURNING.
That's honestly wonderful to see. Can these be used inside of CTEs as well for correlated INSERTs?
Regarding using JSON for arrays, MySQL and MariaDB both support validation using JSON Schema. For example, you can enforce that a JSON column only stores an array of numbers by calling JSON_SCHEMA_VALID in a CHECK constraint.
Granted, using validated JSON is more hoops than having an array type directly. But in a pinch it's totally doable.
MySQL also stores JSON values using a binary representation, it's not a comma-separated string.
Alternatively, in some cases it may also be fine to pack an array of multi-byte ints into a VARBINARY. Or for an array of floats, MySQL 9 now has a VECTOR type.
Regarding ipv6 addresses: MariaDB's inet6 type can also store ipv4 values as well, although it can be inefficient in terms of storage. (inet6 values take up a fixed 16 bytes, regardless of whether the value is an ipv4 or ipv6 address.)
As for using RETURNING inside a writable CTE in MariaDB: not sure, I'd assume probably not. I must admit I'm not familiar with the multi-table pipeline write pattern that you're describing.
WITH new_order AS (
INSERT INTO order (po_number, bill_to, ship_to)
VALUES ('ABCD1234', 42, 64)
RETURNING order_id
)
INSERT INTO order_item (order_id, product_id, quantity)
SELECT new_order.order_id, vals.product_id, vals.quantity
FROM (VALUES (10, 1), (11, 5), (12, 3)) AS vals(product_id, quantity)
CROSS JOIN new_order
;
Not super pretty, but it illustrates the point. A single statement that creates an order, gets its autogenerated id (bigint, uuid, whatever), and applies that id to the order items that follow. No network round trip necessary to get the order id before you add the items, which translates into a shorter duration for the transaction to remain open.
In this specific situation, the most common MySQL/MariaDB pattern would be to use LAST_INSERT_ID() in the second INSERT, assuming the order IDs are auto-increments. Or with UUIDs, simply generating the ID prior to the first INSERT, either on the application side or in a database-side session variable.
To avoid extra network calls, this could be wrapped in a stored proc, although a fair complaint is that MySQL doesn't support a ton of different programming langauges for procs/funcs like Postgres.
This has always been possible, for example using the BINARY(16) column type if you want to be efficient. Or in MariaDB 10.7+ you can now use the dedicated UUID column type, which is equivalent to BINARY(16) under the hood, but provides a human-readable hex value when queried.
UUIDs are fixed-length. Blobs are not the appropriate type for that.
Sorry, I was mistaken and I already do have that set up and working using binary. There’s something else I’m trying but failing to remember that isn’t possible with binary keys.
> My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
Yep that’s exactly what I’ve seen too :). I still overall prefer this distributed database model - yes you spend more and people make mistakes (and learn). But if you can afford it you get higher velocity and more incentive aligned ownership than the old central gate keeping DBA team model.
I have mixed feelings about this. On the one hand I agree that ownership should be shared. On the other, app developers really don't consider their data structures as carefully in SQL as they do in-memory. It's odd. The right data structure matters more than a good algorithm since algorithms are easier to change. Once you settle on a list vs a set vs a queue, you're stuck once code is built around it.
The same is doubly true for the database schema. Lack of planning and knowledge of expected access patterns can turn an otherwise fast database to mud in no time flat. Once your data is in there, changing the schema is exponentially harder.
"I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
– Linus Torvalds (2006)
What is your database but a bunch of data structures and relationships? I get why the gatekeeping occurred. I don't agree with it, but I understand it. Far too many folks consider expertise in data stores to be optional as developers.
100% on all points. I’m a fan of gatekeeping things that are hard to get right, and hard to undo. If you can prove that you know what you’re doing, by all means, own your stuff. But until then, yes, I’d like to watch over your shoulder, and hopefully teach you how to do it right.
The fact that DBs are data structures but are ignored has always irritated me. If I pushed a PR using lists for everything regardless of their practicality, I’d be rightly told to try again. But push a PR with a schema using similar suboptimal choices, and no one blinks an eye.
I agree but also I think I could be clearer about the key advantage of the distributed many dbs model… it’s that I don’t have to care if people are good at anything that you said :). If teams want to do dumb things with their schema, fine, that’s on them and they’ll have to deal with the consequences. If it matters, they’ll learn and get better. If they want to buy their way out of it with their latency and infra budget instead, that’s honestly fine too.
With many smaller databases owned separately, the blast radius of bad db decisions is small/local.
I don’t mind the model IFF the team has interest in learning how to do it correctly. My biggest complaint as both an SRE and now DBRE has been that dev-managed infrastructure inevitably means during an incident that I had nothing to do with, I’ll be paged to fix it anyway. Actually, that’s not the problem; the problem is later when I explain precisely how and why it broke, and how to avoid it in the future, there’s rarely any interest in doing so.
“You have an unpartitioned table with a few billion rows and a UUIDv4 PK. I’m amazed it’s been working as long as it has. You need to change your schema.”
“Or we can just buy a bigger instance.”
“…”
Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
Re: incentives, yes, also a problem. Dev teams are generally incentivized by Product, who doesn’t care at all whether or not something is optimal, only that new features are being shipped. I despise this mentality, but it’s not usually the devs fault.
UUID (version does not matter for storage, only for generation and distribution) is basically a 128-bit unsigned int, so a double "word" on 64-bit platforms, and it's natively supported by Postgres since at least 8.3 (earliest version with docs up).
While most versions ensure it's random, there are plenty of indexing algorithms that make searching through that quick and close to O(1), so that should not be the schema problem.
Unless you used a string field, but there is a quick workaround for that with Postgres too (make an index on `UUID(field)`, and look it up by `UUID(value) = UUID(field)`).
That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases: you can easily handle some small mistakes in schemas too while you do the full migration in parallel.
Who said this was Postgres? MySQL (with the default InnoDB engine) and MSSQL both are clustering indexes; they store tuples around the PK. For a UUIDv4 PK – or anything else non-k-sortable, for that matter – this results in a massive amount of B+tree bloat from the random inserts.
But sure, let’s talk about Postgres. After all, it stores tuples in a heap, and so is immune to this behavior.
Except that its MVCC implementation means that it has to maintain a Visibility Map [0] to keep track of which pages contain only tuples which are visible to every active transaction. This is nominally used for vacuum decisions, but is also cleverly exploited for use with index-only scans. This poses a problem when referencing non-k-sortable entries, because while the tuples are in a heap, the indices are generally a B+tree. So now we’re back to the problem of massive IO amplification. Even if the VM and index are entirely cached, reads are reads, and they add up.
Then there’s the issue of WAL bloat due to full page writes. tl;dr Postgres pages are nominally 8 KiB, and storage device pages are nominally 4 KiB. To guarantee an atomic write, Postgres writes the entire page for a given tuple for the first write after checkpoint, regardless of how many bytes were altered. Again, non-k-sortable: if your tuples are spread randomly across pages, you now have far more pages to write. This can and does matter for either write-heavy workloads, instances with limited network bandwidth (oh hi small RDS / Aurora), or the worst, both combined.
Re: search complexity, I’m pretty sure B+trees (and B-trees) have O(log n) time complexity, not O(1). Whether or not that’s “close” depends on n, I suppose, but in this scenario I specifically said “billions of rows.”
> That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases
I’m a DBRE, and like both MySQL and Postgres. They both have strengths and weaknesses, but you need to deeply understand those – and properly design your schema and query patterns around them – in order to make an informed decision.
Agreed. My preferred PK in descending order is: natural key if it makes sense and would speed up queries, integer of an appropriate size, UUIDv7. I only rank it below integers because at best, they’re 16 bytes, and even a BIGINT is only 8 bytes.
Contrary to intuition, bigint ends up only 25% smaller on disk than UUIDv7 and no improvement in speed. Honestly in 2024, 32-bit ints are generally a poor fit. Either the table is small enough to be considered a lookup table (16-bit int) or could grow to unknown bounds given enough time (64-bit int). Sequential ids are guessable (which could be a security issue), cannot be generated on the client, and can hinder sharding now that Postgres supports bidirectional replication (multiple writers).
My PK preference in descending order: natural key if it makes sense and would speed up queries, 16-bit for unambiguous lookup tables with a hard finite limit of values, UUIDv7, 64-bit sequence, UUIDv4, and then finally 32-bit sequence.
UUIDv7 for performance, reasonable non-guessability, client-generation, and potential for sharding. 64-bit for cases that really need a sequence. UUIDv4 when randomness is required for security. 32-bit when there's a hard finite limit that can guaranteed to be below 2 billion even given the most optimistic of projections.
If the database is small, it doesn't matter. If it's large, it is 99.999% unlikely the primary key will be anything more than a rounding error in storage, access time, or cost. No one ever said, "We would have made payroll if only we had used 32-bit primary keys."
To really save space, extract the timestamp in the UUIDv7 and use an expression index to use as the creation date. Then you're not "wasting" any storage space. Personally I don't think the storage optimization is necessary. Premature optimization being evil and all that.
> Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools - Doctrine in the PHP world for example. And pretty much no one but actual CS graduates knows anything deeper about databases.
Result is, devs are happy enough if they found something that works, and don't want to risk being the one who broke prod because they applied some schema change suggested by their DBA who doesn't know some random thing about the application.
You would be very surprised to see the workflow of DB heavy development teams. Some oracle devs have entire backends coded in pl/sql.
My last company had an absolute ton of pl/pgsql written to support hundreds of ELT pipelines, migrations were all hand written and managed with liquibase.
There are more of them than you'd think out there. Just generally supporting some boring b2b or backend software.
> Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools
Your experience does not match mine. Tools like ORMs make horrible schemas in my opinion that cater to the lowest common denominator of SQL engine functionality. This means leaving a lot of performance and scalability on the floor. In order to make the ORMs generate decent schema definitions, you need to know the underlying engine and therefore SQL. At that point, you might as well use SQL.
Ever try changing a column's data type from a table with hundreds of millions of rows with an ORM definition file? Hope you like downtime.
> My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks
That's a true shame considering how easy it is to make a read replica of any size and then fail over to it as the new primary. Definite skill issues.
It’s more like a “not knowing how fast something should be” in the case of under-provisioning, and “not knowing or caring to look at metrics” for over-provisioning.
I once was examining some queries being generated via Prisma, and found it was using LIMIT/OFFSET for pagination. I pointed this out to the devs, who replied that the query times were acceptable for their SLOs. I guess if you don’t know that a simple SELECT can and should often be in the sub-msec range, you might not be concerned that it’s taking 100 msec.
The other is just the normalization of cloud expenditure. Every large org has some kind of pricing agreement with their provider, and so are rarely incentivized to cut costs, since they have a minimum spend.
When all you know is an ORM, you tend to treat SQL databases like dumb bit bucket add-ons to your app server. It's amazing how much potential performance and scalability are left on the floor because app developers can't shift their mindset when needed. Objects/structs cannot be assumed to map 1:1 with relations. What a world we'd live in if devs spent even 1/10 the effort examining their relational schema design that they spend arguing over whether a set, a list, or a queue is better for a given situation. It's like thoughts on Big-O stop cold at the database driver interface.
MySQL feels more "modern" (in the context of the early 2000s when I started using it), in that it seemed to know it existed in an environment with other tools; for example, authentication was handled internally and was unrelated to the system user (other than the clients usually using your system username as the default database username if you didn't specify one).
Compare that with Postgres, which seemed very "old school", going so far as to assume it was the only thing a given server was doing. Connecting to postgres authenticated as your own user; creating a user was done with the `createuser` command (or similar, I don't remember what it was actually called), and not some namespaced `pg_createuser` command that would make it clear what it did.
I also remember setting up MySQL replication with almost no effort whatsoever, and then in the same year trying to set up Postgres replication - which it didn't have. I was told by other postgres admins to "just set up a script to rsync the database over and over to the other server; then if your first server dies just start the second server up and it'll recover". This seemed like a wildly cavalier attitude towards uptime and reliability, not to mention generating a ridiculous amount of I/O and network traffic for minimal benefit.
As someone who operates both I much prefer to run pg_upgrade every few years rather than dealing with mysqldump, replication issues and babysitting Orchestrator.
> All failure, rebuild, and reindex cases will be reported by pg_upgrade if they affect your installation; post-upgrade scripts to rebuild tables and indexes will be generated automatically.
Not true. Those scripts only cover problems caused by PostgreSQL itself, not, for example, a change in collations that will silently break your indexes, such as the one in Debian 12.
There is a generic way to do it that works with SQLite and other databases, including Postgres. On Linux, take a filesystem or block device coherent snapshot, take your backup of all the SQLite files from the snapshot, then delete the snapshot.
The app or service continues to run without disruption, and the backup is a coherent database image. Perhaps you have other state in addition to the database, or multiple databses. For example cache files. This covers them too, if they are all in the same snapshot.
There are many ways to take a snapshot: ZFS and btrfs offer a filesystem command, and any Linux filesystem, such as ext4, can be snapshotted with LVM or LVM-thin. Well known cloud providers like AWS, GCP, Azure also provide ways to snapshot block devices, through their APIs. However, to ensure a coherent image, it may be necessary to use the Linux `fsfreeze` command around API calls.
The database backup files can have incomplete transactions, but if the files are restored it will be as if the OS was abruptly stopped at the moment of the snapshot. SQLite and other good databases are designed to recover well from this sort of abrupt stop, without corrupting the database. They clean up incomplete transactions on recovery.
To avoid having to slow down development of new PostgreSQL features. Improving upgrades in a way where PostgreSQL does not need to either maintain multiple different versions of parts of the code and/or lock down internal interfaces which now can change freely every major version so they cannot be refactored and improved in the future is not a trivial task, maybe even impossible. Even just the existence of pg_upgrade has to some degree limited what can be done to improve PostgreSQL. Obviously pg_upgrade is worth it, but hindering development even further might not be popular.
The PostgreSQL team simply does not have the resources to do this. At least not without significantly slowing down development of everything else which there is no political will for. Maybe someone will come up with a genius idea which solves this but I am doubtful. Usually there is no free lunch.
Maybe some core dev will correct me, I am quite familiar with the PostgreSQL project but not one of the core devs. :)
Nope, that is totally unrated. To support upgrade in place without an old version of PostgreSQL:
1. The new version of PostgreSQL would need to able to read all old catalog table formats and migrate them.
2. The new version of PostgreSQL would need to support all old versions of the parse tree to migrate views.
3. Likely a bunch of more things that I do not know of. I for example doubt it is trivial to just read an old catalog without having a fully up and running cluster which supports almost everything in that old cluster. The catalog has TOAST tables and indexes for example.
Right now 1 and 2 are implemented in pg_dump plus by having pg_dump call functions in a running old version of PostgreSQL.
It is a PITA, but I've written scripts that pg_dump just the schema, load the schema into the new db with the new version, set up logical replication between the two, wait for them to sync, reset all the sequences, and rebuild indexes before doing the handover.
It works with basically no downtime but I agree this kind of thing should definitely be easier, even turnkey.
I think that sums up PG pretty well. It seems there's a lot of things that lean into it the Unix "just use another tool" philosophy that ends up making management more difficult.
Yes, and it does it by starting an instance of the old version and runs pg_dump against it. And that was one thing the original poster complained about.
It says it works "without the data dump/restore" and...
> Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files
Regardless, I suppose it is that reliance on the unchanging of the internal data format which is limiting what refactors can do.
I think one really BIG factor is that built-in logical replication wasn't introduced until PostgreSQL 10 in 2017, before that you only had physical replication for master-slave but iirc that didn't work between versions so doing a "hot" upgrade was more or less impossible without third-party tools iirc.
So even if it's available these days, the amount of people still subjected to upgrades from older version still leaves an impression that it's really bad.
I don't care much about schema changes as I can setup logical replication just for the major version upgrade. The main issue is the lack of support for large objects, which the code bases I'm dealing with uses heavily.
Lots of dogmatism in this discussion, it seems. A couple of things:
1. Most psql deployments are not exposed to the interwebz, they are typically only accessible to the applications that need them by virtue of network setup (firewalls etc). This limits the attack vector to whatever the application does. Good.
2. Distro vendors (RHEL et al) often stick to major psql release for the lifecycle of the OS version. If the OS lives longer than the psql major version, they take on the responsability of backporting critical security issues.
3. While upgrades aren't hard, they're not easy either.
4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
> 4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
To drive this a little further, "latest and greatest" doesn't always apply. I've chosen software - even databases - for greenfield deployments one or two releases behind for their known characteristics.
Stability doesn't imply perfection, but rather, predictability.
> 3. While upgrades aren't hard, they're not easy either.
And in my experience, apart from ease of use there's also a major trust issue here. If you're upgrading your app server framework/language, it's easy enough to do a rollback. With databases, people are worried that they might not notice errors right away and then you have to merge the data accumulated since the upgrade with the last backup in case of a rollback.
Not saying that this is entirely rational...
Also, new features on the SQL level are hard to sell if all you're doing is lowest common denominator ORM ("New window functions and faster lateral joins? But we're doing all that in our code!").
It requires a lot of work, planned downtime, or some way to smear updates across the estate.
The cost of any failure is very high. The benefit of any major upgrade is also vanishingly small. Unless you need a specific feature, its just not worth it.
5. If your IT department is spread thin already and that old version is running fine, the incentive to potentially create more work for yourself is not gigantic.
The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
Amongst other reasons:
- Performing regular upgrades is an easy and cheap way to maintain a healthy knowledge base about the system. It's always easier to fix a system that is well understood than a black box which nobody has touched in 20 years. Upgrading regularly also leads to people being more comfortable with the upgrade process and it is likely to become a "regular thing" with checklists and other process improvements that make maintenance safer and more reliable.
- Doing preemptive maintenance means YOU get to choose when the system is unavailable and then you can plan accordingly, perhaps by planning it during a period with low usage or even arranging for a redundant system to take up the load. The alternative is leaving it up to chance when a system will fail, and that's frequently at high-load moments when you can least afford it.
- As a corollary to the previous point: a team which is in control of its systems can have more predictable output of feature work, since there will be fewer "unexpected" events.
Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
> The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
and
> Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
Those are pithy assertions, but I don't think they're universally factually correct. This applies at some scales, but not at others. One size does not fit all, nor does one maintenance strategy fit all. I do believe I specifically called out the dogmatism in this discussion, and yet here we go piling on more of the same. I don't think that's terribly helpful. I assume that everyone is well aware of all the points you raised, as I suspect that most people running PostgreSQL are not in fact incompetent.
Not everyone has a team which "is in control of its systems", nor can everyone afford such a team. It may well be cheaper for some people to run something old until it burns only to have The Database Consultant come in one day to raise it from the dead.
> The entire field of maintenance engineering would like a word. Over longer periods of time it's vastly cheaper to regularly fix things even before they break, and software is no exception.
I mean I think it's because maintenance is so unglamorous. So when it happens and everything doesn't collapse, nobody remembers and starts asking why we even do it (and then sysadmins and middle management suddenly aren't a thing, and companies and IT become exactly as fragile as anyone with _actual_ understanding of management would expect).
Meanwhile when regular maintenance fails in progress, it often ends up in the news with a body count attached.
One of my favourite podcasts has a running joke that you should never do maintenance (because so many industrial disasters happen during it). Of course the bias is the point of the joke - and usually the reason things went bad is because either the maintenance was neglected until that point, or the engineering picked Postgresql, I mean, didn't consider the necessity of maintenance.
- If you're scared to perform upgrades, this is a good indicator that you lack confidence in your backup/recovery/reversion policies and procedures, probably for good reason, so maybe fix that.
Maintenance engineering is in full agreement with that. You are maintaining the old version, backporting fixes etc. instead of tearing down the whole thing and replacing it with new shiny all the time.
One of the first laws of universe that an experienced engineer learns is that "do not fix what is not broken" never actually applies, and is only brought up by people invulnerable to consequences.
That doesn't mean "upgrade recklessly," but it does mean you should know _why_ you're either upgrading or _NOT_ upgrading. That's your job, much more than the act of upgrading itself.
Unpublished vulnerabilities in old software are not a hypothetical. And very old packages are usually broken, just coped with at the expense of significant lost opportunity cost - or because the failure is a combination of rare and impactful that means once it happens everyone is out of job anyway.
Seriously, I've yet have to encounter a sysadmin using that old, silly adage at me and not later have to admit I was right.
Edit: so no, you don't stay on an ancient version of the database because "it's not broken." You're staying on it because _the upgrade process itself_ is so broken you're terrified of it.
I generally follow if it’s not broken, fixes need to be carefully planned. I can’t tell you how many times I thought I’d quickly do an upgrade and things would go wrong, like all of my home automation stop working right before bed.
It leads to a lot of old software which is not going to be upgraded ever. Then the entire project dies and gets rewritten from the scratch, because nobody wants to work with Windows 2003 server running Delphi 7, Java 1.4 and Oracle 9i in 2020 (personal experience).
Old software is not necessarily broken, but it is always a tech debt. And you can't live in debt forever, our IT does not work this way.
I do agree, however I think it’s often easier to upgrade iteratively and deal with smaller issues that arise as opposed to upgrading a huge version diff and struggling to understand and fix all the failing parts.
I think there's a balance to be struck there. On large databases, upgrade time can be very slow. Is it worth upgrading iteratively at great expense if there are no benefits to be gained (e.g. no bugfixes that affect you, no critical security issues)? Maybe, maybe not.
This is solid advice, however I would caveat that you can't know if it is broken if you are not checking. E.g. if your old as hell database has a known vulernability that can be expoited, unless you can rule out that it can be used in your setup it is broken by all definitions of the word.
> 3. While upgrades aren't hard, they're not easy either
I guess it depends on scale? I was surprised how easy it was on Ubuntu. There was an automatic migration script, and it worked. Took less than 5 minutes to upgrade.
Sure, there was downtime, but I think most applications out there can live with scheduled downtime of a few minutes.
For SRE (site reliability engineering) the term of art is 'error budget'
To others reading, be mindful: database upgrade times depend greatly on the data stored within them/where/who is using them. Your development environment doesn't represent production. If the distinction even exists!
A five minute upgrade can become indefinite with a single lock [row/table, depending on the storage engine/etc]
I think this is a really good take. It all boils down to "if it ain't broke don't fix it."
No matter how easy it is, it takes more effort and thought to do an upgrade than it does to not do an upgrade at all, and for most users the upside is basically invisible if it exists at all.
"What the application does" may not be what you think of, as it is dependent on how secure the application or the layers beneath it are. This is how people get everything pwned step by step. The database server may then reveal credentials to other apps etc.
Sure. Defense in depth is important. But I hope that your application is only able to talk TCP/5432 to psql. No amount of psql upgrading will protect you against SQL injections in shitty application code.
Sure, but then chances are it's hosted on a nas with other data which you dont want ransomware'd, has access to other parts of the network, etc. - it's easy to underestimate the potential impact
I've used postgresql in most of my previous jobs, but using mariadb in my current one, and I must say it's a joy to administer.
The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer.
It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it. Not a dozen options like in the postgresql ecosystem, each with it's own quirks and drawbacks.
Also, upgrades are a dream. I did 4 major long term stable release upgrades so far and everything was fully automatic and smooth sailing all around.
All of that with about 10 seconds downtime only for each upgrade, despite being a chunky 6TB database, thanks to the user friendly replication process.
I respect postgresql a lot, but mariadb allows me to sleep much better at night thanks to replication, smooth upgrades and no VACUUM to worry about.
To be fair to PostgreSQL, it has multiple replication setups because they have different inherent properties. For instance, it comes with built in streaming and logical replication setups. Which should you use? It’s impossible to say without knowing your own exact use case.
By analogy: should a database cluster optimize consistency or availability? Answer: there’s no way its author can guess which is more important to how you want to use it.
MySQL/MariaDB have binary and logical replication too. They also support much more advanced topologies like multi-master out of the box. It's just that what you need for 99% of situations relies on a well trodden path and requires no configuration at all.
I have suspected for a long time that most people who criticize MySQL have never actually worked with it, or have done so a couple of decades ago. It's often the default choice if you don't need some of the PostgreSQL features (like PostGIS) and can work with either one.
> They also support much more advanced topologies like multi-master out of the box.
This is the one thing in My/MariaDB that I miss in Pg: Galera. So handy for in-house stuff that needs some kind of HA: either a three-server setup, or a two-server+arbitrator.
> The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer. It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it.
Upgrades are hard. There was no replication in the before times. The original block-level replication didn't work among different major versions. Slony was a painful workaround based on triggers that amplified writes.
Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
> At least when you cannot spare more than a few minutes of downtime.
I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.
"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."
"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."
They’re not wrong. If you’ve ever spent meaningful time administering both, you’ll know that Postgres takes far more hands-on work to keep it going.
To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.
I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.
Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).
> I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.
Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?
In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).
MyISAM in the olden days could/would magically lose data. InnoDB has been the de facto standard for a while and I haven't seen data loss attributed to it.
In 2012 MySQL had several flavors of replications, each with its own very serious pitfalls that could introduce corruption or loss of data. I saw enough of MySQL replication issues in those days that I wouldn't want to use it.
But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.
> pitfalls that could introduce corruption or loss of data
sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.
MVCC is overrated. Not every row in a busy MySQL table is your transactional wallet balance. But to upgrade a DB you have to deal with every field every row every table, and data keeps changing, which is a real headache
Fixing a range of broken data, however, can be done by a junior developer. If you rely on rdbms for a single source of truth you are probably fucked anyway.
My experience has been exactly opposite. Ability to do Vacuums is good. MySQL doesn’t free up space taken by deleted rows. The only option to free up the space is to mysqldump the db and load it again. Not practical in most of the situations.
VACUUM rarely reclaims space from the OS’ perspective, if that’s what you meant. It can in certain circumstances, but they’re rare. VACUUM FULL is the equivalent to OPTIMIZE TABLE – both lock the table to do a full rewrite, and optimally binpack it to the extent that is posssible.
EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.
That helps a lot thanks. Will summarize it quickly for those who come later: MySQL (InnoDB really) and Postgres both use MVCC, so they write a new row on update. InnoDB however also additionally writes a record marking the old row for deletion.
To do a cleanup, InnoDB uses the records it kept to delete old data, while Postgres must do a scan. So InnoDB pays a record-keeping price as part of the update that makes it easier to clear data, while Postgres decides to pay this price of occasional scanning.
I don't know how VACUUM works, I couldn't tell you about the differences.
The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)
Not in around 15 years. You're thinking of when MyISAM was the default storage engine for MySQL. It has been InnoDB for over a decade. InnoDB is very reliable - I've never had a single data loss incident in all that time, and I've managed some very large (PB-scale) and active databases.
Postgres is definitely more difficult to administer.
MySQL used to have horrible and very unsafe defaults for new installations that persisted well after the introduction of InnoDB. Those went unfixed for a very long time.
I recall this being the case A LOOOONG time ago but I haven't heard of, read about, been warned to look out for or personally seen such a thing in forever. Have you?
* I'm running a lot of MySQL stuff and such a topic might be of interest to me
Yes, it is messy when you want your MySQL databases to be mission critical in production, e.g. handling a large amount of customer data. Historically MySQL's High Availability architecture has a lot of design and implementation issues because it was an afterthought. Dealing with large amount of critical data means you need it to be performant, reliable and available at the same time, which is hard and requires you to deal with caching, sharding, replication, network issues, zone/resource planning, failovers, leader elections and semi-sync bugs, corrupted logs, manually fixing bad queries that killed the database, data migration, version upgrades, etc. There is a reason why big corps like Google/Meta has dedicated teams of experts (like people who actually wrote the HA features) to maintain their mission critical MySQL deployments.
From what I can tell, MySQL is supposed to be safe since 2018 if you have no data from before 2010.
The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.
Some places still have columns declared as utf8 instead of utf8mb4, and there's a special place in hell for authors of the MySQL general clusterfuck regarding encodings - it was all nice and great if you didn't care about anything other than latin1 or ASCII - go outside that before utf8 option and it was horror that even experienced operators managed to fuckup (I have a badge from a Google conference in 2017 with nicely visible effect of "we have mixed up one of the three separate encoding settings in MySQL and now you have mojibake in your badge").
And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.
In fairness, reasoning about collations is like peering into the abyss. I get why they’re required to have so many levels of detail, and the Unicode Consortium has done a fantastic job, but to say they’re complicated is putting it mildly.
Oracle also didn't support Boolean data types for a long time, and had a 20 some odd year public thread arguing that no one needed a Boolean data type (https://asktom.oracle.com/ords/f?p=100:11:0::::P11_QUESTION_...). They finally added it in Oracle 23 which is nice, but I wouldn't consider it to be in good company to be lacking something Oracle also lacks.
Not having a boolean data type is IMHO just an annoyance, not comparable to the lack of transactional DDL.
But to the point, people often use this point to claim that MySQL is a toy database, not usable for real world production use. I use Oracle as a counterpoint, which also has a lot of warts but is pretty much an archetype of an enterprise-grade DB engine.
Early MySQL versions made egregious design choices like quietly ignoring missing foreign keys and enum typos, truncating long strings, and randomly choosing rows from groups.
Yeah, it was bad. What kills me is SQLite has its own absurd set of gotchas [0] yet is seen as amazing and wonderful by devs. PKs can have NULLs? Sure! Strings can have \0 in the middle of them? Why not? FKs aren’t enforced by default? Yeah, who needs referential integrity, anyway?
My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.
While that change from LGPL to GPL affected only the client library (server always was GPL(+commercial)) and the MySQL company relatively quickly reacted with a FOSS exception to the GPL and by providing a reimplementation of the client library under PHP license (mysqlnd) to serve that market.
(I joined MySQL shortly after that mess, before the Sun acquisition)
They also didn't like updating software - to likely that update to PHP or MySQL or something broke some bad script by a customer, who'd complain to the host.
I am a database specialist and have worn the DBA had for many years. I have run MySQL and Postgres in production, both self-hosted and using managed services. Postgres wins on every single dimension that matters, every time. Yes MySQL is easier to setup for non-experts. That counts for nothing.
If you are sticking up for MySQL in this thread... I just don't even know, man.
I think so. The PostgreSQL logo is an elephant, and the community calls him "Slonik", probably derived from the Russian word for elephant. There is also a node.js driver that is called "slonik"
I've always wondered why Postgres is so insanely popular. I mean it has some nice things like very powerful support for a very comprehensive subset of SQL functionality, but most apps don't need all that.
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
> I've always wondered why Postgres is so insanely popular.
In no particular order, my preference for postgres is driven by:
* Date / time functions that don't suck
* UTF-8 is really UTF-8
* 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
* Really comprehensive documentation
* LTREE and fuzzy string match extensions
* Familiarity from using it for years
MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.
UTF-8 is what made me switch. It’s insane MySQL has something called UTF-8 that isn't really UTF-8, but do have a type UTF8MB4 that actually is correct. This means if you use UFT-8 in MySQL, you can’t use emoji for example.
Postgres limits btree keys to 2704 bytes, which is actually slightly smaller than MySQL's limit of 3072 bytes, assuming the default InnoDB storage engine.
That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
> So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
You should not be indexing 768 characters in any circumstance I can imagine. Go ahead and try it. Spin up two tables, fill them with a few million rows, and slap and index on them. Give one a reasonable prefix limit, and let the other go wild. Make sure you ANALYZE each, then run queries in a loop and check the times.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
A URL, for instance, can't be safely stored in 768 characters, but it can be stored safely in 2704. If you then wanted to sort those URLs so that all URLs for each domain and path within that domain are adjacent, you need an index. Especially if you want to paginate over them with a cursor. Doing that without an index on the raw value is a royal pain in the ass.
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
To be honest, indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.
> indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.
This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.
> But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.
Sorry, I was considering short things like dates and numbers as identifiers. I realize that's not quite right -- what I should have said was that indexes are designed for short things period (short identifiers being one of those things). Thanks.
> This fails when you need to insert new values into the table.
Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.
But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.
Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.
> But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box.
You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.
It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.
And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.
In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.
No, Postgres doesn't. 2730 bytes is not long enough to hold all URL's encountered in the wild. But also, your performance will suffer if you use that whole length. You generally don't want to be doing that.
The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.
You might just load someone else's data, and the index is desirable in general for speeding up analytic queries. It's possible to work around that, of course. But depending on what you do, it can make writing efficient queries against the data more difficult. That's just a distraction because most of the time, those long columns won't matter anyway.
I won't defend that utf8 brain damage, but the defaults are sane since 2018 — you don't need to set the encoding, it's set to proper utf8 out of the box. MySQL 8 cleaned up a lot of this legacy stuff.
Good to hear they saw the light but after I switched to Postgres I never had a single regret.
In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.
Ok so if you are doing sentiment analysis of user product reviews you want to silently truncate emoji because you don’t like them? That’s a good idea how?
Maybe they're thinking of TIMESTAMP in MySQL, which IIRC would auto update its value on any update to the row. Which was useful for uodated_at like columns. Though I think they later limited it to only the first TIMESTAMP column in a table.
No, it works for both [0] types. The first TIMESTAMP thing you’re referring to is that if a specific variable isn’t set, the first TIMESTAMP column automatically gets auto updates applied on creation and update, unless you explicitly defined it to not. This was the default behavior in 5.7, but has since been changed.
Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
You're thinking only in terms of application. Types in the db save storage space, allow for better validation than plain strings, can be correlated cleanly with other columns with the same type, etc.
Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
It's not just DDL that isn't transactional, there's a whole bunch of other things that aren't. And they break the transactionality silently. It's like an obstical course where bumping into something might be fatal.
What specific non-DDL things are you referring to here?
Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
That hardly seems equivalent. Why do you need to e.g. reconfigure replication inside of a transaction in the first place?
The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.
I worked for a company that migrated from mysql to postgres, but then got big enough they wanted to hire fulltime database experts and ended up migrating back to mysql because it was easier to find talent
Ugh. I worked with MySQL earlier in my career (until about 10 years ago.) All the companies since have been Postgres. All my personal projects are Postgres. I can't imagine going back.
It requires a ton of somewhat arcane maintenance at scale. Vacuum shenanigans, Index fragmentation requiring manual reindexing, Txid wraparounds. I like Postgres but it’s definitely way more work to maintain a large instance than mysql. MySQL just kinda works
In complex environments it is not just a one off task. I dealt with it by automating my infrastructure with ansible, but without some tooling it sucks.
I started with MySQL in 2006 for my personal projects, but what first won me over to psql was those commands.
Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
> At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
In MySQL either `?` or `help` or `\?` will show you the help...
I assume this is really what it comes down to. If psql added those verbose-but-descriptive commands a whole bunch of people comfortable with mysql would be a lot happier using postgres.
> I've always wondered why Postgres is so insanely popular.
Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
For me Postgres is 100% predictable and reliable. It's neither clunky nor arcane in my experience. I don't need to think about it, I just SQL it and that's about it. It quietly works in the background. At some scale there might be some issues, but there is always known path to solve things.
That's what MariaDB is for, right? I'm surprised to hear people recommend the Oracle fork of MySQL (still called MySQL because they own the trademark) rather than the original project (now called MariaDB)
You might as well ask, why does anyone run an older version or anything? The reasons will be largely the same.
Most of the software on my machines are "old" because they are part of a Linux distribution that (aside from security issues) was frozen in time a year or two ago so that it could be tested, released, and maintained. I am quite happy to have a system that I know is not going to break (either itself, or my workflow) when I apply security updates.
People who MUST HAVE the latest version of everything I feel either have some deeper FOMO issues to work out, suffer from boredom, or look at their computers as hobbies themselves rather than tools. (Which is fine, just be honest about what it is.)
That said, much of my career has been spent working at companies who got so busy shipping features that upgrading infrastructure never makes it above the fold. You can tell the managers that working around old software adds costs that scale with the age of the infrastructure, but they don't always listen. I currently work at a company that still has loads of CentOS 7 hosts still in production, and only fairly recently began upgrading them to RHEL 8. (Not 9!)
> These are the companies you want to be at IMHO. Provided the compensation is adequate, slow and stable > fast and pivot-y.
Absolutely...not.
Slow does not mean stable. Slow means the floor is rotting out from under you constantly.
Being prudent about when and where to upgrade is a very active, intentional process that the typical company simply don't have the stomach or skill for.
Yeah, eventually you will have to upgrade and deal with all of the accumulated debt. You don’t have to be on the bleeding edge but you should still be updating regularly.
Not chasing shiny is important but generally when tech debt builds up this high, life is generally hell in terms of outages, unable to accomplish basic tasks and dealing with a bunch of people who have NIH syndrome.
Which is why you build on a platform like Alma/Redhat that gives you 10 years of support. You can call it outdated I guess but I prefer "supported." Make everyone else work out the new bugs before upgrading-- it used to be the rule not to update to a .0 release but being a hip modern developer means moving the 0 to the other side and yoloing it.
The problem is that software ends up not getting touched for all those years, but eventually needs an upgrade when it's at end of support. And at that point you end up having to make changes to a system where no one remembers how it works or how to deploy it. Keeping software up to date to me is similar to how you practice disaster recovery scenarios. You do it regularly so no one forgets the processes and so any issues can be dealt with while you're not under a short time limit.
older versions can also mean deprecated packages everyone's too scared to touch, failure to invest in maintenance and tech debt reduction, or use of old technologies which stopped receiving security updates
I don’t necessarily need to be on the latest version, but I prefer to take many small upgrades rather than one big upgrade at least when it comes to databases. Frequent upgrades also forces an organization to get good at upgrading and managing the risk.
That and capitalism doesn’t reward slow and steady, keeping things stable and well oiled. It rewards delivering shoddy features at break neck speed even if they need to be recalled after two weeks. That’s what Wall Street and the shareholders want. Hence why senior management rarely cares about addressing tech debt.
“Show me the incentive, I’ll show you the outcome”
Once your version doesn’t receive security fixes you’re one CERT advisory away from having your whole week pre-empted by an emergency upgrade.
I’ve been there with products that were still internal at the time. I can only imagine how much fun that is with a public product. But then I do have a pretty vivid imagination. We changed to periodic upgrades after that to avoid the obvious problem staring us in the face.
Upgrading when multiple versions behind is significantly more risky than doing it when the update is relatively fresh.
Additionally, actions done frequently are less risky than actions done rarely, since you develop skills in performing that action as an organization - see high deployment frequency as a strategy of managing deployment risk.
This adds up to continuous upgrading being the least risky option in aggregate.
We are planning to upgrade from 11 to 17 soon. Even thinking about it is giving me ulcers. Our infra provider said we actually need to upgrade to 13 first, and then to 17. They did not provide a reason.
I went through a postgres 10 > 16 upgrade recently. What made it easier was just doing a test run of the upgrade process.
Did a restore to a stage environment, worked on my upgrade scripts until I was happy (deployed to VMs with ansible, so manual work to write the upgradeprocessfor me), restored again and ran the upgrade process fresh, and then tested my application, backup scripts, restores, etc. Had everything working entirely smoothly multiple times before pulling the trigger in production.
Why? If the implemented featureset meets your needs, and there are no unresolved bugs or security vulnerabilities relevant to your use cases, what further "maintenance" do you need?
Because when the maintainers have stopped patching that version against all known security vulnerabilities, that doesn't stop the bad guys from looking for more vulnerabilities. When they find one, it will get exploited. So you either wake up to an email from Have I Been Pwned to say all your customer data has been exfiltrated [0], or (if you're lucky) you have a mad scramble to do that update before they get you.
[0] Probably including those passwords you didn't hash, and those credit card numbers you shouldn't be storing in the first place because, what the heck, it meets your needs.
I agree. It helped me completely bypass any discussion from management about “not high enough priority”. Amazon definitely did me a favour in many ways.
In Oracle, ALTER TABLE MOVE in 8i was a godsend, finally enabling a table reorganization without export/import.
My timid management forbade an upgrade from Oracle 7.3.4 until 2013. It was agony to remain on that museum piece for as long as we did.
I am upgrade-minded, but my management is not. I always lose.
I am retiring in two years. I will not miss their problems, not at all.
Edit: Oracle 10g was the last release that (for us) brought must-have features. Sure, upgrading to 19 or 23 would be great, but it doesn't bring anything that I really want.
Because it's not automatic I leave it, I leave it until it's so unsupported that I must upgrade the whole system, then I build a new system with a new Postgres and I migrate the old to the new.
I want, so badly, for Postgres to just automatically update itself, that a new binary just works with the data directory of an old version, and that if required it does an in-place upgrade to those data files when it can (i.e. if it can detect the last version was the same major as the current version, upgrade the files transparently to the admin).
My databases are all backed up each night, and these are single server Postgres with no replication or other trickery, an automatic upgrade for a single-server Postgres should be possible.
As it's not done... I assume (incorrectly?) that there be dragons and risks, and I mitigate that by never upgrading, just waiting and migrating. Migrating puts all of the risk on me, human error, and I am definitely fallible, so I can best handle this risk by just not doing it until I must.
Last migration I performed was from Postgres 7 > Postgres 14 in October 2021... I guess I have quite a few years of Postgres 14 ahead of me still. I would take downtime, my systems aren't zero downtime critical, just give me a super simple one-command no-questions upgrade as I really dislike migrations.
Unless you want support for the MERGE command (v15), bidirectional replication (v16), and some behind the scenes optimizations, you're not really missing anything.
Whether you go from v14 to v17 or v14 to v20, I doubt it'll make a difference in migration strategy. You've still got a fair amount of time before EOL. No need to stress or FOMO.
Thanks. There's a small group of us (PG DevOps type people) who have been working on it for over a year now, and it's come together pretty well.
It doesn't yet automatically upgrade people's PG extensions, but that's on the ToDo list and has initial code in a PR. So that'll likely start happening in a few weeks too. :)
I used to do the old pgdump for backups, but after using pgbackrest at work for years I see no reason not to use it for every cluster I want to back up.
I like pointing it at an S3 bucket so I have another full backup repo (with its own retention strategy) away from my homelab, as well as to my local NAS.
Databases tend to be "stickier" than other parts of any large software system. Largely because database migrations are costly. You can't just tear down an old database and rebuild a new one, you have to figure out how to move all that data across too.
The consequence is that things in database-land tends to move slower than other types of software. This I think is the major reason why we still use SQL.
A dump/reload of the database or use of the pg_upgrade application is required for major upgrades.
Now, seems one can mostly use pg_upgrade, which only rewrites the system tables[2] so is fairly quick. But if on-disk format has changed it seems you're forced to dump and reload.
At work we've mainly been using SQLAnywhere, which would just disable new functionality for databases using old on-disk format. So upgrading major versions has usually been fairly painless and quick.
Well if it's self-hosted you have to do it yourself. You can either backup your databases from the old version and restore it to the new version once installed, or you can use pg_upgrade to upgrade/copy a old version data directory to the new version.
I don't think this is done automatically when you simply install a new postgres version, but I'm not certain of that.
JSON - er JSON-based document storage - documents with unique identifiers. and the ability to define and set schemas for the JSON, and ... we're back to a relational database
The PostgreSQL team releases a data incompatible format every year. Do they really need to break the data format every year? In my opinion, the release model for PostgreSQL should be refactored. Breaking stuff is no fun for users, please stop doing that if you are a software developer. And if you have to, make it autonomous for the user.
If you want performance to steadily increase rather than steadily decrease, the answer to your question is yes. That doesn't mean tools couldn't be better, but there are clear technical advantages to their on-disk format updates.
Database engines (every single one) are notorious for incompatibilities between major versions, upgrading mission critical stuff means updating and re-testing entire applications, which in some cases can be a multi-million dollar process, before going into production.
Even if you deeply know/think that there's no problem upgrading, if something does fail in production after an upgrade and it's mission critical..
This. I was tasked with upgrading Postgresql from a very old version (I think 9?) to one that was still supported a couple of years ago. Backwards compatibility is paramount and from my experience upgrading MySQL/MariaDB I know that changes in versions can break it.
For this reason, I chose to upgrade to version 11 because it was only a couple of versions apart and still had repositories available at the time.
So the first thing I do is stop the VM and take a snapshot. Then I start it back up and go check for database corruption before I dump them... wait there's no utility to check for corruption...? Yep that's right! You basically have to YOLO the whole thing and hope it works. OK...
So I dump the databases and back up the directory. I shut down the old version and then install the new version from the repo. I start to import the databases and notice in the scrolling logs that there's some incompatibility... Oh F*$&. I google the error and spend a good hour trying to figure it out. Apparently there are external plugins for postgres that were installed in the old version. I search for the plugins online and they are long discontinued. OK, so let's just copy them over from the old version. I stop postgres, copy them over, and start it back up. It starts up ok. I reimport the databases and no more errors. Yay! I start the applicationsand pray to the SQL gods. So far so good, everything seems to work.
Thankfully the applications tested well and worked post upgrade.
All of this was done for a critical multi-million dollar healthcare platform. There were no official guides. Nothing. I had to find some random admin's blog for guidance. MySQL on the other hand has full documentation on just about every aspect of everything. The whole process was super hackish and not having any way to check database integrity would have been a show stopper for me had I designed this configuration.
My personal reason: While I haven't had to deal with a Postgres update at work yet, I've been running a pretty large Postgres 12 database in my homelab for a few years now.
My homelab projects mostly center around a "everything is an MQTT message" idea. Zigbee sensors, Tasmota power readings, OwnTracks locations, surveillance camera events, motion sensors for light switches, currently active app on my PC, status of my 3D printer, whatever my vacuum robots are up to and so on. It all gets recorded into a Postgres db. From there I can use it for data mining experiments, but mostly as a source for Grafana. I tried counting the rows but that query didn't even complete while I was writing this comment.
I like trying out all kinds of dockerized oss services, and I keep them updated using watchtower. I run a gitlab instance which is usually the most annoying service to update because it there's an upgrade path and post-start-migrations. With my Postgres instance, which is isolated from the internet, I'll have to figure out what the fastest way is to move all that data around, not leave a huge gap in the record and so on. Sounds like at least a day of work - and since it's technically all for "fun", it'll have to wait until it actually is that.
A good approach for this is to use pg_upgrade in-place, which should give you a downtime of a few minutes at most. (I have 800GB at work and would expect 1-2 minutes for this.)
I recommend installing PG12 on a temporary VM, duplicating the existing database, and test the upgrade in isolation.
And since I have backups, I might not even need the testing step, considering the low risk. Might do it anyway just out of curiosity at how long it would take to duplicate.
>Postgres 17.0 has been out for a bit and it’s awesome, but here’s the reality: most Postgres users won’t upgrade right away. Most probably aren’t even on 16.4 or 16.anything —they’re probably still using Postgres 15 or an even older version.
Here's how we did it at OneGraph (RIP), where we not only upgraded versions without downtime, but we also moved hosting providers from GCP to Aurora without downtime.
1. Set up logical replication to a new database server. We used https://github.com/2ndQuadrant/pglogical, but maybe you don't need that any more with newer versions of postgres?
2. Flip a feature flag that pauses all database queries and wait for the queue of queries to complete.
3. Wait for the query queue to drain and for replication to catch up.
4. Flip a feature flag that switches the connection from the old db to the new db.
5. Flip the flag to resume queries.
It helped that we were written in OCaml. We had to write our own connection pooling, which meant that we had full control over the query queue. Not sure how you would do it with e.g. Java's Hikari, where the query queue and the connection settings are complected.
We also had no long-running queries, with a default timeout of 30 seconds.
It helped to over-provision servers during the migration, because any requests that came in while the migration was ongoing would have to wait for the migration to complete.
In many orgs out there, the version that's picked when a project is started will stick around for a while.
Suppose you join a project and see that the PostgreSQL version used is pretty old.
Do you:
A) convince people to migrate it for some nebulous benefits (features that aren't currently used, performance improvements that will be difficult to measure in lieu of tooling and aggregated statistics) while also taking on the risks of breaking everything or even worse, data loss (since you won't always be able to provision multiple instances and backup restore might take a while and you could still mess that up)
B) or just leave it as it is and focus on anything else
Many will prefer to not get burned even if there shouldn't be that many risks with upgrading your average PostgreSQL install, which is why you'll get minor/patch releases as best, alongside whatever the runtime environment is getting upgraded.
Containers and bind mounts make all of this way easier, but then again, many places don't use containers.
So i've got a small but important app that I run on PG14 via a container. I literally just put the pgdata dir outside the container and I can keep the host OS totally fine and have an isolated db environment. it's a very nice way to not worry about the db.
Some might disagree (believing that the system package manager should be responsible for the PostgreSQL install), but I agree that the approach you’re using is one of the better ones.
That way you can update the database version when you want, while still installing the base system updates quite frequently. Plus, I think it’s really nice to separate the runtime from the persistent data, which such setups make obvious.
Yeah, upgrading to PostgreSQL 17 now would be weird unless you have some very specific feature you need in it and spent resources testing your application on the betas and rcs.
My team has upgraded several dozen databases from 16.x to 17.3. Went entirely smoothly. The thing is that we're running on a process of upgrading all dependencies every Friday, and then promoting to prod on Monday unless there are specific issues, so our definition of "would be weird" is the reverse from what you say.
(Granted, we have rather small DBs and simple applications where ON UPDATE SKIP LOCKED is about the most fancy feature we use.)
Postgres is the only thing on my Debian that doesn't seamlessly automatically upgrade across dist-upgrades, but instead leaves old versions around for me to deal with manually... which I seem to never get around to.
No, what I meant is that the install path for Postgres on Debian involves installing versioned packages. It's the only approved way of installing Postgres from the Debian repos that I'm aware of.
I think it's more about avoiding downtime (I just upgraded a pg with 1TB of data from v11 to v16 and I didn't notice any breaking changes). In an ideal world, every client of the DB should be able to handle the case where the DB is down and patiently wait for the DB to come back to keep doing its job. But from my experience, it's rarely the case, there is always at least 1 micro service running somewhere in the cloud that everybody forgot about that will just crash if the DB is down, which could mean losing data.
Related, but Sandstorm is an app-hosting platform/solution that's very different in design than other solutions like Docker, and one of the reasons is that it's actually designed from the ground up for easy and effective use on homelab-style setups, because among other things (not limited to this, and 100% from memory):
1. Sandstorm apps need to always come back cleanly from a raw SIGKILL at any moment.
2. Sandstorm apps must be able to always upgrade ANY previous version of stored data, with no intervention, automatically, when newer versions are deployed.
These are tough technical constraints and have huge implications on the whole design. For example, the client/protocol layer needs to possibly be multi-version aware for clean rollouts, client connection failover needs to be built in, etc. But the reality is if you do not have these two constraints, your software will never really work in a turnkey forget-about-it-way way where someone just runs an instance and then actually forgets about it, and it can be safe and secure. This kind of stuff is imperative to every actual computer user who isn't a programming nerd.
This is why slapping Docker on a lot of existing pieces of software like databases doesn't really work in the grand scheme. Yes, it gets you started faster, which people value to a high degree. But it often lacks or has the same problems as other solutions on day 2, day 3, ... etc. So, you need to solve all the same problems anyway just in a different form (or externalize them onto some other solution.)
Realistically, the solutions have to be designed in from the ground up. That's very difficult to do but necessary.
Someone else mentioned in here that SQL Server always 100% works when upgraded in place from an old version. That's also my (limited, homelab-ish) experience and a good reason to like it.
Huh, I never really thought of that as a unique property of Sandstorm, but you're absolutely right, this is essential to any software intended to be operated by a non-technical end user, and Sandstorm is aggressive about it (e.g. apps are always shut down by abrupt SIGKILL and whenever someone requested that we add some sort of clean shutdown notification, I refused).
Ironically, Sandstorm itself is still stuck running on Mongo version 2.6 (a decade old!) because Mongo can't necessarily be updated in an unattended way. Of course, we could never ask every Sandstorm user to do some manual process to upgrade it.
Meanwhile, newer versions of the Mongo Node.js client library don't support such old Mongo, which means Sandstorm cannot actually update that package nor any of the packages that depend on it anymore. And this is why Sandstorm is now stuck in time and no longer receiving updates. :(
It was clearly a big mistake for Sandstorm to use Mongo. We chose it because we built the UI on Meteor, which was really only designed to work wing Mongo. In retrospect I wish we'd used SQLite, whose backwards-compatibility guarantee has now reached twenty years.
Often lost in these discussions is how much more difficult upgrading is at scale. The article talks about the challenges with upgrading a 4TB database. In my world - that’s a small database.
Trying to setup a logical replica of a much larger, high write volume database is an entirely different story with its own set of challenges. In some cases it’s not even possible to do even with tricks like dropping and restoring indexes.
Logical still struggles to keep up with high write loads. When something like vacuum freezes kicks off it’s not uncommon to see logical replication lag for significant periods.
Then there are things like lack of DDL replication. While this can be mostly worked around - it adds complexity. And remember DB user management is DDL - so if you change a DB password on the primary it won’t replicate to the logical replica.
When you have CDC systems using logical decoding from the systems you are upgrading you have to deal with resync conditions related to the fact that the logical replication slot will not be replicated and the new replica will lose its place in the replication stream.
Most non-trivial instances have multiple streaming replicas for read offloads which need to be coordinated at cutover. While not difficult it increases the complexity.
In addition - there are no guarantees of application compatibility. While this is rarely an issue in my experience- PG 14 (IIRC) changed the function signatures of a bunch of array functions which was particularly painful.
That said - Postgres is improving the story here. PG 17’s ability to convert a streaming replica to a logical one and be able to upgrade it will be a game changer for standing up the logical replica. If they can get DDL replication into logical it will improve things even more.
That said - it’s still way behind how modern databases like CockroachDB and Yugabyte handle things here.
right? we're on 13 and only now starting to consider upgrading to 16, and I don't think we're very abnormal and I don't consider 13 ancient. We have customers still using 9.3. (That last one does actually count as ancient.)
Same exact boat. Logical replication is nicer than our 'in place' upgrades we did before, but still leaves lots of other issues.
We are constantly making changes to our schemas (adding tables, columns, etc). Its never an issue on physical standby's, it just gets created, but logical replication, we have to manually run the changes on the subscriber.
We have lots of instances where we create a new table for a feature coming, and alter another to add a column.
If those get missed on the logical subscriber, you don't even know until someone tries to write data to that new table or new column.
I know logical is supposed to be flexible, but I wish there was a setting to have a replica via logical, so I can handle upgrades easier.
If I run my application/code v1 right now, I generate data. I expect that if I move to application/code v2, I can leave my data in place and it will automatically apply changes to my data.
I do not get that with postgres. If I am on postgres 16, and I want to upgrade to postgres 17, I want to leave my data folder untouched. When I then start postgres 17, it should just work (tm).
It should also work with code that assumes postgres 16, so I can upgrade my database separate from my application. I can not wait 10 days for a large database to be migrated from 16 to 17 without being able to run it. However, I can wait 10 days before updating my code to support features in 17.
The current upgrade process does not give me such confidence in restoring data and uptime. So I don't upgrade until I really have to.
i ll tell you why from my end. I installed Postgres14 via homebrew many years ago on my Apple M1 mac mini. I searched a lot on how to "upgrade" this installation but found nothing. I have a few databases running with data on it which I can't afford to lose if something goes down for more than 1 hour. I wish someone would guide me on how to actually install a newer postgres such as v17 without breaking an existing v14 install or losing data since I am not an expert by any means
And this, kids, is a good example of why homebrew is not a real package manager, and why macOS is not the best at running production services.
I hope you have a backup somewhere else, not on the Mac Mini. One option would be to restore it on another machine on Postgres 14, follow the upgrade path to 17 (idk if you can jump directly but a few Google searches will cover this), verify the data, back up the new db, somehow upgrade Posgtres on your Mac, and restore the new back up. Done.
I don’t see how this is homebrew‘s fault? Homebrew lets you install multiple versions of PostgreSQL at once, which you need for pg_upgrade, just like you can do with APT
Forget about something as major as Postgres, I have trouble updating packages (that aren't install via pip/npm/cargo) on Linux all the time as a newbie. The experience is worse than Windows for some reason.
Hell, I have a hard time to tell the version of some system build-in binaries.
A few months ago, I have trouble to unzip a file which turns out ot be AES-encrypted. Some answers on SO [1] saying I should update my `unzip` to newer version but I can't find any updates for my distro, and I have no idea (still no, so feel free to teach me) to update it manually to make my `unzip` supporting AES. And all the versions, the good and the bad, all say they're "version 6.0.0" despite they behavior obviously differently.
> I have trouble updating packages (that aren't install via pip/npm/cargo) on Linux all the time as a newbie. The experience is worse than Windows for some reason
If you haven't installed them via your programming language's package manager, you either installed them manually or via the OS package manager. The first one you'd know how to upgrade, and for the second you can ask it what version it is and what version is available to upgrade to (for compatibility reasons it might not be the latest, or latest major, unless you use the software vendor's own package manager repositories).
It's actually much easier than in Windows, because you have a piece of software (package manager) that is your one stop shop to knowing what is installed, update it, check versions available, etc. unless you've manually installed stuff.
In Windows you... google and download random
.exes? Cool. As good as the worst possible option on Linux.
Not exactly random. It's not hard to tell which website is official 7-zip website. Also choco and scoop exist on Windows.
> As good as the worst possible option on Linux.
I understand this is not a fair comparison, but in practice, they're not as easy. When using Windows, I usually use it with a proper GUI interface, so popping up a browser and download the newest installer for a software from their official website would take me less than 1 min.
Doing similar for my Linux VPS with only a terminal is much more complicated.
Of course, if the package you need isn't available in the standard repos, then you'll need to look elsewhere (e.g. PPAs or third party repos). There's also options like Flatpak and AppImage if you want something that's a bit closer to how you'd choose to install new releases on Windows.
If I wanted to update all of the installed software I have on my Windows install, there'd basically be no way for me to do this, outside of shady update manager software.
At the same time, I get the appeal of being able to just download a new release and install it, both AppImage on Linux and the way you install software on macOS (just drag the file into Applications) seem similarly pleasant to me in that regard.
To expand on the latter (the response got deleted), you can very much do something like https://peazip.github.io/peazip-linux.html (I wish 7-Zip was available on Linux natively, but as far as GUI software goes, PeaZip is pretty nice) however that's not the most common approach. You should generally prefer using the package manager when you can.
On average, the experience of upgrading/managing packages is obviously much better than Windows.
I meant to say in certain cases (like the `unzip` example I mentioned above), when the system's build-in package manager fails, I seem to not be able to find alternatives like what I did on Windows (just find the piece of binary I want and manually install it). I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
The author in this answer clearly has a version of unzip that can detect "AES_WG". Unfortunately they only vaguely said (in one of the comment) "Since then the main Linux distros have added patches to fix various issues" and didn't specify which distro.
I'm replying to the "the worst possible option on Linux", i.e. when the said software is not available in package manager. 7-zip is just a (bad) example; since you can install 7-zip using `choco install 7zip.install` on Windows too.
I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
> I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
In that case you would just copy the download link and paste it into your terminal session. It's rarely needed though as most software is available through your distribution's software repositories.
I enjoyed this thread (and the article) as I'm preparing to upgrade several Postgres databases from 14 to 16, which should take about 25 minutes or less.
My experience has been that most people (including devs) hardly think about their database at all. Everyone just takes the database for granted until a dev runs a bad query that locks a table. Devs let their ORM manage everything for them and don't take the time to think for themselves.
Also I rarely see teams with a culture that prioritizes maintenance, even for their own software. It's common for teams to become wary of upgrading because they do it infrequently. Like most things, the more often you perform an upgrade the easier it is. Smaller changes are easier to revert. The more often you do something, the better at it you get. Delaying maintenance often just increases the work later.
I'm still on 9.x in some systems and it's running great. I find this kind of sentiment a bit weird anyway: PostgeSQL 17 has been out for a couple of weeks, I'm certainly not in a rush to upgrade anything unless I need to. Never touch a running system is as valid as ever and on top of that I'm not a full-time DBA itching to upgrade as soon as possible. With containerization it's also more common to have multiple and right out many DB instances, I won't be going through all of them until someone requests it. Security updates is a completely different matter, but major versions? Don't get the rush, the developers will come around sooner or later.
Why? Because it's risky when a lot of business critical data and processes rely on it. I'm just happy I was able to convince my boss to let me upgrade our 9.1 instance to 14 two years ago. Now to upgrade those two pesky 9.4 instances.
I guess I'm lucky and 10 minutes of downtime is not a problem for my customers during off-peak hours, every year or so. Upgrading has always been a painless experience with just a pg_dump and pg_restore on the upgraded db server.
Ancient version like 15 or 16? I have recently encountered a PostgreSQL 12 instance.
Once the database runs you are inclined to let it keep on running until you must do an upgrade (Bugfixes, EOS) . Upgrading for new features/ improved performance is nice but can be a pain and is prone to cause downtime. PostgreSQLs upgrade tolling story is also a bit painful but it has proven to be an awesome way to semi-regularly test my backup infrastructure.
Why don't people upgrade? Why don't we ask why upgrading is necessary? I understand that Postgres is free software, and if you choose to use it, you're signing up for whatever upgrade cadence the developers give you. But really, I with the developers would spend a bit more time patching "old" versions. 6 years is really too short a time to have to do an upgrade, especially if you have many database servers.
I use old postgresql versions because they work perfectly for my use cases- there is no upside to upgrading. In general I don’t upgrade software that works unless there is a good reason to.
Unless I'm mistaken table inheritance is still a thing in current PG versions, in terms of partitioning at least it's just less commonly used in favour of declarative partitioning since it's easier to manage.
It's been a long time since I worked with v9.x in anger, so I could well be forgetting things though
Our problem isn't the inheritance feature itself-- it's the removal of the config knob `sql_inheritance` which lets you change whether inherited tables are included in queries by default or not (behavior controlled manually by the * notation). It's a goofy behavior nobody liked, but we have a mountain of SQL that depends on that knob. It's fixable... Just a pain :)
(Neon employee) We auto-upgrade minor versions as long as they can be done autonomously. For major versions, you're right it's still manual but we're working on improving that. Here is our version policy: https://neon.tech/docs/postgresql/postgres-version-policy
One of the many things I love about Mongodb is the upgrade process. Shutdown server, delete old exes, copy new ones and start server. Done
Any needed updates to databases are done by the new exe, automatically.
Obviously backup everything before you do the upgrade.
I've never had one fail.
If there is anyone from Neon watching this thread, is there a way to suggest updates to the pgversions website? It currently pins Xata to 15.5, which is true for the shared cluster environment depending on the region, but one can start dedicated clusters up to 16.4 at the moment.
If you're on something like RDS, major version upgrades are pretty easy with Blue Green deployments. You can do it with just seconds of downtime and pretty low risk because it uses replication under the hood and handles the cutover for you.
Ha, I run arch on my dev machine and they just LOVE upgrading postgres. t has always been a pain until my projects all got their postgres in a docker container. I just don't like to do extra steps for the upgrade process between major versions.
pg15 is not ancient at all and it has been a part of latest debian hence I don't see an issue. Same with python or any other dependency. You rarely need the newest shiny these days and being within the support window is fine
Because if it isn’t broken, don’t fix it and nobody Wants to be responsible with the production database going down for any significant period of time, Or worse loss of data
I wouldn’t upgrade major versions until the x.2 version is out. New major versions come with new bugs and I’d like to wait two minor versions until I start using the image in production.
As a counterpoint a lot of managed postgres providers offer automatic upgrades. Upgrading Aurora Postgres is very easy and we just schedule it in once or twice a year.
Because they already work great, you are unlikly to get forced to upgrade, and they are the part of your system doing the most critical work under the heviest load.
Honestly, I've aside from React and Java (8 -> 21 is big but still not that big), there's very little software that I updated and noticed a major step change difference in the system. Once it works, its fine
Postgres and mysql usually have changes in each new version that are important enough to motivate an upgrade, whatever it is new features or better performance or both. Although it really depends if your are using the features they are improving or not (e.g. if you don't use partitions, well of course that 30% perf improvement on write operations on partitions won't benefit you).
You can check this article about Uber migrating its Mysql from v5 to v8 posted here 3 days ago [1]. Among other things, they observed a "~94% reduction in overall database lock time." The before/after graph is pretty impressive. It also gave them window functions and better JSON support, which are two very big features.
It’s always a trade-off of how much you’ll get from an upgrade, vs. how much time, effort, and pain you’ll have to invest to do the switch. Postgres is at a stage, where a single version can easily outlive the duration of the software you’ve built with it. Let’s be honest here, a vast majority of software doesn’t need a fraction of the requirements we think it does, frankly, because no one uses it to the extent that latest and greatest features would make the experience more pleasant.
If PostgreSQL has replication, why are they talking about "minimal" downtime? Is there no quorum strategy that delivers high availability? I don't know as much as I should.
Writes happen on your primary. At some point, you need to stop accepting writes, wait for the replica to fully catch up, reverse the replication so the replica is the new primary, then direct writes to the new primary. That's hard to do without any downtime.
There's no option where the nodes all accept writes.
That still only has one primary: you can't just start pointing writes at the upgraded standbys. Synchronous replication (besides having its own downsides) just shortens the time needed for the replicas to catch up. You still need to perform a cutover.
Exactly this. At my old employer we (the sysadmins/operational DBAs) were pushing for newer, supported versions of Postgres. The push back was always the regression testing cycle, amount of desired change and willingness of the product owner to even engage. The testing cycle was so long that I tried to convince them to test on Postgres beta, because it would well and truly be prod with a few bugfix releases by the time testing was completed (alas, they went with n-1 release of Postgres instead).
I've always found it fascinating that there is a vocal contingent at HN that seems to legitimately hate advertising. But then an article like this turns up that is obvious advertising and is also a good article - we get a nice summary of what the major performance features over different postgres versions are, and some interesting case studies (I'd never even heard of the NOT VALID option although apparently it is nothing new).
This is something I've heard called "permission marketing". The idea is that you show genuinely useful ads to only the few people who will benefit from them, rather than indiscriminately blasting millions of innocent bystanders. Then these few people will actually welcome your marketing efforts.
The classic example is advertising a new improved fishing reel in a fishing magazine. People buy the magazine (well, 20 years ago they did) because they want to know about things like new improved fishing reels.
It's a world away from the overwhelming avalanche of bullshit that is modern advertising/spam. There's nothing at all weird about hating advertising in general but being ok with permission marketing.
If you follow this idea further you'll find that very few people, even the most vocal, genuinely hate advertising. We all want to know about useful products and services. We just don't want to see a million ads a day for Apple, Coke, Pepsi, Nike, erectile dysfunction, fake single women in your area, Nigerian princes...
Because when it reaches a certain scale, and when too many psychological tricks are being played, and everything is always, BRIGHT, BIG, hyper-sexualized, when you can't walk down any street, watch anything, read anything, without seeing people richer, smarter, younger, sexier, happier than you, it goes far beyond just advertising. It's brainwashing. It has to stop because it's extremely unhealthy for our societies, our mental health, our children.
Well said. If I'm reading about fishing, for instance, an ad for a new piece of fishing gear would not be too annoying, as long as it isn't too intrusive (like popping up in the middle of my reading).
But when I'm watching a YouTube video, having the video cut mid-sentence to some hyper-annoying and unrelated noisy ad simply angers me and makes me look for an ad-blocker.
I rarely see much objection to contentful 'advertising' like this. Anyway, the answer really is that it's fully handled by submission/voting/flagging mechanisms, doesn't matter what anyone might say.
Yes but Neon databases is a funder of Postgres development. So I'm interested in hearing what they have to say. If they're advertising then I think helping open source is the right way to go about it. To me it sounds like they just want to make sure people benefit from all the money they're spending.
The RDBMS has not fundamentally changed much since the 80's. It's the same basic design inside and out, with a shitload of tweaks and optimizations. Don't get me wrong - you can get very far with tweaks and optimizations. But the foundations root you to a particular model, and some things in that model will always be painful.
The important question to me isn't why don't people upgrade. It's why do people run Postgres?
Is there no other kind of database? No, there are hundreds of different databases. Is it because Postgres does things fundamentally different and better than anything else? No, lots of things work somewhat like Postgres, with pros and cons. Is it because you can't do things without Postgres? No, there are always alternatives that you can make work. Is it because it's impossible for it to keep working without upgrading? No, any database with long-term support could continue working without upgrading.
So why use Postgres at all? Answer: it's the incumbency, stupid.
When a big fucking honking giant "thing" sucks up all the air in the room, provides for most of the use cases, and is accepted by all the users, then it's extremely hard to justify not using it. Incumbents usually win, even if they're old, disreputable, buggy, and annoying. Even if they're missing some obvious features other things have. Even if people loudly complain that they want change. It's just more annoying not to use them.
We're used to them. We've adapted. Trying to do something else is going to be annoying and hard. So we stick to what we know. But that doesn't mean we have to stroke the balls while working the shaft. Upgrading isn't exactly fun. It doesn't even buy us anything, other than the promise of "support", or "compatibility" with whatever else will eventually require it. So we upgrade, eventually, when we must.
But the constant mind-numbing march away from entropy isn't a fantastic reason to keep using the thing. When you have to convince yourself to stick with it, it's likely you're in an toxic relationship. If you're honest with yourself, you'll agree that it's time for a change. You deserve better.
But switching is full of unknowns. Ominous. Time-consuming. Hard. You know you want to, maybe even need to. But there's too much to lose. So you stick to the familiar, to what you can handle. Maybe something better will come down the pike soon. You even tell yourself you're lucky to be here. But deep down you know you're just comforting yourself. You wish you had something else. Something that brings you joy. Something better. Something... right. But that something isn't here right now. So until Mr. Right gets here, you'll stick with Mr. Right Now.
Time to get ready for the next upgrade..... sigh...
Seems perfectly reasonable to me. The article is about people not upgrading from older versions. One could imagine that PostgreSQL 13.0 is the “pillar” of some company, that their whole system relies upon. The article then goes into detail on what they are missing out on by not upgrading PostgreSQL to a more recent major version, and why it might be that so many stay on ancient versions, and also how you can actually perform major version upgrades of PostgreSQL.
Yeah, I mean it's not hugely imaginative, but it kind of makes sense, if you want to describe pg 13 as "ancient" that you use some kind of Greek/Roman temple as a visual analogy for that. Not particularly well executed - a serif font might have helped to drive the "joke" home - but nobody's trying to win an art contest here.
I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)
It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.
Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.