Hacker News new | past | comments | ask | show | jobs | submit login
Leaving MySQL (sesse.net)
626 points by sammorrowdrums 53 days ago | hide | past | favorite | 341 comments

> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.

Wow, right on the chin with that one.

> More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases)

I only have anecdata of my usages of MySQL and Postgres but I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing.

Yes Postgres can be slower out of the box and yes Postgres has worse connection handling that usually requires a pooler but the actual engine and it's performance makes it worth it in my opinion.

The author was rather focused on the optimizer/scheduler. I can easily believe that it is much less advanced and robust, and messier, in MySQL. I really liked MySQL (and more recently MariaDB), and I have experienced some significant annoyances with Postgres recently, and it's for a completely different set of reasons.

Logical Replication is the big one. MySQL has had logical (and "mixed") replication for a decade (or more?). PostgreSQL has gotten built-in logical replication only recently, and it's still very annoying. It doesn't replicate schema change statements, so those have to be applied independently by some external system, and of course it's not going to be timed exactly right so replication will get stuck. And somehow for very lightly active applications generating just a few MB of writes per day, the stuck replication log can back up 10s of GBs per day. And does logical replication across versions for zero-downtime upgrades work in Postres yet? I dunno.

Connections you mentioned already, Postgres needs an external connection pooler/proxy, MySQL not nearly so soon. Vacuuming is a problem I never had with MySQL/MariaDB of course.

I'm an infrastructure dev, not an DBA or an advanced SQL user, I never used window functions, or wrote a query that filled half the screen, or have a thick web of foreign keys ... If you keep things damn simple so you have confidence in efficiency and performance, then MySQL works pretty damn well in my experience, and Postgres, for all its academic correctness and advanced query feature set, has been pretty damn annoying.

And, you can tell MySQL to use a particular index for a query!

This is about where I'm at. I've used both DBs and adminned sites with them. Sure PG has a ton of nice SQL features that are really handy if you want to do any kind of more advanced analysis. I wouldn't be surprised either to learn that it's much more efficient on executing complex queries. But the bottom line is that the great majority of production usage is basic CRUD queries using simple SQL, and a lot of applications will never need to use advanced SQL. The things MySQL does do better, replication and connections like you mentioned, tend to be more suited to keeping big dumb basic sites up and running reliably.

I'm usually gonna reach for Wordpress first along with MySQL if I need to set up a basic blog for non-technical people to contribute to. Maybe the DB isn't as cool as it could be, but it works fine, and I hardly ever have need to touch the database itself directly.

it's prety much a 90%-10% issue: Mysql is quite good at the 10% of things regular users need to do 90% of the time. A website that performs simple selects and inserts, for a few hundred clients a second, only cares about speed and consistency, and MySQL delivers.

When the client count and the business complexity explode, things start to get hairy fast, but by that time the investment in MySQL is hard to walk away from. You will start to hit all sort of edge cases in other DBs - which MySQL did not exhibit simply because your application grew around its feature set and limitations.

> When the client count and the business complexity explode, things start to get hairy fast, but by that time the investment in MySQL is hard to walk away from. You will start to hit all sort of edge cases in other DBs - which MySQL did not exhibit simply because your application grew around its feature set and limitations.

I'd like to point out that this seems to happen whatever db you move from and to.

Sometime over a decade ago I was part of a team moving from an ancient version of Sybase Adaptive Server Anywhere to the newest Microsoft SQL server running on a beefier server.

We ported everything carefully and tested it out. We also felt good since both where dialects of T-SQL.

...and then we ran right into so large performance problems that we had fall back to the old system.

The reason was that at least older models of had Sybase ASA had implicit indexes.

Feels like there is always something.

I am pretty sure that is also a 90% - 10% issue as well. 90% of user's App complexity dont grow out of those MySQL comfort zone.

I've seen quite a few codebases where the complexity of the problem did but they basically bodged around it in the app tier in order to keep the SQL within what MySQL could handle - and it's -very- easy to not even notice you're doing that because it's a technical debt/death by a thousand cuts sort of experience when it does happen.

Then again, depending on your scenario, introducing a bunch of extra complexity into your app tier code may be less annoying than introducing a bunch of extra complexity (connection poolers etc.) into your infrastructure.

So it ends up that there's a bunch of workloads where mysql is obviously the better option, and a bunch of workloads where postgresql is obviously the better option, and also a bunch where they're mostly differently annoying and the correct choice depends on a combination of your team composition and where your workload is likely to go in the next N years.

A lot of applications don’t use more advanced SQL simply because it’s not available.

Just as an example, there are so many people who assume they need a 3rd party search engine simply because what’s built into their database is so bad compared to the options you get with PG.

You’re welcome to only use your database as a crud data store while you manage a mountain of other tools to work around all of the limitations…but you don’t have to.

> I'm an infrastructure dev, not an DBA or an advanced SQL user, I never used window functions, or wrote a query that filled half the screen, or have a thick web of foreign keys ... If you keep things damn simple so you have confidence in efficiency and performance, then MySQL works pretty damn well in my experience, and Postgres, for all its academic correctness and advanced query feature set, has been pretty damn annoying.

At that point you're effectively using MySQL as a key/value store with an SQL interface, rather than a full RDBMS.

Which is, in fact, pretty much exactly what MySQL was originally designed to be and it's extremely good at it.

> If you keep things damn simple so you have confidence in efficiency and performance

I can have that level of confidence on a minimally tuned (albeit sensibly indexed, you still have to make sure the indices you need -exist- on anything, realistically) postgres at levels of schema/query complexity where getting mysql to handle it -at all- would be ... an experience.

Having used both extensively, I'd say that there's a lot of knowledge when using mysql that's basically "introducing complexity into your application and query pattern to work around the optimiser being terrible at doing the sensible thing" and then there's lots of knowledge when administrating postgresql that's "having to pick and choose lots of details about your topology and configuration because you get a giant bag of parts rather than something that's built to handle the common cases out of the box".

So ... on the one hand, I'm sufficiently spoiled by being able to use multi column keys, complex joins, subqueries, group by etc. and rarely have to even think about it because the query will perform just fine that I vastly prefer postgres by default as an apps developer these days.

On the other hand, I do very much remember the learning curve in terms of getting a deployment I was happy with and if I hadn't had a project that flat out required those things to be performant I might never have gotten around to trying it.

So, in summary: They've chosen very different trade-offs, and which set of downsides is more annoying is very dependent on the situation at and.

> Having used both extensively, I'd say that there's a lot of knowledge when using mysql that's basically "introducing complexity into your application and query pattern to work around the optimiser being terrible at doing the sensible thing" and then there's lots of knowledge when administrating postgresql that's "having to pick and choose lots of details about your topology and configuration because you get a giant bag of parts rather than something that's built to handle the common cases out of the box".

This completely maps to my experience as well. I still believe Postgres' defaults being set to use an absolute minimum footprint, especially prior to PostgreSQL 9, significantly impacted it's adoption. It's better now, although IMO it could still do with an easier configuration where you could flag the install as dev/shared or prod/standalone to provide more sensible settings for the most common situations. Like, gosh, I'd like to be able to stand up an instance and easily tell it to use 90% of the system memory instead of 100 MB or whatever it was.

But the MySQL issues were much worse. Outer join and subquery problems, the lack of functions like ROW_NUMBER(), a a default storage engine that lacked transactions and fsync(), foreign keys that don't do anything, etc. I've met so many people who think MySQL limitations are RDBMS limitations, or don't understand database fundamentals because MySQL just didn't implement them properly. Then again, I also remember when the best reason to use MySQL was the existence of GROUP_CONCAT(), which everything else lacked for religious reasons.

I also vividly remember prior to MySQL 5.5 when new users of PostgreSQL, Oracle, or MS SQL Server would discover that, instead of making a guess for what you wanted, the RDBMS would actually return error messages and expect you to understand and solve them deterministically. And somehow they would be angry about it! Old MySQL (mostly 3.3, but 4.0 and 4.1, too) used to silently truncate data, or allow things like February 31 in a date field, or silently allow non-deterministic GROUP BY, or only reporting warnings when you explicitly asked for them really undermined the perception of MySQL to database folks. This wasn't that long ago, either.

We had tried to do a logical replication-based strategy for Postgres to achieve no-downtime version upgrades during business hours, but there were so many gotchas and things you have to do on the side (stuff like sequence tracking etc), that we just gave up and took the downtime to do it more safely.

I think Postgres is really cool and has a lot of good reliability, but I've always wondered what people do for high availability. And yeah, pgbouncer being a thing bugs,

It feels like we're getting closer and closer to "vanilla pg works perfectly for the common worklfows" though, so tough to complain too much.

I have written an article about upgrading PSQL with no/low downtime with Logical Replication. (More like a note to my future self)

See if you can understand enough of it and consider doing it again for next upgrade. (I have done that for 12 > 13)

So to be honest I see that article (linked below) and I think "Yeah that's probably all right", but given this is for a Real Sytem With Serious People's Data, when faced with "take the downtime and do the offline data upgrade that _basically everyone does_" and "try to use logical replication to get a new database in the right state, knowing that logical replication doesn't copy _everything_ so I need to manually copy some stuff, and if I miss stuff I'm in a bad situation"...

I trust the pg people that tools do what are advertised, but it's a very high risk proposition

Maybe you could give a link to the post

Whats the link?

This is what happen I don't have enough sleep...

I forgot to post the article link!

Thanks for posting it.

IMO logical replication is too slow for Production. At scale you'll be doing physical replication and rarely using logical replication as it is quite slow in comparison still (As of PG 13.2). Logical replication is helpful when doing a Postgres version upgrade primary/replica switch, but not useful many other places at the moment.

You do need to use an external connection pooler (pgbouncer etc), though Postgres 14 has made some large improvements in gracefully handling large numbers of connections, though you can still run into problems.

As for the query planner/optimizer, so far in all of the optimizations and improvements I have worked on, I've only run into 1 or 2 that had a query plan that made my head scratch. There are some extreme edge cases that will prompt the cost function to prioritize the wrong index, but in production I have found that 99%+ of slow queries can be easily improved by a rather simple composite index. One thing I do love about postgres is using partial indexes, which can significantly reduce the amount of space required and also make it extremely easy to create indexes to match predicates of a function, while indexing other columns.

Other than one or two slow queries I've tracked down and worked on, I've never wished that I could "hint" to use a particular index over another.

Now some of the things that people like for Postgres over Mysql, in practice aren't that great at the moment. People like doing atomic DDL operations, when in reality locking the table schema can cause lots of problems, and in production you only add indexes etc concurrently.

You still get the issue of dead tuples in mysql and need to periodically clean them up using OPTIMZE TABLE et al, though postgres and innodb have different designs, but ultimately it needs to happen sometime. Its just that postgres IMO requires more tuning to get the right balance.

> If you keep things damn simple so you have confidence in efficiency and performance...

... then SQLite might be the right choice for you?

In all seriousness, so many use cases are met by SQLite on the low end, PostgreSQL on the high end, and dedicated KV stores like Elastic off to the side, that what is left for MySQL? When you have a write-heavy workload on a multi-TB dataset that needs to be sharded, with a schema that is "simple" but not so simple that it works with an actual KV store?

... unless I want logical replication, a few hundred concurrent connections, and a more strictly typed schema than a KV store ...

> And, you can tell MySQL to use a particular index for a query!

I didn't know this was a thing or even common until this year and it opened my eyes to some major issues when the PG optimizer doesn't get a query right you have no escape hatch.

That being said, and me being weak in MySQL, it wasn't obvious to me if this is very common and how often you _have_ to tell MySQL what index to use because the optimizer misses it...

> it opened my eyes to some major issues when the PG optimizer doesn't get a query right you have no escape hatch.

For me the biggest issue with that is that the optimizer decision can change basically at any time. If the stats start preferring one index over the other at 2am, you can't just say "no, trust me, use this one instead" - you have to find out exactly which value crossed some threshold and tweak the stats to restore the previous behaviour.

> it wasn't obvious to me if this is very common and how often you _have_ to tell MySQL what index to use because the optimizer misses it...

I'm not sure how representative my sample is, but in the last 20 years, I've seen a hint used two times and one of them was an app bug (the forced index was not better, the order of columns in it should've been reversed and the hint removed). But I haven't worked with super-complicated reporting queries.

> PostgreSQL has gotten built-in logical replication only recently, and it's still very annoying. It doesn't replicate schema change statements, so those have to be applied independently by some external system

I’ve been using PostgreSQL built-in replication since 9.1 (when it was introduced) and this is definitely not my experience....

In fact it wouldn’t be possible to apply out of replication changes because (unlike MySQL) Postgres replication forces immutability (the replica nodes cannot become r/w).

The "issue" could potentially be that it's async? But I use postgresql a _lot_ and I've never experienced this.

They're referring to logical replication, which is a relatively new feature.

The original mode of replication merely ships the WAL, which describes binary changes to data files. Every change comes in the form "set page 32435 in <file> to <bytes>", with no distinction between the operations that the changes represent.

Logical replication describes the changes logically. A change may be "insert <column values> into <table>" or "add <column name> of <type> to <table>". Since this format isn't tied to the underlying database file format, it offers greater compatibility between PostgreSQL versions. It also allows other applications to process the log and do things like listen for data changes and ingest them into other databases, queues, etc.

> I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing

I seriously trialled and compared Postgres vs MySQL at the start of a major major project, and MySQL had a few clear wins for us (mainly replication) while the features Postgres had weren't in our current roadmap's requirements (the biggest regret that caused me was not having stored procedures). That was in 1998/1999. I now seem to be stuck on a career path where everything is MySQL and switching to Postgres anywhere seems to have huge back pressure from everyone I work with - even though at least half of them know damned well (like I do) that MySQL hasn't been the right choice for a couple of decades almost.


I could tell a similar story from 2015. Worked at a MySQL shop where we had several big clusters with bi-directional replication between two "masters", and used failover between them if anything died. Also replicated to a 3rd node which was read only (used for huge queries, backups etc). Sounds ok, but our developers were not good at writing SQL or designing schemas, and our ops team didn't have any DBAs. So all of the problems were our own fault. But a new senior engineering hire told us that Postgres would fix everything because MySQL was a shitshow. He demonstrated amazing benchmarks on a specific database we were having issues with. But OMG once we tried to get replication working... Bloody awful. Major internal fighting ensued. Another engineer decided to build a clean MySQL and tune it using best practices, and demonstrated comparable performance to the Postgres demo. That was it for Postgres, and a major project kicked off to fix the MySQL problems (switch to innodb, turn on GTIDs, fix bad SQL that was breaking replication, use row based replication, better indexes, monitoring, actual metrics from the underlying hardware etc). Who'd have thought you had to spend time on your databases? Crazy right?

Yeah. Interesting to see the replication results came out the same 15 years apart.

And I feel seen on the devs with no database skills thing. Every single database problem I can recall having to step in and help fix in the last ~5 years has been Java devs leaving everything up the the ORM to deal with, resulting in unindexed full table scans and similar brain damage. I "fixed" a production problem with a query that'd crept up to over 30 seconds without anybody noticing, by adding a single index to a single table. (And then sent the devs to work out how to give Hibernate the hints it needed to create the schema properly and set up more monitoring so once-weekly reporting SQL queries taking 10+ seconds wouldn't be hidden in 95 and 99 percentile alerting...)

Just sticking with your old horse, has its problems, but you wont get surprised by new ones

And the fact that the community and tooling surrounding MySQL is absolutely bonkers massive. It's not that the grass has been trodden down to a path; no there's a 6 lane highway with a rest area and a Cracker Barrel.

LOL amazing metaphor.

The one thing that I've learned recently which surprised me- if you want to cluster MySQL (via partitioned tables or PlanetScale, etc) you cannot use foreign keys.

Some of the Postgres compatible tooling (Citus, Cockroach, etc) appears to support foreign keys across partitions / shards.

I was quite surprised to see this. Makes sense that not allowing FKs makes the data easier to shard but I never imagined that would be a thing you would have to contend with.

I also ran into this recently when looking at SingleStore! I was surprised when they told me I couldn't use foreign keys, and was even more surprised that they were surprised that I do use foreign keys. They acted like I was the only one. And to be fair, given their clientele, I might be the only one!

At "scale" you just can't afford to use FK, it's indeed slowing down everything and preventing good and fast clustering / replication. For some use cases you just have to live with it

Yup for sure, that's what they were saying. It makes sense, but I was still surprised by it. I've never had to manage something at this scale before, so it was news to me.

For reference, we're at about 250 million rows, so not even proper "scale" scale!

citus has some rather severe limitations around foreign keys. Namely, you can’t use them from any distributed tables. See https://docs.citusdata.com/en/v10.2/develop/reference_ddl.ht...

  > Yes Postgres ... has worse connection handling that usually requires a pooler
This is the issue. The vast majority of PHP (Wordpress runs ~30% of the internet or something) and Python (Django) devs are not going to write a database connection pooler. I happen to like PHP and love Python, but the low bar to entry on those two (PHP especially) means that the vast majority of devs would be completely lost without the default tooling provided by e.g. php-mysqlng or other prepackaged database connection modules in their default settings with the most trivial of use cases.

A slow database can be ignored for a while. The need to add additional code in which the database is no longer a black box can not be ignored.

It usually comes down to setting up a pgbouncer in front of the DB, from there you can mostly ignore the pooling layer in your application.

It sure is one more step that for MySQL, but we're talking production level DB, and in my experience it's a small task compared to the rest of what you will be doing to have your architecture properly running.

But pgbouncer comes with it's own set of bugs and behaviours to be familiar with.

Just look at the changelog for the bugs fixed (and yes, there are unfixed bugs as well): https://www.pgbouncer.org/

And the FAQs for the unusual behaviours to be aware of and find workarounds for: https://www.pgbouncer.org/faq.html

I would love nothing more than to be convinced. I find MySQL to be adequate, but far from great. Postgres is described as great by everyone who uses it, however I prefer adequate but reliable over great but leads to downtime.

On your main point, yes, you need to understand how pgbouncer works, it has its quirks and limitations. In particular a dev needs to be aware of how sessions and transactions work if they have some critical transactions. When introducing pgbouncer on one of our rails project we had to fine tune ActiveRecord to play nice, but once it's done there wasn't much to adjust from there.

I'm with on how much it would help if it had a magic mode without these quirks, but in my experience you're still ahead of MySQL and its per session configuration quirks for instance. On a day to day basis, you'll rarely be scratching your head about how to deal with the pgbouncer pooling behavior you chose, an you'll still have the option to directly hit the DB on anything you don't trust the pooler to deal with appropriately.

For the bugs, those are mostly on the security and configuration side. For the few years I've used it, I never hit a behavioral bug TBH.

Terrific, thank you

> Yes Postgres can be slower out of the box

Is it though? I read this post earlier this year and it mirrors some similar tests I did back in 2020 when I was just checking basic perf between PostgreSQL / SQL Server / MySQL on my laptop.


> A simple conclusion: Postgres engine family is about twice as fast as MySQL engine family, except MariaDB.


> Postgres has worse connection handling

Is this addressed in PG14? Or is it still to come in PG15? I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.

> I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.

pg14 improves the performance for large numbers (thousands) of connections. IIRC there is some work to add a primitive connection pooler as an option, but I can't find anything references on that ATM.

So the situation is improving, but the fundamental connection=process model is so baked into Postgres that it's unlikely to ever change.

Naive question (I only use PostgreSQL and SQLite): is there any point using MySQL over MariaDB nowadays?

MySQL 8 has a lot of functionality that MariaDB lacks.

People like to criticize MySQL for some bad engineering decisions made decades ago. For whatever reason, often these folks have a more positive view of MariaDB, but without stopping to consider where all those original MySQL engineers went. FWIW one of the footnotes in the original article touches on this too.

Pretty much only three reasons: you want Enterprise support, you don't want to think about license terms, and you really want to help pay for Larry Ellison's personal island.

Last I checked mariaDB still misses some functiinality around JSON, and i don't know what the story is exactly about compat. With the newer X protocol and its collections.

> Is this addressed in PG14? Or is it still to come in PG15? I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.

This is by design, postgresql doesn't come with a connection pooler. For each connection a separate process is started. If you have a lot of mostly idle connections then you should use use pooler in your application, server side (e.g. pgbouncer) or both (there are trade offs).

These are not typical workloads for databases, and I don't think very useful

Pretty surprised about the substantial difference between MariaDB and MySQL.

Maria has some additional, interesting settings around replication I saw this week.

Optimistic mode for parallel in-order replication



I was at the Perl conference in maybe 1999 or 2000 where there was some guy giving a MySQL talk. It wasn't Monty, but he was in the audience. I asked a question during question time about enabling ssl for encrypted replication, and the guy giving the talk said "Sorry, I don't know if/when that'll happen". Monty stood up and said "I've just done that. I'll check it in tonight, come grab me out in the corridor if you'd like me to mail you the patch."

I then spent the next 7-8 years running replicated MySQL databases in SF/Dallas/London/Singapore/Sydney. It worked really well for us.

All this is showing is that the default settings for MariaDB Docker image are a better fit for this person's particular machine specs than the default settings for the MySQL Docker image. IMO it's not really a useful comparison at all, for any real-world use-case whatsoever.

When working as an SRE every place that I worked at that used MySQL had at least one situation where MySQL database got corrupted itself without any damaging events (i.e. no broken hardware, no disk filling up (but even that shouldn't corrupt data), no machine crash or hard restart).

This exact situation keeps me up at night.

Fear of a db not coming back up in a restart and backups being corrupt as well.

We pull our backups into new instances daily and I think that would catch it.

> I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing.

Yeah, I am one of those people. Needed a database in 2005, and MySQL was the de facto choice. Got used to it and never ran into problems that couldn’t be solved by getting better at schema design and indexing.

I never felt limited by MySQL and I am very comfortable with it, so never felt the need to try anything else. I might be missing something, but there is an opportunity cost in switching without a real motivating reason.

Vaccuuming sounds much more pleasant than dealing with innodb without file per table?

Yes I would rather have the ability / requirement to vacuum than have unused space in large table files especially one large innodb file.

Unless MySQL changed it since I last looked, the one large innodb file is a big PITA, but otherwise you can do OPTIMIZE TABLE to compact tables; apparently this can sometimes be done in place, but I'm used to it writing to a new file so you'd need room for old and new; and at least for MyISAM locks the table for the whole time, so I hope you've got a way to manage that too, but that's about the same as VACUUM FULL.

Innodb can be split into individual per table files and has had this ability for years.

Yeah, but the default used to be one file for everything, and it's hard to split things after you started with that. Especially if your data is sizable.

The default innodb_file_per_table changed to ON in 5.6, released nearly 9 years ago.

It's not really that hard to change if you started with OFF, it just requires a logical dump and reload (typically on a replica, followed by some re-cloning and promotion). Companies with sizable data tend to have database engineers who can handle this fairly easily.

In any case, it's not an urgent maintenance operation like a VACUUM problem...

It's not nearly the problem now it was ~6 years ago when I was stuck with 8TB in a single file at a small startup.

I look around now and you a right. It's really no big deal to move a few terabytes of data around but it sure was a mess back then and I'm once bitten, twice shy.

Having been in a similar situation 11 years ago, from my POV it was waaay easier 6 years ago than it was back then ;)

I was fortunate that the startup was willing to devote sufficient time and resources to help me get our DB problems under control -- including getting expert consulting services from Percona, attending conferences, etc. However I've definitely seen some other startups that didn't take DB tech debt seriously enough, and just kicked the can down the road until a major disaster occurred.

Anyway though, even aside from improvements in hardware and tooling, managing large DBs just gets less stressful once you've spent some time on the problem. Assuming you successfully solved this 6 years ago, why shy away from it in the future? You almost certainly learned a lot in the process, and the good news is it only gets easier from there!

> Yes Postgres can be slower out of the box

Do you have an up to date source for that?

Not just out of the box. There is a good comment at https://news.ycombinator.com/item?id=7488498 that listed a few key features of InnoDB:

1. Change Buffering - If your benchmark result shows that MySQL is faster than PostgreSQL, it is likely due to change buffering. For random read/write workload that doesn't fit into the memory, this feature is fantastic. This is available since the very first commit of InnoDB, and of course also has been causing random corruption during crash recovery throughout the years: https://jira.mariadb.org/browse/MDEV-24449 (fixed in MariaDB, still exists in MySQL)

2. Page-level Compression - PostgreSQL still doesn't support this, but I am fine with relying on ZFS filesystem compression when running either MySQL or PostgreSQL. Arguably it should be better for the database engine to do its own compression, but see this blocker level bug that impacted FreeBSD: https://jira.mariadb.org/browse/MDEV-26537 (MariaDB only, but feel free to search bugs.mysql.com for other compression related bugs)

3. Clustered Index - The main topic in the Uber drama. I think InnoDB has the better design, since scaling write is harder for a RDBMS. I also think that InnoDB implementation causes a severe performance cliff, when it comes to optimistic update versus pessimistic update.

> Page-level Compression ... but I am fine with relying on ZFS filesystem compression

There are compromises there. With page level compression the pages usually remain compressed in memory as well as on-disk and get unpacked on access (and repacked on change). This eats CPU time, but saves RAM as well as disk space which for some workloads is a beneficial trade-off as a larger common working set fits into the same amount of active memory (particularly when IO is slow, such as traditional spinning disks, cloud providers when you aren't paying a fortune, or when competing with a lot of IO or network contention on a busy LAN/SAN).

With filesystem compression you don't get the same CPU hit every time the page is read, and depending on your data you may get better compression for a couple of reasons, but you don't get the RAM saving.

ZFS ARC actually works the same way, i.e. the records would remain compressed in memory.

When running PostgreSQL, I would probably go with `primarycache=all` for the data dir, a largish ARC cache, and a smallish shared buffers, to take advantage of the filesystem compression.

When running MySQL, I would probably go with `primarycache=metadata` for the data dir, a smallish ARC cache, and a largish buffer pool, to still benefit (slightly) from the filesystem compression.

IIRC it's because some of the default settings were chosen back when RAM was measured in MB and low GB and not really updated. work_mem, for example, can typically be safely set to several times its default of 4MB, which then allows individual operations to use more memory before spilling over into temporary files on disk.

Of course, that particular example really only matters if you're actually hitting the 4MB limit in your queries, but I think there were some other settings with similar too-low defaults.

It's because the default Postgres configuration is very basic and targeted at low-end hardware. Besides, in Postgres you have to tune some of the settings that other databases take care of automagically (work_mem being probably the trickiest one).

So the first thing you do after setting up Postgres on a reasonable hardware is tune the config file. There's a lot of advice on the web, and there's pgtune service that makes it very easy to start in the right direction (https://pgtune.leopard.in.ua). Still, you need to know about it to do it, so there's definitely quite a lot of suboptimal Postgres installations out there.

That isn't unique to pg though: just look at the number of performance questions in dba.stackexchange and on web hosting forums that, once you take out those that turn out to be bad queries and/or non-existent/inappropriate indexes, come down to tuning certain settings in my.ini.

I've not used either for a long time so my knowledge may be out of date, but mysql definitely used to be the easiest to get started out of the box which was part of the reason it won out in the shared hosting space. Of course the main reason it won out in that area was because it was faster than pg in many artificial benchmarks because at the time it didn't bother with silly things like transactions (as MyISAM was the default table type), and foreign key constraints, and other necessities…

Not on my phone and i'm headed to bed but I can go educate myself in the morning.

Maybe its all better now?

I have worked with MySQL and every database has its ups and downs.

Thanks, but no, I don't like slower and 1 process per connection.

It is incredibly hard to improve something that your user base doesn’t think needs improving. That part really stands out to me.

When I first moved to Postgres I was fairly reluctant - my impression was that it was some arbitrary flavour of sql, and since I was used to MySQL (the only sql database I’d ever used) it seemed incredibly unnecessary to bother with. I imagine there are tons of people out there like I was who would rather let things be.

Since the first few weeks I don’t think I’ve ever chosen to use MySQL again; Postgres totally changed my career for the better.

I have very strong feelings that a universally pronounceable product name is ESSENTIAL to wide adoption. PostgreSQL never rolls off the tongue, and it's not fun to say.

> I have very strong feelings that a universally pronounceable product name is ESSENTIAL to wide adoption

What? SQL itself is cursed with people thinking there's one right way to pronounce it, and everybody else is stupid, but PostgreSQL has the advantage that you can just say "Postgres" and dodge that mess. Good luck saying "MySQL" without triggering at least one person in your office.

(I think the official MySQL stance is that both pronunciations are acceptable, but that doesn't stop pedants from being pedantic.)

The only right way to pronounce it is my-squeal

It’s pronounced “postgres”, as in “ingress” but with “in” replaced by “post”. If that helps. No one pronounces the “SQL”, as no one can agree on its pronunciation.

Agreed, though most people just call/pronounce it "post gress". Wish it had a better name though.

i both pronounce and spell it postgres. is that not a formal thing?

It’s how nearly everyone pronounces it, and for the most part spells it in common usage. The formal thing IIRC is that the name is PostgreSQL as it was originally a successor to Ingres, and added SQL support later.

Cultural inertia is a strange issue.

Serious question: If the userbase doesn't think it needs improving, then why are you improving it?

Well, several reasons:

1. Your users leverage unintended behaviours, which obstructs your ability to move forward with meeting needs with intended behaviours in your intended target market (unfortunate but very real)

2. You know if you add or improve existing features, you can access more users, or an prospective user worth more than the existing ones

3. It's a passion project and you want it to function/look/whatever a certain way

I've encountered all of these, and each time it becomes a pretty big pain to make progress.

It's a self-fulfilling prophecy. If your userbase doesn't think it needs improving, people with unfulfilled requirements will go somewhere else. Which may be a good thing if you want to limit the scope and serve a specific market well. But it may also leave you behind with shrinking usage and few new users. (since they prefer to start with the overall better option)

Interesting! What were the features or differences that changed your mind after switching?

I quickly found dealing with time series reports dramatically easier. There's the possibility I was just doing it wrong with MySQL. It seemed so much more expressive for things like writing a query based on an interval and perhaps filling empty cells with default values for example. I also found it very intuitive to connect a UI to our API which would convert various parameters to dynamic queries. That was a game changer; almost all reports could be run by anyone in the company without asking someone to write a query for them. The result was remarkably reliable and easy to maintain. the MySQL version I refactored was much harder to deal with, slower, and only getting worse despite our best efforts.

There were very cool things I couldn't find a way to do in MySQL which eventually made the switch an amazing change. It was trivial to set up aggregated time series tables, and very easy to roll that data up in order to speed up queries with a coarser granularity. At the time I recall window functions making this really clean and easy. It was possible in MySQL (and maybe trivial today, too) but it felt like a massive kludge every step of the way.

This is actually what lead us to consulting someone who could implement the best solutions to these issues that they knew. They encouraged us to adopt Postgres before doing any work, we didn't listen, then as they wrapped up the project they once again advised we leave MySQL behind. He then sent some examples of how we might make the migration and why it would be better. Months later we had made the migration entirely and were very, very glad.

That guy was an AWESOME DBA, and he changed the way I value someone who is good with data and databases. He transformed our team's ability to grow our product, dramatically improved our ability to deliver, and instilled a ton of knowledge in a short time. I can only dream of imparting that to other teams!

Overall I'd say that Postgres provided tools that made working with data easier. I'm not sure if there are any specific features besides a cleaner, simpler,

Weird that that got cut off. I think I intended to say "a cleaner, simpler, more efficient set of tools for working with the type of data I had". And of course, right tool for the right job and all, perhaps there are plenty of cases where MySQL is better. I just haven't found them yet, and probably won't any time soon.

Honestly, the post feels a bit bitter.

> ...let me point out something that I've been saying both internally and externally for the last five years...

Life is short, don't get stuck in a job/relationship/situation/etc. you hate for half a decade. IMO the irony is that he's moving to the Chrome team, which has been an absolute shitshow. Woohoo, now you're going to work on fantastic ways of screwing up the web, inserting ads where we don't want them, tracking users against their will, and banning extensions on the Chrome store. Interesting technical challenges, indeed.

Judging by the focus on internal code quality, I suspect the author is not too concerned about the real-world implications of the overall product.

Admittedly i dont write much C these days. But as someone who has integrated chrome/cef across platform (desktop/mobile) i would say that ive never seen such important piece of software have so many edge cases, regressions and gotchas. The CEF guys are in a constant struggling keeping there implementions stable; it takes a company the size of of MS/Apple todo anything more with it. MS is spending 1-2yrs developing there webview2/Chrome99+ macos integration - even though edge already runs on macos.

I got into programming in 2000 and right when I did Tim Perdue wrote a very important article comparing MySQL and Postgres. He concluded that Postgres was the better database.

[edit] Thanks to DocTomoe for finding this:


Some of the discussion of the article survives:


The article was a very big deal at the time, because it was so well researched and so devastating for MySQL.

I figured that article was the death knell for MySQL, but then MySQL surprised me by just going and going and going. It was helped along back then by having a very close relationship with the PHP language. Nowadays, you can use any database you want with PHP, but in 2000 PHP was very much biased in favor of MySQL, and every article written about PHP was written with the assumption you'd be using MySQL as the database. I suspect that MySQL would have died except for the massive life support it got from PHP during those years.

The moral I take from all of this is that sometimes you can have two technologies, and one is clearly better, yet the one that is clearly better can remain under-utilized for 21 straight years.

I know a bunch of HN people will now show up and defend MySQL, or defend a lazy style of programming that accepts defaults even when that means using a poor cousin of something good. But we should stop for a moment and really think about the implications of this. Because it really is remarkable that people have known of the superiority of Postgres for 21 years and yet people still use MySQL.

All these analysis both on HN an outside seem to miss it one logical conclusion:

It doesn’t matter.

At least for ~90% of the projects. People use MySQL because it’s in the tutorial and it works, end of story.

> It doesn’t matter.

I think the issue is that there are so many different reasons projects need any storage. There's a reason why you need persistence, there's a reason why you need ACID, there's a reason you need some form of replication ...

Eg: a cms like WordPress would probably have been better off just storing structured data in xml files on disk (or nfs) - along with media files on disk. Or using Berkeley db/later sqlite.

Your point of sale terminals might not fare so well without central inventory.

Perhaps postgress is not really all that better. Perhaps MySQL speaks to some need that we have not measured. For me, mysql is _easy_. It just works. You don't need a phd to manage a mysql database, but it almost seems like you need one for pg. That is what makes mysql better, even though it may be technically worse. I read everywhere that mysql is supposed to be slow, but in my experience it is plenty fast. Note that I don't do any "big data" things, most I've had is a few million rows.

I am currently wrapping up a project whereupon I am adding pg support to a previously mysql-only large open source application. The footguns that I discovered when doing this (this is a very CRUD-like web app) weren't performance related, but rather the bizarre data integrity issues MySQL ships with out of the box. Did you know, for instance, that the default collation that MySQL installs with is case insensitive as well as accent insensitive? This means that data that contains é, for instance, will match the e character in queries. MySQL is in general much more forgiving about data type coercion than Postgres as well. MySQL makes assumptions about certain things regarding dates and times that Postgres doesn't.

All of the above bodes well for someone trying to quickly hack together an app that does stuff, but for long term data integrity it's much easier than pg to introduce subtle differences and flaws in the data through some application code that works at face value for CRUD operations out of the db but down the line fails due to subtle data differences.

Absolutely. And the non-standard handling of NULl and defaults. I think they might have fixed this, but for the first 15 years I interacted with MySQL, I never knew when a field would default to an empty string or a NULL.

Strict mode has fixed this for over a decade.

Does MySQL still converts `0::timestamp` into the date '0000-00-00'?

I have never seen a MySQL database that didn't have problems with dates. (But I haven't looked in a while.)

It can function like this, yes. There's a flag for the `sql_mode` that enables/disables this functionality, and I believe zero dates are disallowed by default in MySQL 8.0+ IIRC.

"Some opinionated thoughts on SQL databases" (2021): https://blog.nelhage.com/post/some-opinionated-sql-takes/

There are many factors to take into account, but this article focuses on the "operating reliably in production" factor and says that MySQL has fewer surprises:

As for Postgres, I have enormous respect for it and its engineering and capabilities, but, for me, it’s just too damn operationally scary. In my experience it’s much worse than MySQL for operational footguns and performance cliffs, where using it slightly wrong can utterly tank your performance or availability. In addition, because MySQL is, in my experience, more widely deployed, it’s easier to find and hire engineers with experience deploying and operating it. Postgres is a fine choice, especially if you already have expertise using it on your team, but I’ve personally been burned too many times.

> one is clearly better

For some definitions of better.

I started with MySQL. At some point I encountered project based on Postgres. It just came with its own set of quirks you had to deal with. It didn't feel like, wow, this is so much better. It was more like, yup, it's a database, it just COUNTs slowly and needs to be VACUUMed. I had more issues and questions at the time how to do things I knew how to do in MySQL, I figured them out, but those two are the impression that were left.

Then I encountered MsSql Server and it was pleasent enough. For my next project I'd probably choose Postgres but not because it created any kind of sympathy in me. The only thing I reallty liked was EXPLAIN.

MySQL has one advantage in my opinion. At least it had two decades ago. It was simple. It did things that realtionational databases are good at, in fast and easy manner. It didn't have constructs like nested queries, recursive queries, triggers or stored procedures with cursors which databases like Postgres had, because who's gonna stop them, but their performance is horrible and must be horrible because they step out of the realm of fast relational model into the realm of normal programming which algorithmically is way slower (and that's the main reason we use databases at all).

So MySQL in my opinion teaches you about what the databases are good at. It helped me immensly when dealing with slow queries later in life in Postgres, SQL Server and even Oracle.

I do prefer PostgreSQL if I have a choice, but from the practicality standpoint that many people are hitting on, I'm okay with various design decisions (i.e. take a look at some of the flags for MySQL's `sql-mode` option over the years) being phased out via the normal (warn -> deprecate -> throw error -> remove) lifecycle that things like this often go through in software. Once a technology gets wide adoption, no matter how "flawed" or not earlier versions were, you start to prioritize stability and reliability over "correctness" at some point. This leads to the understandably practical approach to many bugs in many enterprise systems where the team supporting a tech stack learns to work around the rough edges, and might even depend on certain "weird" functionality because it's simply more practical in both the short and long term than not doing it.

None of the above means that I don't see MySQL as flawed in some ways. I'm in a group of developers that I suspect make up a sizable portion of the MySQL community who didn't choose MySQL, but must support it, if for no other reason than because we see ourselves as professionals, and that's what professional do: make the employer's application work reliably.

For applications that have already survived past the point of finding product/market fit, a wholesale conversion of DBMS is rarely worth it, and conversions of this type are costly/risky even if it is worth it. I do understand many of the benefits (real and theoretical) of PostgreSQL, and if I'm around at the moment when a project's DBMS is being selected I'm going to recommend _not_ MySQL, but at some level I'm also paid to make the application that my employer is running on top of their DBMS work reliably ... and the fact is even among people who get PostgreSQL - who prefer it, would choose it if they could - many of us are also pragmatic enough not to pull the rug out from under a running application for "reasons".

You might find this (from 1989) interesting: https://dreamsongs.com/RiseOfWorseIsBetter.html

Right, it's important to keep "worse is better" in mind, but also keep in mind that the philosophy fades in importance the longer that a technology has been around. For instance:

"C is a programming language designed for writing Unix, and it was designed using the New Jersey approach. C is therefore a language for which it is easy to write a decent compiler"

Is that still true? Over time, people have wanted to fix the flaws in C, so they have added features, and nowadays writing a compiler for C is less easy.

And there are other ways to measure the declining value of "worse is better." Many people would now argue that memory safety is worth the extra effort. Many would say the world would be a better place if C was banned and everyone switched to something like Rust. C continues to lose market share to those languages that guarantee memory safety, and yet C never fully dies, which is interesting.

Over time, the tax you must pay for the "worse" begins to cost more than what you gain from the simplicity. My point, above, was how long this can take. We think of the tech industry as fast moving, and yet many core technologies have had obvious flaws for 30 or 40 years, and yet little action is taken to move to better technologies.

Unfortunately, the ecosystem of a tool matters more than the tool itself. One could come up with a technically superior alternative to an existing tool (gpg vs age and signify) but unless they have a widely recognised ecosystem, they tend to remain relatively niche products.

For example, Git can sign commits using gpg and x509 and now in version 2.34, OpenSSH keys. Although OpenSSH can be used to sign data, signify is a much better tool for this task.

I am fairly sure you are referring to https://phpbuilder.com/mysql-and-postgresql-compared/

In the live website, that link works but the links to pages 2 to 5 are broken. So here's an archive: https://web.archive.org/web/20170422031621/http://www.phpbui...

I used MySQL ages ago. It was pretty fast out of the box. I never needed to vacuum. You could get up and running with Upsert and connections without trouble. It did feel hacky, but don't discount that it mostly worked for a lot of use cases. It always reminded me of software developed with use cases in mind (vs theory). That's actually a complement (ie, full joins are rare, left joins more common - so MySQL was better at left joins).

Going to postgre - much pickier. I found it slower out of box. You couldn't just throw tons of connections at it (ie, connection buildup / teardown felt slower). I had issues initially with quoting and capitalization etc.

That was a long time ago. Now I enjoy postgres and haven't touched MySQL, but there is a real history where MySQL was the database you could get going with pretty easily (I was pre-Oracle buyout).

The connection story is the primary reason that PHP + MySQL was a big thing. The PHP model of spinning up on every request depended on fast and lightweight connections to the DB. This was the one area that MySQL really excelled IMO.

To some extent, Postgres just follows the path that Oracle did to optimize performance. To see what Postgres needs to do next, just look at what Oracle had to do.

Postgres forks a listener process for each connection, which is relatively heavyweight. Oracle used to do that, but implemented a separate multi-threaded listener process for performance reasons. MySQL is multi-threaded from the beginning.

> Postgres forks a listener process for each connection [...] MySQL is multi-threaded from the beginning.

I thought on Linux that forking had the same cost, more-or-less, as spawning a new thread?

In Linux, the actual cost of the call to create the thread vs the process is negligible (its actually just flags to same function, saying whether to share the address space or COW it).

But then the cost of starting to use the process introduces a second speedbump, because once you start exercising it you start COWing things.

There's also the cost of task switching. I don't know the details, but I wonder if modern side-channel mitigations in kernels flush a lot of stuff when switching processes that they don't need to do for threads?

Postgres uses a lot of shared memory to communicate between the processes, so its really its own implementation of threads. Postgres is how it is because its history and portability from a time before threads.

In any case, if you were starting out today you really ought have a few threads that are shared and use async processing and plenty of io_uring. Modern Linux allows massively better mechanical sympathy but all the big database engines are already written and can't shift their paradigms so easily. This is from 2010 about how to massively speed up databases with syscall batching (in this case, MySQL is used in the study): https://www.usenix.org/legacy/events/osdi10/tech/full_papers...

One advantage of PosgreSQL's design is that you cannot about a query which shell tools. Which has been helpful to me on occasion, although as measure of last resort.

> is that you cannot about a query which shell tools.

yerw0t m8?

Was this generally or specifically when using vfork to call execve?

I'm sure there are brilliant tricks but it's hard to imagine copy-on-write being completely free

That does seem like a somewhat obvious solution to built a relatively simple pooler / more efficient listener in. Anyone working on something like this? Merge with an existing solution?

This was exactly our my case. I did 10 servers going flat out calling into one (beefy at the time) MySQL. I was amazed at how far you could go with that. Then we had replication at the time built in, and so did read replicas. For free and somewhat hacky software the scaling was solid.

When I started playing with postgresql I had no clue why folks liked it. My memory may be bad, but we used UPSERT and replication. On Postgresql at least through 8 you could not upsert or stream replication.

I am curious how people feel about wisdom and ethics of acting in ways that are actively hostile to the interests of a previous employer immediately after quitting?

Regardless of the technical facts, unless there is a very strong ethical argument in favor (for example, say, the employer is outright lying in a highly fraudulent way and customers need to know) .... I find this sort of behavior to be slightly unprofessional. But I am curious if others feel the same ... and does it make a difference that it is Oracle?

I think the peak of professionalism is:

1. At all times, you should tell the truth.

2. As long as you're employed, you either speak the company line, or carefully avoid commenting, in order to avoid breaking rule 1. If you can't neither avoid commenting or lying, you should quit.

3. Once you're no longer employed, you should avoid spilling company secrets, disclosing confidential information, etc. Beyond that, see rule 1.

I'd be concerned if he:

1. Went to conferences and recommended MySQL, despite thinking it was a bad product

2. Went to conferences and trashed MySQL, despite being employed in part to promote MySQL

Both of those would be, in different way, unprofessional. But what he did? Seems fine to me. I certainly don't think he was obligated to share his unvarnished thoughts about MySQL, but I don't think he was obligated not to either.

Why does "unprofessional" matter here? They're not on company time. If it's accurate, and that's their experience, that should be shared. I'm tired of a culture that thinks professional means advocating against yourself and your peers. Talk about employers, talk about customers, talk about salaries, if a company exec is bad to work with, talk about it. That gives insight for other devs so they're not finding themselves in the same position.

Why is it seen as "acting in ways actively hostile" to one's previous employer to... simply speak the truth, as one sees it, about the nuts and bolts the work that one actually did there -- and the quality of the products that emerged as a result?

I find this sort of behavior to be slightly unprofessional.

I find it highly professional, in the sense that he was being 100 percent honest -- and true to his craft. And not in the least vindictive or spiteful in regard to his previous employer.

He commends his employer for being open to total rewrites when necessary

Because he is stirring things up over there right after he walked away so it won’t be affecting him. The post is a big warning sign to anyone considering working with him.

Well, stirring them up while you work there would be grounds for being fired. When do you think people can talk about it? If you shouldn't speak ill of your employer while employed and you also shouldn't speak ill of your employer after you're no longer employed by them, does that mean being hired means you should never speak ill of that company again?

They made it very clear that while they've held these opinions for a while, they've avoided situations where they would be presented with either lying about their opinions or bad-mouthing the company they work for whether out of respect or fear. I think that's as professional as you can expect people to be. If, as a company, you expect people to not speak about their subjective opinions after they are no longer getting paid as an employee, you need to be willing to offer them something for that, because what you're really asking for is an NDA.

People can talk about it whenever they want, I'm only speaking in support of the very narrow point that it is unprofessional. People reading the post will use its contents to form (or modify) their opinions of the author, and I believe it will generally be in the negative direction. Publishing something that has a net-negative impact on your professional reputation strikes me as unprofessional in a very basic way.

Can he do it? Sure. Should he? Maybe, depends on his values. Hopefully he understood how the message would be received and is OK with the possible consequences, and valued self-expression and making an authoritative critique of MySQL more highly than the negative impacts of rubbing a few readers the wrong way as they perceive a breach of generally accepted professional decorum.

It's definitely unprofessional to talk about it while working there. If it's also unprofessional to talk about it after no longer working there, that means it's unprofessional to ever voice your own opinion of your own work experience at a company, and that's ludicrous.

I'm responsible for vetting possible hires to work under me. What this person did falls well within what I would accept as responsible behavior for someone I was considering hiring. The fact that they note they went out of their way to avoid conferences where they would likely be forced to compromise their morals in some way, either through going against the wished of their employer or being untruthful, speaks very well of their character, if it's to be believed as presented. People that have no problem lying for the company, or badmouthing the company to external people while they work there, are both types of people I would desperately try to avoid.

People can talk about it whenever they want, I'm only speaking in support of the very narrow point that it is unprofessional.

Stop mincing words, please.

If it's deemed "unprofessional" to talk about it, then to all intents and purposes -- "if he knows what's good for him" as the saying goes -- he can't.

I think there's "professionalism" as in "behavior that signals to employers that you're safe to hire because you are a good little drone" and then there's "professionalism" as in "does good work and treats people right".

I don't think there's a reason to push the former, as it's got plenty of natural support from the unequal relationship between employers and employees. We may not be able to throw off the shackles entirely, but let's not shame those who rattle them a bit, you know?

Thank you, this thread really hinges on what basket of behaviors and ethics are considered “professional” in the first place.

I completely agree with you. If they are willing to publicly trash a former employer and colleagues, what would stop them from trashing me as a new employer? I would be hesitant to hire someone who did this.

They are entitled to write whatever they want, but not taking into account the feelings of the people they were presumably working close with in posting something like this shows a startling lack of empathy.

I'm sure this developer is a genius, but I would take empathy over brilliance every time.

Edit: To be clear, if there is anything unethical or toxic about the workplace culture, they should absolutely post about it publicly to help fix it if that's the best route to address it. This post did not read that way to me. But that's just my opinion.

> I completely agree with you. If they are willing to publicly trash a former employer and colleagues, what would stop them from trashing me as a new employer?

I mean, presumably your confidence in being a good employer? People don't decide to trash talk their previous employers at random, right? Surely there is some cause-and-effect involved.

> what would stop them from trashing me as a new employer

Wouldn't it be a good incentive as an employer to not do things that would get you trashed by your employees ?

Sure some people could be revengeful for all the wrong reasons, but I don't think they're many, nor that they usually have a huge platform to throw trash, and these thing would also be denied on the spot by other people in the know.

What would stop them from trashing me as a new employer?

Not giving people loads and loads of things to feel bitter about. And an eggshell-strewn environment where it's basically impossible to air these concerns with anyone upstream.

It's really quite simple, actually.

I'm sorry what ? You should really be free to speak your mind about your employer (or really anything else) at any time. It turns out, employers don't like that and add close in contracts to silence their employees.

This happens in every field, even in "tech", even though organised labor could absolutely leverage the insane demand for qualified worker to get a better deal, but I guess that's being a largely depoliticized and generally ignorant workforce for you : you get less.

Anyway you are released from any and all restrictions on your free speech the micro-second your contract doesn't specify it anymore.

If your previous employer sucked, feel free to say it if you want to. The gag order you sign in your contract regarding bad mouthing your employer has nothing to do with professionalism, and everything to do with a power equilibrium between employers and employees.

It's not terrible, but it's not a great look to me. I don't think you need to hold to your previous employer's interests, though.

The problem to me here is, OP dumps on the work of their former colleges and the criticisms of MySQL aren't backed up and seem a little naive.

E.g., only very useful software survives a long time with many users, and such software also very typically has major warts, especially looking from the inside.

Reading between the lines and subtracting OPs tone, MySQL actually looks like it's in better shape than I would have thought: when OP arrived the optimizer was a mess. Now it's in good shape. Better yet, management is fully supportive and investing in major improvements. Awfully good for a long-lived open-source project.

I might be a little skeptical that OP could thrive on Chrome, except that we all mature as our horizons widen.

I think they are right that many people who swear by MySQL have never used another SQL database.

When MySQL hit the scene at the start of the whole LAMP thing, PG was much slower in some use cases than MySQL, and proprietary databases were super expensive. There weren’t too many options. PG was still fairly rough around the edges.

Of course I also needed transactions, and they didn’t come with MySQL - that would hurt performance apparently - so when I was forced to leave Solid due to HP’s acquisition and subsequent pricing hike, PG was the only option for me.

But most developers didn’t need (or didn’t know they needed) transactions, and SQL was the tech de jour, so off they went.

I tried to like MySQL, but the weird not-quite-SQL syntax, lack of transactions and this _weird feeling I got_ put me off.

Over my career I've used Sybase, SQL Server, Oracle, MySQL, Postgres, even a bit of DB2 way back in the day. I wouldn't use MySQL if I had another option, but that said I've managed some web sites (Drupal and other CMSs) that used MySQL and it was ... fine. It worked, and never needed any attention.

MySQL is a simple database that's easy to set up and run. Don't make the mistake of thinking there's nothing better, but for certain use cases, it gets the job done.

At that point your options were either a dog slow database (PG), a database that randomly lost data (MySQL), or big money for Oracle. The first thing everyone did when they got their first VC check was spring for Oracle licenses and Sun boxes to run it.

We found SOLID back in the 90s, which was an awesome, full featured Finnish (I think?) SQL database server. It was proprietary but super cheap. Then HP bought it and jacked the price 10x and we were in trouble cos our product pricing at the time depended on the low cost SQL server (this is before cloud of course).

Before that I ran something called Unify which was ... interesting.

I'm curious - when what true? In the 2000s? Before?

I joined my first startup on 2010 and we had a managed PG database that was good enough for us I reckon.

For some reason I remember Postgresql 8.3 as being the pivotal release that let us adopt it, and that came out in 2008 according to the internet. So yeah we are talking ancient history now.

Looking back at the release notes I can't believe how many of those features are critical to how I use PG today. Really some visionary work there.


> With significant new functionality and performance enhancements, this release represents a major leap forward for PostgreSQL. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the following major features...

I’m thinking of the first dotcom bubble—-late 90s into early 2000s.

Around that date there was akso MSSql server 2000. I remember it was good enough and not as expensive as Oracle.

My recollection is that running Windows’ servers was generally and all or nothing thing. Either you were a Windows’ shop that hired all Windows’ people and had a hefty annual contract with MSFT or you’d never consider MSSql.

It was essentially Sybase 4.9.2 at that time.

definitely true in the late 90s.

> so when I was forced to leave Solid due to HP’s acquisition and subsequent pricing hike

We used the Solid database at my previous company in 1998-2005+. It was a great piece of software. IBM bought it, rebranded it as an in-memory database, marketed it to telecom companies, and now it seems to be completely focused on that: https://en.wikipedia.org/wiki/SolidDB

It's a shame because it was so simple to manage and "solid" as a rock. Worked really well for us and we never had performance problems with it.

As an aside, I was certain that Solid was bought by HP and looking at the wiki page I think we were already using PG by 2007. So it maybe changed hands a few times.

Regardless, it was an awesome database, ahead of it’s time in many ways. And amazing support.

Transactions are rarely useful for a LAMP style setup where you have a stateless web server doing request-response - it's not like you can serve a form to the user and hold a transaction open until they commit it.

Funnily enough I'd say the feeling was what really put me off PostgreSQL. Everything just felt slightly more cumbersome to do; I had to constantly look up all these backslash commands whereas things like SHOW CREATE TABLE might be slower if you spend all day doing database admin, but were a lot easier to remember as a developer who only used them occasionally.

One http request usually does multiple database operations, so why would transactions not be useful?

At the point where you're handling a request the user has already submitted their form, so if you refuse to save what they submitted then that ends up as a pretty terrible user experience (unlike with a rich client application where it might be ok to reject a commit to the server because the user still has a local copy). And if you can't roll back then having a transaction is pretty meaningless.

Transactions are super useful in many contexts, and they certainly aren't about "refusing to save what they submitted". They are about behaving properly when something goes wrong, or when you try to enter data into a database that's invalid.

If you have a form submission that updates multiple tables - say, adds you as a customer, saves your order, and adds it to a queue - then a transaction ensures that either ALL of the tables are updated or NONE of them are. In this way you don't end up with half-orders, or data that you can't use later.

It's far better user experience to fail, than to lie about having succeeded and throw the user's data away.

Transactions are one of the most useful tools in the data toolbox... you can get work done without them, sure, but most of the time you just end up creating something that looks a lot like a transaction, but slower...

1. If the transaction fails due to a conflict it can usually be transparently re-tried (doesn't work if you have external side-effects). So the remaining failures are either due to user error (passing invalid data) or bugs and not related to the use of transactions.

2. Error handling becomes much easier if you can raise an error at any point during request processing and it simply rolls back the whole transaction, since you don't have to duplicate all validation to also run before you first write to the database. One important case is if there is a bug that triggers an assertion failure in the middle of the request processing.

> If the transaction fails due to a conflict it can usually be transparently re-tried

True, but if your transaction is retriable in that way then you might as well write the initial request and then do the processing async afterwards (i.e. event sourcing style).

I mean synchronous retry during the same request on conflicts with concurrent operations. Async processing is usually not an option, because then you can't return the result in the current request. Also retries can fail, which needs to communicated to the user.

that's the exact argument that I've heard in the 90's... nothing has changed :)

In my very humble and naive opinion, MySQL has the whole 'worse is better' thing going for it. Does less, does it maybe less than perfect, but it does it very quickly and reliably.

We've personally scaled MySQL to many tens of thousands of concurrent users without too much trouble and without a dedicated person watching the database.

That said, I've only really worked significantly with MSSQL and MySQL, so my points of comparison are based largely on what I've heard. The majority of my last 15 professional years have been spent with MySQL.

MySQL used to be the "has more features" camp, I'm not sure why you think it does less.

There are many things that MySQL does that Postgres still can't do, like memory tables or pluggable storage engines.

And for anecdatum: I've been serving a million people concurrently on PG without any handholding except the initial setup: but that tells you nothing about what we were actually doing with it.

Would appreciate reading more about your experience, setup, underlying infra, performance (e.g. query speed stats and percentiles)...

Also, how many concurrent connections do you max at Postgres and what do you use to manage it?

> pluggable storage engines.

I can help shed light here. Postgres doesn’t make you choose between two storage engines and their compromises. It’s been years, but last time I looked the trade offs between MySQL storage engines were material and the newer engine was missing some valuable features of the older one. The advice was “pick the right engine for the job” - but I don’t want to, that another decision I have to make, why make me do that? At least until they come out with a new storage engine that’s “objectively better than Postgres”, I will enjoy the simplicity of using Postgres with its single storage engine. Choice isn’t always better.

I searched to see if anything has changed and found that InnoDB doesn’t support full text search - give me break, after it’s been the default for years, you gotta be kidding. https://hevodata.com/learn/myisam-vs-innodb/

You should find more reputable sources for this information. InnoDB has supported fulltext indexes since MySQL 5.6, released almost 9 years ago.

Here's the manual page from 5.6: https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-inde...

That’s good to know!

If the first result is not reputable, that’s still a smell (but a different kind of smell) for MySQL. https://duckduckgo.com/?q=innodb+vs+myisam

I disagree. Search engines put more weight on recent articles, and no reputable source is writing about this topic in years, for one simple reason: there is no valid reason to choose the MyISAM storage engine in 2021. This was true even a decade ago.

Hevodata.com appears to be selling an ETL product. This post you're linking to is effectively SEO / content marketing. It is not written by MySQL experts.

Regarding fulltext support in InnoDB, there are many results when searching for "innodb fulltext".

Thanks! Point taken that my garbage query yielded garbage results.

Coming full circle here, in your opinion, are pluggable storage engines a virtue of MySQL, for practical purposes?

I wouldn't say it is uniformly a virtue, especially for user-facing practical purposes when comparing MySQL to e.g. Postgres. It's a major architecture difference which has pros and cons.

Most MySQL users should just stick with InnoDB for everything. It's an extremely battle-hardened choice with excellent performance for the vast majority of OLTP workloads.

However, if you're a large company and storing a massive amount of relational data, MyRocks has significantly better compression than any other comparable relational database I'm aware of. FB poured a ton of engineering effort into it, because those compression benefits provide ludicrous cost savings at their scale. For most companies and use-cases though, the benefits may not be significant enough to justify using a less common engine.

Aside from these two engines, AFAIK there's currently no other popular general-purpose modern storage engine that is widely used for MySQL. There are a lot of random third-party ones, but historically it's risky to tie your business to an uncommon storage engine.

I'm not a DB internals engineer, but from what I understand, some of MySQL's downsides are a direct result of the added complexity of having a pluggable storage engine architecture. Mixing-and-matching multiple engines in one DB instance can also be risky (it affects crash-safety guarantees of logical replication on replicas). So given that most companies should just use InnoDB anyway, for practical purposes pluggable engines are not a huge advantage for most users today.

But in the future, who knows; it's good to know the flexibility is there. And had MySQL never supported pluggable engines originally and MyISAM had been the only option, MySQL would be dead and forgotten. InnoDB was originally developed by a third-party company that Oracle acquired. Interestingly, Mongo followed a very similar trajectory, replacing their initial engine with WiredTiger, developed by a third-party company they acquired.

InnoDB and Memory are really the only ones widely used these days. After Oracle bought MySQL they threw their full backing behind InnoDB and recommended all MyISAM users migrate. They fixed a number of small problems like the missing full text support.

InnoDB is more fault tolerant and modern feature rich (e.g. Foreign keys) than MyISAM. MyISAM is faster in practice despite Oracle’s continuous claims to the contrary.

MyISAM is largely undeveloped at this point. MariaDB hard forked MyISAM into Aria and have included better fault tolerance and other general improvements. It’s what I use for my personal projects.

Memory tables are strictly in memory and useful for doing very quick processing on things you are certain will fit entirely into system memory. Their data is lost when the system is rebooted or MySQL is restarted.

> MySQL has the whole 'worse is better' thing going for it.

I've used both MySQL and PostgreSQL and I prefer MySQL, it seems to work and scale better "out of the box" and has a more comprehensible permission model than Postgres. I might be biased somewhat as I've used MySQL a lot longer and have even written plugins to interop with it but its still my number 2 go to (after SQLite)

That's a perfectly reasonable assessment. Also, it's certainly a dream product compared to anything on the market 20 years ago. So yeah, it's fair to say the original blog post could have been more nuanced in its holistic assessment of MySQL.

But then again, you can tell he was worn down from his experience working there. The industry has a way of doing that to people.

In general it seems MySql targets start-ups while PostgreSql targets more established companies and projects. It's not necessarily about good-vs-bad, but using the right tool for the job.

Postgres now has UNLOGGED tables which are pretty similar to memory tables.

This strikes me as pretty childish.

If it had read more like "I left because I was frustrated with code quality", and actually took ownership of the feeling, that'd have made a big difference. It reads more like "i left because the code sucks, everyone around me is dumb, and our users are sheep"

I don't think anyone's going to suddenly switch to postgres as a result of this blog post...so really, what's the point?

Maybe it'll get more attention on the systems he called out? Probably not. The post just reads as bitter, mostly about the mindset of other contributors, but doesn't outline what they tried to do to change the hearts and mind of those contributors. Or put another way: what did this person do to lift up all of these people? Deriding doesn't actually help anything.

If the goal is just to vent that's fine, but there's really not much else to see here.

It reads more like "i left because the code sucks, everyone around me is dumb, and our users are sheep"

Not in the least. The fact that he doesn't say what you're saying he says ("everyone around me is dumb, and our users are sheep"), but rather simply sticks to brass tacks -- and it's pretty hard to counter his main point (about the quality of MySQL as a product), after all -- belies the characterization you are trying to make of what he said.

The post just reads as bitter,

Anytime anyone, heaven forbid, talks the plain and simple truth about the conditions many of us work under in this industry -- they get characterized as "bitter", "derisive", "just venting", or (especially in the context of describing our past work experiences to prospective employers), "badmouthing". Or as you put it: "childish".

Such is the state of our industry.

Well, readings are subjective, and it's ok that you don't agree with my reading of it.

But I do think that you're either mistaking "plain and simple truth" for callousness, or creating a double standard. Are descriptions such as "bitter" not the "plain and simple" truth for this post? I certainly think it reads bitterly.

They're not really complaining about the work conditions, so I'm not really sure what you're referring to there. It's just code. It doesn't bite. It does ossify. It's unpleasant to deal with, but it's part of the job.

It seemed like they had supportive, albeit corporate management.

Well I didn't mean "conditions" like having to work in a windowless basement, with a pile phonebooks to sit on instead of a chair.

But rather: being asked to work on products we just can't really believe in, to be silent when upper management (though otherwise supportive and presumably in no way outwardly abusive or mean) would prefer that we keep our blinders on, etc.

It's just code.

His main concern was the quality of the product as a whole -- and the lack of awareness in that environment of what, to him, seemed to be simple and obvious facts. The remarks about the "bad code" almost tangential (like he said, "it didn't bother me much").

> Are descriptions such as "bitter" not the "plain and simple" truth for this post? I certainly think it reads bitterly.

Erm, no?

You yourself pointed out: “Well, readings are subjective…”

What real facts did the author present? It's a recounting of their impression of the code base, experience on boarding, etc.

Perhaps I'm being dense - could you highlight the plain and simple truth part?

It’s not about what the author wrote. Of course what they wrote was their own opinion.

It’s about you reading ‘bitterness’ in to what they wrote. The bitterness is in your mind.

The bitterness is in your mind.

Agreed - across this thread, there's been a lot of what imputation of bad or petty intent and/or of a disagreeable emotional state on the part of the blog author that just isn't called for.

Insulting your coworkers and users is childish, and that's exactly what this post is. "More jarring were the people who insisted everything was OK" "was hailed as “efficient” (it wasn't)." In fact that's pretty much all it is. There are plenty of ways to complain without burning every bridge around you.

"More jarring were the people who insisted everything was OK"

I read this as simply being honest (in the "Dutch" sense). It wasn't like he was shit-talking his former co-workers, per se. He's just saying he had a radically different appraisal of technical viability of the flagship product.

Which, again, was simply the truth as he saw it.

Simply being honest doesn't mean you're not unneededly insulting everyone around you. He can say he had a radically different appraisal of the technical viability of mySQL without calling everyone he works with an idiot.

He's not saying, or even insinuating, that they're "idiots".

Really, look at the dictionary definition of the term, please.

Everyone who works on mySQL is going to read this to mean he thinks they're idiots, and that's what matters.

You don't know that -- and even if some of them do, that doesn't mean that interpretation is justified.

> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product.

> (it seems most MySQL users and developers don't really use other databases)

> But perhaps consider taking a look at the other side of that fence at some point, past the “OMG vacuum” memes.

> Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.

> I am genuinely proud of the work I have been doing, and MySQL 8.0 (with its ever-increasing minor version number) is a much better product than 5.7 was


There’s definitely a lot of “our users are sheep” and “everyone else is dumb” going on in this post.

Heck, it may even be true. But you can’t really argue that hybrid author isn’t saying it.

There’s definitely a lot of “our users are sheep” and “everyone else is dumb” going on in this post.

Well, we disagree then. I see his post as making some definitely very harsh critiques -- but still short of the threshold of outright insulting people.

> Such is the state of our industry.

Yeah, but the guy is worth millions, ain't he? I assume he's pulling in 6 figures?

Nah, it's a pretty important post. It's the first guy to publicly say "the emperor is naked". The total object level information hasn't changed much, but now I know others know, and they know others know too. It's out.

And from now on everybody who has issues with the query optimizer won't shamefully look for documentation on how to fix _their_ use case, but publicly stink mysql for having a shit optimizer (which it has, btw).

Which in turn will support those developers inside mysql who want to push things forward. They won't seem like cowboys that want to fix a good thing - they'll have lots of user complaints to help them argue, and also the awareness that maybe, just maybe, mysql will get left behind if they don't move faster.

`It's the first guy to publicly say "the emperor is naked"'

That MySQL has a poor optimizer has been known for as long as MySQL has been a product. The product is legendary for its inability to do even basic RDBMS needs competently, and the rise of NoSQL was largely people assuming MySQL limitations were general RDBMS problems (for instance its painful incompetence doing basic RDBMS tasks like joins).

Having said that, a couple of decades in this industry has me reading this post and immediately sensing oozing bitterness. That maybe he got passed over for a promotion he felt he earned, etc.

When someone does the "it's all crap" exit, it's seldom from a good place. Who could seriously have applied to and joined the MySQL team without knowing that it isn't exactly the pinnacle of database systems?

Having said all of that, it's interesting seeing pgsql being held as the panacea. I like pgsql, and prefer it among open source database systems, but in many ways it is a decade+ behind MSSQL and Oracle.

It might have been more useful if it wasn’t literally the day after he got another job (it’s not like he quit it because MySQL is so terrible and will now look for a new landing spot).

I don’t know the author, but assuming he has publicly made such comments before it would behoove him to link to those posts so it’s clear that he has been raising these complaints publicly for a while and didn’t just wait to dump the consequences of his public post on his now former colleagues the moment he got out.

> This strikes me as pretty childish.

It does because all adults are cynical nowadays and rather than speaking truth they will just sweep it under the rug for political convenience.

> I don't think anyone's going to suddenly switch to postgres as a result of this blog post...so really, what's the point?

Spreading the truth? Does it even need a point?

> The post just reads as bitter

Yeah it does, I still find it refreshing that this guy is just calling out all the bs as he sees it. He has nothing to gain from writing this, a lot to lose. It's pretty entertaining. Hope it doesn't affect him negatively.

> It does because all adults are cynical nowadays and rather than speaking truth they will just sweep it under the rug for political convenience.

It's possible to speak the truth without being callous. I'm not really sure what political convenience has to do with this.

> Spreading the truth? Does it even need a point?

No, but it should probably have one if it's going to be a discussion on HN.

> Yeah it does, I still find it refreshing that this guy is just calling out all the bs as he sees it. He has nothing to gain from writing this, a lot to lose. It's pretty entertaining. Hope it doesn't affect him negatively.

It's great that you're entertained by it. Personally I it just gave me the impression that the person would benefit from therapy.

> I'm not really sure what political convenience has to do with this.

Political convenience is not burning any bridges. The only reason people aren't callous is fear of retribution.

>No, but it should probably have one if it's going to be a discussion on HN.

I really hate the word should, it's almost always used to force opinions on people without justification.

> Personally I it just gave me the impression that the person would benefit from therapy.

He would probably only need therapy if he stayed at oracle.

I'd argue truth is a big enough point, don't you think so?

I dunno. I think the author makes some compelling points. Chief of which is the bad code quality. At least the bit about being unable to do full outer joins without a full rewrite. If the code base got so unwieldy that it made adding features tough all these years after it was started that’s a knock on both Oracle leadership and the project itself which I think is grounds to leave.

I don't think he's wrong from a technical assessment, but most of the problem that he was experiencing was a social problem.

> More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases)

i.e., the problem wasn't that the executor was bad, the problem was that everyone thought it was ok

And maybe they're not a people person, and trying to enlighten isn't what they signed up for -- all good -- I for one certainly don't want anyone working a job that they're unhappy with.

But the post is littered with putdowns --

> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product. At the same time, I was attending orientation and told how the optimizer worked internally, and I genuinely needed shock pauses to take in how primitive nearly everything was

> Don't believe for a second that MariaDB is any better. Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.

I guess I've just seen this attitude enough where it's boring. Shock pauses, very primitive, bad code. Got it. Moving on...

> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product.

Spending years in an environment where a core component of your product was crippled by technical debt and you were surrounded by people who didn't -understand- how crippled it was does seem like a recipe for understandable bitterness.

People saying "yes, we know, but rewriting that isn't the business priority right now" is a different matter - that's often aggravating but the right call - but not even acknowledging the problem is unhealthy.

Note that I've made a couple other comments on this article that do their best to acknowledge how much more of a pain in the ass learning how to setup postgres replication is than mysql, because even though I prefer postgres most of the time it's still -true- and I don't see how hiding from that fact makes anything better for any user of anything.

I wish he be happier

I think what you read out of the post and what he posted are different things.

I mean, that's kinda tautological.

your argument reads like a straw man...

It might be. Just sharing my impression of it, take it as you will.

100%. Such a petty way to resign

From having fought it as a savvy user, I completely agree with the article about the poor MySQL query planner. I don't have the under-the-hood insights of the poster, obviously, but everything he says jives with how it felt as a user.

Of course the planner is just part of a database, and I have some kind words to say about the other technically-impressive bits of MySQL:

A lot of MySQL users are websites and things with pretty CRUD access patterns. That wasn't me.

My history is using it for high-throughput real-time batching and fancy buzzword stuff at reasonably massive scale (big distributed teleco systems), which is where a lot of expensive choices were pitched.

Doing teleco systems with MySQL was staggeringly cheaper and actually quite cheerful and, for all the times I swore at it, I'm actually still a fan.

I went with MySQL for advanced features that, at the time, Postgres was way behind on. MySQL had lots of storage engine choices (including TokuDB, which changed everything for my use-cases) and upserts and compression and things that put it way ahead of Postgres.

Of course MySQL had lots of warts too. The query planner was completely poo, but 99% of uses are simple things that it can handle well, and the most critical times it gets things wrong you end up annotating the sql to force indices and do the planner's job for it etc.

Of course, nowadays, Postgres is reaching parity on these things too (except, perhaps, compression. My understanding is that Postgres is way behind on decent in-engine page-based compression. It will presumably get something decent eventually.)

Postgres still needs tweaking for getting the query planner to do the right thing, and you can't easily tell it what to do.

I ended up proxying my queries with a preprocessor that uses special comment syntax - e.g. `-- $disable_seqscan` - which wraps query execution in sets of enable_seqscan off and on again, to force PG to use the index. All databases can have performance that falls off a cliff when changing statistics make them choose a less optimal join order (join order is normally the biggest thing that affects performance), but PG is particular in not having much flexibility to lock in or strongly hint the plan.

MySQL, on the other hand, is predictably bad and has STRAIGHT_JOIN and other friends which make things much easier to tweak.

You should try pg_hint_plan ;)

The postgres team is advocating that compression should be applied the file system, and they are correct. But this won‘t solve the problem if filesystem compression is not widely available. ZFS and BTRFS are not widely used, the first one because of licenses the last one because of bad pr in the past.

You can add VDO to the LVM stack but thats another layer in the complex ext4/xfs lvm layering approach.

I'm not sure page level compression can get efficiency of LSM-Tree, and frequent page writes will be a write amplification problem if you end up doing any form of copy-on-write and try to do in-place overwrites.

Rare, and utterly refreshing for its frankness. Let's hope this person does well wherever he goes in his career.

I only have the untrained DBA perspective coming from building and hosting MySQL, Maria DB and postgres clusters over the last 14 years and I feel that Galera clustering feels a lot better than anything I've seen in postgres.

Agreed that postgres has amazing performance. I was actually an early convert to postgres back in 2004 because my boss and coworker at the time saved entire businesses by migrating them to postgres when they had enough of trying to scale MySQL with more hardware vertically.

But in the later decade when I myself have used clustering to scale I've found that Galera feels less like a tacked on afterthought than anything I've seen in the postgres world.

It's just a lot more integrated than pgoool and watchdog.

I know there are new solutions now that run the postgres clusters inside kubernetes that I haven't tried yet. But that isn't more integration, that's just more 3rd party abstraction.

Edit: I realize after I wrote this how strange it must seem to a developer. Because galera/wsrep is actually a 3rd party replication product "tacked on" to MariaDB. While postgres replication, afaik, is written into their mainline code. I guess my gripe wasn't about the replication but rather the "clustering" around it, like maxscale/haproxy/pgpool and so forth. And in that sense they seem pretty equal, I just chose to use pgpool for my postgres clusters and that is definitely a hacky, scripty mess compared to HAproxy or Maxscale.

I do significantly more pg administration than mysqlish administration.

Galera I gaze upon with envy, no question.

Does any one know when is MySQL 9.0 coming? It has been a unusually long time between a major version or a dot version release. MySQL 8.0 was in April 2018.

Compared to Postgres which has been shipping features after features.

It‘s not planned. They are shipping new features in patch releases, yes, you heard correct. They had the genius idea to stay at 8.0.x and still add new features with every release.

I have a moderately large table (hundreds of millions of records) with very simple structure but lots of columns and keys. This post is a huge "aha!" moment for me - I keep struggling with how the optimizer is choosing which index to use. It keeps making stupid mistakes and behaves differently from instance to instance - I still have a staging server where I just have to avoid an admin interface because it takes minutes to load, and I haven't invested more than a few hours to debug because it's not production.

I have pieces of half commented code where I do the optimizer's job and try to guess which index is best and chose it manually. Lately I've been considering doing an "explain" first and based on that tweak the query.

So yeah - I love MySQL, and I don't think I'll be able to invest the time to switch, but at least now everybody can see the emperor is naked and it wasn't just them. The optimizer sucks, including in MySQL 8.

Most of the people annoyed with PostgreSQL in these comments are annoyed because it feels like you have to write your own replication and connection management

Most of the people annoyed with MySQL in these comments are annoyed because it feels like you have to write your own optimiser

Welcome to software, I guess :D

You could perhaps fix it by trying to use some index hints in the query to get a bit more consistency, see https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

Yeah, that's what I mean by "trying to do the optimizer's job". But this is meant for edge cases - doing this fully would mean starting to rewrite the optimizer in the app. This is where using the "explain" might help - if it's choosing what seems like a sane index and has a limited range of rows, I send the query "as is". If it fucks up and does a table scan, I'll force a compromise index.

> Generally in MySQL we send queries massaged to a point where optimizer doesn’t have to think about anything.

https://dom.as/2015/07/30/on-order-by-optimization/ - Wisdom from the guy who managed to get MySQL to work at facebook scale.

Thank you

> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.

This is an insider view, meaning that he is probably addressing things like code quality, compile/run/debug workflow, technical design decisions and so on.

As a 20 year MySQL user - starting with version 3.23 if my memory serves me well - and after billions upon billions of inserted/updated/deleted/queried rows, MySQL is not a pretty poor database by any measure. It has served me well: Good performance, near zero maintenance, very few crashes, and a couple of data corruption that resulted in data loss after power outage.

That's the whole thing with the difference between so-called "ACID" databases and MySQL. Corrupted data after a power outage just shouldn't happen. It's one of the things database engines are supposed to protect us against.

In 30 years, it's never happened to me with Postgres or the other two database engines I've used professionally.

If it did happen, then most people who know about databases would say "that's a pretty poor database".

>>> That's the whole thing with the difference between so-called "ACID" databases and MySQL. Corrupted data after a power outage just shouldn't happen. It's one of the things database engines are supposed to protect us against.

I knew I was not running MySQL in ACID mode (via innodb-flush-log-at-trx-commit=0), and I sacrificed safety by performance. I was OK with that in my specific use-case.

MySQL is, by default, ACID. There are storage engines that aren't, but they are rare and not default choices.

MyISAM used to be the default for a long while. I made a bit of extra money when younger tidying up MySQL DBs that were in inconsistent states thanks to MyISAM not enforcing FKs.

InnoDB has been the default for over a decade.

Yep, it is now. My favourite bit was when InnoDB became the default, a lot of people configured MySQL after upgrading to keep defaulting MyISAM "for performance" that they really didn't need.

But the dominance of MySQL and LAMP happened long before InnoDB.

Complete outsider view:

> internal company communications tried to spin that Oracle is filled with geniuses and WE ARE WINNING IN THE CLOUD.

From wikipedia about mariaDB:

> A group of investment companies led by Intel has invested $20 million in SkySQL. The European Investment Bank funded MariaDB with €25 million in 2017. Alibaba led a $27M investment into MariaDB in 2017.

From wikipedia about MySQL:

> MySQL is also used by many popular websites, including Facebook, Flickr, MediaWiki, Twitter, and YouTube.

They certainly have their place. I'm sure some of these companies have considered Postgres.

One thing I'd note is that a lot of these companies made the decision years ago. MySQL's replication story has been good for a lot longer. It wasn't until September 2010 when PostgreSQL landed replication (and you probably didn't want to adopt the first version). If you were looking to create a highly available service like Facebook, Flickr, Twitter, or YouTube before 2010, you were probably looking at MySQL. All those companies were huge before 2010.

That was one of MySQL's killer features. No matter how bad any other part of MySQL might be, it did replication. PostgreSQL had some third-party add-ons to do replication, but it was hard, slow, and reasonably easy to do wrong. Some of them were downright bad ideas where they'd just proxy your request to two independent PostgreSQL instances and hope that nothing went wrong, others used triggers on the tables. Literally, the Postgres core team said this about PostgreSQL: "Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases."

No, those companies wouldn't have considered PostgreSQL and once you get to a certain size, things tend to stick around.

That said, many of these companies aren't using MySQL for a lot of new stuff. YouTube developed Vitess to handle some of their MySQL problems, but from what I've heard they've moved off MySQL since then (correct me if I'm wrong). Twitter has its Manhattan database. Facebook has gone through many databases. MediaWiki is a project that people are meant to be able to run on shared hosts and that means PHP/MySQL. Flickr isn't really a company that has done a lot post-2010. That doesn't mean it's a bad site, but it doesn't seem to be making a lot of new stuff.

Decisions have context. Without the context, it's easy to come to the wrong conclusion.

I don't hate MySQL and PostgreSQL has its problems. I think MySQL's strengths are generally in its current install base and current compatibility. Lots of things work with MySQL. Vitess isn't perfect, but it is a nice project for scaling a relational database. PostgreSQL doesn't have a Vitess. Likewise, many things already work with MySQL like MediaWiki and many things speak the MySQL protocol. However, that's starting to shift. I think we're seeing more things adopt PostgreSQL compatibility. Google's Spanner now has a PostgreSQL layer. RedShift, CockroachDB, and others are going for PostgreSQL compatibility.

The thing is that ecosystems take a long time to shift. If it were 2004-2006 when Facebook, YouTube, and Twitter were created, I'd definitely have grabbed MySQL. You need good replication. PostgreSQL wasn't even talking about bringing replication into core back then, never mind having something available. Times change and software changes.

> Facebook has gone through many databases.

Main use cases (social graph, messaging, ..) are on MySQL (and never left it). Storage engine is different, replication is improved, etc, but it is still tracking upstream MySQL tree.

The thing is - PG still relies on physical replication, and staying with logical one allows to use it for out-of-DBMS change data capture and reuse in other systems.

> MediaWiki is ...

Funny tidbit - I tried to move over Wikipedia to PG ages ago, and did the initial prototyping and made it work to a certain degree. But also I learned about InnoDB more at the time.

What the Mysql folks never say is that these companies never run the off-the-shelf version, but instead have built engines, tooling and expertise that allow them to prevent developers from being too close to the real Mysql.

Find out who has invested in EDB, the leading Postgres vendor in the market. Postgres is a foundation, so there is no central entity to invest in.

> In the end, there's just not enough resources that I could see it turn into a competitive product, no matter how internal company communications tried to spin that Oracle is filled with geniuses and WE ARE WINNING IN THE CLOUD.

It’s refreshing to know that tier 2 cloud companies (Oracle, IBM etc) all have similar internal perspectives, ie leadership insisting that they’re making amazing progress in the cloud while their market share either stagnates or reduces.

What a burn - but not surprising given MySQL's history and things like MariaDB or Percona which exist to fill the gaps. My impression is that many people will choose Postgres over MySQL when starting a new project as it progresses so quickly and has a far richer set of features.

Or just use Auroradb on AWS, with maria as your test setup.

> But it doesn't really explain why I did go looking for that somewhere else in the first place.

I don't have understand why people think they need to provide a reason.

They are just part of economy and should constantly look for better opportunities for themselves because that's what drives the economy towards more efficient state where resources are better utilized. Not to mention they themselves have just one life and have full moral right to live it the best way they can. They don't owe anyone anything they didn't promise and employment is not a promise of dedicating your whole life to a project or an employer.

Ok, I have to admit I LOL at the bottom of this critique of MySQL code quality to hear he was going to work on Google Chrome!

Chrome is almost all open source and Sesse has worked at Google before. I'm pretty sure he knows what he's getting into.

Why is that? I admit I know little of chrome code quality.

But either they choose to work on a high quality product, or they choose to work on a product whose quality they can improve (low hanging fruit).

What is LOL worthy?

I never worked on Chrome directly, but I looked into the Chromium code sometimes when debugging some weird JS issues.

What strikes me first is how much code there is…

There are tons of code, some things copied multiple times, because Chrome nowadays does a lot of things; basically it’s an entire operating system, which accesses USB, runs assembly code, runs WebGL, basically all.

And it’s all in C++, and very verbose Google C++.

But in the end I always found what I was looking for there. And I can’t say how good or bad the C++ is, as I’m not C++ dev. Just there is a LOT of it.

I wonder if there is any piece of software that actually works and yet developers don't call its source code a shitshow. I understand code as akin to biology, full of repetitions and garbage DNA because these things are live and evolve along with the environment, and cannot be "pure" for more than a few months. C++ is very versatile, but i bet that using a verbose makes it much easier for newcomers to find their way around with the code compared to using an uber-abstracted style.

The core of the Linux kernel, in my experience. It's regularly refactored by grizzled oldtimers, which helps as standards rise.

I've heard horror stories from Google engineers, but I don't think they're public so I shan't quote them.

I am tempted to put a reminder in my calendar for a few years' time to see where this engineer is, though...

I was with MySQL AB in its early days, It's codebase and behaviors was much worse than it is now. Yet we had a team of people who loved MySQL and was focused on making it better.

I think this is unfortunate culture change in Oracle land - they hire mercenaries rather missionaries to work on the code, so such "I never loved it, but I worked on it for years" is not a surprise.

MySQL, MariaDB, MongoDB, PostgreSQL all have skeletons in their closet. Yet if PostgreSQL developer would be leaving with similar attitude community would raise much more to defense

If Steinar is responsible for removing query_cache in 8.0 I'm glad he's leaving the MySQL team. Sure, MySQL is not the perfect database heck I'm willing to even say it's not good or efficient, but it is extremely well supported across toolchains and programming languages. It just works and it works pretty darn well for most applications. I've built several startups on top of it some even exited.

There's a huge difference between "you don't need the best infrastructure to build a successful company" and "I left because the place had a culture that refused to acknowledge major product shortcomings". They work in different domains.

> extremely well supported across toolchains and programming languages. It just works and it works pretty darn well for most applications

The same holds true for PostgreSQL.

¯\_(ツ)_/¯ optimizer guy says he did not do good job, shits on optimizer?

There're many reasons to pick MySQL over PG at large scale deployments (economics, replication strategies, etc) - and the fact that some queries will run better on PG may not outweight those benefits.

Don't get me wrong, I know many areas where MySQL sucks, and that is mostly in lacking execution strategies and optimizer problems. Indeed, in many of these ways MySQL is stuck years behind, but in other areas (MyRocks, modern hardware use, etc) it is far ahead of PG.

The thing is, optimizing has costs, and bypassing those costs is useful, if you're looking at economics of your system, and storage engines that pay attention to data patterns are ahead of just naive heaps of data.

It is very simple to diss MySQL when your business does not depend on managing petabytes of data.

P.S. We've migrated a major social network from PG onto our MySQL-based platform and our MySQL DBAs didn't even notice. :-)

> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.

It's not like it's an industry secret either: when I started my CS training in 2007, the first course was about relational databases and one of the first things the teacher told us was that MySQL was pretty sucky.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact