Wow, right on the chin with that one.
> More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases)
I only have anecdata of my usages of MySQL and Postgres but I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing.
Yes Postgres can be slower out of the box and yes Postgres has worse connection handling that usually requires a pooler but the actual engine and it's performance makes it worth it in my opinion.
Logical Replication is the big one. MySQL has had logical (and "mixed") replication for a decade (or more?). PostgreSQL has gotten built-in logical replication only recently, and it's still very annoying. It doesn't replicate schema change statements, so those have to be applied independently by some external system, and of course it's not going to be timed exactly right so replication will get stuck. And somehow for very lightly active applications generating just a few MB of writes per day, the stuck replication log can back up 10s of GBs per day. And does logical replication across versions for zero-downtime upgrades work in Postres yet? I dunno.
Connections you mentioned already, Postgres needs an external connection pooler/proxy, MySQL not nearly so soon. Vacuuming is a problem I never had with MySQL/MariaDB of course.
I'm an infrastructure dev, not an DBA or an advanced SQL user, I never used window functions, or wrote a query that filled half the screen, or have a thick web of foreign keys ... If you keep things damn simple so you have confidence in efficiency and performance, then MySQL works pretty damn well in my experience, and Postgres, for all its academic correctness and advanced query feature set, has been pretty damn annoying.
And, you can tell MySQL to use a particular index for a query!
I'm usually gonna reach for Wordpress first along with MySQL if I need to set up a basic blog for non-technical people to contribute to. Maybe the DB isn't as cool as it could be, but it works fine, and I hardly ever have need to touch the database itself directly.
When the client count and the business complexity explode, things start to get hairy fast, but by that time the investment in MySQL is hard to walk away from. You will start to hit all sort of edge cases in other DBs - which MySQL did not exhibit simply because your application grew around its feature set and limitations.
I'd like to point out that this seems to happen whatever db you move from and to.
Sometime over a decade ago I was part of a team moving from an ancient version of Sybase Adaptive Server Anywhere to the newest Microsoft SQL server running on a beefier server.
We ported everything carefully and tested it out. We also felt good since both where dialects of T-SQL.
...and then we ran right into so large performance problems that we had fall back to the old system.
The reason was that at least older models of had Sybase ASA had implicit indexes.
Feels like there is always something.
Then again, depending on your scenario, introducing a bunch of extra complexity into your app tier code may be less annoying than introducing a bunch of extra complexity (connection poolers etc.) into your infrastructure.
So it ends up that there's a bunch of workloads where mysql is obviously the better option, and a bunch of workloads where postgresql is obviously the better option, and also a bunch where they're mostly differently annoying and the correct choice depends on a combination of your team composition and where your workload is likely to go in the next N years.
Just as an example, there are so many people who assume they need a 3rd party search engine simply because what’s built into their database is so bad compared to the options you get with PG.
You’re welcome to only use your database as a crud data store while you manage a mountain of other tools to work around all of the limitations…but you don’t have to.
At that point you're effectively using MySQL as a key/value store with an SQL interface, rather than a full RDBMS.
Which is, in fact, pretty much exactly what MySQL was originally designed to be and it's extremely good at it.
> If you keep things damn simple so you have confidence in efficiency and performance
I can have that level of confidence on a minimally tuned (albeit sensibly indexed, you still have to make sure the indices you need -exist- on anything, realistically) postgres at levels of schema/query complexity where getting mysql to handle it -at all- would be ... an experience.
Having used both extensively, I'd say that there's a lot of knowledge when using mysql that's basically "introducing complexity into your application and query pattern to work around the optimiser being terrible at doing the sensible thing" and then there's lots of knowledge when administrating postgresql that's "having to pick and choose lots of details about your topology and configuration because you get a giant bag of parts rather than something that's built to handle the common cases out of the box".
So ... on the one hand, I'm sufficiently spoiled by being able to use multi column keys, complex joins, subqueries, group by etc. and rarely have to even think about it because the query will perform just fine that I vastly prefer postgres by default as an apps developer these days.
On the other hand, I do very much remember the learning curve in terms of getting a deployment I was happy with and if I hadn't had a project that flat out required those things to be performant I might never have gotten around to trying it.
So, in summary: They've chosen very different trade-offs, and which set of downsides is more annoying is very dependent on the situation at and.
This completely maps to my experience as well. I still believe Postgres' defaults being set to use an absolute minimum footprint, especially prior to PostgreSQL 9, significantly impacted it's adoption. It's better now, although IMO it could still do with an easier configuration where you could flag the install as dev/shared or prod/standalone to provide more sensible settings for the most common situations. Like, gosh, I'd like to be able to stand up an instance and easily tell it to use 90% of the system memory instead of 100 MB or whatever it was.
But the MySQL issues were much worse. Outer join and subquery problems, the lack of functions like ROW_NUMBER(), a a default storage engine that lacked transactions and fsync(), foreign keys that don't do anything, etc. I've met so many people who think MySQL limitations are RDBMS limitations, or don't understand database fundamentals because MySQL just didn't implement them properly. Then again, I also remember when the best reason to use MySQL was the existence of GROUP_CONCAT(), which everything else lacked for religious reasons.
I also vividly remember prior to MySQL 5.5 when new users of PostgreSQL, Oracle, or MS SQL Server would discover that, instead of making a guess for what you wanted, the RDBMS would actually return error messages and expect you to understand and solve them deterministically. And somehow they would be angry about it! Old MySQL (mostly 3.3, but 4.0 and 4.1, too) used to silently truncate data, or allow things like February 31 in a date field, or silently allow non-deterministic GROUP BY, or only reporting warnings when you explicitly asked for them really undermined the perception of MySQL to database folks. This wasn't that long ago, either.
I think Postgres is really cool and has a lot of good reliability, but I've always wondered what people do for high availability. And yeah, pgbouncer being a thing bugs,
It feels like we're getting closer and closer to "vanilla pg works perfectly for the common worklfows" though, so tough to complain too much.
See if you can understand enough of it and consider doing it again for next upgrade. (I have done that for 12 > 13)
I trust the pg people that tools do what are advertised, but it's a very high risk proposition
I forgot to post the article link!
Thanks for posting it.
You do need to use an external connection pooler (pgbouncer etc), though Postgres 14 has made some large improvements in gracefully handling large numbers of connections, though you can still run into problems.
As for the query planner/optimizer, so far in all of the optimizations and improvements I have worked on, I've only run into 1 or 2 that had a query plan that made my head scratch. There are some extreme edge cases that will prompt the cost function to prioritize the wrong index, but in production I have found that 99%+ of slow queries can be easily improved by a rather simple composite index. One thing I do love about postgres is using partial indexes, which can significantly reduce the amount of space required and also make it extremely easy to create indexes to match predicates of a function, while indexing other columns.
Other than one or two slow queries I've tracked down and worked on, I've never wished that I could "hint" to use a particular index over another.
Now some of the things that people like for Postgres over Mysql, in practice aren't that great at the moment. People like doing atomic DDL operations, when in reality locking the table schema can cause lots of problems, and in production you only add indexes etc concurrently.
You still get the issue of dead tuples in mysql and need to periodically clean them up using OPTIMZE TABLE et al, though postgres and innodb have different designs, but ultimately it needs to happen sometime. Its just that postgres IMO requires more tuning to get the right balance.
... then SQLite might be the right choice for you?
In all seriousness, so many use cases are met by SQLite on the low end, PostgreSQL on the high end, and dedicated KV stores like Elastic off to the side, that what is left for MySQL? When you have a write-heavy workload on a multi-TB dataset that needs to be sharded, with a schema that is "simple" but not so simple that it works with an actual KV store?
I didn't know this was a thing or even common until this year and it opened my eyes to some major issues when the PG optimizer doesn't get a query right you have no escape hatch.
That being said, and me being weak in MySQL, it wasn't obvious to me if this is very common and how often you _have_ to tell MySQL what index to use because the optimizer misses it...
For me the biggest issue with that is that the optimizer decision can change basically at any time. If the stats start preferring one index over the other at 2am, you can't just say "no, trust me, use this one instead" - you have to find out exactly which value crossed some threshold and tweak the stats to restore the previous behaviour.
> it wasn't obvious to me if this is very common and how often you _have_ to tell MySQL what index to use because the optimizer misses it...
I'm not sure how representative my sample is, but in the last 20 years, I've seen a hint used two times and one of them was an app bug (the forced index was not better, the order of columns in it should've been reversed and the hint removed). But I haven't worked with super-complicated reporting queries.
I’ve been using PostgreSQL built-in replication since 9.1 (when it was introduced) and this is definitely not my experience....
In fact it wouldn’t be possible to apply out of replication changes because (unlike MySQL) Postgres replication forces immutability (the replica nodes cannot become r/w).
The "issue" could potentially be that it's async? But I use postgresql a _lot_ and I've never experienced this.
The original mode of replication merely ships the WAL, which describes binary changes to data files. Every change comes in the form "set page 32435 in <file> to <bytes>", with no distinction between the operations that the changes represent.
Logical replication describes the changes logically. A change may be "insert <column values> into <table>" or "add <column name> of <type> to <table>". Since this format isn't tied to the underlying database file format, it offers greater compatibility between PostgreSQL versions. It also allows other applications to process the log and do things like listen for data changes and ingest them into other databases, queues, etc.
I seriously trialled and compared Postgres vs MySQL at the start of a major major project, and MySQL had a few clear wins for us (mainly replication) while the features Postgres had weren't in our current roadmap's requirements (the biggest regret that caused me was not having stored procedures). That was in 1998/1999. I now seem to be stuck on a career path where everything is MySQL and switching to Postgres anywhere seems to have huge back pressure from everyone I work with - even though at least half of them know damned well (like I do) that MySQL hasn't been the right choice for a couple of decades almost.
And I feel seen on the devs with no database skills thing. Every single database problem I can recall having to step in and help fix in the last ~5 years has been Java devs leaving everything up the the ORM to deal with, resulting in unindexed full table scans and similar brain damage. I "fixed" a production problem with a query that'd crept up to over 30 seconds without anybody noticing, by adding a single index to a single table. (And then sent the devs to work out how to give Hibernate the hints it needed to create the schema properly and set up more monitoring so once-weekly reporting SQL queries taking 10+ seconds wouldn't be hidden in 95 and 99 percentile alerting...)
The one thing that I've learned recently which surprised me- if you want to cluster MySQL (via partitioned tables or PlanetScale, etc) you cannot use foreign keys.
Some of the Postgres compatible tooling (Citus, Cockroach, etc) appears to support foreign keys across partitions / shards.
I was quite surprised to see this. Makes sense that not allowing FKs makes the data easier to shard but I never imagined that would be a thing you would have to contend with.
For reference, we're at about 250 million rows, so not even proper "scale" scale!
> Yes Postgres ... has worse connection handling that usually requires a pooler
A slow database can be ignored for a while. The need to add additional code in which the database is no longer a black box can not be ignored.
It sure is one more step that for MySQL, but we're talking production level DB, and in my experience it's a small task compared to the rest of what you will be doing to have your architecture properly running.
Just look at the changelog for the bugs fixed (and yes, there are unfixed bugs as well):
And the FAQs for the unusual behaviours to be aware of and find workarounds for:
I would love nothing more than to be convinced. I find MySQL to be adequate, but far from great. Postgres is described as great by everyone who uses it, however I prefer adequate but reliable over great but leads to downtime.
I'm with on how much it would help if it had a magic mode without these quirks, but in my experience you're still ahead of MySQL and its per session configuration quirks for instance. On a day to day basis, you'll rarely be scratching your head about how to deal with the pgbouncer pooling behavior you chose, an you'll still have the option to directly hit the DB on anything you don't trust the pooler to deal with appropriately.
For the bugs, those are mostly on the security and configuration side. For the few years I've used it, I never hit a behavioral bug TBH.
Is it though? I read this post earlier this year and it mirrors some similar tests I did back in 2020 when I was just checking basic perf between PostgreSQL / SQL Server / MySQL on my laptop.
> A simple conclusion: Postgres engine family is about twice as fast as MySQL engine family, except MariaDB.
> Postgres has worse connection handling
Is this addressed in PG14? Or is it still to come in PG15? I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.
pg14 improves the performance for large numbers (thousands) of connections. IIRC there is some work to add a primitive connection pooler as an option, but I can't find anything references on that ATM.
So the situation is improving, but the fundamental connection=process model is so baked into Postgres that it's unlikely to ever change.
People like to criticize MySQL for some bad engineering decisions made decades ago. For whatever reason, often these folks have a more positive view of MariaDB, but without stopping to consider where all those original MySQL engineers went. FWIW one of the footnotes in the original article touches on this too.
This is by design, postgresql doesn't come with a connection pooler. For each connection a separate process is started. If you have a lot of mostly idle connections then you should use use pooler in your application, server side (e.g. pgbouncer) or both (there are trade offs).
Optimistic mode for parallel in-order replication
I was at the Perl conference in maybe 1999 or 2000 where there was some guy giving a MySQL talk. It wasn't Monty, but he was in the audience. I asked a question during question time about enabling ssl for encrypted replication, and the guy giving the talk said "Sorry, I don't know if/when that'll happen". Monty stood up and said "I've just done that. I'll check it in tonight, come grab me out in the corridor if you'd like me to mail you the patch."
I then spent the next 7-8 years running replicated MySQL databases in SF/Dallas/London/Singapore/Sydney. It worked really well for us.
Fear of a db not coming back up in a restart and backups being corrupt as well.
We pull our backups into new instances daily and I think that would catch it.
Yeah, I am one of those people. Needed a database in 2005, and MySQL was the de facto choice. Got used to it and never ran into problems that couldn’t be solved by getting better at schema design and indexing.
I never felt limited by MySQL and I am very comfortable with it, so never felt the need to try anything else. I might be missing something, but there is an opportunity cost in switching without a real motivating reason.
It's not really that hard to change if you started with OFF, it just requires a logical dump and reload (typically on a replica, followed by some re-cloning and promotion). Companies with sizable data tend to have database engineers who can handle this fairly easily.
In any case, it's not an urgent maintenance operation like a VACUUM problem...
I look around now and you a right. It's really no big deal to move a few terabytes of data around but it sure was a mess back then and I'm once bitten, twice shy.
I was fortunate that the startup was willing to devote sufficient time and resources to help me get our DB problems under control -- including getting expert consulting services from Percona, attending conferences, etc. However I've definitely seen some other startups that didn't take DB tech debt seriously enough, and just kicked the can down the road until a major disaster occurred.
Anyway though, even aside from improvements in hardware and tooling, managing large DBs just gets less stressful once you've spent some time on the problem. Assuming you successfully solved this 6 years ago, why shy away from it in the future? You almost certainly learned a lot in the process, and the good news is it only gets easier from there!
Do you have an up to date source for that?
1. Change Buffering - If your benchmark result shows that MySQL is faster than PostgreSQL, it is likely due to change buffering. For random read/write workload that doesn't fit into the memory, this feature is fantastic. This is available since the very first commit of InnoDB, and of course also has been causing random corruption during crash recovery throughout the years: https://jira.mariadb.org/browse/MDEV-24449 (fixed in MariaDB, still exists in MySQL)
2. Page-level Compression - PostgreSQL still doesn't support this, but I am fine with relying on ZFS filesystem compression when running either MySQL or PostgreSQL. Arguably it should be better for the database engine to do its own compression, but see this blocker level bug that impacted FreeBSD: https://jira.mariadb.org/browse/MDEV-26537 (MariaDB only, but feel free to search bugs.mysql.com for other compression related bugs)
3. Clustered Index - The main topic in the Uber drama. I think InnoDB has the better design, since scaling write is harder for a RDBMS. I also think that InnoDB implementation causes a severe performance cliff, when it comes to optimistic update versus pessimistic update.
There are compromises there. With page level compression the pages usually remain compressed in memory as well as on-disk and get unpacked on access (and repacked on change). This eats CPU time, but saves RAM as well as disk space which for some workloads is a beneficial trade-off as a larger common working set fits into the same amount of active memory (particularly when IO is slow, such as traditional spinning disks, cloud providers when you aren't paying a fortune, or when competing with a lot of IO or network contention on a busy LAN/SAN).
With filesystem compression you don't get the same CPU hit every time the page is read, and depending on your data you may get better compression for a couple of reasons, but you don't get the RAM saving.
When running PostgreSQL, I would probably go with `primarycache=all` for the data dir, a largish ARC cache, and a smallish shared buffers, to take advantage of the filesystem compression.
When running MySQL, I would probably go with `primarycache=metadata` for the data dir, a smallish ARC cache, and a largish buffer pool, to still benefit (slightly) from the filesystem compression.
Of course, that particular example really only matters if you're actually hitting the 4MB limit in your queries, but I think there were some other settings with similar too-low defaults.
So the first thing you do after setting up Postgres on a reasonable hardware is tune the config file. There's a lot of advice on the web, and there's pgtune service that makes it very easy to start in the right direction (https://pgtune.leopard.in.ua).
Still, you need to know about it to do it, so there's definitely quite a lot of suboptimal Postgres installations out there.
I've not used either for a long time so my knowledge may be out of date, but mysql definitely used to be the easiest to get started out of the box which was part of the reason it won out in the shared hosting space. Of course the main reason it won out in that area was because it was faster than pg in many artificial benchmarks because at the time it didn't bother with silly things like transactions (as MyISAM was the default table type), and foreign key constraints, and other necessities…
Maybe its all better now?
When I first moved to Postgres I was fairly reluctant - my impression was that it was some arbitrary flavour of sql, and since I was used to MySQL (the only sql database I’d ever used) it seemed incredibly unnecessary to bother with. I imagine there are tons of people out there like I was who would rather let things be.
Since the first few weeks I don’t think I’ve ever chosen to use MySQL again; Postgres totally changed my career for the better.
What? SQL itself is cursed with people thinking there's one right way to pronounce it, and everybody else is stupid, but PostgreSQL has the advantage that you can just say "Postgres" and dodge that mess. Good luck saying "MySQL" without triggering at least one person in your office.
(I think the official MySQL stance is that both pronunciations are acceptable, but that doesn't stop pedants from being pedantic.)
1. Your users leverage unintended behaviours, which obstructs your ability to move forward with meeting needs with intended behaviours in your intended target market (unfortunate but very real)
2. You know if you add or improve existing features, you can access more users, or an prospective user worth more than the existing ones
3. It's a passion project and you want it to function/look/whatever a certain way
I've encountered all of these, and each time it becomes a pretty big pain to make progress.
There were very cool things I couldn't find a way to do in MySQL which eventually made the switch an amazing change. It was trivial to set up aggregated time series tables, and very easy to roll that data up in order to speed up queries with a coarser granularity. At the time I recall window functions making this really clean and easy. It was possible in MySQL (and maybe trivial today, too) but it felt like a massive kludge every step of the way.
This is actually what lead us to consulting someone who could implement the best solutions to these issues that they knew. They encouraged us to adopt Postgres before doing any work, we didn't listen, then as they wrapped up the project they once again advised we leave MySQL behind. He then sent some examples of how we might make the migration and why it would be better. Months later we had made the migration entirely and were very, very glad.
That guy was an AWESOME DBA, and he changed the way I value someone who is good with data and databases. He transformed our team's ability to grow our product, dramatically improved our ability to deliver, and instilled a ton of knowledge in a short time. I can only dream of imparting that to other teams!
Overall I'd say that Postgres provided tools that made working with data easier. I'm not sure if there are any specific features besides a cleaner, simpler,
> ...let me point out something that I've been saying both internally and externally for the last five years...
Life is short, don't get stuck in a job/relationship/situation/etc. you hate for half a decade. IMO the irony is that he's moving to the Chrome team, which has been an absolute shitshow. Woohoo, now you're going to work on fantastic ways of screwing up the web, inserting ads where we don't want them, tracking users against their will, and banning extensions on the Chrome store. Interesting technical challenges, indeed.
 Thanks to DocTomoe for finding this:
Some of the discussion of the article survives:
The article was a very big deal at the time, because it was so well researched and so devastating for MySQL.
I figured that article was the death knell for MySQL, but then MySQL surprised me by just going and going and going. It was helped along back then by having a very close relationship with the PHP language. Nowadays, you can use any database you want with PHP, but in 2000 PHP was very much biased in favor of MySQL, and every article written about PHP was written with the assumption you'd be using MySQL as the database. I suspect that MySQL would have died except for the massive life support it got from PHP during those years.
The moral I take from all of this is that sometimes you can have two technologies, and one is clearly better, yet the one that is clearly better can remain under-utilized for 21 straight years.
I know a bunch of HN people will now show up and defend MySQL, or defend a lazy style of programming that accepts defaults even when that means using a poor cousin of something good. But we should stop for a moment and really think about the implications of this. Because it really is remarkable that people have known of the superiority of Postgres for 21 years and yet people still use MySQL.
It doesn’t matter.
At least for ~90% of the projects. People use MySQL because it’s in the tutorial and it works, end of story.
I think the issue is that there are so many different reasons projects need any storage. There's a reason why you need persistence, there's a reason why you need ACID, there's a reason you need some form of replication ...
Eg: a cms like WordPress would probably have been better off just storing structured data in xml files on disk (or nfs) - along with media files on disk. Or using Berkeley db/later sqlite.
Your point of sale terminals might not fare so well without central inventory.
All of the above bodes well for someone trying to quickly hack together an app that does stuff, but for long term data integrity it's much easier than pg to introduce subtle differences and flaws in the data through some application code that works at face value for CRUD operations out of the db but down the line fails due to subtle data differences.
I have never seen a MySQL database that didn't have problems with dates. (But I haven't looked in a while.)
There are many factors to take into account, but this article focuses on the "operating reliably in production" factor and says that MySQL has fewer surprises:
As for Postgres, I have enormous respect for it and its engineering and capabilities, but, for me, it’s just too damn operationally scary. In my experience it’s much worse than MySQL for operational footguns and performance cliffs, where using it slightly wrong can utterly tank your performance or availability. In addition, because MySQL is, in my experience, more widely deployed, it’s easier to find and hire engineers with experience deploying and operating it. Postgres is a fine choice, especially if you already have expertise using it on your team, but I’ve personally been burned too many times.
For some definitions of better.
I started with MySQL. At some point I encountered project based on Postgres. It just came with its own set of quirks you had to deal with. It didn't feel like, wow, this is so much better. It was more like, yup, it's a database, it just COUNTs slowly and needs to be VACUUMed. I had more issues and questions at the time how to do things I knew how to do in MySQL, I figured them out, but those two are the impression that were left.
Then I encountered MsSql Server and it was pleasent enough. For my next project I'd probably choose Postgres but not because it created any kind of sympathy in me. The only thing I reallty liked was EXPLAIN.
MySQL has one advantage in my opinion. At least it had two decades ago. It was simple. It did things that realtionational databases are good at, in fast and easy manner. It didn't have constructs like nested queries, recursive queries, triggers or stored procedures with cursors which databases like Postgres had, because who's gonna stop them, but their performance is horrible and must be horrible because they step out of the realm of fast relational model into the realm of normal programming which algorithmically is way slower (and that's the main reason we use databases at all).
So MySQL in my opinion teaches you about what the databases are good at. It helped me immensly when dealing with slow queries later in life in Postgres, SQL Server and even Oracle.
None of the above means that I don't see MySQL as flawed in some ways. I'm in a group of developers that I suspect make up a sizable portion of the MySQL community who didn't choose MySQL, but must support it, if for no other reason than because we see ourselves as professionals, and that's what professional do: make the employer's application work reliably.
For applications that have already survived past the point of finding product/market fit, a wholesale conversion of DBMS is rarely worth it, and conversions of this type are costly/risky even if it is worth it. I do understand many of the benefits (real and theoretical) of PostgreSQL, and if I'm around at the moment when a project's DBMS is being selected I'm going to recommend _not_ MySQL, but at some level I'm also paid to make the application that my employer is running on top of their DBMS work reliably ... and the fact is even among people who get PostgreSQL - who prefer it, would choose it if they could - many of us are also pragmatic enough not to pull the rug out from under a running application for "reasons".
"C is a programming language designed for writing Unix, and it was designed using the New Jersey approach. C is therefore a language for which it is easy to write a decent compiler"
Is that still true? Over time, people have wanted to fix the flaws in C, so they have added features, and nowadays writing a compiler for C is less easy.
And there are other ways to measure the declining value of "worse is better." Many people would now argue that memory safety is worth the extra effort. Many would say the world would be a better place if C was banned and everyone switched to something like Rust. C continues to lose market share to those languages that guarantee memory safety, and yet C never fully dies, which is interesting.
Over time, the tax you must pay for the "worse" begins to cost more than what you gain from the simplicity. My point, above, was how long this can take. We think of the tech industry as fast moving, and yet many core technologies have had obvious flaws for 30 or 40 years, and yet little action is taken to move to better technologies.
For example, Git can sign commits using gpg and x509 and now in version 2.34, OpenSSH keys. Although OpenSSH can be used to sign data, signify is a much better tool for this task.
Going to postgre - much pickier. I found it slower out of box. You couldn't just throw tons of connections at it (ie, connection buildup / teardown felt slower). I had issues initially with quoting and capitalization etc.
That was a long time ago. Now I enjoy postgres and haven't touched MySQL, but there is a real history where MySQL was the database you could get going with pretty easily (I was pre-Oracle buyout).
Postgres forks a listener process for each connection, which is relatively heavyweight. Oracle used to do that, but implemented a separate multi-threaded listener process for performance reasons. MySQL is multi-threaded from the beginning.
I thought on Linux that forking had the same cost, more-or-less, as spawning a new thread?
But then the cost of starting to use the process introduces a second speedbump, because once you start exercising it you start COWing things.
There's also the cost of task switching. I don't know the details, but I wonder if modern side-channel mitigations in kernels flush a lot of stuff when switching processes that they don't need to do for threads?
Postgres uses a lot of shared memory to communicate between the processes, so its really its own implementation of threads. Postgres is how it is because its history and portability from a time before threads.
In any case, if you were starting out today you really ought have a few threads that are shared and use async processing and plenty of io_uring. Modern Linux allows massively better mechanical sympathy but all the big database engines are already written and can't shift their paradigms so easily. This is from 2010 about how to massively speed up databases with syscall batching (in this case, MySQL is used in the study): https://www.usenix.org/legacy/events/osdi10/tech/full_papers...
I'm sure there are brilliant tricks but it's hard to imagine copy-on-write being completely free
When I started playing with postgresql I had no clue why folks liked it. My memory may be bad, but we used UPSERT and replication. On Postgresql at least through 8 you could not upsert or stream replication.
Regardless of the technical facts, unless there is a very strong ethical argument in favor (for example, say, the employer is outright lying in a highly fraudulent way and customers need to know) .... I find this sort of behavior to be slightly unprofessional. But I am curious if others feel the same ... and does it make a difference that it is Oracle?
1. At all times, you should tell the truth.
2. As long as you're employed, you either speak the company line, or carefully avoid commenting, in order to avoid breaking rule 1. If you can't neither avoid commenting or lying, you should quit.
3. Once you're no longer employed, you should avoid spilling company secrets, disclosing confidential information, etc. Beyond that, see rule 1.
I'd be concerned if he:
1. Went to conferences and recommended MySQL, despite thinking it was a bad product
2. Went to conferences and trashed MySQL, despite being employed in part to promote MySQL
Both of those would be, in different way, unprofessional. But what he did? Seems fine to me. I certainly don't think he was obligated to share his unvarnished thoughts about MySQL, but I don't think he was obligated not to either.
I find this sort of behavior to be slightly unprofessional.
I find it highly professional, in the sense that he was being 100 percent honest -- and true to his craft. And not in the least vindictive or spiteful in regard to his previous employer.
They made it very clear that while they've held these opinions for a while, they've avoided situations where they would be presented with either lying about their opinions or bad-mouthing the company they work for whether out of respect or fear. I think that's as professional as you can expect people to be. If, as a company, you expect people to not speak about their subjective opinions after they are no longer getting paid as an employee, you need to be willing to offer them something for that, because what you're really asking for is an NDA.
Can he do it? Sure. Should he? Maybe, depends on his values. Hopefully he understood how the message would be received and is OK with the possible consequences, and valued self-expression and making an authoritative critique of MySQL more highly than the negative impacts of rubbing a few readers the wrong way as they perceive a breach of generally accepted professional decorum.
I'm responsible for vetting possible hires to work under me. What this person did falls well within what I would accept as responsible behavior for someone I was considering hiring. The fact that they note they went out of their way to avoid conferences where they would likely be forced to compromise their morals in some way, either through going against the wished of their employer or being untruthful, speaks very well of their character, if it's to be believed as presented. People that have no problem lying for the company, or badmouthing the company to external people while they work there, are both types of people I would desperately try to avoid.
Stop mincing words, please.
If it's deemed "unprofessional" to talk about it, then to all intents and purposes -- "if he knows what's good for him" as the saying goes -- he can't.
I don't think there's a reason to push the former, as it's got plenty of natural support from the unequal relationship between employers and employees. We may not be able to throw off the shackles entirely, but let's not shame those who rattle them a bit, you know?
They are entitled to write whatever they want, but not taking into account the feelings of the people they were presumably working close with in posting something like this shows a startling lack of empathy.
I'm sure this developer is a genius, but I would take empathy over brilliance every time.
Edit: To be clear, if there is anything unethical or toxic about the workplace culture, they should absolutely post about it publicly to help fix it if that's the best route to address it. This post did not read that way to me. But that's just my opinion.
I mean, presumably your confidence in being a good employer? People don't decide to trash talk their previous employers at random, right? Surely there is some cause-and-effect involved.
Wouldn't it be a good incentive as an employer to not do things that would get you trashed by your employees ?
Sure some people could be revengeful for all the wrong reasons, but I don't think they're many, nor that they usually have a huge platform to throw trash, and these thing would also be denied on the spot by other people in the know.
Not giving people loads and loads of things to feel bitter about. And an eggshell-strewn environment where it's basically impossible to air these concerns with anyone upstream.
It's really quite simple, actually.
This happens in every field, even in "tech", even though organised labor could absolutely leverage the insane demand for qualified worker to get a better deal, but I guess that's being a largely depoliticized and generally ignorant workforce for you : you get less.
Anyway you are released from any and all restrictions on your free speech the micro-second your contract doesn't specify it anymore.
If your previous employer sucked, feel free to say it if you want to.
The gag order you sign in your contract regarding bad mouthing your employer has nothing to do with professionalism, and everything to do with a power equilibrium between employers and employees.
The problem to me here is, OP dumps on the work of their former colleges and the criticisms of MySQL aren't backed up and seem a little naive.
E.g., only very useful software survives a long time with many users, and such software also very typically has major warts, especially looking from the inside.
Reading between the lines and subtracting OPs tone, MySQL actually looks like it's in better shape than I would have thought: when OP arrived the optimizer was a mess. Now it's in good shape. Better yet, management is fully supportive and investing in major improvements. Awfully good for a long-lived open-source project.
I might be a little skeptical that OP could thrive on Chrome, except that we all mature as our horizons widen.
When MySQL hit the scene at the start of the whole LAMP thing, PG was much slower in some use cases than MySQL, and proprietary databases were super expensive. There weren’t too many options. PG was still fairly rough around the edges.
Of course I also needed transactions, and they didn’t come with MySQL - that would hurt performance apparently - so when I was forced to leave Solid due to HP’s acquisition and subsequent pricing hike, PG was the only option for me.
But most developers didn’t need (or didn’t know they needed) transactions, and SQL was the tech de jour, so off they went.
I tried to like MySQL, but the weird not-quite-SQL syntax, lack of transactions and this _weird feeling I got_ put me off.
MySQL is a simple database that's easy to set up and run. Don't make the mistake of thinking there's nothing better, but for certain use cases, it gets the job done.
Before that I ran something called Unify which was ... interesting.
I joined my first startup on 2010 and we had a managed PG database that was good enough for us I reckon.
Looking back at the release notes I can't believe how many of those features are critical to how I use PG today. Really some visionary work there.
> With significant new functionality and performance enhancements, this release represents a major leap forward for PostgreSQL. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the following major features...
We used the Solid database at my previous company in 1998-2005+. It was a great piece of software. IBM bought it, rebranded it as an in-memory database, marketed it to telecom companies, and now it seems to be completely focused on that: https://en.wikipedia.org/wiki/SolidDB
It's a shame because it was so simple to manage and "solid" as a rock. Worked really well for us and we never had performance problems with it.
Regardless, it was an awesome database, ahead of it’s time in many ways. And amazing support.
Funnily enough I'd say the feeling was what really put me off PostgreSQL. Everything just felt slightly more cumbersome to do; I had to constantly look up all these backslash commands whereas things like SHOW CREATE TABLE might be slower if you spend all day doing database admin, but were a lot easier to remember as a developer who only used them occasionally.
If you have a form submission that updates multiple tables - say, adds you as a customer, saves your order, and adds it to a queue - then a transaction ensures that either ALL of the tables are updated or NONE of them are. In this way you don't end up with half-orders, or data that you can't use later.
It's far better user experience to fail, than to lie about having succeeded and throw the user's data away.
Transactions are one of the most useful tools in the data toolbox... you can get work done without them, sure, but most of the time you just end up creating something that looks a lot like a transaction, but slower...
2. Error handling becomes much easier if you can raise an error at any point during request processing and it simply rolls back the whole transaction, since you don't have to duplicate all validation to also run before you first write to the database. One important case is if there is a bug that triggers an assertion failure in the middle of the request processing.
True, but if your transaction is retriable in that way then you might as well write the initial request and then do the processing async afterwards (i.e. event sourcing style).
We've personally scaled MySQL to many tens of thousands of concurrent users without too much trouble and without a dedicated person watching the database.
That said, I've only really worked significantly with MSSQL and MySQL, so my points of comparison are based largely on what I've heard. The majority of my last 15 professional years have been spent with MySQL.
There are many things that MySQL does that Postgres still can't do, like memory tables or pluggable storage engines.
And for anecdatum: I've been serving a million people concurrently on PG without any handholding except the initial setup: but that tells you nothing about what we were actually doing with it.
Also, how many concurrent connections do you max at Postgres and what do you use to manage it?
I can help shed light here. Postgres doesn’t make you choose between two storage engines and their compromises. It’s been years, but last time I looked the trade offs between MySQL storage engines were material and the newer engine was missing some valuable features of the older one. The advice was “pick the right engine for the job” - but I don’t want to, that another decision I have to make, why make me do that? At least until they come out with a new storage engine that’s “objectively better than Postgres”, I will enjoy the simplicity of using Postgres with its single storage engine. Choice isn’t always better.
I searched to see if anything has changed and found that InnoDB doesn’t support full text search - give me break, after it’s been the default for years, you gotta be kidding. https://hevodata.com/learn/myisam-vs-innodb/
Here's the manual page from 5.6: https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-inde...
If the first result is not reputable, that’s still a smell (but a different kind of smell) for MySQL. https://duckduckgo.com/?q=innodb+vs+myisam
Hevodata.com appears to be selling an ETL product. This post you're linking to is effectively SEO / content marketing. It is not written by MySQL experts.
Regarding fulltext support in InnoDB, there are many results when searching for "innodb fulltext".
Coming full circle here, in your opinion, are pluggable storage engines a virtue of MySQL, for practical purposes?
Most MySQL users should just stick with InnoDB for everything. It's an extremely battle-hardened choice with excellent performance for the vast majority of OLTP workloads.
However, if you're a large company and storing a massive amount of relational data, MyRocks has significantly better compression than any other comparable relational database I'm aware of. FB poured a ton of engineering effort into it, because those compression benefits provide ludicrous cost savings at their scale. For most companies and use-cases though, the benefits may not be significant enough to justify using a less common engine.
Aside from these two engines, AFAIK there's currently no other popular general-purpose modern storage engine that is widely used for MySQL. There are a lot of random third-party ones, but historically it's risky to tie your business to an uncommon storage engine.
I'm not a DB internals engineer, but from what I understand, some of MySQL's downsides are a direct result of the added complexity of having a pluggable storage engine architecture. Mixing-and-matching multiple engines in one DB instance can also be risky (it affects crash-safety guarantees of logical replication on replicas). So given that most companies should just use InnoDB anyway, for practical purposes pluggable engines are not a huge advantage for most users today.
But in the future, who knows; it's good to know the flexibility is there.
And had MySQL never supported pluggable engines originally and MyISAM had been the only option, MySQL would be dead and forgotten. InnoDB was originally developed by a third-party company that Oracle acquired. Interestingly, Mongo followed a very similar trajectory, replacing their initial engine with WiredTiger, developed by a third-party company they acquired.
InnoDB is more fault tolerant and modern feature rich (e.g. Foreign keys) than MyISAM. MyISAM is faster in practice despite Oracle’s continuous claims to the contrary.
MyISAM is largely undeveloped at this point. MariaDB hard forked MyISAM into Aria and have included better fault tolerance and other general improvements. It’s what I use for my personal projects.
Memory tables are strictly in memory and useful for doing very quick processing on things you are certain will fit entirely into system memory. Their data is lost when the system is rebooted or MySQL is restarted.
I've used both MySQL and PostgreSQL and I prefer MySQL, it seems to work and scale better "out of the box" and has a more comprehensible permission model than Postgres. I might be biased somewhat as I've used MySQL a lot longer and have even written plugins to interop with it but its still my number 2 go to (after SQLite)
But then again, you can tell he was worn down from his experience working there. The industry has a way of doing that to people.
If it had read more like "I left because I was frustrated with code quality", and actually took ownership of the feeling, that'd have made a big difference. It reads more like "i left because the code sucks, everyone around me is dumb, and our users are sheep"
I don't think anyone's going to suddenly switch to postgres as a result of this blog post...so really, what's the point?
Maybe it'll get more attention on the systems he called out? Probably not. The post just reads as bitter, mostly about the mindset of other contributors, but doesn't outline what they tried to do to change the hearts and mind of those contributors. Or put another way: what did this person do to lift up all of these people? Deriding doesn't actually help anything.
If the goal is just to vent that's fine, but there's really not much else to see here.
Not in the least. The fact that he doesn't say what you're saying he says ("everyone around me is dumb, and our users are sheep"), but rather simply sticks to brass tacks -- and it's pretty hard to counter his main point (about the quality of MySQL as a product), after all -- belies the characterization you are trying to make of what he said.
The post just reads as bitter,
Anytime anyone, heaven forbid, talks the plain and simple truth about the conditions many of us work under in this industry -- they get characterized as "bitter", "derisive", "just venting", or (especially in the context of describing our past work experiences to prospective employers), "badmouthing". Or as you put it: "childish".
Such is the state of our industry.
But I do think that you're either mistaking "plain and simple truth" for callousness, or creating a double standard. Are descriptions such as "bitter" not the "plain and simple" truth for this post? I certainly think it reads bitterly.
They're not really complaining about the work conditions, so I'm not really sure what you're referring to there. It's just code. It doesn't bite. It does ossify. It's unpleasant to deal with, but it's part of the job.
It seemed like they had supportive, albeit corporate management.
But rather: being asked to work on products we just can't really believe in, to be silent when upper management (though otherwise supportive and presumably in no way outwardly abusive or mean) would prefer that we keep our blinders on, etc.
It's just code.
His main concern was the quality of the product as a whole -- and the lack of awareness in that environment of what, to him, seemed to be simple and obvious facts. The remarks about the "bad code" almost tangential (like he said, "it didn't bother me much").
You yourself pointed out: “Well, readings are subjective…”
Perhaps I'm being dense - could you highlight the plain and simple truth part?
It’s about you reading ‘bitterness’ in to what they wrote. The bitterness is in your mind.
Agreed - across this thread, there's been a lot of what imputation of bad or petty intent and/or of a disagreeable emotional state on the part of the blog author that just isn't called for.
I read this as simply being honest (in the "Dutch" sense). It wasn't like he was shit-talking his former co-workers, per se. He's just saying he had a radically different appraisal of technical viability of the flagship product.
Which, again, was simply the truth as he saw it.
Really, look at the dictionary definition of the term, please.
> (it seems most MySQL users and developers don't really use other databases)
> But perhaps consider taking a look at the other side of that fence at some point, past the “OMG vacuum” memes.
> Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.
> I am genuinely proud of the work I have been doing, and MySQL 8.0 (with its ever-increasing minor version number) is a much better product than 5.7 was
There’s definitely a lot of “our users are sheep” and “everyone else is dumb” going on in this post.
Heck, it may even be true. But you can’t really argue that hybrid author isn’t saying it.
Well, we disagree then. I see his post as making some definitely very harsh critiques -- but still short of the threshold of outright insulting people.
Yeah, but the guy is worth millions, ain't he? I assume he's pulling in 6 figures?
And from now on everybody who has issues with the query optimizer won't shamefully look for documentation on how to fix _their_ use case, but publicly stink mysql for having a shit optimizer (which it has, btw).
Which in turn will support those developers inside mysql who want to push things forward. They won't seem like cowboys that want to fix a good thing - they'll have lots of user complaints to help them argue, and also the awareness that maybe, just maybe, mysql will get left behind if they don't move faster.
That MySQL has a poor optimizer has been known for as long as MySQL has been a product. The product is legendary for its inability to do even basic RDBMS needs competently, and the rise of NoSQL was largely people assuming MySQL limitations were general RDBMS problems (for instance its painful incompetence doing basic RDBMS tasks like joins).
Having said that, a couple of decades in this industry has me reading this post and immediately sensing oozing bitterness. That maybe he got passed over for a promotion he felt he earned, etc.
When someone does the "it's all crap" exit, it's seldom from a good place. Who could seriously have applied to and joined the MySQL team without knowing that it isn't exactly the pinnacle of database systems?
Having said all of that, it's interesting seeing pgsql being held as the panacea. I like pgsql, and prefer it among open source database systems, but in many ways it is a decade+ behind MSSQL and Oracle.
I don’t know the author, but assuming he has publicly made such comments before it would behoove him to link to those posts so it’s clear that he has been raising these complaints publicly for a while and didn’t just wait to dump the consequences of his public post on his now former colleagues the moment he got out.
It does because all adults are cynical nowadays and rather than speaking truth they will just sweep it under the rug for political convenience.
> I don't think anyone's going to suddenly switch to postgres as a result of this blog post...so really, what's the point?
Spreading the truth? Does it even need a point?
> The post just reads as bitter
Yeah it does, I still find it refreshing that this guy is just calling out all the bs as he sees it. He has nothing to gain from writing this, a lot to lose. It's pretty entertaining. Hope it doesn't affect him negatively.
It's possible to speak the truth without being callous. I'm not really sure what political convenience has to do with this.
> Spreading the truth? Does it even need a point?
No, but it should probably have one if it's going to be a discussion on HN.
> Yeah it does, I still find it refreshing that this guy is just calling out all the bs as he sees it. He has nothing to gain from writing this, a lot to lose. It's pretty entertaining. Hope it doesn't affect him negatively.
It's great that you're entertained by it. Personally I it just gave me the impression that the person would benefit from therapy.
Political convenience is not burning any bridges. The only reason people aren't callous is fear of retribution.
>No, but it should probably have one if it's going to be a discussion on HN.
I really hate the word should, it's almost always used to force opinions on people without justification.
> Personally I it just gave me the impression that the person would benefit from therapy.
He would probably only need therapy if he stayed at oracle.
i.e., the problem wasn't that the executor was bad, the problem was that everyone thought it was ok
And maybe they're not a people person, and trying to enlighten isn't what they signed up for -- all good -- I for one certainly don't want anyone working a job that they're unhappy with.
But the post is littered with putdowns --
> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product. At the same time, I was attending orientation and told how the optimizer worked internally, and I genuinely needed shock pauses to take in how primitive nearly everything was
> Don't believe for a second that MariaDB is any better. Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.
I guess I've just seen this attitude enough where it's boring. Shock pauses, very primitive, bad code. Got it. Moving on...
Spending years in an environment where a core component of your product was crippled by technical debt and you were surrounded by people who didn't -understand- how crippled it was does seem like a recipe for understandable bitterness.
People saying "yes, we know, but rewriting that isn't the business priority right now" is a different matter - that's often aggravating but the right call - but not even acknowledging the problem is unhealthy.
Note that I've made a couple other comments on this article that do their best to acknowledge how much more of a pain in the ass learning how to setup postgres replication is than mysql, because even though I prefer postgres most of the time it's still -true- and I don't see how hiding from that fact makes anything better for any user of anything.
Of course the planner is just part of a database, and I have some kind words to say about the other technically-impressive bits of MySQL:
A lot of MySQL users are websites and things with pretty CRUD access patterns. That wasn't me.
My history is using it for high-throughput real-time batching and fancy buzzword stuff at reasonably massive scale (big distributed teleco systems), which is where a lot of expensive choices were pitched.
Doing teleco systems with MySQL was staggeringly cheaper and actually quite cheerful and, for all the times I swore at it, I'm actually still a fan.
I went with MySQL for advanced features that, at the time, Postgres was way behind on. MySQL had lots of storage engine choices (including TokuDB, which changed everything for my use-cases) and upserts and compression and things that put it way ahead of Postgres.
Of course MySQL had lots of warts too. The query planner was completely poo, but 99% of uses are simple things that it can handle well, and the most critical times it gets things wrong you end up annotating the sql to force indices and do the planner's job for it etc.
Of course, nowadays, Postgres is reaching parity on these things too (except, perhaps, compression. My understanding is that Postgres is way behind on decent in-engine page-based compression. It will presumably get something decent eventually.)
I ended up proxying my queries with a preprocessor that uses special comment syntax - e.g. `-- $disable_seqscan` - which wraps query execution in sets of enable_seqscan off and on again, to force PG to use the index. All databases can have performance that falls off a cliff when changing statistics make them choose a less optimal join order (join order is normally the biggest thing that affects performance), but PG is particular in not having much flexibility to lock in or strongly hint the plan.
MySQL, on the other hand, is predictably bad and has STRAIGHT_JOIN and other friends which make things much easier to tweak.
You can add VDO to the LVM stack but thats another layer in the complex ext4/xfs lvm layering approach.
Agreed that postgres has amazing performance. I was actually an early convert to postgres back in 2004 because my boss and coworker at the time saved entire businesses by migrating them to postgres when they had enough of trying to scale MySQL with more hardware vertically.
But in the later decade when I myself have used clustering to scale I've found that Galera feels less like a tacked on afterthought than anything I've seen in the postgres world.
It's just a lot more integrated than pgoool and watchdog.
I know there are new solutions now that run the postgres clusters inside kubernetes that I haven't tried yet. But that isn't more integration, that's just more 3rd party abstraction.
Edit: I realize after I wrote this how strange it must seem to a developer. Because galera/wsrep is actually a 3rd party replication product "tacked on" to MariaDB. While postgres replication, afaik, is written into their mainline code. I guess my gripe wasn't about the replication but rather the "clustering" around it, like maxscale/haproxy/pgpool and so forth. And in that sense they seem pretty equal, I just chose to use pgpool for my postgres clusters and that is definitely a hacky, scripty mess compared to HAproxy or Maxscale.
Galera I gaze upon with envy, no question.
Compared to Postgres which has been shipping features after features.
I have pieces of half commented code where I do the optimizer's job and try to guess which index is best and chose it manually. Lately I've been considering doing an "explain" first and based on that tweak the query.
So yeah - I love MySQL, and I don't think I'll be able to invest the time to switch, but at least now everybody can see the emperor is naked and it wasn't just them. The optimizer sucks, including in MySQL 8.
Most of the people annoyed with MySQL in these comments are annoyed because it feels like you have to write your own optimiser
Welcome to software, I guess :D
https://dom.as/2015/07/30/on-order-by-optimization/ - Wisdom from the guy who managed to get MySQL to work at facebook scale.
This is an insider view, meaning that he is probably addressing things like code quality, compile/run/debug workflow, technical design decisions and so on.
As a 20 year MySQL user - starting with version 3.23 if my memory serves me well - and after billions upon billions of inserted/updated/deleted/queried rows, MySQL is not a pretty poor database by any measure. It has served me well:
Good performance, near zero maintenance, very few crashes, and a couple of data corruption that resulted in data loss after power outage.
In 30 years, it's never happened to me with Postgres or the other two database engines I've used professionally.
If it did happen, then most people who know about databases would say "that's a pretty poor database".
I knew I was not running MySQL in ACID mode (via innodb-flush-log-at-trx-commit=0), and I sacrificed safety by performance. I was OK with that in my specific use-case.
> internal company communications tried to spin that Oracle is filled with geniuses and WE ARE WINNING IN THE CLOUD.
From wikipedia about mariaDB:
> A group of investment companies led by Intel has invested $20 million in SkySQL. The European Investment Bank funded MariaDB with €25 million in 2017. Alibaba led a $27M investment into MariaDB in 2017.
From wikipedia about MySQL:
> MySQL is also used by many popular websites, including Facebook, Flickr, MediaWiki, Twitter, and YouTube.
They certainly have their place. I'm sure some of these companies have considered Postgres.
That was one of MySQL's killer features. No matter how bad any other part of MySQL might be, it did replication. PostgreSQL had some third-party add-ons to do replication, but it was hard, slow, and reasonably easy to do wrong. Some of them were downright bad ideas where they'd just proxy your request to two independent PostgreSQL instances and hope that nothing went wrong, others used triggers on the tables. Literally, the Postgres core team said this about PostgreSQL: "Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases."
No, those companies wouldn't have considered PostgreSQL and once you get to a certain size, things tend to stick around.
That said, many of these companies aren't using MySQL for a lot of new stuff. YouTube developed Vitess to handle some of their MySQL problems, but from what I've heard they've moved off MySQL since then (correct me if I'm wrong). Twitter has its Manhattan database. Facebook has gone through many databases. MediaWiki is a project that people are meant to be able to run on shared hosts and that means PHP/MySQL. Flickr isn't really a company that has done a lot post-2010. That doesn't mean it's a bad site, but it doesn't seem to be making a lot of new stuff.
Decisions have context. Without the context, it's easy to come to the wrong conclusion.
I don't hate MySQL and PostgreSQL has its problems. I think MySQL's strengths are generally in its current install base and current compatibility. Lots of things work with MySQL. Vitess isn't perfect, but it is a nice project for scaling a relational database. PostgreSQL doesn't have a Vitess. Likewise, many things already work with MySQL like MediaWiki and many things speak the MySQL protocol. However, that's starting to shift. I think we're seeing more things adopt PostgreSQL compatibility. Google's Spanner now has a PostgreSQL layer. RedShift, CockroachDB, and others are going for PostgreSQL compatibility.
The thing is that ecosystems take a long time to shift. If it were 2004-2006 when Facebook, YouTube, and Twitter were created, I'd definitely have grabbed MySQL. You need good replication. PostgreSQL wasn't even talking about bringing replication into core back then, never mind having something available. Times change and software changes.
Main use cases (social graph, messaging, ..) are on MySQL (and never left it). Storage engine is different, replication is improved, etc, but it is still tracking upstream MySQL tree.
The thing is - PG still relies on physical replication, and staying with logical one allows to use it for out-of-DBMS change data capture and reuse in other systems.
> MediaWiki is ...
Funny tidbit - I tried to move over Wikipedia to PG ages ago, and did the initial prototyping and made it work to a certain degree. But also I learned about InnoDB more at the time.
It’s refreshing to know that tier 2 cloud companies (Oracle, IBM etc) all have similar internal perspectives, ie leadership insisting that they’re making amazing progress in the cloud while their market share either stagnates or reduces.
I don't have understand why people think they need to provide a reason.
They are just part of economy and should constantly look for better opportunities for themselves because that's what drives the economy towards more efficient state where resources are better utilized. Not to mention they themselves have just one life and have full moral right to live it the best way they can. They don't owe anyone anything they didn't promise and employment is not a promise of dedicating your whole life to a project or an employer.
But either they choose to work on a high quality product, or they choose to work on a product whose quality they can improve (low hanging fruit).
What is LOL worthy?
What strikes me first is how much code there is…
There are tons of code, some things copied multiple times, because Chrome nowadays does a lot of things; basically it’s an entire operating system, which accesses USB, runs assembly code, runs WebGL, basically all.
And it’s all in C++, and very verbose Google C++.
But in the end I always found what I was looking for there. And I can’t say how good or bad the C++ is, as I’m not C++ dev. Just there is a LOT of it.
I am tempted to put a reminder in my calendar for a few years' time to see where this engineer is, though...
I think this is unfortunate culture change in Oracle land - they hire mercenaries rather missionaries to work on the code, so such "I never loved it, but I worked on it for years" is not a surprise.
MySQL, MariaDB, MongoDB, PostgreSQL all have skeletons in their closet. Yet if PostgreSQL developer would be leaving with similar attitude community would raise much more to defense
The same holds true for PostgreSQL.
There're many reasons to pick MySQL over PG at large scale deployments (economics, replication strategies, etc) - and the fact that some queries will run better on PG may not outweight those benefits.
Don't get me wrong, I know many areas where MySQL sucks, and that is mostly in lacking execution strategies and optimizer problems. Indeed, in many of these ways MySQL is stuck years behind, but in other areas (MyRocks, modern hardware use, etc) it is far ahead of PG.
The thing is, optimizing has costs, and bypassing those costs is useful, if you're looking at economics of your system, and storage engines that pay attention to data patterns are ahead of just naive heaps of data.
It is very simple to diss MySQL when your business does not depend on managing petabytes of data.
P.S. We've migrated a major social network from PG onto our MySQL-based platform and our MySQL DBAs didn't even notice. :-)
It's not like it's an industry secret either: when I started my CS training in 2007, the first course was about relational databases and one of the first things the teacher told us was that MySQL was pretty sucky.