> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
Wow, right on the chin with that one.
> More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases)
I only have anecdata of my usages of MySQL and Postgres but I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing.
Yes Postgres can be slower out of the box and yes Postgres has worse connection handling that usually requires a pooler but the actual engine and it's performance makes it worth it in my opinion.
The author was rather focused on the optimizer/scheduler. I can easily believe that it is much less advanced and robust, and messier, in MySQL. I really liked MySQL (and more recently MariaDB), and I have experienced some significant annoyances with Postgres recently, and it's for a completely different set of reasons.
Logical Replication is the big one. MySQL has had logical (and "mixed") replication for a decade (or more?). PostgreSQL has gotten built-in logical replication only recently, and it's still very annoying. It doesn't replicate schema change statements, so those have to be applied independently by some external system, and of course it's not going to be timed exactly right so replication will get stuck. And somehow for very lightly active applications generating just a few MB of writes per day, the stuck replication log can back up 10s of GBs per day. And does logical replication across versions for zero-downtime upgrades work in Postres yet? I dunno.
Connections you mentioned already, Postgres needs an external connection pooler/proxy, MySQL not nearly so soon. Vacuuming is a problem I never had with MySQL/MariaDB of course.
I'm an infrastructure dev, not an DBA or an advanced SQL user, I never used window functions, or wrote a query that filled half the screen, or have a thick web of foreign keys ... If you keep things damn simple so you have confidence in efficiency and performance, then MySQL works pretty damn well in my experience, and Postgres, for all its academic correctness and advanced query feature set, has been pretty damn annoying.
And, you can tell MySQL to use a particular index for a query!
This is about where I'm at. I've used both DBs and adminned sites with them. Sure PG has a ton of nice SQL features that are really handy if you want to do any kind of more advanced analysis. I wouldn't be surprised either to learn that it's much more efficient on executing complex queries. But the bottom line is that the great majority of production usage is basic CRUD queries using simple SQL, and a lot of applications will never need to use advanced SQL. The things MySQL does do better, replication and connections like you mentioned, tend to be more suited to keeping big dumb basic sites up and running reliably.
I'm usually gonna reach for Wordpress first along with MySQL if I need to set up a basic blog for non-technical people to contribute to. Maybe the DB isn't as cool as it could be, but it works fine, and I hardly ever have need to touch the database itself directly.
it's prety much a 90%-10% issue: Mysql is quite good at the 10% of things regular users need to do 90% of the time. A website that performs simple selects and inserts, for a few hundred clients a second, only cares about speed and consistency, and MySQL delivers.
When the client count and the business complexity explode, things start to get hairy fast, but by that time the investment in MySQL is hard to walk away from. You will start to hit all sort of edge cases in other DBs - which MySQL did not exhibit simply because your application grew around its feature set and limitations.
> When the client count and the business complexity explode, things start to get hairy fast, but by that time the investment in MySQL is hard to walk away from. You will start to hit all sort of edge cases in other DBs - which MySQL did not exhibit simply because your application grew around its feature set and limitations.
I'd like to point out that this seems to happen whatever db you move from and to.
Sometime over a decade ago I was part of a team moving from an ancient version of Sybase Adaptive Server Anywhere to the newest Microsoft SQL server running on a beefier server.
We ported everything carefully and tested it out. We also felt good since both where dialects of T-SQL.
...and then we ran right into so large performance problems that we had fall back to the old system.
The reason was that at least older models of had Sybase ASA had implicit indexes.
I've seen quite a few codebases where the complexity of the problem did but they basically bodged around it in the app tier in order to keep the SQL within what MySQL could handle - and it's -very- easy to not even notice you're doing that because it's a technical debt/death by a thousand cuts sort of experience when it does happen.
Then again, depending on your scenario, introducing a bunch of extra complexity into your app tier code may be less annoying than introducing a bunch of extra complexity (connection poolers etc.) into your infrastructure.
So it ends up that there's a bunch of workloads where mysql is obviously the better option, and a bunch of workloads where postgresql is obviously the better option, and also a bunch where they're mostly differently annoying and the correct choice depends on a combination of your team composition and where your workload is likely to go in the next N years.
A lot of applications don’t use more advanced SQL simply because it’s not available.
Just as an example, there are so many people who assume they need a 3rd party search engine simply because what’s built into their database is so bad compared to the options you get with PG.
You’re welcome to only use your database as a crud data store while you manage a mountain of other tools to work around all of the limitations…but you don’t have to.
> I'm an infrastructure dev, not an DBA or an advanced SQL user, I never used window functions, or wrote a query that filled half the screen, or have a thick web of foreign keys ... If you keep things damn simple so you have confidence in efficiency and performance, then MySQL works pretty damn well in my experience, and Postgres, for all its academic correctness and advanced query feature set, has been pretty damn annoying.
At that point you're effectively using MySQL as a key/value store with an SQL interface, rather than a full RDBMS.
Which is, in fact, pretty much exactly what MySQL was originally designed to be and it's extremely good at it.
> If you keep things damn simple so you have confidence in efficiency and performance
I can have that level of confidence on a minimally tuned (albeit sensibly indexed, you still have to make sure the indices you need -exist- on anything, realistically) postgres at levels of schema/query complexity where getting mysql to handle it -at all- would be ... an experience.
Having used both extensively, I'd say that there's a lot of knowledge when using mysql that's basically "introducing complexity into your application and query pattern to work around the optimiser being terrible at doing the sensible thing" and then there's lots of knowledge when administrating postgresql that's "having to pick and choose lots of details about your topology and configuration because you get a giant bag of parts rather than something that's built to handle the common cases out of the box".
So ... on the one hand, I'm sufficiently spoiled by being able to use multi column keys, complex joins, subqueries, group by etc. and rarely have to even think about it because the query will perform just fine that I vastly prefer postgres by default as an apps developer these days.
On the other hand, I do very much remember the learning curve in terms of getting a deployment I was happy with and if I hadn't had a project that flat out required those things to be performant I might never have gotten around to trying it.
So, in summary: They've chosen very different trade-offs, and which set of downsides is more annoying is very dependent on the situation at and.
> Having used both extensively, I'd say that there's a lot of knowledge when using mysql that's basically "introducing complexity into your application and query pattern to work around the optimiser being terrible at doing the sensible thing" and then there's lots of knowledge when administrating postgresql that's "having to pick and choose lots of details about your topology and configuration because you get a giant bag of parts rather than something that's built to handle the common cases out of the box".
This completely maps to my experience as well. I still believe Postgres' defaults being set to use an absolute minimum footprint, especially prior to PostgreSQL 9, significantly impacted it's adoption. It's better now, although IMO it could still do with an easier configuration where you could flag the install as dev/shared or prod/standalone to provide more sensible settings for the most common situations. Like, gosh, I'd like to be able to stand up an instance and easily tell it to use 90% of the system memory instead of 100 MB or whatever it was.
But the MySQL issues were much worse. Outer join and subquery problems, the lack of functions like ROW_NUMBER(), a a default storage engine that lacked transactions and fsync(), foreign keys that don't do anything, etc. I've met so many people who think MySQL limitations are RDBMS limitations, or don't understand database fundamentals because MySQL just didn't implement them properly. Then again, I also remember when the best reason to use MySQL was the existence of GROUP_CONCAT(), which everything else lacked for religious reasons.
I also vividly remember prior to MySQL 5.5 when new users of PostgreSQL, Oracle, or MS SQL Server would discover that, instead of making a guess for what you wanted, the RDBMS would actually return error messages and expect you to understand and solve them deterministically. And somehow they would be angry about it! Old MySQL (mostly 3.3, but 4.0 and 4.1, too) used to silently truncate data, or allow things like February 31 in a date field, or silently allow non-deterministic GROUP BY, or only reporting warnings when you explicitly asked for them really undermined the perception of MySQL to database folks. This wasn't that long ago, either.
We had tried to do a logical replication-based strategy for Postgres to achieve no-downtime version upgrades during business hours, but there were so many gotchas and things you have to do on the side (stuff like sequence tracking etc), that we just gave up and took the downtime to do it more safely.
I think Postgres is really cool and has a lot of good reliability, but I've always wondered what people do for high availability. And yeah, pgbouncer being a thing bugs,
It feels like we're getting closer and closer to "vanilla pg works perfectly for the common worklfows" though, so tough to complain too much.
So to be honest I see that article (linked below) and I think "Yeah that's probably all right", but given this is for a Real Sytem With Serious People's Data, when faced with "take the downtime and do the offline data upgrade that _basically everyone does_" and "try to use logical replication to get a new database in the right state, knowing that logical replication doesn't copy _everything_ so I need to manually copy some stuff, and if I miss stuff I'm in a bad situation"...
I trust the pg people that tools do what are advertised, but it's a very high risk proposition
IMO logical replication is too slow for Production. At scale you'll be doing physical replication and rarely using logical replication as it is quite slow in comparison still (As of PG 13.2). Logical replication is helpful when doing a Postgres version upgrade primary/replica switch, but not useful many other places at the moment.
You do need to use an external connection pooler (pgbouncer etc), though Postgres 14 has made some large improvements in gracefully handling large numbers of connections, though you can still run into problems.
As for the query planner/optimizer, so far in all of the optimizations and improvements I have worked on, I've only run into 1 or 2 that had a query plan that made my head scratch. There are some extreme edge cases that will prompt the cost function to prioritize the wrong index, but in production I have found that 99%+ of slow queries can be easily improved by a rather simple composite index. One thing I do love about postgres is using partial indexes, which can significantly reduce the amount of space required and also make it extremely easy to create indexes to match predicates of a function, while indexing other columns.
Other than one or two slow queries I've tracked down and worked on, I've never wished that I could "hint" to use a particular index over another.
Now some of the things that people like for Postgres over Mysql, in practice aren't that great at the moment. People like doing atomic DDL operations, when in reality locking the table schema can cause lots of problems, and in production you only add indexes etc concurrently.
You still get the issue of dead tuples in mysql and need to periodically clean them up using OPTIMZE TABLE et al, though postgres and innodb have different designs, but ultimately it needs to happen sometime. Its just that postgres IMO requires more tuning to get the right balance.
> If you keep things damn simple so you have confidence in efficiency and performance...
... then SQLite might be the right choice for you?
In all seriousness, so many use cases are met by SQLite on the low end, PostgreSQL on the high end, and dedicated KV stores like Elastic off to the side, that what is left for MySQL? When you have a write-heavy workload on a multi-TB dataset that needs to be sharded, with a schema that is "simple" but not so simple that it works with an actual KV store?
> And, you can tell MySQL to use a particular index for a query!
I didn't know this was a thing or even common until this year and it opened my eyes to some major issues when the PG optimizer doesn't get a query right you have no escape hatch.
That being said, and me being weak in MySQL, it wasn't obvious to me if this is very common and how often you _have_ to tell MySQL what index to use because the optimizer misses it...
> it opened my eyes to some major issues when the PG optimizer doesn't get a query right you have no escape hatch.
For me the biggest issue with that is that the optimizer decision can change basically at any time. If the stats start preferring one index over the other at 2am, you can't just say "no, trust me, use this one instead" - you have to find out exactly which value crossed some threshold and tweak the stats to restore the previous behaviour.
> it wasn't obvious to me if this is very common and how often you _have_ to tell MySQL what index to use because the optimizer misses it...
I'm not sure how representative my sample is, but in the last 20 years, I've seen a hint used two times and one of them was an app bug (the forced index was not better, the order of columns in it should've been reversed and the hint removed). But I haven't worked with super-complicated reporting queries.
> PostgreSQL has gotten built-in logical replication only recently, and it's still very annoying. It doesn't replicate schema change statements, so those have to be applied independently by some external system
I’ve been using PostgreSQL built-in replication since 9.1 (when it was introduced) and this is definitely not my experience....
In fact it wouldn’t be possible to apply out of replication changes because (unlike MySQL) Postgres replication forces immutability (the replica nodes cannot become r/w).
The "issue" could potentially be that it's async? But I use postgresql a _lot_ and I've never experienced this.
They're referring to logical replication, which is a relatively new feature.
The original mode of replication merely ships the WAL, which describes binary changes to data files. Every change comes in the form "set page 32435 in <file> to <bytes>", with no distinction between the operations that the changes represent.
Logical replication describes the changes logically. A change may be "insert <column values> into <table>" or "add <column name> of <type> to <table>". Since this format isn't tied to the underlying database file format, it offers greater compatibility between PostgreSQL versions. It also allows other applications to process the log and do things like listen for data changes and ingest them into other databases, queues, etc.
> I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing
I seriously trialled and compared Postgres vs MySQL at the start of a major major project, and MySQL had a few clear wins for us (mainly replication) while the features Postgres had weren't in our current roadmap's requirements (the biggest regret that caused me was not having stored procedures). That was in 1998/1999. I now seem to be stuck on a career path where everything is MySQL and switching to Postgres anywhere seems to have huge back pressure from everyone I work with - even though at least half of them know damned well (like I do) that MySQL hasn't been the right choice for a couple of decades almost.
I could tell a similar story from 2015. Worked at a MySQL shop where we had several big clusters with bi-directional replication between two "masters", and used failover between them if anything died. Also replicated to a 3rd node which was read only (used for huge queries, backups etc). Sounds ok, but our developers were not good at writing SQL or designing schemas, and our ops team didn't have any DBAs. So all of the problems were our own fault. But a new senior engineering hire told us that Postgres would fix everything because MySQL was a shitshow. He demonstrated amazing benchmarks on a specific database we were having issues with. But OMG once we tried to get replication working... Bloody awful. Major internal fighting ensued. Another engineer decided to build a clean MySQL and tune it using best practices, and demonstrated comparable performance to the Postgres demo. That was it for Postgres, and a major project kicked off to fix the MySQL problems (switch to innodb, turn on GTIDs, fix bad SQL that was breaking replication, use row based replication, better indexes, monitoring, actual metrics from the underlying hardware etc). Who'd have thought you had to spend time on your databases? Crazy right?
Yeah. Interesting to see the replication results came out the same 15 years apart.
And I feel seen on the devs with no database skills thing. Every single database problem I can recall having to step in and help fix in the last ~5 years has been Java devs leaving everything up the the ORM to deal with, resulting in unindexed full table scans and similar brain damage. I "fixed" a production problem with a query that'd crept up to over 30 seconds without anybody noticing, by adding a single index to a single table. (And then sent the devs to work out how to give Hibernate the hints it needed to create the schema properly and set up more monitoring so once-weekly reporting SQL queries taking 10+ seconds wouldn't be hidden in 95 and 99 percentile alerting...)
And the fact that the community and tooling surrounding MySQL is absolutely bonkers massive. It's not that the grass has been trodden down to a path; no there's a 6 lane highway with a rest area and a Cracker Barrel.
The one thing that I've learned recently which surprised me- if you want to cluster MySQL (via partitioned tables or PlanetScale, etc) you cannot use foreign keys.
Some of the Postgres compatible tooling (Citus, Cockroach, etc) appears to support foreign keys across partitions / shards.
I was quite surprised to see this. Makes sense that not allowing FKs makes the data easier to shard but I never imagined that would be a thing you would have to contend with.
I also ran into this recently when looking at SingleStore! I was surprised when they told me I couldn't use foreign keys, and was even more surprised that they were surprised that I do use foreign keys. They acted like I was the only one. And to be fair, given their clientele, I might be the only one!
At "scale" you just can't afford to use FK, it's indeed slowing down everything and preventing good and fast clustering / replication. For some use cases you just have to live with it
Yup for sure, that's what they were saying. It makes sense, but I was still surprised by it. I've never had to manage something at this scale before, so it was news to me.
For reference, we're at about 250 million rows, so not even proper "scale" scale!
> Yes Postgres ... has worse connection handling that usually requires a pooler
This is the issue. The vast majority of PHP (Wordpress runs ~30% of the internet or something) and Python (Django) devs are not going to write a database connection pooler. I happen to like PHP and love Python, but the low bar to entry on those two (PHP especially) means that the vast majority of devs would be completely lost without the default tooling provided by e.g. php-mysqlng or other prepackaged database connection modules in their default settings with the most trivial of use cases.
A slow database can be ignored for a while. The need to add additional code in which the database is no longer a black box can not be ignored.
It usually comes down to setting up a pgbouncer in front of the DB, from there you can mostly ignore the pooling layer in your application.
It sure is one more step that for MySQL, but we're talking production level DB, and in my experience it's a small task compared to the rest of what you will be doing to have your architecture properly running.
I would love nothing more than to be convinced. I find MySQL to be adequate, but far from great. Postgres is described as great by everyone who uses it, however I prefer adequate but reliable over great but leads to downtime.
On your main point, yes, you need to understand how pgbouncer works, it has its quirks and limitations. In particular a dev needs to be aware of how sessions and transactions work if they have some critical transactions. When introducing pgbouncer on one of our rails project we had to fine tune ActiveRecord to play nice, but once it's done there wasn't much to adjust from there.
I'm with on how much it would help if it had a magic mode without these quirks, but in my experience you're still ahead of MySQL and its per session configuration quirks for instance. On a day to day basis, you'll rarely be scratching your head about how to deal with the pgbouncer pooling behavior you chose, an you'll still have the option to directly hit the DB on anything you don't trust the pooler to deal with appropriately.
For the bugs, those are mostly on the security and configuration side. For the few years I've used it, I never hit a behavioral bug TBH.
Is it though? I read this post earlier this year and it mirrors some similar tests I did back in 2020 when I was just checking basic perf between PostgreSQL / SQL Server / MySQL on my laptop.
> A simple conclusion: Postgres engine family is about twice as fast as MySQL engine family, except MariaDB.
Also:
> Postgres has worse connection handling
Is this addressed in PG14? Or is it still to come in PG15? I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.
> I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.
pg14 improves the performance for large numbers (thousands) of connections. IIRC there is some work to add a primitive connection pooler as an option, but I can't find anything references on that ATM.
So the situation is improving, but the fundamental connection=process model is so baked into Postgres that it's unlikely to ever change.
MySQL 8 has a lot of functionality that MariaDB lacks.
People like to criticize MySQL for some bad engineering decisions made decades ago. For whatever reason, often these folks have a more positive view of MariaDB, but without stopping to consider where all those original MySQL engineers went. FWIW one of the footnotes in the original article touches on this too.
Pretty much only three reasons: you want Enterprise support, you don't want to think about license terms, and you really want to help pay for Larry Ellison's personal island.
Last I checked mariaDB still misses some functiinality around JSON, and i don't know what the story is exactly about compat. With the newer X protocol and its collections.
> Is this addressed in PG14? Or is it still to come in PG15? I haven't kept up with the latest Postgres stuff but I thought they were fixing the connection stuff.
This is by design, postgresql doesn't come with a connection pooler. For each connection a separate process is started. If you have a lot of mostly idle connections then you should use use pooler in your application, server side (e.g. pgbouncer) or both (there are trade offs).
I was at the Perl conference in maybe 1999 or 2000 where there was some guy giving a MySQL talk. It wasn't Monty, but he was in the audience. I asked a question during question time about enabling ssl for encrypted replication, and the guy giving the talk said "Sorry, I don't know if/when that'll happen". Monty stood up and said "I've just done that. I'll check it in tonight, come grab me out in the corridor if you'd like me to mail you the patch."
I then spent the next 7-8 years running replicated MySQL databases in SF/Dallas/London/Singapore/Sydney. It worked really well for us.
All this is showing is that the default settings for MariaDB Docker image are a better fit for this person's particular machine specs than the default settings for the MySQL Docker image. IMO it's not really a useful comparison at all, for any real-world use-case whatsoever.
When working as an SRE every place that I worked at that used MySQL had at least one situation where MySQL database got corrupted itself without any damaging events (i.e. no broken hardware, no disk filling up (but even that shouldn't corrupt data), no machine crash or hard restart).
> I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing.
Yeah, I am one of those people. Needed a database in 2005, and MySQL was the de facto choice. Got used to it and never ran into problems that couldn’t be solved by getting better at schema design and indexing.
I never felt limited by MySQL and I am very comfortable with it, so never felt the need to try anything else. I might be missing something, but there is an opportunity cost in switching without a real motivating reason.
Unless MySQL changed it since I last looked, the one large innodb file is a big PITA, but otherwise you can do OPTIMIZE TABLE to compact tables; apparently this can sometimes be done in place, but I'm used to it writing to a new file so you'd need room for old and new; and at least for MyISAM locks the table for the whole time, so I hope you've got a way to manage that too, but that's about the same as VACUUM FULL.
Yeah, but the default used to be one file for everything, and it's hard to split things after you started with that. Especially if your data is sizable.
The default innodb_file_per_table changed to ON in 5.6, released nearly 9 years ago.
It's not really that hard to change if you started with OFF, it just requires a logical dump and reload (typically on a replica, followed by some re-cloning and promotion). Companies with sizable data tend to have database engineers who can handle this fairly easily.
In any case, it's not an urgent maintenance operation like a VACUUM problem...
It's not nearly the problem now it was ~6 years ago when I was stuck with 8TB in a single file at a small startup.
I look around now and you a right. It's really no big deal to move a few terabytes of data around but it sure was a mess back then and I'm once bitten, twice shy.
Having been in a similar situation 11 years ago, from my POV it was waaay easier 6 years ago than it was back then ;)
I was fortunate that the startup was willing to devote sufficient time and resources to help me get our DB problems under control -- including getting expert consulting services from Percona, attending conferences, etc. However I've definitely seen some other startups that didn't take DB tech debt seriously enough, and just kicked the can down the road until a major disaster occurred.
Anyway though, even aside from improvements in hardware and tooling, managing large DBs just gets less stressful once you've spent some time on the problem. Assuming you successfully solved this 6 years ago, why shy away from it in the future? You almost certainly learned a lot in the process, and the good news is it only gets easier from there!
1. Change Buffering - If your benchmark result shows that MySQL is faster than PostgreSQL, it is likely due to change buffering. For random read/write workload that doesn't fit into the memory, this feature is fantastic. This is available since the very first commit of InnoDB, and of course also has been causing random corruption during crash recovery throughout the years: https://jira.mariadb.org/browse/MDEV-24449 (fixed in MariaDB, still exists in MySQL)
2. Page-level Compression - PostgreSQL still doesn't support this, but I am fine with relying on ZFS filesystem compression when running either MySQL or PostgreSQL. Arguably it should be better for the database engine to do its own compression, but see this blocker level bug that impacted FreeBSD: https://jira.mariadb.org/browse/MDEV-26537 (MariaDB only, but feel free to search bugs.mysql.com for other compression related bugs)
3. Clustered Index - The main topic in the Uber drama. I think InnoDB has the better design, since scaling write is harder for a RDBMS. I also think that InnoDB implementation causes a severe performance cliff, when it comes to optimistic update versus pessimistic update.
> Page-level Compression ... but I am fine with relying on ZFS filesystem compression
There are compromises there. With page level compression the pages usually remain compressed in memory as well as on-disk and get unpacked on access (and repacked on change). This eats CPU time, but saves RAM as well as disk space which for some workloads is a beneficial trade-off as a larger common working set fits into the same amount of active memory (particularly when IO is slow, such as traditional spinning disks, cloud providers when you aren't paying a fortune, or when competing with a lot of IO or network contention on a busy LAN/SAN).
With filesystem compression you don't get the same CPU hit every time the page is read, and depending on your data you may get better compression for a couple of reasons, but you don't get the RAM saving.
ZFS ARC actually works the same way, i.e. the records would remain compressed in memory.
When running PostgreSQL, I would probably go with `primarycache=all` for the data dir, a largish ARC cache, and a smallish shared buffers, to take advantage of the filesystem compression.
When running MySQL, I would probably go with `primarycache=metadata` for the data dir, a smallish ARC cache, and a largish buffer pool, to still benefit (slightly) from the filesystem compression.
IIRC it's because some of the default settings were chosen back when RAM was measured in MB and low GB and not really updated. work_mem, for example, can typically be safely set to several times its default of 4MB, which then allows individual operations to use more memory before spilling over into temporary files on disk.
Of course, that particular example really only matters if you're actually hitting the 4MB limit in your queries, but I think there were some other settings with similar too-low defaults.
It's because the default Postgres configuration is very basic and targeted at low-end hardware. Besides, in Postgres you have to tune some of the settings that other databases take care of automagically (work_mem being probably the trickiest one).
So the first thing you do after setting up Postgres on a reasonable hardware is tune the config file. There's a lot of advice on the web, and there's pgtune service that makes it very easy to start in the right direction (https://pgtune.leopard.in.ua).
Still, you need to know about it to do it, so there's definitely quite a lot of suboptimal Postgres installations out there.
That isn't unique to pg though: just look at the number of performance questions in dba.stackexchange and on web hosting forums that, once you take out those that turn out to be bad queries and/or non-existent/inappropriate indexes, come down to tuning certain settings in my.ini.
I've not used either for a long time so my knowledge may be out of date, but mysql definitely used to be the easiest to get started out of the box which was part of the reason it won out in the shared hosting space. Of course the main reason it won out in that area was because it was faster than pg in many artificial benchmarks because at the time it didn't bother with silly things like transactions (as MyISAM was the default table type), and foreign key constraints, and other necessities…
It is incredibly hard to improve something that your user base doesn’t think needs improving. That part really stands out to me.
When I first moved to Postgres I was fairly reluctant - my impression was that it was some arbitrary flavour of sql, and since I was used to MySQL (the only sql database I’d ever used) it seemed incredibly unnecessary to bother with. I imagine there are tons of people out there like I was who would rather let things be.
Since the first few weeks I don’t think I’ve ever chosen to use MySQL again; Postgres totally changed my career for the better.
I have very strong feelings that a universally pronounceable product name is ESSENTIAL to wide adoption. PostgreSQL never rolls off the tongue, and it's not fun to say.
> I have very strong feelings that a universally pronounceable product name is ESSENTIAL to wide adoption
What? SQL itself is cursed with people thinking there's one right way to pronounce it, and everybody else is stupid, but PostgreSQL has the advantage that you can just say "Postgres" and dodge that mess. Good luck saying "MySQL" without triggering at least one person in your office.
(I think the official MySQL stance is that both pronunciations are acceptable, but that doesn't stop pedants from being pedantic.)
It’s pronounced “postgres”, as in “ingress” but with “in” replaced by “post”. If that helps. No one pronounces the “SQL”, as no one can agree on its pronunciation.
It’s how nearly everyone pronounces it, and for the most part spells it in common usage. The formal thing IIRC is that the name is PostgreSQL as it was originally a successor to Ingres, and added SQL support later.
1. Your users leverage unintended behaviours, which obstructs your ability to move forward with meeting needs with intended behaviours in your intended target market (unfortunate but very real)
2. You know if you add or improve existing features, you can access more users, or an prospective user worth more than the existing ones
3. It's a passion project and you want it to function/look/whatever a certain way
I've encountered all of these, and each time it becomes a pretty big pain to make progress.
It's a self-fulfilling prophecy. If your userbase doesn't think it needs improving, people with unfulfilled requirements will go somewhere else. Which may be a good thing if you want to limit the scope and serve a specific market well. But it may also leave you behind with shrinking usage and few new users. (since they prefer to start with the overall better option)
I quickly found dealing with time series reports dramatically easier. There's the possibility I was just doing it wrong with MySQL. It seemed so much more expressive for things like writing a query based on an interval and perhaps filling empty cells with default values for example. I also found it very intuitive to connect a UI to our API which would convert various parameters to dynamic queries. That was a game changer; almost all reports could be run by anyone in the company without asking someone to write a query for them. The result was remarkably reliable and easy to maintain. the MySQL version I refactored was much harder to deal with, slower, and only getting worse despite our best efforts.
There were very cool things I couldn't find a way to do in MySQL which eventually made the switch an amazing change. It was trivial to set up aggregated time series tables, and very easy to roll that data up in order to speed up queries with a coarser granularity. At the time I recall window functions making this really clean and easy. It was possible in MySQL (and maybe trivial today, too) but it felt like a massive kludge every step of the way.
This is actually what lead us to consulting someone who could implement the best solutions to these issues that they knew. They encouraged us to adopt Postgres before doing any work, we didn't listen, then as they wrapped up the project they once again advised we leave MySQL behind. He then sent some examples of how we might make the migration and why it would be better. Months later we had made the migration entirely and were very, very glad.
That guy was an AWESOME DBA, and he changed the way I value someone who is good with data and databases. He transformed our team's ability to grow our product, dramatically improved our ability to deliver, and instilled a ton of knowledge in a short time. I can only dream of imparting that to other teams!
Overall I'd say that Postgres provided tools that made working with data easier. I'm not sure if there are any specific features besides a cleaner, simpler,
Weird that that got cut off. I think I intended to say "a cleaner, simpler, more efficient set of tools for working with the type of data I had". And of course, right tool for the right job and all, perhaps there are plenty of cases where MySQL is better. I just haven't found them yet, and probably won't any time soon.
> ...let me point out something that I've been saying both internally and externally for the last five years...
Life is short, don't get stuck in a job/relationship/situation/etc. you hate for half a decade. IMO the irony is that he's moving to the Chrome team, which has been an absolute shitshow. Woohoo, now you're going to work on fantastic ways of screwing up the web, inserting ads where we don't want them, tracking users against their will, and banning extensions on the Chrome store. Interesting technical challenges, indeed.
Admittedly i dont write much C these days. But as someone who has integrated chrome/cef across platform (desktop/mobile) i would say that ive never seen such important piece of software have so many edge cases, regressions and gotchas. The CEF guys are in a constant struggling keeping there implementions stable; it takes a company the size of of MS/Apple todo anything more with it. MS is spending 1-2yrs developing there webview2/Chrome99+ macos integration - even though edge already runs on macos.
I got into programming in 2000 and right when I did Tim Perdue wrote a very important article comparing MySQL and Postgres. He concluded that Postgres was the better database.
The article was a very big deal at the time, because it was so well researched and so devastating for MySQL.
I figured that article was the death knell for MySQL, but then MySQL surprised me by just going and going and going. It was helped along back then by having a very close relationship with the PHP language. Nowadays, you can use any database you want with PHP, but in 2000 PHP was very much biased in favor of MySQL, and every article written about PHP was written with the assumption you'd be using MySQL as the database. I suspect that MySQL would have died except for the massive life support it got from PHP during those years.
The moral I take from all of this is that sometimes you can have two technologies, and one is clearly better, yet the one that is clearly better can remain under-utilized for 21 straight years.
I know a bunch of HN people will now show up and defend MySQL, or defend a lazy style of programming that accepts defaults even when that means using a poor cousin of something good. But we should stop for a moment and really think about the implications of this. Because it really is remarkable that people have known of the superiority of Postgres for 21 years and yet people still use MySQL.
I think the issue is that there are so many different reasons projects need any storage. There's a reason why you need persistence, there's a reason why you need ACID, there's a reason you need some form of replication ...
Eg: a cms like WordPress would probably have been better off just storing structured data in xml files on disk (or nfs) - along with media files on disk. Or using Berkeley db/later sqlite.
Your point of sale terminals might not fare so well without central inventory.
Perhaps postgress is not really all that better. Perhaps MySQL speaks to some need that we have not measured. For me, mysql is _easy_. It just works. You don't need a phd to manage a mysql database, but it almost seems like you need one for pg. That is what makes mysql better, even though it may be technically worse. I read everywhere that mysql is supposed to be slow, but in my experience it is plenty fast. Note that I don't do any "big data" things, most I've had is a few million rows.
I am currently wrapping up a project whereupon I am adding pg support to a previously mysql-only large open source application. The footguns that I discovered when doing this (this is a very CRUD-like web app) weren't performance related, but rather the bizarre data integrity issues MySQL ships with out of the box. Did you know, for instance, that the default collation that MySQL installs with is case insensitive as well as accent insensitive? This means that data that contains é, for instance, will match the e character in queries. MySQL is in general much more forgiving about data type coercion than Postgres as well. MySQL makes assumptions about certain things regarding dates and times that Postgres doesn't.
All of the above bodes well for someone trying to quickly hack together an app that does stuff, but for long term data integrity it's much easier than pg to introduce subtle differences and flaws in the data through some application code that works at face value for CRUD operations out of the db but down the line fails due to subtle data differences.
Absolutely. And the non-standard handling of NULl and defaults. I think they might have fixed this, but for the first 15 years I interacted with MySQL, I never knew when a field would default to an empty string or a NULL.
It can function like this, yes. There's a flag for the `sql_mode` that enables/disables this functionality, and I believe zero dates are disallowed by default in MySQL 8.0+ IIRC.
There are many factors to take into account, but this article focuses on the "operating reliably in production" factor and says that MySQL has fewer surprises:
As for Postgres, I have enormous respect for it and its engineering and capabilities, but, for me, it’s just too damn operationally scary. In my experience it’s much worse than MySQL for operational footguns and performance cliffs, where using it slightly wrong can utterly tank your performance or availability. In addition, because MySQL is, in my experience, more widely deployed, it’s easier to find and hire engineers with experience deploying and operating it. Postgres is a fine choice, especially if you already have expertise using it on your team, but I’ve personally been burned too many times.
I started with MySQL. At some point I encountered project based on Postgres. It just came with its own set of quirks you had to deal with. It didn't feel like, wow, this is so much better. It was more like, yup, it's a database, it just COUNTs slowly and needs to be VACUUMed. I had more issues and questions at the time how to do things I knew how to do in MySQL, I figured them out, but those two are the impression that were left.
Then I encountered MsSql Server and it was pleasent enough. For my next project I'd probably choose Postgres but not because it created any kind of sympathy in me. The only thing I reallty liked was EXPLAIN.
MySQL has one advantage in my opinion. At least it had two decades ago. It was simple. It did things that realtionational databases are good at, in fast and easy manner. It didn't have constructs like nested queries, recursive queries, triggers or stored procedures with cursors which databases like Postgres had, because who's gonna stop them, but their performance is horrible and must be horrible because they step out of the realm of fast relational model into the realm of normal programming which algorithmically is way slower (and that's the main reason we use databases at all).
So MySQL in my opinion teaches you about what the databases are good at. It helped me immensly when dealing with slow queries later in life in Postgres, SQL Server and even Oracle.
I do prefer PostgreSQL if I have a choice, but from the practicality standpoint that many people are hitting on, I'm okay with various design decisions (i.e. take a look at some of the flags for MySQL's `sql-mode` option over the years) being phased out via the normal (warn -> deprecate -> throw error -> remove) lifecycle that things like this often go through in software. Once a technology gets wide adoption, no matter how "flawed" or not earlier versions were, you start to prioritize stability and reliability over "correctness" at some point. This leads to the understandably practical approach to many bugs in many enterprise systems where the team supporting a tech stack learns to work around the rough edges, and might even depend on certain "weird" functionality because it's simply more practical in both the short and long term than not doing it.
None of the above means that I don't see MySQL as flawed in some ways. I'm in a group of developers that I suspect make up a sizable portion of the MySQL community who didn't choose MySQL, but must support it, if for no other reason than because we see ourselves as professionals, and that's what professional do: make the employer's application work reliably.
For applications that have already survived past the point of finding product/market fit, a wholesale conversion of DBMS is rarely worth it, and conversions of this type are costly/risky even if it is worth it. I do understand many of the benefits (real and theoretical) of PostgreSQL, and if I'm around at the moment when a project's DBMS is being selected I'm going to recommend _not_ MySQL, but at some level I'm also paid to make the application that my employer is running on top of their DBMS work reliably ... and the fact is even among people who get PostgreSQL - who prefer it, would choose it if they could - many of us are also pragmatic enough not to pull the rug out from under a running application for "reasons".
Right, it's important to keep "worse is better" in mind, but also keep in mind that the philosophy fades in importance the longer that a technology has been around. For instance:
"C is a programming language designed for writing Unix, and it was designed using the New Jersey approach. C is therefore a language for which it is easy to write a decent compiler"
Is that still true? Over time, people have wanted to fix the flaws in C, so they have added features, and nowadays writing a compiler for C is less easy.
And there are other ways to measure the declining value of "worse is better." Many people would now argue that memory safety is worth the extra effort. Many would say the world would be a better place if C was banned and everyone switched to something like Rust. C continues to lose market share to those languages that guarantee memory safety, and yet C never fully dies, which is interesting.
Over time, the tax you must pay for the "worse" begins to cost more than what you gain from the simplicity. My point, above, was how long this can take. We think of the tech industry as fast moving, and yet many core technologies have had obvious flaws for 30 or 40 years, and yet little action is taken to move to better technologies.
Unfortunately, the ecosystem of a tool matters more than the tool itself. One could come up with a technically superior alternative to an existing tool (gpg vs age and signify) but unless they have a widely recognised ecosystem, they tend to remain relatively niche products.
For example, Git can sign commits using gpg and x509 and now in version 2.34, OpenSSH keys. Although OpenSSH can be used to sign data, signify is a much better tool for this task.
I used MySQL ages ago. It was pretty fast out of the box. I never needed to vacuum. You could get up and running with Upsert and connections without trouble. It did feel hacky, but don't discount that it mostly worked for a lot of use cases. It always reminded me of software developed with use cases in mind (vs theory). That's actually a complement (ie, full joins are rare, left joins more common - so MySQL was better at left joins).
Going to postgre - much pickier. I found it slower out of box. You couldn't just throw tons of connections at it (ie, connection buildup / teardown felt slower). I had issues initially with quoting and capitalization etc.
That was a long time ago. Now I enjoy postgres and haven't touched MySQL, but there is a real history where MySQL was the database you could get going with pretty easily (I was pre-Oracle buyout).
The connection story is the primary reason that PHP + MySQL was a big thing. The PHP model of spinning up on every request depended on fast and lightweight connections to the DB. This was the one area that MySQL really excelled IMO.
To some extent, Postgres just follows the path that Oracle did to optimize performance. To see what Postgres needs to do next, just look at what Oracle had to do.
Postgres forks a listener process for each connection, which is relatively heavyweight. Oracle used to do that, but implemented a separate multi-threaded listener process for performance reasons. MySQL is multi-threaded from the beginning.
In Linux, the actual cost of the call to create the thread vs the process is negligible (its actually just flags to same function, saying whether to share the address space or COW it).
But then the cost of starting to use the process introduces a second speedbump, because once you start exercising it you start COWing things.
There's also the cost of task switching. I don't know the details, but I wonder if modern side-channel mitigations in kernels flush a lot of stuff when switching processes that they don't need to do for threads?
Postgres uses a lot of shared memory to communicate between the processes, so its really its own implementation of threads. Postgres is how it is because its history and portability from a time before threads.
In any case, if you were starting out today you really ought have a few threads that are shared and use async processing and plenty of io_uring. Modern Linux allows massively better mechanical sympathy but all the big database engines are already written and can't shift their paradigms so easily. This is from 2010 about how to massively speed up databases with syscall batching (in this case, MySQL is used in the study): https://www.usenix.org/legacy/events/osdi10/tech/full_papers...
One advantage of PosgreSQL's design is that you cannot about a query which shell tools. Which has been helpful to me on occasion, although as measure of last resort.
That does seem like a somewhat obvious solution to built a relatively simple pooler / more efficient listener in. Anyone working on something like this? Merge with an existing solution?
This was exactly our my case. I did 10 servers going flat out calling into one (beefy at the time) MySQL. I was amazed at how far you could go with that. Then we had replication at the time built in, and so did read replicas. For free and somewhat hacky software the scaling was solid.
When I started playing with postgresql I had no clue why folks liked it. My memory may be bad, but we used UPSERT and replication. On Postgresql at least through 8 you could not upsert or stream replication.
I am curious how people feel about wisdom and ethics of acting in ways that are actively hostile to the interests of a previous employer immediately after quitting?
Regardless of the technical facts, unless there is a very strong ethical argument in favor (for example, say, the employer is outright lying in a highly fraudulent way and customers need to know) .... I find this sort of behavior to be slightly unprofessional. But I am curious if others feel the same ... and does it make a difference that it is Oracle?
2. As long as you're employed, you either speak the company line, or carefully avoid commenting, in order to avoid breaking rule 1. If you can't neither avoid commenting or lying, you should quit.
3. Once you're no longer employed, you should avoid spilling company secrets, disclosing confidential information, etc. Beyond that, see rule 1.
I'd be concerned if he:
1. Went to conferences and recommended MySQL, despite thinking it was a bad product
2. Went to conferences and trashed MySQL, despite being employed in part to promote MySQL
Both of those would be, in different way, unprofessional. But what he did? Seems fine to me. I certainly don't think he was obligated to share his unvarnished thoughts about MySQL, but I don't think he was obligated not to either.
Why does "unprofessional" matter here? They're not on company time. If it's accurate, and that's their experience, that should be shared. I'm tired of a culture that thinks professional means advocating against yourself and your peers. Talk about employers, talk about customers, talk about salaries, if a company exec is bad to work with, talk about it. That gives insight for other devs so they're not finding themselves in the same position.
I completely agree with you. If they are willing to publicly trash a former employer and colleagues, what would stop them from trashing me as a new employer? I would be hesitant to hire someone who did this.
They are entitled to write whatever they want, but not taking into account the feelings of the people they were presumably working close with in posting something like this shows a startling lack of empathy.
I'm sure this developer is a genius, but I would take empathy over brilliance every time.
Edit: To be clear, if there is anything unethical or toxic about the workplace culture, they should absolutely post about it publicly to help fix it if that's the best route to address it. This post did not read that way to me. But that's just my opinion.
> I completely agree with you. If they are willing to publicly trash a former employer and colleagues, what would stop them from trashing me as a new employer?
I mean, presumably your confidence in being a good employer? People don't decide to trash talk their previous employers at random, right? Surely there is some cause-and-effect involved.
> what would stop them from trashing me as a new employer
Wouldn't it be a good incentive as an employer to not do things that would get you trashed by your employees ?
Sure some people could be revengeful for all the wrong reasons, but I don't think they're many, nor that they usually have a huge platform to throw trash, and these thing would also be denied on the spot by other people in the know.
What would stop them from trashing me as a new employer?
Not giving people loads and loads of things to feel bitter about. And an eggshell-strewn environment where it's basically impossible to air these concerns with anyone upstream.
I'm sorry what ?
You should really be free to speak your mind about your employer (or really anything else) at any time.
It turns out, employers don't like that and add close in contracts to silence their employees.
This happens in every field, even in "tech", even though organised labor could absolutely leverage the insane demand for qualified worker to get a better deal, but I guess that's being a largely depoliticized and generally ignorant workforce for you : you get less.
Anyway you are released from any and all restrictions on your free speech the micro-second your contract doesn't specify it anymore.
If your previous employer sucked, feel free to say it if you want to.
The gag order you sign in your contract regarding bad mouthing your employer has nothing to do with professionalism, and everything to do with a power equilibrium between employers and employees.
It's not terrible, but it's not a great look to me. I don't think you need to hold to your previous employer's interests, though.
The problem to me here is, OP dumps on the work of their former colleges and the criticisms of MySQL aren't backed up and seem a little naive.
E.g., only very useful software survives a long time with many users, and such software also very typically has major warts, especially looking from the inside.
Reading between the lines and subtracting OPs tone, MySQL actually looks like it's in better shape than I would have thought: when OP arrived the optimizer was a mess. Now it's in good shape. Better yet, management is fully supportive and investing in major improvements. Awfully good for a long-lived open-source project.
I might be a little skeptical that OP could thrive on Chrome, except that we all mature as our horizons widen.
Why is it seen as "acting in ways actively hostile" to one's previous employer to... simply speak the truth, as one sees it, about the nuts and bolts the work that one actually did there -- and the quality of the products that emerged as a result?
I find this sort of behavior to be slightly unprofessional.
I find it highly professional, in the sense that he was being 100 percent honest -- and true to his craft. And not in the least vindictive or spiteful in regard to his previous employer.
Because he is stirring things up over there right after he walked away so it won’t be affecting him. The post is a big warning sign to anyone considering working with him.
Well, stirring them up while you work there would be grounds for being fired. When do you think people can talk about it? If you shouldn't speak ill of your employer while employed and you also shouldn't speak ill of your employer after you're no longer employed by them, does that mean being hired means you should never speak ill of that company again?
They made it very clear that while they've held these opinions for a while, they've avoided situations where they would be presented with either lying about their opinions or bad-mouthing the company they work for whether out of respect or fear. I think that's as professional as you can expect people to be. If, as a company, you expect people to not speak about their subjective opinions after they are no longer getting paid as an employee, you need to be willing to offer them something for that, because what you're really asking for is an NDA.
People can talk about it whenever they want, I'm only speaking in support of the very narrow point that it is unprofessional. People reading the post will use its contents to form (or modify) their opinions of the author, and I believe it will generally be in the negative direction. Publishing something that has a net-negative impact on your professional reputation strikes me as unprofessional in a very basic way.
Can he do it? Sure. Should he? Maybe, depends on his values. Hopefully he understood how the message would be received and is OK with the possible consequences, and valued self-expression and making an authoritative critique of MySQL more highly than the negative impacts of rubbing a few readers the wrong way as they perceive a breach of generally accepted professional decorum.
It's definitely unprofessional to talk about it while working there. If it's also unprofessional to talk about it after no longer working there, that means it's unprofessional to ever voice your own opinion of your own work experience at a company, and that's ludicrous.
I'm responsible for vetting possible hires to work under me. What this person did falls well within what I would accept as responsible behavior for someone I was considering hiring. The fact that they note they went out of their way to avoid conferences where they would likely be forced to compromise their morals in some way, either through going against the wished of their employer or being untruthful, speaks very well of their character, if it's to be believed as presented. People that have no problem lying for the company, or badmouthing the company to external people while they work there, are both types of people I would desperately try to avoid.
I think there's "professionalism" as in "behavior that signals to employers that you're safe to hire because you are a good little drone" and then there's "professionalism" as in "does good work and treats people right".
I don't think there's a reason to push the former, as it's got plenty of natural support from the unequal relationship between employers and employees. We may not be able to throw off the shackles entirely, but let's not shame those who rattle them a bit, you know?
People can talk about it whenever they want, I'm only speaking in support of the very narrow point that it is unprofessional.
Stop mincing words, please.
If it's deemed "unprofessional" to talk about it, then to all intents and purposes -- "if he knows what's good for him" as the saying goes -- he can't.
I think they are right that many people who swear by MySQL have never used another SQL database.
When MySQL hit the scene at the start of the whole LAMP thing, PG was much slower in some use cases than MySQL, and proprietary databases were super expensive. There weren’t too many options. PG was still fairly rough around the edges.
Of course I also needed transactions, and they didn’t come with MySQL - that would hurt performance apparently - so when I was forced to leave Solid due to HP’s acquisition and subsequent pricing hike, PG was the only option for me.
But most developers didn’t need (or didn’t know they needed) transactions, and SQL was the tech de jour, so off they went.
I tried to like MySQL, but the weird not-quite-SQL syntax, lack of transactions and this _weird feeling I got_ put me off.
Over my career I've used Sybase, SQL Server, Oracle, MySQL, Postgres, even a bit of DB2 way back in the day. I wouldn't use MySQL if I had another option, but that said I've managed some web sites (Drupal and other CMSs) that used MySQL and it was ... fine. It worked, and never needed any attention.
MySQL is a simple database that's easy to set up and run. Don't make the mistake of thinking there's nothing better, but for certain use cases, it gets the job done.
At that point your options were either a dog slow database (PG), a database that randomly lost data (MySQL), or big money for Oracle. The first thing everyone did when they got their first VC check was spring for Oracle licenses and Sun boxes to run it.
We found SOLID back in the 90s, which was an awesome, full featured Finnish (I think?) SQL database server. It was proprietary but super cheap. Then HP bought it and jacked the price 10x and we were in trouble cos our product pricing at the time depended on the low cost SQL server (this is before cloud of course).
Before that I ran something called Unify which was ... interesting.
For some reason I remember Postgresql 8.3 as being the pivotal release that let us adopt it, and that came out in 2008 according to the internet. So yeah we are talking ancient history now.
Looking back at the release notes I can't believe how many of those features are critical to how I use PG today. Really some visionary work there.
> With significant new functionality and performance enhancements, this release represents a major leap forward for PostgreSQL. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the following major features...
My recollection is that running Windows’ servers was generally and all or nothing thing. Either you were a Windows’ shop that hired all Windows’ people and had a hefty annual contract with MSFT or you’d never consider MSSql.
> so when I was forced to leave Solid due to HP’s acquisition and subsequent pricing hike
We used the Solid database at my previous company in 1998-2005+. It was a great piece of software. IBM bought it, rebranded it as an in-memory database, marketed it to telecom companies, and now it seems to be completely focused on that: https://en.wikipedia.org/wiki/SolidDB
It's a shame because it was so simple to manage and "solid" as a rock. Worked really well for us and we never had performance problems with it.
As an aside, I was certain that Solid was bought by HP and looking at the wiki page I think we were already using PG by 2007. So it maybe changed hands a few times.
Regardless, it was an awesome database, ahead of it’s time in many ways. And amazing support.
Transactions are rarely useful for a LAMP style setup where you have a stateless web server doing request-response - it's not like you can serve a form to the user and hold a transaction open until they commit it.
Funnily enough I'd say the feeling was what really put me off PostgreSQL. Everything just felt slightly more cumbersome to do; I had to constantly look up all these backslash commands whereas things like SHOW CREATE TABLE might be slower if you spend all day doing database admin, but were a lot easier to remember as a developer who only used them occasionally.
At the point where you're handling a request the user has already submitted their form, so if you refuse to save what they submitted then that ends up as a pretty terrible user experience (unlike with a rich client application where it might be ok to reject a commit to the server because the user still has a local copy). And if you can't roll back then having a transaction is pretty meaningless.
Transactions are super useful in many contexts, and they certainly aren't about "refusing to save what they submitted". They are about behaving properly when something goes wrong, or when you try to enter data into a database that's invalid.
If you have a form submission that updates multiple tables - say, adds you as a customer, saves your order, and adds it to a queue - then a transaction ensures that either ALL of the tables are updated or NONE of them are. In this way you don't end up with half-orders, or data that you can't use later.
It's far better user experience to fail, than to lie about having succeeded and throw the user's data away.
Transactions are one of the most useful tools in the data toolbox... you can get work done without them, sure, but most of the time you just end up creating something that looks a lot like a transaction, but slower...
1. If the transaction fails due to a conflict it can usually be transparently re-tried (doesn't work if you have external side-effects). So the remaining failures are either due to user error (passing invalid data) or bugs and not related to the use of transactions.
2. Error handling becomes much easier if you can raise an error at any point during request processing and it simply rolls back the whole transaction, since you don't have to duplicate all validation to also run before you first write to the database. One important case is if there is a bug that triggers an assertion failure in the middle of the request processing.
> If the transaction fails due to a conflict it can usually be transparently re-tried
True, but if your transaction is retriable in that way then you might as well write the initial request and then do the processing async afterwards (i.e. event sourcing style).
I mean synchronous retry during the same request on conflicts with concurrent operations. Async processing is usually not an option, because then you can't return the result in the current request. Also retries can fail, which needs to communicated to the user.
In my very humble and naive opinion, MySQL has the whole 'worse is better' thing going for it. Does less, does it maybe less than perfect, but it does it very quickly and reliably.
We've personally scaled MySQL to many tens of thousands of concurrent users without too much trouble and without a dedicated person watching the database.
That said, I've only really worked significantly with MSSQL and MySQL, so my points of comparison are based largely on what I've heard. The majority of my last 15 professional years have been spent with MySQL.
MySQL used to be the "has more features" camp, I'm not sure why you think it does less.
There are many things that MySQL does that Postgres still can't do, like memory tables or pluggable storage engines.
And for anecdatum: I've been serving a million people concurrently on PG without any handholding except the initial setup: but that tells you nothing about what we were actually doing with it.
I can help shed light here. Postgres doesn’t make you choose between two storage engines and their compromises. It’s been years, but last time I looked the trade offs between MySQL storage engines were material and the newer engine was missing some valuable features of the older one. The advice was “pick the right engine for the job” - but I don’t want to, that another decision I have to make, why make me do that? At least until they come out with a new storage engine that’s “objectively better than Postgres”, I will enjoy the simplicity of using Postgres with its single storage engine. Choice isn’t always better.
I searched to see if anything has changed and found that InnoDB doesn’t support full text search - give me break, after it’s been the default for years, you gotta be kidding. https://hevodata.com/learn/myisam-vs-innodb/
I disagree. Search engines put more weight on recent articles, and no reputable source is writing about this topic in years, for one simple reason: there is no valid reason to choose the MyISAM storage engine in 2021. This was true even a decade ago.
Hevodata.com appears to be selling an ETL product. This post you're linking to is effectively SEO / content marketing. It is not written by MySQL experts.
Regarding fulltext support in InnoDB, there are many results when searching for "innodb fulltext".
I wouldn't say it is uniformly a virtue, especially for user-facing practical purposes when comparing MySQL to e.g. Postgres. It's a major architecture difference which has pros and cons.
Most MySQL users should just stick with InnoDB for everything. It's an extremely battle-hardened choice with excellent performance for the vast majority of OLTP workloads.
However, if you're a large company and storing a massive amount of relational data, MyRocks has significantly better compression than any other comparable relational database I'm aware of. FB poured a ton of engineering effort into it, because those compression benefits provide ludicrous cost savings at their scale. For most companies and use-cases though, the benefits may not be significant enough to justify using a less common engine.
Aside from these two engines, AFAIK there's currently no other popular general-purpose modern storage engine that is widely used for MySQL. There are a lot of random third-party ones, but historically it's risky to tie your business to an uncommon storage engine.
I'm not a DB internals engineer, but from what I understand, some of MySQL's downsides are a direct result of the added complexity of having a pluggable storage engine architecture. Mixing-and-matching multiple engines in one DB instance can also be risky (it affects crash-safety guarantees of logical replication on replicas). So given that most companies should just use InnoDB anyway, for practical purposes pluggable engines are not a huge advantage for most users today.
But in the future, who knows; it's good to know the flexibility is there.
And had MySQL never supported pluggable engines originally and MyISAM had been the only option, MySQL would be dead and forgotten. InnoDB was originally developed by a third-party company that Oracle acquired. Interestingly, Mongo followed a very similar trajectory, replacing their initial engine with WiredTiger, developed by a third-party company they acquired.
InnoDB and Memory are really the only ones widely used these days. After Oracle bought MySQL they threw their full backing behind InnoDB and recommended all MyISAM users migrate. They fixed a number of small problems like the missing full text support.
InnoDB is more fault tolerant and modern feature rich (e.g. Foreign keys) than MyISAM. MyISAM is faster in practice despite Oracle’s continuous claims to the contrary.
MyISAM is largely undeveloped at this point. MariaDB hard forked MyISAM into Aria and have included better fault tolerance and other general improvements. It’s what I use for my personal projects.
Memory tables are strictly in memory and useful for doing very quick processing on things you are certain will fit entirely into system memory. Their data is lost when the system is rebooted or MySQL is restarted.
> MySQL has the whole 'worse is better' thing going for it.
I've used both MySQL and PostgreSQL and I prefer MySQL, it seems to work and scale better "out of the box" and has a more comprehensible permission model than Postgres. I might be biased somewhat as I've used MySQL a lot longer and have even written plugins to interop with it but its still my number 2 go to (after SQLite)
That's a perfectly reasonable assessment. Also, it's certainly a dream product compared to anything on the market 20 years ago. So yeah, it's fair to say the original blog post could have been more nuanced in its holistic assessment of MySQL.
But then again, you can tell he was worn down from his experience working there. The industry has a way of doing that to people.
In general it seems MySql targets start-ups while PostgreSql targets more established companies and projects. It's not necessarily about good-vs-bad, but using the right tool for the job.
If it had read more like "I left because I was frustrated with code quality", and actually took ownership of the feeling, that'd have made a big difference. It reads more like "i left because the code sucks, everyone around me is dumb, and our users are sheep"
I don't think anyone's going to suddenly switch to postgres as a result of this blog post...so really, what's the point?
Maybe it'll get more attention on the systems he called out? Probably not. The post just reads as bitter, mostly about the mindset of other contributors, but doesn't outline what they tried to do to change the hearts and mind of those contributors. Or put another way: what did this person do to lift up all of these people? Deriding doesn't actually help anything.
If the goal is just to vent that's fine, but there's really not much else to see here.
It reads more like "i left because the code sucks, everyone around me is dumb, and our users are sheep"
Not in the least. The fact that he doesn't say what you're saying he says ("everyone around me is dumb, and our users are sheep"), but rather simply sticks to brass tacks -- and it's pretty hard to counter his main point (about the quality of MySQL as a product), after all -- belies the characterization you are trying to make of what he said.
The post just reads as bitter,
Anytime anyone, heaven forbid, talks the plain and simple truth about the conditions many of us work under in this industry -- they get characterized as "bitter", "derisive", "just venting", or (especially in the context of describing our past work experiences to prospective employers), "badmouthing". Or as you put it: "childish".
Well, readings are subjective, and it's ok that you don't agree with my reading of it.
But I do think that you're either mistaking "plain and simple truth" for callousness, or creating a double standard. Are descriptions such as "bitter" not the "plain and simple" truth for this post? I certainly think it reads bitterly.
They're not really complaining about the work conditions, so I'm not really sure what you're referring to there. It's just code. It doesn't bite. It does ossify. It's unpleasant to deal with, but it's part of the job.
It seemed like they had supportive, albeit corporate management.
Well I didn't mean "conditions" like having to work in a windowless basement, with a pile phonebooks to sit on instead of a chair.
But rather: being asked to work on products we just can't really believe in, to be silent when upper management (though otherwise supportive and presumably in no way outwardly abusive or mean) would prefer that we keep our blinders on, etc.
It's just code.
His main concern was the quality of the product as a whole -- and the lack of awareness in that environment of what, to him, seemed to be simple and obvious facts. The remarks about the "bad code" almost tangential (like he said, "it didn't bother me much").
Agreed - across this thread, there's been a lot of what imputation of bad or petty intent and/or of a disagreeable emotional state on the part of the blog author that just isn't called for.
Insulting your coworkers and users is childish, and that's exactly what this post is. "More jarring were the people who insisted everything was OK" "was hailed as “efficient” (it wasn't)." In fact that's pretty much all it is. There are plenty of ways to complain without burning every bridge around you.
"More jarring were the people who insisted everything was OK"
I read this as simply being honest (in the "Dutch" sense). It wasn't like he was shit-talking his former co-workers, per se. He's just saying he had a radically different appraisal of technical viability of the flagship product.
Simply being honest doesn't mean you're not unneededly insulting everyone around you. He can say he had a radically different appraisal of the technical viability of mySQL without calling everyone he works with an idiot.
> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product.
> (it seems most MySQL users and developers don't really use other databases)
> But perhaps consider taking a look at the other side of that fence at some point, past the “OMG vacuum” memes.
> Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.
> I am genuinely proud of the work I have been doing, and MySQL 8.0 (with its ever-increasing minor version number) is a much better product than 5.7 was
————————-
There’s definitely a lot of “our users are sheep” and “everyone else is dumb” going on in this post.
Heck, it may even be true. But you can’t really argue that hybrid author isn’t saying it.
There’s definitely a lot of “our users are sheep” and “everyone else is dumb” going on in this post.
Well, we disagree then. I see his post as making some definitely very harsh critiques -- but still short of the threshold of outright insulting people.
Nah, it's a pretty important post. It's the first guy to publicly say "the emperor is naked". The total object level information hasn't changed much, but now I know others know, and they know others know too. It's out.
And from now on everybody who has issues with the query optimizer won't shamefully look for documentation on how to fix _their_ use case, but publicly stink mysql for having a shit optimizer (which it has, btw).
Which in turn will support those developers inside mysql who want to push things forward. They won't seem like cowboys that want to fix a good thing - they'll have lots of user complaints to help them argue, and also the awareness that maybe, just maybe, mysql will get left behind if they don't move faster.
`It's the first guy to publicly say "the emperor is naked"'
That MySQL has a poor optimizer has been known for as long as MySQL has been a product. The product is legendary for its inability to do even basic RDBMS needs competently, and the rise of NoSQL was largely people assuming MySQL limitations were general RDBMS problems (for instance its painful incompetence doing basic RDBMS tasks like joins).
Having said that, a couple of decades in this industry has me reading this post and immediately sensing oozing bitterness. That maybe he got passed over for a promotion he felt he earned, etc.
When someone does the "it's all crap" exit, it's seldom from a good place. Who could seriously have applied to and joined the MySQL team without knowing that it isn't exactly the pinnacle of database systems?
Having said all of that, it's interesting seeing pgsql being held as the panacea. I like pgsql, and prefer it among open source database systems, but in many ways it is a decade+ behind MSSQL and Oracle.
It might have been more useful if it wasn’t literally the day after he got another job (it’s not like he quit it because MySQL is so terrible and will now look for a new landing spot).
I don’t know the author, but assuming he has publicly made such comments before it would behoove him to link to those posts so it’s clear that he has been raising these complaints publicly for a while and didn’t just wait to dump the consequences of his public post on his now former colleagues the moment he got out.
It does because all adults are cynical nowadays and rather than speaking truth they will just sweep it under the rug for political convenience.
> I don't think anyone's going to suddenly switch to postgres as a result of this blog post...so really, what's the point?
Spreading the truth? Does it even need a point?
> The post just reads as bitter
Yeah it does, I still find it refreshing that this guy is just calling out all the bs as he sees it. He has nothing to gain from writing this, a lot to lose. It's pretty entertaining. Hope it doesn't affect him negatively.
> It does because all adults are cynical nowadays and rather than speaking truth they will just sweep it under the rug for political convenience.
It's possible to speak the truth without being callous. I'm not really sure what political convenience has to do with this.
> Spreading the truth? Does it even need a point?
No, but it should probably have one if it's going to be a discussion on HN.
> Yeah it does, I still find it refreshing that this guy is just calling out all the bs as he sees it. He has nothing to gain from writing this, a lot to lose. It's pretty entertaining. Hope it doesn't affect him negatively.
It's great that you're entertained by it. Personally I it just gave me the impression that the person would benefit from therapy.
I dunno. I think the author makes some compelling points. Chief of which is the bad code quality. At least the bit about being unable to do full outer joins without a full rewrite. If the code base got so unwieldy that it made adding features tough all these years after it was started that’s a knock on both Oracle leadership and the project itself which I think is grounds to leave.
I don't think he's wrong from a technical assessment, but most of the problem that he was experiencing was a social problem.
> More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases)
i.e., the problem wasn't that the executor was bad, the problem was that everyone thought it was ok
And maybe they're not a people person, and trying to enlighten isn't what they signed up for -- all good -- I for one certainly don't want anyone working a job that they're unhappy with.
But the post is littered with putdowns --
> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product. At the same time, I was attending orientation and told how the optimizer worked internally, and I genuinely needed shock pauses to take in how primitive nearly everything was
> Don't believe for a second that MariaDB is any better. Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.
I guess I've just seen this attitude enough where it's boring. Shock pauses, very primitive, bad code. Got it. Moving on...
> Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product.
Spending years in an environment where a core component of your product was crippled by technical debt and you were surrounded by people who didn't -understand- how crippled it was does seem like a recipe for understandable bitterness.
People saying "yes, we know, but rewriting that isn't the business priority right now" is a different matter - that's often aggravating but the right call - but not even acknowledging the problem is unhealthy.
Note that I've made a couple other comments on this article that do their best to acknowledge how much more of a pain in the ass learning how to setup postgres replication is than mysql, because even though I prefer postgres most of the time it's still -true- and I don't see how hiding from that fact makes anything better for any user of anything.
He worked for years on MySQL (writing the executor used in MySQL 8.0 and the upcoming join optimizer) and came to his conclusions based on that time with it.
Can you provide comparable knowledge about MySQL to support your accusation that he wrote "generic statements without any nuance"?
From having fought it as a savvy user, I completely agree with the article about the poor MySQL query planner. I don't have the under-the-hood insights of the poster, obviously, but everything he says jives with how it felt as a user.
Of course the planner is just part of a database, and I have some kind words to say about the other technically-impressive bits of MySQL:
A lot of MySQL users are websites and things with pretty CRUD access patterns. That wasn't me.
My history is using it for high-throughput real-time batching and fancy buzzword stuff at reasonably massive scale (big distributed teleco systems), which is where a lot of expensive choices were pitched.
Doing teleco systems with MySQL was staggeringly cheaper and actually quite cheerful and, for all the times I swore at it, I'm actually still a fan.
I went with MySQL for advanced features that, at the time, Postgres was way behind on. MySQL had lots of storage engine choices (including TokuDB, which changed everything for my use-cases) and upserts and compression and things that put it way ahead of Postgres.
Of course MySQL had lots of warts too. The query planner was completely poo, but 99% of uses are simple things that it can handle well, and the most critical times it gets things wrong you end up annotating the sql to force indices and do the planner's job for it etc.
Of course, nowadays, Postgres is reaching parity on these things too (except, perhaps, compression. My understanding is that Postgres is way behind on decent in-engine page-based compression. It will presumably get something decent eventually.)
Postgres still needs tweaking for getting the query planner to do the right thing, and you can't easily tell it what to do.
I ended up proxying my queries with a preprocessor that uses special comment syntax - e.g. `-- $disable_seqscan` - which wraps query execution in sets of enable_seqscan off and on again, to force PG to use the index. All databases can have performance that falls off a cliff when changing statistics make them choose a less optimal join order (join order is normally the biggest thing that affects performance), but PG is particular in not having much flexibility to lock in or strongly hint the plan.
MySQL, on the other hand, is predictably bad and has STRAIGHT_JOIN and other friends which make things much easier to tweak.
The postgres team is advocating that compression should be applied the file system, and they are correct. But this won‘t solve the problem if filesystem compression is not widely available. ZFS and BTRFS are not widely used, the first one because of licenses the last one because of bad pr in the past.
You can add VDO to the LVM stack but thats another layer in the complex ext4/xfs lvm layering approach.
I'm not sure page level compression can get efficiency of LSM-Tree, and frequent page writes will be a write amplification problem if you end up doing any form of copy-on-write and try to do in-place overwrites.
I only have the untrained DBA perspective coming from building and hosting MySQL, Maria DB and postgres clusters over the last 14 years and I feel that Galera clustering feels a lot better than anything I've seen in postgres.
Agreed that postgres has amazing performance. I was actually an early convert to postgres back in 2004 because my boss and coworker at the time saved entire businesses by migrating them to postgres when they had enough of trying to scale MySQL with more hardware vertically.
But in the later decade when I myself have used clustering to scale I've found that Galera feels less like a tacked on afterthought than anything I've seen in the postgres world.
It's just a lot more integrated than pgoool and watchdog.
I know there are new solutions now that run the postgres clusters inside kubernetes that I haven't tried yet. But that isn't more integration, that's just more 3rd party abstraction.
Edit: I realize after I wrote this how strange it must seem to a developer. Because galera/wsrep is actually a 3rd party replication product "tacked on" to MariaDB. While postgres replication, afaik, is written into their mainline code. I guess my gripe wasn't about the replication but rather the "clustering" around it, like maxscale/haproxy/pgpool and so forth. And in that sense they seem pretty equal, I just chose to use pgpool for my postgres clusters and that is definitely a hacky, scripty mess compared to HAproxy or Maxscale.
Does any one know when is MySQL 9.0 coming? It has been a unusually long time between a major version or a dot version release. MySQL 8.0 was in April 2018.
Compared to Postgres which has been shipping features after features.
It‘s not planned. They are shipping new features in patch releases, yes, you heard correct. They had the genius idea to stay at 8.0.x and still add new features with every release.
I have a moderately large table (hundreds of millions of records) with very simple structure but lots of columns and keys. This post is a huge "aha!" moment for me - I keep struggling with how the optimizer is choosing which index to use. It keeps making stupid mistakes and behaves differently from instance to instance - I still have a staging server where I just have to avoid an admin interface because it takes minutes to load, and I haven't invested more than a few hours to debug because it's not production.
I have pieces of half commented code where I do the optimizer's job and try to guess which index is best and chose it manually. Lately I've been considering doing an "explain" first and based on that tweak the query.
So yeah - I love MySQL, and I don't think I'll be able to invest the time to switch, but at least now everybody can see the emperor is naked and it wasn't just them. The optimizer sucks, including in MySQL 8.
Most of the people annoyed with PostgreSQL in these comments are annoyed because it feels like you have to write your own replication and connection management
Most of the people annoyed with MySQL in these comments are annoyed because it feels like you have to write your own optimiser
Yeah, that's what I mean by "trying to do the optimizer's job". But this is meant for edge cases - doing this fully would mean starting to rewrite the optimizer in the app. This is where using the "explain" might help - if it's choosing what seems like a sane index and has a limited range of rows, I send the query "as is". If it fucks up and does a table scan, I'll force a compromise index.
> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
This is an insider view, meaning that he is probably addressing things like code quality, compile/run/debug workflow, technical design decisions and so on.
As a 20 year MySQL user - starting with version 3.23 if my memory serves me well - and after billions upon billions of inserted/updated/deleted/queried rows, MySQL is not a pretty poor database by any measure. It has served me well:
Good performance, near zero maintenance, very few crashes, and a couple of data corruption that resulted in data loss after power outage.
That's the whole thing with the difference between so-called "ACID" databases and MySQL. Corrupted data after a power outage just shouldn't happen. It's one of the things database engines are supposed to protect us against.
In 30 years, it's never happened to me with Postgres or the other two database engines I've used professionally.
If it did happen, then most people who know about databases would say "that's a pretty poor database".
>>> That's the whole thing with the difference between so-called "ACID" databases and MySQL. Corrupted data after a power outage just shouldn't happen. It's one of the things database engines are supposed to protect us against.
I knew I was not running MySQL in ACID mode (via innodb-flush-log-at-trx-commit=0), and I sacrificed safety by performance. I was OK with that in my specific use-case.
MyISAM used to be the default for a long while. I made a bit of extra money when younger tidying up MySQL DBs that were in inconsistent states thanks to MyISAM not enforcing FKs.
Yep, it is now. My favourite bit was when InnoDB became the default, a lot of people configured MySQL after upgrading to keep defaulting MyISAM "for performance" that they really didn't need.
> internal company communications tried to spin that Oracle is filled with geniuses and WE ARE WINNING IN THE CLOUD.
From wikipedia about mariaDB:
> A group of investment companies led by Intel has invested $20 million in SkySQL. The European Investment Bank funded MariaDB with €25 million in 2017. Alibaba led a $27M investment into MariaDB in 2017.
From wikipedia about MySQL:
> MySQL is also used by many popular websites, including Facebook, Flickr, MediaWiki, Twitter, and YouTube.
They certainly have their place. I'm sure some of these companies have considered Postgres.
One thing I'd note is that a lot of these companies made the decision years ago. MySQL's replication story has been good for a lot longer. It wasn't until September 2010 when PostgreSQL landed replication (and you probably didn't want to adopt the first version). If you were looking to create a highly available service like Facebook, Flickr, Twitter, or YouTube before 2010, you were probably looking at MySQL. All those companies were huge before 2010.
That was one of MySQL's killer features. No matter how bad any other part of MySQL might be, it did replication. PostgreSQL had some third-party add-ons to do replication, but it was hard, slow, and reasonably easy to do wrong. Some of them were downright bad ideas where they'd just proxy your request to two independent PostgreSQL instances and hope that nothing went wrong, others used triggers on the tables. Literally, the Postgres core team said this about PostgreSQL: "Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases."
No, those companies wouldn't have considered PostgreSQL and once you get to a certain size, things tend to stick around.
That said, many of these companies aren't using MySQL for a lot of new stuff. YouTube developed Vitess to handle some of their MySQL problems, but from what I've heard they've moved off MySQL since then (correct me if I'm wrong). Twitter has its Manhattan database. Facebook has gone through many databases. MediaWiki is a project that people are meant to be able to run on shared hosts and that means PHP/MySQL. Flickr isn't really a company that has done a lot post-2010. That doesn't mean it's a bad site, but it doesn't seem to be making a lot of new stuff.
Decisions have context. Without the context, it's easy to come to the wrong conclusion.
I don't hate MySQL and PostgreSQL has its problems. I think MySQL's strengths are generally in its current install base and current compatibility. Lots of things work with MySQL. Vitess isn't perfect, but it is a nice project for scaling a relational database. PostgreSQL doesn't have a Vitess. Likewise, many things already work with MySQL like MediaWiki and many things speak the MySQL protocol. However, that's starting to shift. I think we're seeing more things adopt PostgreSQL compatibility. Google's Spanner now has a PostgreSQL layer. RedShift, CockroachDB, and others are going for PostgreSQL compatibility.
The thing is that ecosystems take a long time to shift. If it were 2004-2006 when Facebook, YouTube, and Twitter were created, I'd definitely have grabbed MySQL. You need good replication. PostgreSQL wasn't even talking about bringing replication into core back then, never mind having something available. Times change and software changes.
Main use cases (social graph, messaging, ..) are on MySQL (and never left it). Storage engine is different, replication is improved, etc, but it is still tracking upstream MySQL tree.
The thing is - PG still relies on physical replication, and staying with logical one allows to use it for out-of-DBMS change data capture and reuse in other systems.
> MediaWiki is ...
Funny tidbit - I tried to move over Wikipedia to PG ages ago, and did the initial prototyping and made it work to a certain degree. But also I learned about InnoDB more at the time.
What the Mysql folks never say is that these companies never run the off-the-shelf version, but instead have built engines, tooling and expertise that allow them to prevent developers from being too close to the real Mysql.
> In the end, there's just not enough resources that I could see it turn into a competitive product, no matter how internal company communications tried to spin that Oracle is filled with geniuses and WE ARE WINNING IN THE CLOUD.
It’s refreshing to know that tier 2 cloud companies (Oracle, IBM etc) all have similar internal perspectives, ie leadership insisting that they’re making amazing progress in the cloud while their market share either stagnates or reduces.
What a burn - but not surprising given MySQL's history and things like MariaDB or Percona which exist to fill the gaps. My impression is that many people will choose Postgres over MySQL when starting a new project as it progresses so quickly and has a far richer set of features.
> But it doesn't really explain why I did go looking for that somewhere else in the first place.
I don't have understand why people think they need to provide a reason.
They are just part of economy and should constantly look for better opportunities for themselves because that's what drives the economy towards more efficient state where resources are better utilized. Not to mention they themselves have just one life and have full moral right to live it the best way they can. They don't owe anyone anything they didn't promise and employment is not a promise of dedicating your whole life to a project or an employer.
I never worked on Chrome directly, but I looked into the Chromium code sometimes when debugging some weird JS issues.
What strikes me first is how much code there is…
There are tons of code, some things copied multiple times, because Chrome nowadays does a lot of things; basically it’s an entire operating system, which accesses USB, runs assembly code, runs WebGL, basically all.
And it’s all in C++, and very verbose Google C++.
But in the end I always found what I was looking for there. And I can’t say how good or bad the C++ is, as I’m not C++ dev. Just there is a LOT of it.
I wonder if there is any piece of software that actually works and yet developers don't call its source code a shitshow. I understand code as akin to biology, full of repetitions and garbage DNA because these things are live and evolve along with the environment, and cannot be "pure" for more than a few months. C++ is very versatile, but i bet that using a verbose makes it much easier for newcomers to find their way around with the code compared to using an uber-abstracted style.
I was with MySQL AB in its early days, It's codebase and behaviors was much worse than it is now. Yet we had a team of people who loved MySQL and was focused on making it better.
I think this is unfortunate culture change in Oracle land - they hire mercenaries rather missionaries to work on the code, so such "I never loved it, but I worked on it for years" is not a surprise.
MySQL, MariaDB, MongoDB, PostgreSQL all have skeletons in their closet. Yet if PostgreSQL developer would be leaving with similar attitude community would raise much more to defense
If Steinar is responsible for removing query_cache in 8.0 I'm glad he's leaving the MySQL team. Sure, MySQL is not the perfect database heck I'm willing to even say it's not good or efficient, but it is extremely well supported across toolchains and programming languages. It just works and it works pretty darn well for most applications. I've built several startups on top of it some even exited.
There's a huge difference between "you don't need the best infrastructure to build a successful company" and "I left because the place had a culture that refused to acknowledge major product shortcomings". They work in different domains.
¯\_(ツ)_/¯ optimizer guy says he did not do good job, shits on optimizer?
There're many reasons to pick MySQL over PG at large scale deployments (economics, replication strategies, etc) - and the fact that some queries will run better on PG may not outweight those benefits.
Don't get me wrong, I know many areas where MySQL sucks, and that is mostly in lacking execution strategies and optimizer problems. Indeed, in many of these ways MySQL is stuck years behind, but in other areas (MyRocks, modern hardware use, etc) it is far ahead of PG.
The thing is, optimizing has costs, and bypassing those costs is useful, if you're looking at economics of your system, and storage engines that pay attention to data patterns are ahead of just naive heaps of data.
It is very simple to diss MySQL when your business does not depend on managing petabytes of data.
P.S. We've migrated a major social network from PG onto our MySQL-based platform and our MySQL DBAs didn't even notice. :-)
> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
It's not like it's an industry secret either: when I started my CS training in 2007, the first course was about relational databases and one of the first things the teacher told us was that MySQL was pretty sucky.
Strange. SQLite runs most of the world but I haven't seen it mentioned in this thread once.
90%+ of software products could easily get by with SQLite as a data persistence mechanism if the developers can be brought to a state of humility regarding the scale of their projects.
I'm seeing comments in here talking about 200 vs 20 milliseconds. You know what's even faster? Putting the DB engine in-proc and dropping the network from the equation altogether. With NVMe storage, you can reliably complete SQLite transactions in tens to hundreds of microseconds with the appropriate journaling mode enabled. You'll never get this kind of latency bound from something that lives in another process or computer.
Another big open source DB I'd like to hear about more often is Firebird (https://firebirdsql.org/) which was forked from the Interbase code released by Borland (don't remember how they were called at the time). Four choices of connection model (process-by-connection, thread-by-connection, some weird mix, and in-process), full ANSI SQL, runs on all major platforms, and uses single file databases. Seems to be the best of SQLite (inmem, single file), MySQL (lots of connections fast) and Postgres (standard, robust) to me, in a single package. But for whatever reason it seems to be completely off the radar on HN...
I've had too may anecdotal failures from replication, innodb consistency corner-case issues and finally being adjacently associated with Oracle killed it for me. Since moving to Postgres, I've been very happy.
I'm super tempted to move from MySQL to postgres, but, we are currently self hosting and MySQL is breeze to run. My only experience with postgres is running a brew update and it completely borking the database and having to track down some cryptic post in order to fix it. Also, as pointed out elsewhere, MySQL seems not to require vacuuming or anything else. It's the unknowns of moving which make me nervous.
For self-hosting one instance, there's little difference.
What matters more is how well the tooling you use with with your DB supports it.
Recently I've been on a .NET 6.0 project and the Entity Framecore Core MySQL provider was broken for what I was doing. The one for Postgres worked fine so we chose that.
>My only experience with postgres is running a brew update and it completely borking the database and having to track down some cryptic post in order to fix it.
I find containers work wonders for databases, whether it is for using specific versions, segregating instances, keeping your system clean etc. I keep at least one of MySQL, Mango, MSSQL and Postgres running at all times. I would never install one natively.
> running a brew update and it completely borking the database
I also ran into this, but to me it more looks like in issue with how Homebrew does Postgres. Also, keeping different versions of Postgres around with Homebrew is a bit of a PITA.
- Works 5 years on something he hates
- No proof (links, anything) that prove what he's stating
- Quits and supports the other vendor, knowing there's internet quarrel between <insert-db-name> vs <insert-other-db-name>, nice going there. Real grown up.
- Leaves to work for largest spyware company on the planet
This sounds like one of those people everyone on HN say to avoid in job environment.
Pretty sure most folks with relation database experience outside of MySQL already knew. It's just jarring to hear it from someone who was "on the inside".
MySQL is a 26 year old database engine. 26 years.
MySQL (not MariaDB) started enforcing CHECK constraints only 2.5 years ago in v8.0.16. Before that it would parse them (to make migrations from other DBs easier) but not actually use them.
Or the fact that DDL changes weren't transactional, so a goof up halfway through leaves your schema in an indeterminate state.
How making a temporary table for intermediate transaction state/materialization? (Since after all MySQL doesn't support materialized views.) Oh! You wanted to use that temp table more than once in the same query through joins? Too bad. So sad. One use per, sadly. Just create a temp_stuff_2 table and re-run the original query. MariaDB finally fixed that 5 years ago, but still an issue in MySQL today.
If MySQL works for you, use it. But I compare it to something like the original Bourne Shell. It works. It's seemingly everywhere. If all you need is something simple, it can work well. And only folks who haven't used anything else ever think it's anything close to the best out there.
I think MySQL's success comes down to developer-friendliness. I find Postgres a bit tricky to setup. For example, I don't like how Postgres authentication is intertwined with Linux user accounts by default; it causes more problems than it solves; especially for open source projects.
But anyway, it's nice to read an honest piece like this once in a while. Many software projects these days are terrible (to be quite frank) and it's not pleasant to work on such projects as a developer once you understand that it is sub-par and that it will probably remain sub-par because your company doesn't have the talent or right incentives to attract or foster such talent.
If you work at Oracle, it's better to do so as a lawyer, not a technologist. Oracle lawyers are world class cronies. Oracle doesn't even need developers; the letters written by their lawyers can negate the shortcomings of any of their code.
Why even bother writing complex code to solve a technical problem when the lawyer can solve the same problem with a few paragraphs of English legalese?
Postgres auth is intertwined? I mean, yeah, to create the first database and set of users, the local user must be in the postgres group. But after that (or if you edit the pg_hba.conf file)?
https://www.postgresql.org/docs/current/auth-pg-hba-conf.htm...
It uses the current OS user as the default user to connect with. What else would you default to? But you can (and 99% of the time should) specify the role you're connecting with, which is a purely database-controlled role, not an OS username.
Honestly I never thought MySQL's method was clearer, but I acknowledge that's subjective. That said, MySQL's solution for a VERY long time in its history was simply to default to superuser/root within the database until you lock it down.
"Simple" turned out to be a security nightmare for a lot of folks who didn't realize what was happening, which was almost all new users.
But that's mostly moot for most folks now. Managed database instances in Azure, GCloud, AWS, DigitalOcean, et al is rapidly becoming the norm (if it isn't already). In those environments, the connectivity options are really hard to distinguish between Postgres and MySQL.
i hate mysql as well, but it almost runs the world, and onboarded a whole generation of young coders.
postgres is not panacea (especially on RDS), i might even say we(my team) have more outages because of it than if we were using mysql (but thats just my gut feeling)
the comment about how the team was so oblivious to "genuinely believe in mysql" was in very poor taste.
As someone who isn't remotely interested in the database internals (but open to to being), why should I switch from MariaDB/MySQL (which I've been using for 7 years and never had any major issues with, beyond ONLY_FULL_GROUP_BY being a slight pain) to something else?
There's so many little things that add up that Just Work in terms of SQL usage under pg and are a pain under mysql.
A quick example is array types plus rowtypes (plus postgres having auto-group-by but -only- if you GROUP BY a pk, so it's predictable) letting you do something like this:
SELECT author.*, json_agg(posts)
FROM users author
JOIN posts ON author.id = posts.author_id
GROUP BY author.id;
and you'll get back a json array of the post row objects in your application so you get one-row-per-author-row in your query results but still fetch your 1-to-N relationship efficiently (I used JOIN rather than LEFT JOIN to get only people who'd actually authored a post, note, that was a choice rather than a typo).
Note, yes, there's no reason you can't get the same result without the json_agg and with a bit of code to collapse posts onto the right author as you page through the resultset, but the thing I'm trying to gesture at is that lots and lots of little quality of life improvements add up in a way you don't necessarily even notice that much until you've mostly only been working on pg projects for a couple of years and then try and do the now-seems-obvious thing in something else.
The people complaining about the steep operations learning curve aren't at all wrong though, lots of postgres tooling was written by people who assume you -want- to read the entire manual first and then configure things exactly how you want and even as that sort of person it still took me a while to get up to speed.
So ... maybe you shouldn't want to switch, but the above is why, overall, I usually don't find myself wanting to switch -from- postgres to something else.
(except when I realise we're literally just using the database as a key/value store with an SQL interface, because at that point, screw that, back as far as mysql 3.23 it was amazingly good at that and it's still fantastic at it so in that situation "don't be silly" applies)
In general, I agree, though pg's JSONB is effectiuvely hstore2 so its performance is surprisingly good.
This example didn't involve a JSON column in the database at all though, I was talking about turning native database rows into json data - so given tables like users(id, name) and posts(post_id, author_id, title, body) (apologies for pseudocodish rendering) instead of
Somewhat offtopic, but sesse.net runs a Stockfish instance with the (probably) deepest analysis of live chess games: http://analysis.sesse.net/
I visit it daily for the Chess World Championship match.
Ah refactoring and rewrites. Always seem like such a great idea. And can destroy massive companies and projects. Was it Marc Andreessen who wrote that awesome post on how the rewrite of Netscape 6 seemed like a great idea but failed to consider all the little wisdoms built into the current version? It’s impossible to find now with all the noise around him.
Production software more than a few versions old is a living breathing thing that has had the hell beaten out of it by QA team and users and survived repeatedly by having passionate devs make it stronger. What it needs is love, not pedants.
Some of the most used and popular software in the world is fugly beyond belief but it works, is maintained and loved and continues to defy newcomers.
Having used Netscape 4 extensively, it was in a good place to throw out everything and start over.
Joel is wrong on this in the cited blog post,
> It worked pretty darn well on an awful lot of real world computer systems.
It worked adequately, but had no future - that was a big problem, since IE was so, so much better. They were totally screwed and a rewrite is all that would have saved them.
Netscape 6 totally sucked, but it finally found its feet - and it took a few different versions/browsers based on the codebase to get right. Frankly, I'm amazed it's still around, given the pressure of Safari/Chrome/etc.
That might be true of some complete rewrites, but refactoring - when done right - improves all the good stuff like composability, readability, and simplicity, without changing any logic (unless you count fixing bugs which it will likely uncover).
There’s also a bit of learned helplessness on some teams averse to refactoring, as in “things were always done this way, don’t touch it if it works, the opportunity cost is too large, etc. etc.”. In my experience, this sentiment evaporates when presented with the results of properly done refactoring - when it becomes obvious how much faster the team can move without compromising on quality (and frankly the work just becomes more enjoyable).
Wouldn’t any decision - when done right - improve the good things? I tend to look for situations (since I am never an expert, just a person trying to get the job done) felt like I k ow ahead of time what “done right” means.
I used to be pretty scornful of badly put together software, for various definitions of badly put together. There are flavours of usability, but if a thing is around for a long time and manages to attract a community, and that community manages to build useful things around it, that is something to respect and admire as well.
MySQL works great for a range of use cases and is easier to use for many simpler use cases. Most folk won't need any extra power offered by Postgres, and won't be encumbered by having to tinker with its implementation.
Shameless plug: check out [Rosettable https://github.com/francoisp/rosettable]. Brings Postgres triggers and notifies to your MySQL schema, with it you can organize a smooth transition. Using MySQL_fdw you can use your current MySQL schema with the above you can know when a client interacted directly with MySQL; all the while you add some nice jsonb and Plpgsql and transition out of MySQL incrementally. Don’t rewrite, write onward
We use mysql in prod at work and frankly the reason I didn't bother bringing up the mysql vs postgresql debate is simply high availability.
We have a multi-master mysql database based on percona's pxc.
And it works beautifully... sometimes a node crashes and it's no big deal.
Meanwhile in pgland it's quite a landmine of solutions, each of which seems to be studied accurately. And there is a constellation of other solutions (extensions or posygresql distributions) that maybe fit your use case, maube don't.
So many licenses, including the Business Source License which is non-free license.
Also there's this CockroachDB Community License (CCL), which I guess is a made-up license that's going to include some whatever terms.
I might still accept this mix of licenses however I can't find the definition of "foundational features" -- what do I "lose" by using the OSS version instead of one of their managed offerings?
From a quick overview, it's soft-no from me (in the sense that it probably wouldn't be my first choice).
I've seen a lot of negative reactions to this mike drop. I wouldn't judge him too harshly. He's actually a very good developer worth the benefit of the doubt.
In my opinion, both Postgres and MySQL have their places. While I chose Postgres for most of my projects, some 3rd party software just work better with MySQL because they are written for MySQL first, then ported to Postgres. For example, Hive Metastore still experiences some edge-case bugs when running with Postgres.
I would not choose MariaDB for any new projects though, since it's no longer 100% compatible with MySQL, yet I don't see it bringing any game-changing advantage.
Whenever I come across these types of almost-rants, I never know: Is he talking about some specific use cases in which Postgres completely dominates MySQL? Or is it that for literally every CRUD call I do in my app I'm having my users wait a pointless 200 ms when Postgres would be doing the same in 20 ms ...
What's the deal? Is the difference that drastic and relevant that I should really switch immediately even for ongoing projects, never to look back on MySQL again ?
Online schema change solutions have been around for over the past decade and are commonly used to ALTER TABLE with no downtime (or with minimal interruption) on the largest deployments of MySQL today.
The two most common solutions are pt-online-schema-change and gh-ost, and if you are running MySQL today and still running direct ALTER TABLE suffering outage, then you're in for a pleasant change.
On top of that, most MySQL ALTER TABLE operations with InnoDB tables support non-blocking, lockless operation as well. My main concern with these is that they're still replicated sequentially leading to replication lags.
MySQL is also slowly adding "Instant DDL", currently still limited to just a few types of changes.
Disclosure: I authored gh-ost (at GitHub), oak-online-alter-table (the original schema change tool) and am a maintainer for Vitess and working on online schema changes in Vitess.
Having used both MySQL and PostgreSQL enough to have been annoyed by features of both of them and fall into a bunch of their traps, I feel inclined to point out a few ways in which MySQL is better, most of which relate to the initial learning curve.
1. MySQL has the better CLI tool. psql is full of minor annoyances, like its cryptic and unmemorable backslash commands like \d for schema inspection, or having a pager turned on by default for query results that are more than one screen length.
2. Postgres has a deeply unintuitive concept of "USER"s and "ROLE"s where actually those are the same thing, and commands like CREATE USER and CREATE ROLE are thus basically synonymous (albeit with slight differences in default values). Worse, lots of the docs pages relating to specific commands don't highlight this fact at all. A majority of web devs I've talked to about Postgres have at some point complained about how they don't understand the permissions system or how ROLEs work.
3. While MySQL enforces that a foreign key from column A -> column B requires an index on both A and B, Postgres only requires a foreign key on column B. Maybe this freedom is nice if you know what you're doing, but it's a major footgun otherwise, since it means that deleting or updating rows in B has a time cost that scales linearly with the number of rows in the table A belongs to.
4. MySQL has a handy ON UPDATE current_timestamp shorthand for creating updated_at timestamp columns, which Postgres lacks, requiring you to use a trigger to implement such a column or move the logic to your application layer.
5. MySQL's "online DDL" features allow making lots of kinds of schema changes without ever holding a lock on the table being updated. In Postgres, by contrast, any schema change, even adding an index with the CONCURRENTLY option, needs to at least momentarily hold a totally exclusive lock that blocks all writes and reads on the table. Worse, as soon as an ALTER TABLE statement starts waiting for that lock, it blocks all new reads and writes against the table. This makes all schema changes in Postgres much more dangerous to the naive; even one that's theoretically able to happen concurrently with queries will hang your application if there's already a long-running query going against the table to be modified. It also means that at scale you need to roll a bunch of your own tooling to apply schema migrations safely, where you run them from a connection with a very short lock_timeout and wrap them in some code that retries on failure to acquire the lock. I don't remember any of this crap being necessary in MySQL.
Maybe Postgres is still better; in particular, maybe it's more performant in a way that outweighs all these nuisances. I don't really know because I've never done any head-to-head comparison to see how the performance of the two would differ in a given scenario, and so the performance differences aren't something I've had a chance to witness as an ordinary dev using both databases. But I just want to make clear that there absolutely is another side to the story!
> In Postgres, by contrast, any schema change, even adding an index with the CONCURRENTLY option, needs to at least momentarily hold a totally exclusive lock that blocks all writes and reads on the table.
It's true that every DDL statement requires a relation-level lock. Though the same is true for even a simple SELECT statement. The important detail is the lock strength for each variety of DDL, and how that affects and interacts with other queries (including other DDL):
CREATE INDEX (even without CONCURRENTLY) will not block reads, even for a moment (it just blocks write DML). CREATE INDEX (without CONCURRENTLY) won't even block other CREATE INDEX statements that run against the same table.
My guess is that your application appeared to exhibit this behavior, but the true problem was actually how several conflicting locks accumulated, which had the effect of blocking SELECTs for an unreasonably long time. A combination of CREATE INDEX and some other conflicting DDL that really does block reads (e.g., certain kinds of ALTER TABLE) can create the false impression that CREATE INDEX blocks reads in general. But that's not really the case at all. At worst, CREATE INDEX is only one part of the "traffic jam" that caused SELECTs to block in this scenario.
Huh. I was sure I was right about this but I tested and realised you are correct. CREATE INDEX (even with CONCURRENTLY) gets blocked by reads while waiting to acquire the lock it needs, but it doesn't block reads during that time.
I was getting confused because I'd seen my attempts to CONCURRENTLY add indexes time out waiting for locks, and had also experienced some of the other kinds of DDL that block reads while waiting for their lock (see https://dba.stackexchange.com/q/293992/18607), and I guess I just fallaciously assumed that index creation therefore also blocked reads without ever having tested it.
~Will edit my prior post to add a note about the error!~ Nope, doesn't seem I'm allowed to edit it any more.
CONCURRENTLY needs to wait for old transactions to go away, including those that haven't (and won't ever) touch the table that you're building an index on. So it's not waiting for a lock as such - it's waiting for older transactions to go away without conflicting in a way that can cause these "traffic jams". This can be a problem for the obvious reason, though generally only for the index build itself.
Tricky problems with relation-level locks tend to come from a combination of one lock request that is "generally non-disruptive but long-lived", and another lock request that is "generally disruptive/blocking but short-lived". I have heard of quite a few problem scenarios where these locks conflict with each other (usually by chance), leading to "generally disruptive/blocking and long-lived" -- which can be very dangerous. But that's fundamental to how lock managers work in general.
The Postgres implementation tries to make it as unlikely as reasonably possible. For example, autovacuum usually notices when something like this happens, and cancels itself.
While I personally favour Postgres the biggest thing I think that MySQL has better is the "undo" log vs MVCC vacuum paradigm of Postgres. In a normal running system bad transactions should be significantly rarer than successful transactions - so making sure the normal path generates less garbage is nice. It means less garbage in the database. Tables with high levels of updates don't need high levels of locking or index updates. It also would make things in Postgres more useful like BRIN index - where an update doesn't necessarily need to change physical table order causing index fragmentation. Would make using BRIN index with things like Sequential UUID's on extremely large tables actually viable vs duplicated indexes that cause insert speed issues.
I second the complaint about the user and role design in Postgres. The first time I encountered it, I convinced myself I just didn't understand it well enough yet since it didn't make a lot of sense to me that it works like it does.
> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
As someone who has been a long time MySQL and MariaDB "fanboy" I couldn't agree more. A decade ago I switched to Postgres and I was hooked after the first 30 minutes already. There simply is no way back. I check back on MySQL/MariaDB every now and then just out of curiosity and it still gives me the shivers.
I remember a huge Reddit thread about a guy who used to work at Oracle. I can't find it anymore, it probably got deleted as people are leaving the Reddit ship but I digress.
Generally speaking, HN is pretty much one sided with Postgres Vs MySQL. I can count with both hands amount of comments over the years who dare to post an alternative view. And Oracle ( MySQL ) is evil.
So I am surprised this thread had more comments on reason to choose MySQL than everywhere else on HN combined. It is still a minority. But at least not as one sided as usual.
> Generally speaking, HN is pretty much one sided with Postgres Vs MySQL.
The community here is very pragmatic, and often is speaking through the lens of personal experience. More so than many other communities.
> And Oracle ( MySQL ) is evil.
I think they get about the right level of skepticism with respect to motive that every big tech player gets. MySQL has a complicated story, but honestly, it's a great product.
> But at least not as one sided as usual.
I really enjoy HN because over the years, I've found the community to be pragmatic. There are always responses that tend to favor dogma over practicality, but on the whole, the community really is great about sharing insight and opinion.
This is pretty much the same debate in linux vs freebsd back in early 2000s. Better code quality doesn't get the most recognition. Usually the devs just go with what's hip. And MySQL just sounds better on the marketing end (just because of it's name) than postgres (I mean who can pronounce the name).
For us (we build business software distributed to customers), the License is the key factor. MySQL/MariaDB is GPL, and even worse, the JDBC driver is also GPL (which makes any business software built on MySQL jdbc driver violate the GPL license). While postgres's license is more permissive.
Just to clarify for others (you probably already know this), the GPL is only an issue if you distribute the binaries for your code. If it's e.g. a typical SaaS then the GPL isn't nearly as scary. The AGPL is problematic in that case though.
Not a licensing lawyer. But with a system like MySQL, which is installed as a service (opposed to a library you link with, like the MySQL Connector for java), most of the GPL triggers if you distribute a modified version to a third party.
Key words there are: Distribute - an employee of your company gives the modified binary to a customer; modified - it's not stock mysql, but you've patched something inside it; and third party - that's some customer outside of your own company or legal entity.
If you distribute an unmodified mysql to your customers, that's fine. If you keep a modified mysql running in your infrastructure only without giving anyone outside your company / legal entity access to the binaries, you're legally fine, but a bit of a jerk (this lead to the whole MongoDB licensing fun against SaaS vendors). Only if you patch MySQL and distribute it, you have to make your changes available.
This makes GPL licensed systems you don't link any code against fairly safe to use.
For a Saas maintained by one person, would you recommend MySQL or Postgres? I don’t know yet the volume of users or connections I will receive (perhaps zero?), but I do like to keep things simple (this pgbouncer thing one may need with postgres makes me feel uncomfortable).
I would use always start with Postgres, because of its vast flexibility and extremely solid feature set, not to forget the excellent docomentation. I'v used Postgres in numerous projects and never had the need for pg_bouncer. Although I am sure there are some situation where it might be beneficial to use it. E.g. on the JVM it is best practice since 20 years to always use a connection pool in front of all JDBC drivers. IMHO transactional DDL is the killer feature for a fast moving project with customers.
as we are all discussing the pros and cons of various DBs, my favourite MySQL feature has to be the pluggable storage engines. Being able to support both InnoDB and RocksDB backed tables in the same database is wonderful. Just today I made use of the MyRocks storage engine's handy TTL [1] feature instead of a tedious delete & optimise scheduled job
Basically it's saying the MySql code base is a mess. Maybe PostgreSql's codebase is also a mess. Without experience inside it, it's hard to really say. Anybody here worked at both for a good while? Most code bases are messes in the real world, I hate to tell ya.
I once dig into MySQL source code (mysqlbinlog), but I could not understand how it can get compiled. Later, I found the answer, they added lots #include "xxx.c" at the end of the file, from then, I don't regard the MySQL as a serious project.
wait, does mariadb randomly lose data? if youre writing a message service, is it bad practice to store each message individually, one row per message? or clump them together?
> MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
Holy crap! I moved away from to Postgres way back in 2012 just for the features (at cost of connection complexity), but this is a plain blow to MySQL fan boys!
Auto vacuum doesn't work, vacuum will double your db size, upgrading requires reading the entire changelog and manual intervention, but yeah otherwise I'm sure postgres is just super amazing right?
Bear in mind that my first experiences with MySQL are 20 years old now, so I freely admit that some of my disdain stems from holding a grudge both toward MySQL AB, the PHP dev team, and hundreds of bad tutorials leading to SQL injection attacks.
Over the years, MySQL devs claimed that foreign keys were mostly superfluous, silently truncating input without warning was normal, implicit lossy data type conversions were acceptable, and quoting numeric values was fine. Don't get me started on weird default character sets or the variously confusing multibyte UTF-8 choices and interactions.
MySQL would parse CHECK constraints but not actually enforce them until very recently (version 8 point something). Let me repeat: MySQL would accept the column syntax
CHECK (foo BETWEEN 1 AND 10)
but would happily accept -306 without so much as a warning, assuming your code was even checking warnings. Most database engines consider saving/retrieving your data to be a strict contract of sorts. MySQL YOLOs it far too often for my taste.
Even now, DDL operations are not transaction safe. Got two CREATE TABLE statements, three ALTER TABLE statements, and a CREATE INDEX? Maybe you have an INSERT right after a CREATE TABLE because it's a lookup table. If any part fails due to whatever reason, you are now in an intermediate state. One table and a new column might be there, but not the rest. How do you roll back from that? You don't. You figure out where it failed and try to re-run the commands that didn't go through. Hopefully it didn't bork a running app.
But here's a laundry list of features MySQL does(n't) support. Imagine trying to learn English but the teacher uses a regional patois and limits you to a small subset of a normal conversational vocabulary. You wouldn't know any different as a learner until you met someone who spoke one of the major dialects like General American, Received Pronunciation, or Australian English. Even though the latter three vary quite a bit, it is a far cry from the dialect spoken like this:
https://youtu.be/0pBOLdZZT6s?t=120
If you've got a particular problem that MySQL happens to solve, good on you. Solving the problem at hand is the most important thing. But I wouldn't count on it as your go-to option. Unless you've got a good and specific reason to use it, I'd steer clear.
You keep quoting this sql-workbench guy, a if he'd be some kind of authority. He's not, and he is biased. A lot of red in MariaDB column can be green, if he researched better.
Now, if we look at your laundry list, and a "real" database, like Oracle
Oracle - Weird UTF8 - check. Actually, WTF-8, encodes supplemental character as 6 bytes, incorrectly, while the original MySQL's utf8mb3 just does not allow them, and is otherwise a strict subset of real UTF8
Oracle - No transactional DDL - check.
Check constraints - 5 years ago, MariaDB
I think you might reconsider your "To get a sense of you're talking about, compare Oracle ..." sentence. Maybe remove Oracle from that list?
Fair enough. I've got no skin in the game for Oracle. Simply omit Oracle from my list.
As for "A lot of red in MariaDB column can be green", please say more. Are you referring to features it supports but is reported wrong or features that simply aren't listed?
Supported, but reported wrong. On the first glance, off the top of my head, without checking his other claims, those are supported features, that are red
Row constructor
SELECT FOR UPDATE ... NOWAIT
IP address datatype
RETURNING clause
Welcome. Be sure to try out the transactional DDL, actual boolean and array types, functional and partial indexes, and other such features that required jumping through hoops in MySQL-land.
https://www.sql-workbench.eu/dbms_comparison.html
In fact, if you've got the time, I'd highly recommend skimming through the excellent manual. You rarely know what you're missing until it's right in front of you.
https://www.postgresql.org/docs/current/index.html
I mostly come across MySQL for simple web applications that need to store some data. For that it works fine and is cheaper than MSSQL for instance. Not really the prime use case for Postgres.
What makes Postgres unfit for that use? It doesn't have large licensing fees like SQL Server, and Postgres (just like MySQL) can run on a single node and handle small amounts of data just fine.
Getting to reasonably solid active/passive failover for relatively simple uses is substantially easier to learn from zero for mysql.
And I say this as somebody who defaults to postgres or sqlite over mysql where possible these days just because I expect it to be less of a headache down the road.
The knowledge about running MySQL in production seems to be more widespread. A lot of web-hosting companies offers it. MySQL is the default for some CMS systems and so on. Postgres is powerful and can be used for pretty much everything but it takes more knowledge.
For the amateurs like me, remember this is a personal opinion. If you haven't heard of this guy or familiar with his DB usecase (which I am not), you should be smart enough to know not get on the bandwagon.
People who are religious about stack either have been working with it for ages and testing it to its limits or are just writing SEO blogs. Flow my stupid plan if you want to really want to get preachy about DB. I started with CSV, then SQLite3 then I stopped. I am familiar with the syntax and basic workflow of MySQL and PostGreSQL just so I can get a job. I still use SQLite3 because it works for me period.
This like in the Seinfeld episode when George complains: what another gift? Just because he's moving (into a new apartment) he gets a gift? Birthday, this, that, the other other it never ends.
Same: why do I care somebody quit/fired/whatever? Why do we gotta know about and it get involved?
My large employer has a certain low, background level of turnover. Work there long enough and this will add up. A dear friend helps run a company where restructuring is going on with mid-high level people get cleaned out. This stuff is all over the place.
Look, if you failed to have forethought to arrange for a new position ahead of time, failed to see this event coming, or don't have the contacts for a new position without blabbering on about publicly ... just come out and say: I want a position. I'm shopping. My CSV is <url-here>. Linked-in is nice I hear for things like that.
There's a weird signaling of virtuosity or humble-bragging I'm sick of. Absolutely Fabulous had a nice line on this: you get your dry-cleaning back and it's a revolution. This was said sarcastically (because no it's not) by Pats ... because doing/improving things at work while important is not comparable to the rhetoric that currently passes for normal.
Wow, right on the chin with that one.
> More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases)
I only have anecdata of my usages of MySQL and Postgres but I swear people that cut their teeth on MySQL and have never used Postgres just don't know what they are missing.
Yes Postgres can be slower out of the box and yes Postgres has worse connection handling that usually requires a pooler but the actual engine and it's performance makes it worth it in my opinion.