I see the “Postgres is technology superior” take on hn all the time. Can you act...

jitl · 2023-10-15T16:56:38

Postgres has a reputation for correctness and does have some more specialized features, column, and index types. It’s also totally fine and dandy at small-medium size. Fully transactional DDL and online schema migrations put it ahead of MySQL at those scales too.

But I agree that MySQL is easier to use at high scale for its superior replication story, 3rd party tooling, and pragmatism. If you’ve only run small Postgres you don’t know the fear of the query planner suddenly going AWOL at 3am, or needing to shard your database under duress as you watch the Postgres transaction ID stick closer and closer to overflow because VACUUM won’t finish. You haven’t needed to stress out about the super low practical connection limit and build a cluster of Postgres proxies using PGBouncer or similar.

Related blog post by a teammate: https://www.notion.so/blog/sharding-postgres-at-notion

bbatha · 2023-10-15T16:47:30

Postgres still has a ways to go for scale but has improved substantially in replication options since that 2017 article. Sharding has also seen a lot of upstream patches from enterprise db and other commercial providers since then. Upgrades in particular are still a sore spot.

Also note that folks aren’t running giant vanilla MySQL clusters. They’re running it through vitess or home grown tools with similar functionality.

Finally MySQL gets performance in these large setups by turning off features like foreign keys, acid guarantees etc. it’s awesome and powerful that you can do this in MySQL but it’s not apples to apples here. MySQL has also improved a lot on these dimensions in the last few years.

The other reality here is that on nvme hardware with a few tb of ram you can get away with a basic reader writer replica setup for a long time.

samlambert · 2023-10-15T16:51:39

Vanilla MySQL vs vanilla Postgres. MySQL still wins on scale. You run single MySQL servers with 70,000 connections, you can't get close on Postgres because of a fundamentally broken connection-per-process model. The Neon team suggested changing that and got flamed by the community.

I was at GitHub in 2017 when that GitLab outage happened. We had 60 million users running on vanilla MySQL no Vitess at that point. We had 10 million users literally running on 3 MySQL servers. Don't pretend that even now Postgres can do that.

bbatha · 2023-10-15T17:05:59

Agreed

However, you can absolutely run that many users on a Postgres cluster of that size today on modern hardware. Connections are absolutely a sore point and probably my number one pain point as a Postgres admin day to day. Pg bouncer is incredibly easy to run though. You rarely actually need 70,000 connections most will be idle most of the time anyway it’s not like MySQL can actually serve 70,000 queries on 70,000 connections simultaneously. But it certainly makes administration and the programming model more difficult.

iamdanieljohns · 2023-10-16T02:01:55

What do you think of Supavisor[0] ?

[0] https://supabase.com/blog/supavisor-1-million

samlambert · 2023-10-15T18:03:33

70k QPS is easy on a single MySQL.

bbatha · 2023-10-15T18:41:00

Same with Postgres. Qps != active connections.

justinclift · 2023-10-15T17:11:07

While MySQL being able to scale is nice and all, it's had serious problems with even keeping basic data integrity for the data people hand it... for decades.

bombcar · 2023-10-15T17:45:48

This is the root of it all - for years and years MySQL was perfectly capable of doing the needful, but it shipped with default configurations that would do things like use "not_quite_really_UTF8" and "myisam_tables_please_no_acid".

If you knew enough to switch to innodb and "mysql_real_utf8_string" or whatever it was, you had a quite performant and stable system.

justinclift · 2023-10-15T18:11:46

There have also been numerous horror stories where people discovered (after the fact) that MySQL wasn't storing their data as given, but instead internally silently converting it to some other format/type and storing that.

People that directly experienced this very thing will probably chime in to give examples (etc). :)

jarym · 2023-10-15T18:36:38

Yep - ME! Happened to ME! Now some will blame bad development, should have caught it in the application code via testing, etc.

A 'true' DBMS will provide some guarantees and rather fail a transaction than do a 'best endeavours' job with it.

anonacct37 · 2023-10-15T22:47:28

You hit the nail on the head, those were my top two complaints, but the rabbit hole goes deep.

It's not an exaggeration to say that trying to keep mysql from corrupting your data feels like a full-time job. Mysql in many ways, especially as used by massive social media companies early on was closer to a SQLish DSL for a non-transactional kv store than a "real" ACID database.

That's why "but it can do 70k rps" is just noise to someone. I don't care how fast it can lose my data.

I'm all for people needing to read the docs but man a database burns trust pretty fast when it's so easy to have it pretend to support transactions or pretend to support utf8.

jarym · 2023-10-15T18:34:22

That was my biggest gripe with it - at least back in the day. Back then it really felt like MySQL vs Postgres was a trade-off between 'easy but no guarantees on data integrity' and 'correct but operationally harder to manage'.

The gap has for sure closed in many areas but I still feel safer with Postgres today because there's been little reason for me to try MySQL again.

karmakaze · 2023-10-15T19:11:13

Can we restrict our discussion to the situation now. If default configs result in data corruption, that's an issue. If an older version in the past decade does, that's of less interest.

justinclift · 2023-10-15T22:17:40

[flagged]

evanelias · 2023-10-15T23:26:03

MySQL's default storage engine has been InnoDB -- a high-quality OLTP storage engine which is fully ACID compliant -- for thirteen years now.

Meanwhile MySQL's default settings have included strict sql_mode (giving a hard error on data type mismatches, overflows, etc) for eight years.

These have been available as non-default options for 20+ years, and larger MySQL users have long enabled them before they were defaults.

Yes, the defaults certainly should have changed earlier, especially re: strict sql_mode, I won't dispute that. However on the default storage engine side, the historical reasoning makes perfect sense: InnoDB and MySQL were owned by different companies. Oracle purchased the makers of InnoDB in 2005 and didn't purchase Sun (incl MySQL) until 2010. Once both products were owned by Oracle, they made InnoDB the default storage engine in the next major MySQL release that same year.

In any case, your claims of "decades of misleading marketing" about transactions is completely fictional. This simply did not happen for "decades" and has not been a discussion point of any sort for a very, very long time. Serious MySQL shops were never using MyISAM at scale in the first place.

justinclift · 2023-10-16T00:13:42

[flagged]

evanelias · 2023-10-16T02:07:10

> Yeah, good luck with that.

That's a completely nonsensical reply to the contents of my comment. I'm not the same person you were originally responding to up-thread.

justinclift · 2023-10-16T02:10:53

[flagged]

samlambert · 2023-10-16T13:31:23

[flagged]

justinclift · 2023-10-17T10:56:22

Alternatively, I don't have the energy to deal with people pushing misinformation. :/

evanelias · 2023-10-17T19:38:47

You chose to come into a thread about MariaDB, you wrote some verifiably factually incorrect statements about MySQL, and then you complain about not having the energy to discuss further?

I took time out of my Sunday to write a fact-based rebuttal to your false claims in this thread. The dates in my comment are all verifiable on Wikipedia, among other places. And yet your response is childish repetition, followed by a citationless comment claiming that my own comment was misinformation.

Since you're a Golang enthusiast (as am I), perhaps I can reframe this in a more relatable way. Imagine that it's the year 2035, and there's an HN thread about Golang. Some Go-hater chimes in that no one should ever trust Go because of decades of misleading marketing ("no one needs generics!"), GOPATH being stupid, and lack of a module dependency system. What would your response to this be in 2035, knowing full well that these issues were far in the past and that the claim of "decades" is at best an extreme exaggeration?

justinclift · 2023-10-18T05:35:52

> you wrote some verifiably factually incorrect statements about MySQL

That's the kind of bullshit I'm meaning. I have no energy left to deal with trolls, and that means you.

evanelias · 2023-10-18T14:34:53

MySQL was first released in 1995.

A decade is 10 years. Plural decades means 20 or more years.

You claimed MySQL made decades of misleading marketing about transactions. That would mean they were making such statements in 2015 or later.

That is verifiably factually incorrect, considering that InnoDB (a fully ACID-compliant storage engine) was made the default storage engine in 2010, and prominent MySQL users had already switched to InnoDB en masse from 2005 to 2008, some even earlier. The previous storage engine (MyISAM) was garbage that wasn't even crash-safe, so at scale its deficiency was plainly obvious.

I am not posting "bullshit" or "trolling". Your statement was incorrect and this is simple date math.

rjh29 · 2023-10-15T21:22:41

But now strict mode is on by default, the situation has improved substantially.

CAP_NET_ADMIN · 2023-10-15T17:42:13

I have 5 million users running on PostgreSQL, 3 dedicated servers (primary + 2 replicas), we use less than 190 connections, vanilla PostgreSQL and no things like pgbouncer.

samlambert · 2023-10-15T18:02:47

Right. In 2023. This was possible in 2013 with MySQL.

winrid · 2023-10-15T18:53:16

It was possible with PG in 2013 too, with only 190 connections...

evanelias · 2023-10-15T20:52:53

> Finally MySQL gets performance in these large setups by turning off features like foreign keys, acid guarantees etc.

Large MySQL users are generally not turning off acid guarantees. That is nonsense. What settings are you even specifically referring to with this?

On the durability side, large MySQL shops only mess with innodb_flush_log_at_trx_commit if there's a safe distributed durability mechanism in use (e.g. synchronous or semi-synchronous replication). Or perhaps if the data is ephemeral/recomputable, but that's a similar use-case to running Postgres on a ramdisk.

On the isolation side, MySQL supports READ UNCOMMITTED (unlike Postgres) but it is very rarely used. Only useful for some specific situations. fwiw MS SQL Server supports it too, it's not a MySQL specific thing.

samlambert · 2023-10-15T16:52:52

Also Vitess has foreign keys shipping every soon.

v3ss0n · 2023-10-15T17:05:43

MySQL have unfixable legacy code for scalability that world top database engineer gave up and asked users to stop using.. https://www.google.com/amp/s/www.theregister.com/AMP/2021/12...

facebook was build using pho mysql since start, by the time they have billion dollar valuation 8ts too late for them to change.

samlambert · 2023-10-15T18:17:28

They have reevaluated multiple times.

endisneigh · 2023-10-15T17:48:30

Considering the source take it with a grain of salt

v3ss0n · 2023-10-16T08:26:34

Oracle engineer who is tasked to work on MySQL optimizations is not a good source?

endisneigh · 2023-10-16T09:14:32

A single departing employees opinion is not definitive, no. Considering both YouTube and Facebook use modified MySQL I’m skeptical there’s some inherent structural disadvantages.

Not to mention no technical reasoning is given at all, so yes take the article with a grain of salt.

otabdeveloper4 · 2023-10-15T16:12:58

Yes, Postgres is a toy database that can't do replication correctly. You are not wrong.

keep_reading · 2023-10-15T16:52:44

Postgres replication features have lagged behind but were never dangerous.

MySQL has multiple edge cases where things on the master will not happen on the slave depending on which replication method you've selected and which features you're using. MySQL is the issue here.