What makes something better than another things is a whole host of things, not just highlighting one really bad case from the other side. I'm sure if the MySQL world wanted they could point out ways that MySQL is so much better than Postgres. Just look back at how long it too for us to get upsert, we're finally getting a better replication story in recent years. To take this one case and highlight it doesn't put the Postgres community in a positive light and it's a shame, because most I know within the community value good engineering work.
MySQL has had a lot of people that contribute to it and has some good things about it, it might be worthwhile for us to pay more attention to where they are better and just focus on improving Postgres and leaving things at that.
Well, I googled around and couldn't find anything conclusive, so I hopped into the / channel for postgres and asked about it. "Is there any straightforward replication solutions for Postgres?"
It was like the early days of open source all over again. The first thing they did was challenge my needs. "Why do you need replication in the first place?" "Replication isn't a panacea you know." "Postgres doesn't have the issues that MySQL does that makes replication necessary." "Replication doesn't work well in the first place."
A ton of people jumped into what had been a quiet channel to make sure I knew that replication, which postgres didn't have, was something I didn't need in the first place, most of which happened before they even knew why I was asking.
After sorting through all that, a few people suggested "solutions". "Just have a cronjob rsync your data directory over to another machine every minute, and if your first server dies then you can just start postgres on the second one." Other solutions were much more convoluted. Patch postgres, add this software, manage it all yourself. And through it all, they kept telling me that I didn't need replication anyway, "but if you INSIST…".
In comparison, the MySQL channel on freenode was staffed with MySQL employees and volunteers who were always happy to help you solve your problem while teaching you what you were missing and giving you resources you needed. They'd answer questions, help your configuration, and even fix your SQL queries for you, all without being condescending about it or making you feel like an idiot.
And that's why I gave up on Postgres: the community. Postgres's was ridiculously hostile and self-important, verging on insecure and defensive. MySQL's was helpful, friendly, and full of resources. To this day I still hear great things about postgres, and I know it does a lot of things better than MySQL, but whenever someone writes a self-superior blog post about it I always think back to how much easier my life was by using a database where I could ask questions of the community and get useful answers.
Not to derail the thread too much, but this is exactly what's going on with Go and generics right now (and I'm a Go fan too.) "Why do you need generics?" "Generics don't solve all your problems" "Generics don't work well" "No language does generics perfectly".
Sorry to hear that your interactions with the community was not great. My experience is quite different - perhaps I was lucky, or maybe it's because I started contributing to the project, not sure. But I stuck with PostgreSQL for exactly the same reason why you left it - the community.
I'm sure there are still people who respond in the "RTFM" or "Try rebooting it" style, but I doubt we're alone in that. One thing that is still generally disliked on the mailing lists are people treating it as unpaid production support ("Hey, I've started using your open source database, and not it failed and the production DB is down. Help me! Pronto!"). And I kinda share that opinion.
Perhaps it's worth giving the community another try? There's loads of new people contributing to it, so hopefully a more diverse audience.
Regarding the replication, before the built-in replication was added to PostgreSQL 9.0, the expectation was that replication will be implemented as a plugin, using some sort of internal API. That's kinda how PostgreSQL does stuff - it's extremely extensible (functions, data types, operators, indexes, ...) so the development community expected that to be the case for replication too.
But the API never materialized, because no one knew how it should look, because no one attempted to implement a plugin using a non-existent API. Chicken-egg problem. That changed though, and nowadays the built-in replication is considered a great feature.
Interestingly enough, the logical replication added gradually over the several recent releases (and particularly in PostgreSQL 10) is implemented using the API approach, and it worked nicely in this case.
Of course, this does not make your past experience any better, but hopefully it illustrates that the development community learned from that.
You probably can't buy support of that caliber, no matter how much money you throw at it.
YM, as always, MV.
You're underestimating how good support can be. Postgres doesn't actually have a primary sponsoring company so good (and equivalent) talent is widely available from multiple vendors.
EDIT: It turns out there is https://www.postgresql.org/docs/9.1/static/warm-standby.html and this seems to accomplish the same goal without actually being replication, so what gives?
As someone who's had to configure warm standby recently, it's fine, but it's not active-active, which infuriated a few Oracle DBA colleagues. The best way to do that at the moment is by using a proxy like PgPool II.
Coincidentally, using those two features together on MySQL can have fun results:
"Because the results of INSERT ... SELECT statements depend on the ordering of rows from the SELECT and this order cannot always be guaranteed, it is possible when logging INSERT ... SELECT ON DUPLICATE KEY UPDATE statements for the master and the slave to diverge. (...) An INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe. (Bug #11765650, Bug #58637)."
> Such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using MIXED mode
The problem you're writing about is not related to MySQL itself, it's a problem implicit in statement-based replication. Row/mixed mode replication has been available in MySQL since 5.1 (almost 10 years ago).
DBAs must know how replication works, as RDBMSs are not toys. It's naive to think that replication in other RDBMs itjustworks! - see, for example, the replication bug that caused slave corruption to the famous transportation company.
The choice to raise a warning rather than stopping replication is arguable, but it's far from being "insanity".
I assume you mean Uber, and the replication bug they ran into on 9.2. Isn't it a bit strange comparing a bug (that got fixed promptly after it was reported) to implementing inherently unsafe replication mode?
In any case, there are two issues described in my post.
I've spent a few minutes after a Google search; results follow.
Bug #11765650: this was fixed 5 and half years ago; see https://lists.mysql.com/commits/143383;
Bug #58637: see https://bugs.mysql.com/bug.php?id=58637 -
this is not a bug; statement-based replication is obsolete, and it's superseded by mixed-mode - if users insist on using the former, they're asking for trouble.
Even today, the "16.2.1 Replication Formats" page says nothing about it being obsolete, or provide indication of level of danger in choosing it. It's like walking on a minefield where the markers are camouflaged.
Insanity. Also pretty much the reason I stopped using MySQL. I am not a DBA. A very small part of my job is actually interacting with the database, and I have to trust that statement execution will be consistent in all cases. That's something PostgresQL has been able to give for the past 10 years.
MySQL is still not there yet, despite being backed by a "database" company with over $30 billion in revenue per year. If Oracle, with all their domain experience and reputation incentive can't get MySQl into a sane and stable state, then its fundamentals must be built on sand.
Furthermore, MySQL manual actually has "Restrictions and Limitations" (which is an interesting read too), yet this "unsafe" nature of "INSERT ... ON DUPLICATE KEY UPDATE" is not mentioned there.
I'd say that if PostgreSQL learned something from MySQL, it's "Never accept feature that is incompatible with existing features, even if it's promised to be fixed soon."
I'm pretty sure no database engine guarantees the ordering of un-ordered SELECT statements.
Many here discussed why it wasn't a good argument and that Uber just didn't know what they were doing.
What is the reddit thread that almost broke them? This one I think: https://www.reddit.com/r/pics/comments/92dd8/test_post_pleas...
From memory, they do something different for just that thread, like loading it from a different cache. I am sure that is all sorts of "wrong", but being "right" and not scaling seems worse!
Postgres is good because we have no open bug tracker whatsoever and we're always adding new features!
I can't figure out who this article is for - certainly not DBAs!
The article picks an extreme example and uses insinuation and condescencion to denigrate MySQL based on one example, but it goes further than that. For example:
> Maybe its wrong to start a flame war on this
So he knows that's what he's doing, and he's okay with that.
> but I need to say, at a personal level, that this is exactly the reason why I have spent years contributing to PostgreSQL and never once contributed to MySQL.
Maybe, but if I wanted to contribute to Postgres where would I start? There's no bug tracker to look through. With the MySQL family, there is.
Also, I'm willing to bet that most people who've contributed to MySQL haven't contributed to Postgres and vice-versa, so to act as though you're doing it from some kind of principled stance rather than just practicality makes no sense to me.
This entire blog post is a giant attack ad against MySQL for the purpose of making it look bad. "MySQL left a bug unfixed for 14 years while its users' data slowly corrupted. Vote Postgres for RDBMS, and I promise to keep your auto_increment fields consistent, day in and day out."
> The article picks an extreme example and uses insinuation and condescencion to denigrate MySQL based on one example, but it goes further than that. For example:
Yeah, I share this opinion. I may agree with some points Simon makes in the blog post, but I certainly dislike how it's presented.
> Maybe, but if I wanted to contribute to Postgres where would I start? There's no bug tracker to look through. With the MySQL family, there is.
I really doubt bug tracker is where people start their contributions. I mean, you don't go to a bug tracker, because (a) if it's broken someone else is probably already working on fixing it, and (b) the unsolved issues are rather complex and not quite suitable for new contributors.
There's actually a bunch of pages on the wiki that might help you:
* https://wiki.postgresql.org/wiki/TODO (a nice list of development topics, including some quite simple ones)
I'd recommend picking a feature that matters to you and either add it (if it's missing) or improve it in some way. That's how I started contributing - I improved a bunch of stuff in the internal statistics tracking, because I needed that. I improved a bunch of performance regressions, because they were affecting the systems in our production systems. And so on.
Then start talking to people on pgsql-hackers. Send a patch, review patches from other people in the commitfests.
Sometimes I don't understand the HN community. There is so much good stuff on this page, but at other times things that certainly don't deserve any attention make it to front page.
In the end, felt like click-baited
MySQL has a bug (eg. #199) or some deficiency (no subselects, no transactions, incorrect utf-8 support). The community's solution is a workaround (Here's how to restructure your select so that you don't need a subselect. Maybe you could use UUIDs instead of integer keys). Meanwhile the problem never existed in Postgres, which seemed to have been correctly and completely implemented from the start. After several years the issue is fixed in MySQL and the boosters ask that the score be reset and the clock re-started.
Postgres has its deficiencies and bugs but other than replication they never seemed core to the product in the way that they were for MySQL.
 These deficiencies span several decades and might seem unfair to bring up but I want to show that what the article is highlighting is just another round of the same problem.
The only small advantage of MySQL I can think of (which is a great downside as well) is that it's really tolerant towards all sorts of erroneous usage and data.
The following should be a problem but are accepted by MySQL
- missing group by clauses
- all of the `ignore` commands (insert, alter table,
- incorrect dates
- not enforcing foreign key constraints
If I remember right, though, mostly because their use pattern was more like a key value store, and MySQL worked better for that pattern.
Well, it's also quite a misleading read as the numerous rebuttals will point out:
really uber should not taken as an example, especially since they failed so hard with their databases in the past.
(I'm a Postgres fan, yes, but I think that there a valid reasons to migrate to MySQL (there are several other companies who use mysql, etc but Uber's stuff is just plain FUD))
So... not in all cases
SET FOREIGN_KEY_CHECKS = 0;
INSERT INTO ...
SET FOREIGN_KEY_CHECKS = 1;
begin; alter table t2 disable trigger all;
insert into t2 (t1_id) values (20);
alter table t2 enable trigger all;
SET session_replication_role TO 'replica';
And we'll all try and forget this feature exists when we're done!
My intuition is that, if there's anything MySQL does uniquely well on the "being an ACID datastore" front, then this would let you get the best of both worlds—essentially letting some of your Postgres tables use MySQL's storage engines et al.
FWD are a great thing when you need to occasionally access remote systems, but it's not a drop-in replacement for storage.
For example, how exactly would you do consistent backups?
update foo set a = b, b = a
In MySQL, it logically does the assignments left to right, not evaluating an assignment's right hand sides until it gets to that assignment, and that evaluation uses the updated values from the already evaluated assignments.
In Python terms, it is supposed to be like this:
a, b = b, a
a = b
b = a
This seems to be the canonical list.
The claim that PostgreSQL does not have a bug tracker is not entirely accurate - there is no bugzilla/debbugs/...-style bugtracker (although it's a recurrent topic in discussions), but we do have pgsql-bugs mailing list. Which is mostly fine, as long as you fix bugs promptly and don't keep unsolved bugs lying around for years.
If you need to know what was fixed in a particular release, see either the release notes (high-level summary), or commits in the stable branches (those are typically just bugfixes, and the commit messages usually include a link to the relevant pgsql-bugs thread).
* careful patch review process - Sometimes it's a bit grueling, and it takes time to get stuff in, but it's extremely valuable. Not only for finding bugs but alternative approaches to implementing the feature.
* careful testing - It's a natural part of the review process (Does the patch have tests? Can I come up with something weird that makes it fail?). Of course, there's a bunch of machines with very different configurations (OS, hw resources, ...) that are generally quite good in finding race conditions, incorrect assumptions in the code, ...
* bugfixing - Someone reported a bug on pgsql-bugs? Challenge accepted. Customer reported a bug internally? Better fix it ASAP.
A mailing list hardly seems like an efficient method for tracking bugs unless a major part of the efficiency is reducing signal to noise by make it more difficult for non-developer or non-paying users to file bugs
I can't speak for other devs contributing to the project, but I think the mailing list is a great way to do this. Ultimately, what you want is a simple way to track information about reported issues, discuss the details, make it reasonably simple to search for existing issues, and post patches fixing the issue. A mailing list does all of that, without having to deploy (and maintain and curate) a separate system.
Why do you think it's more difficult to report a bug this way, compared to a bug tracker? There's even a "report a bug" link on the project home page, where you can submit a bug report through a simple web form (you need a community account, but every other bug tracker I'm aware of requires that too).
Can you explain why you think a mailing list is less efficient than a bug tracker?
With a mailing list, I can work on the most recent problems, and with my memory, I can work on the problems that most stand out in my memory (either because they affect me personally, or the process of solving them has attracted my background attention). Neither of those seem like good ways to prioritize things.
But with a bug tracker, I can look for old problems that have not had attention paid to them, and pay attention to them.
I certainly agree a mailing list is a good approach, and many bug-tracking systems operate essentially around mailing lists: RT adds a tracking number to the subject line, and Debbugs essentially creates one mailing list per bug.
As an end-user the first thing I want to do before reporting a bug is search and see if it has already been reported or if it is being worked on and when it is expected to be released. If I was told that I had to dig through mailing list I would probably just stop before I even started.
> Can you explain why you think a mailing list is less efficient than a bug tracker?
As a bug reporter, I want a friendly way efficiently search and filter existing bugs based on status, version, subsystem, etc. I also want to be able to subscribe to a specific bug to get updates. Maybe you guys have some way to get all those features from a mailing list, but new users certainly won't have a clue.
As a developer I want to be able to easily tag, filter, assign and triage bugs. Assign priorities and associate them with releases and pull requests. I also want to be able to do high level reporting.
Reasonable sounding theory. How true is it, how significant?
Does that story describe how 2ndquadrant gets business -- other vendors have customers asking for an SLA on PostgreSQL fixes, which would require PostgreSQL experts to meet?
I wonder how many private forks of MySQL had fixed that and never upstreamed a patch.
So the main point why postgres is better is that they don't use a bug-tracking system? Definitely that makes any qualified comparison impossible of such claims as the better "time to fix" or bug responsiveness. IMHO this post is trying to sell a weakness as a strength.