Hacker News new | comments | show | ask | jobs | submit login

Can anyone provide a sound argument for choosing MySQL over Postgres?



My general sense is that MySQL is battle tested and operationally sane to manage at scale; I'm talking specifically about replication.

I've been a DBA for thousands of MySQL hosts in production and just a handful of PostgreSQL clusters, so I'm not a PostgreSQL expert by any stretch. These PostgreSQL machines were also inherited from a company we acquired, so some of my negative impressions can likely be explained by them being set up poorly, which sadly seems to be the norm.

PostgreSQL's replication has evolved slowly over the years, and as a result it's idiosyncratic, complicated to get right, and hard to generalize about. There aren't a lot of great setup guides and tutorials on the subject either.

Postgres replication works by shipping WAL (transaction/redo) logs to slaves, and the slaves are in a constant state of DB recovery. Streaming replication does so as writes happen, and file based log shipping copies the logs when the 16MB segment is complete. Ours were set up to use streaming replication, without having a separate file-based archive log host to use to fetch older logs. Whenever the write load became too much and the slaves could no longer keep up with the master WAL logs on the master would expire and replication would break entirely, requiring you to rebuild all your slaves from backup. This was my introduction to PG replication. Apparently PG 9.4 has 'replication slots' or something that make this less of a headache.

WAL logs were also applied without checksums or verification which corrupted the entire replication chain, and as I remember we didn't discover this until the corrupt pages got hit by a query.

Software upgrades with PG seemed to be a huge pain as well, we could never figure out how long they were going to take, and even if the on-disk formats didn't change, somehow all the cardinality stats on our tables disappeared, and all our query plans went to shit until we ran some process to rebuild them, which took days.

Operationally there were some things that made our team angry. We couldn't figure out how to reparent a slave without completely re-cloning it from the new parent, even though it was completely up to date from the authoritative master at the time of the reparenting. Also, do you really have to take down the entire cluster to change max connections on a slave? Many such settings seemed to not be dynamic and must remain in sync across the entire chain. One of the things people seem to love about PG is how correct and proper it is respecting the sanctity of your data. As an ops person I'd much rather deal with slightly inconsistent replicas (common in MySQL) than have to fight with how rigid PG is.


Agreed - replication is the key. There are two key perspectives on what architectural purpose a database serves:

1) As a relational query engine first and foremost. Postgres wins handily here, leaving people wondering "why use Mysql."

2) As a reliable replicated datastore, which perhaps happens to be queried via SQL or some rough approximation thereof.

One item you didn't mention is how the replication chain behaves in the event of a node failure. With Postgres, yes, you can promote a slave to master, but last I checked (~2 years ago) promoting a slave to master would break replication to all other slaves, causing a complete loss of redundancy in your entire cluster. I believe this may have been addressed recently; I know folks were working on it.

Going back slightly further in history, Postgres has always lagged far behind Mysql in terms of replication tools and ability. What Postgres has going for it is a far, far superior SQL environment. Which is fantastic; I absolutely prefer it as a developer. But it can be a complete non-starter operationally and architecturally.


With regards to replication and remastering - that was fixed in 9.3 - https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_...


True, but built-in replication is still the whole installation or nothing. I believe Slony doesn't have this limitation; though, being trigger-based has some other quirks of it's own. Pgpool is another solution I've also used for a while. All have quirks which I don't recall running into with Mysql's replication.


1. It's far more popular so easier to find solutions to problems.

2. Used by very popular websites e.g. Twitter, Facebook, Yahoo so you get projects like http://webscalesql.org to utilise.

3. More options for client side tooling although debatable whether these are better than generic options.

4. MySQL Cluster is a more cohesive solution for clustering.

5. The most likely reason: you have bought something that needs it.

Obviously PostgreSQL has plenty of reasons for choosing it over MySQL.


For me the biggest reason not to change to PostgreSQL was the sad state of pgAdmin. In my opinion it doesn't hold a candle against the MySQL Workbench or the old MySQL tools.

Sure there are commercial options available, but it is really nice to have some good free or open source tools for a database.

Other than that PostgreSQL looks like a nice database, but I didn't find the "killer" argument to change to it. All the nice SQL syntax changes don't mean too much to me, because I am mostly using an ORM to access the database.

Another thing which isn't working as well on PostgreSQL is master - master clustering. All the solutions I found for PostgreSQL looked inferior to Galera (which has its own pitfalls). On the other hand I think it is far better to have one / two beefy servers than having a cluster of master databases.


pgAdmin is a disaster, and honestly all the tooling around PostgreSQL is not in the best shape. I am currently working on a new cross-platform database admin for PostgreSQL and MySQL - Datazenit[0]. It tries to solve a lot of problems other DB tools have and provides a nice UI on top of it. If you are interested, check it out.

[0] - https://datazenit.com


Looks pretty nice. I am not too fond of web-based database management tools, but it looks good.

One thing I'd like to see in this software would be export to Excel directly (not CSV). I am no big fan of CSV (e.g. Excel can't handle UTF-8 text in CSV files) and think that exporting to xlsx is a tremendous time saver. Exporting to SQL dumps is another important feature - but I see that you have this in your tool.


Looks neat, why not open it up for the community? Your site design just begs for a "Fork Me on GitHub" link.


> For me the biggest reason not to change to PostgreSQL was the sad state of pgAdmin. In my opinion it doesn't hold a candle against the MySQL Workbench or the old MySQL tools.

Oh I agree, but for many projects its not really an issue.

With Django, you can use the Django admin for GUI changes and the REPL shell for more programatic changes.

On my Django projects, I've never had to actually write SQL.


You never had to copy data from a test database to the productive database or the other way around ?

I am using MySQL Workbench all the time for tasks like this.


The odd point i would add is that Amazon are betting on it with Aurora. While Aurora is not really MySQL, the fact that they decided upon the MySQL API rather than PostgreSQL really says a lot about the market penetration of MySQL and that its truly got some life left in it yet.

Personally, I love PostgreSQL, it's my preference in terms of a db engine...but i have to agree with a lot of the above comments (excluding pt 4 about cluster - thats a world of trouble in my experience)

http://aws.amazon.com/rds/aurora/


From my understanding, Aurora is a modified MySQL with a few components substantially rewritten and a new, log-structured storage engine added that is well integrated with Amazon's storage infrastructure (volumes auto-grow, quorum is handled at the storage layer, replicas are just caching views on top of shared, replicated storage). MySQL has support for multiple storage engines over a relatively simple API, while Postgres is tightly integrated with its storage engine.

In addition, MySQL just has a lot fewer features and points of extensibility, so it would likely be easier to support.


I see a few down-votes (thats cool...) out of curiosity and to better the discussion would anyone down-voting mind adding to the conversation? Generally curious about peoples thoughts on the MySQL V PostgreSQL situation, in particular how the community is receiving Aurora.


6. A lot more people available on the market with mysql familiarity

7. Wider support by libraries in virtually any language

8. Nearly guaranteed availability in any web facing environment (think about WordPress - if it required Postgres, would it be as popular?)

This isn't to argue superiority in any sort of meaningful technical way - sometimes your data store decision isn't all about the technical bits, unfortunately.


> This isn't to argue superiority in any sort of meaningful technical way - sometimes your data store decision isn't all about the technical bits, unfortunately.

Quite often, actually. I have more than few customers demanding Oracle, because you can't have serious business without Oracle, right?

Then they found out the price and started demanding Oracle XE for production use. Sigh.


My biggest complaint about MySQL, there is no equivalent of pgpool or pgbouncer. There was the dreadful mysql proxy, but it's basically dead. All of the other solutions are commercial, terrible, or both.


This question has been answered very ably by a lot of other people. What I really want to know is why we get a version of this every time there's a thread about MySQL here. I used to use MySQL, now I'm mostly using MS SQL, I could give a dissertation about what I don't like and do like about each relative to the other. When I go into a Postgres thread, I'm mostly looking to learn about Postgres. It feels like whenever Postgres users come into a MySQL thread, it's to score points. It's annoying and it gives a really poor impression of the Postgres community.


Thanks for pointing this out, I agree it's tacky. Unfortunately this sort of thing seems to be part of nerd culture. I know it's gotten bad when I have a serious urge to sidetrack PostgreSQL threads with MySQL related trolling, but participating will only make things worse.

I'm sure PHP (and to a lesser extent GNU Screen) enthusiasts know what this feels like. It's as easy as painting a bike shed to point out ways that these tools suck, they all have some historical baggage and warts.


I switched from MySQL to Postgres. Specifically, we are using CitusDB. The performance we are getting is nothing but staggering, due to MPP.

9.5 is coming with Parallel Sequential Scan (http://www.reddit.com/r/programming/comments/30b7f7/parallel...) and lots of other goodies.

So the future is looking very bright for 9.6+ and beyond.

I don't know of any valid reasons why anyone would switch back.


People here are comparing vanilla MySQL to vanilla Postgres. Both are row-oriented, general purpose relational databases. Citus is specifically optimised for analytics, and is not a drop-in replacement for either MySQL or Postgres. From their FAQ:

"it is a great fit for ad-hoc, real-time analytics on timeseries data, but not ideal as the transactional backbone of your credit card processing system"

I've no experience with Citus, but on the MySQL side I have a lot of experience with Infobright. It might be a better application to compare against Citus. It's single-server, massively faster than MySQL for analytics work, but also comes with a number of caveats that make it unsuitable to be a general purpose database. Great for analytics, though.


Couple of things.

1) CitusDB is not single server. It's multiple server. I have queries which destroy a single MySQL or single Postgres server.

2) Postgres 9.5 comes with using multiple cores. That's DEFAULT. Did you miss the link of Parallel Sequential Scan?

Postgres 9.5 Vanilla is going to out-do MySQL just on this alone.

I know some MySQL guys who wanted to go with Citus but couldnt because it was too expensive. Now looking at the Parallel Sequential Scan, it's showing some major successes. Trust me, once they get this right. 9.6/9.7 could be stuff like Postgres XL that is native. That's right, bye bye MySQL vanilla. It's was nice to know you.


For the record, I think Postgres is an all-round better database than MySQL. I've experience with both in large production deployments.

I still think that's not fair to compare MySQL to Citus DB. That's the point I was trying to make. Citus is a multi-server MPP database for analytics. MySQL doesn't have a player in that game. I would certainly be interested to see a (fair) comparison between Citus, Redshift, Vertica, Teradata, Greenplum, Snowflake etc... but that's a different thing.

Parallel Sequential Scan sounds great, but it's not here yet. When it is, we can see whether it makes Postgres a one-stop solution for both traditional, transactional work as well as analytics (which tends to do better on column-oriented stores)


I don't think it's fair to compare Citus DB to MySQL either.

I never did. I just stated my experience and you READ that as a comparison. Due to me mentioning MPP, I then lead with PSS. See how that works?

I mention MPP and PSS to inform people who are on MySQL and looking to change. I don't think MySQL will come with native MPP or PSS because of licensing issues (I may be wrong).

There is shard query with MySQL which is a percona effort. But there are too many moving parts for a production env.

Finally, I'm not too bothered about other offerings in the MPP space. I'm already paying a license fee and reaping the rewards. It's good enough for me and was able to hit the ground running very very quickly.


> I switched from MySQL to Postgres. Specifically, we are using CitusDB. The performance we are getting is nothing but staggering, due to MPP.

You are talking about switching between them and even concluded, "I don't know of any valid reasons why anyone would switch back."

It might not be what you meant, but to me, it also reads as if you're comparing MySQL to CitusDB.


The TokuDB engine for MySQL gives you amazing compression and insert performance compared to Postgres, which is useful for certain workloads.


Finding people with Postgresql experience is still hard in some areas. MySQL is a well understood product, so despite it's quirks you can almost always get help or buy support. You're almost certain that what ever technology you want to use in conjunction with MySQL, there's support for it.

While it does happen, it's also very rare that people lose data because of MySQL.


In-house experience. You can always develop new experience, but if you have a lot of people who have already been cut by MySQL enough times to know it well...


* Third-party frameworks that have been hyperoptimised for MySQL, and the PG support is half a volunteer. e.g. MediaWiki, WordPress, Magento, Drupal.

* that's all.


sometimes you just want to dump a damn easy website or utility, just a few clicks a couple queries and you are done, something for processing data, sometimes you need a little app for you...

I use it for almost everything... Throw navicat, mysql, google refine, python, r, emacs, some curls and terminal magic, and you can do magic mining, curing altering or whatever you want to do with data.


I know only three reasons for that:

1) MySQL is bundled in so many low cost PHP hosting services that it is an almost forced choice if you're on a tight budget and presumibly at a very small scale. But anything would do there, maybe even a SQLite backend. Furthermore you usually don't have to really "know" the database you're using if all you do are basic selects, inserts and updates. Chances are that your code will work on any DB and MySQL will be perfectly fine for that kind of applications.

2) At large scale, setting up a cluster of MySQL servers is a more established practice than setting up a cluster of PostgreSQL servers. This is slowly changing as replication features are added to PostgreSQL's core and BDR will be fully developed for 9.5 or 9.6 but it will take time before you find as many experts of PostgreSQL than you have for MySQL.

3) The diagramming tool of MySQL Workbench is far better that anything you can get for free for PostgreSQL. The closest match is Valentina Studio which is not as usable.

I can provide plenty or arguments for not using MySQL though. I'll leave them to a search for why mysql sucks but I want to add some of mine. I'm migrating a project from PostgreSQL to MariaDB (drop in compatible with MySQL). The customer made the choice not because of technical reasons and I don't think I can get into the details. The annoying facts we are facing:

1) A whole mess with TIMESTAMPS that is getting sorted out only with MySQL 5.6 and MariaDB 10's switches for 5.6 compatibility http://shankargopal.blogspot.it/2013/03/mysql-566-timestamp-... ("In MySQL versions prior to 5.6, behavior of columns with default values for the TIMESTAMP column type is pretty much nonstandard [...]")

2) A pretty horrible procedural language adding complications over the PostgreSQL one and lacking some features we gave for granted. We must do some processing inside the database not to move whole tables to the application server so we have a few stored procedures. What used to be a < 20 lines procedure in PostgreSQL is becoming a 50+ lines little monster in MySQL/MariaDB because, for example, they can't return directly a resultset unless it comes straight from a SELECT. We're doing some processing on those SELECTs so we must create a temporary table, insert into it and select from it at the end. Ah, inserting 1100 records (INT, VARCHAR) into that temporary table took 40 seconds. We googled around and discovered that creating it with ENGINE=MEMORY fixes the problem. But even writing that to disk... 40 seconds?

3) The lack of row_number also gave us some problems in grouping records on some columns and picking the ones with the most recent timestamp for each group. See http://stackoverflow.com/questions/1895110/row-number-in-mys... for examples. We built our query over this answer http://stackoverflow.com/a/9652359 In general, MySQL has a pattern of being the only database among the big relational ones that doesn't have feature X or supports it in a non standard way.

4) MySQL's EXPLAIN doesn't seem as informative as PostgreSQL's EXPLAIN ANALYZE but maybe we're still missing something.

Summing it up, IMHO MySQL is OK for INSERT/SELECT/UPDATE applications with simple queries but this experience is confirming my years long policy of trying to steer clear from it.

Edit:

5) If you're using Ruby, the mysql driver (mysql2) could be slower than the PostgreSQL one (pg). I have a seeding script written in Ruby and it got unbeliveably slow with MySQL (20 times as slow). I opened an issue and you find the details there https://github.com/brianmario/mysql2/issues/623 One of the mysql2 maintainers has been kind and helped me to fix the problem in this particular scenario (grouping inserts together) but it seems that the PostgreSQL driver is more efficient at sending data to the DB (it uses prepared statements, mysql2 doesn't yet) so your application could have a performance boost by using PostgreSQL over MySQL. Be careful: this problem is Ruby specific, most probably not due to the database, and probably it's workload specific (you could say that Ruby's slow anyway, but still, when nearly all of what you do it talking with the database...)


Hiring feasibility.


How about... MySQL runs some of the most popular websites on the internet. I know folks on HN like to nerd rageā„¢ over the newest/obscure technology (see rust, see ocaml, see erlang), but the reality is, what works, what has mindshare, and what is easy to scale with a team.


PostgreSQL predates MySQL by some 10 years. Or 15 years if you start counting from InnoDB, MySQL was really a toy before that.


Erlang isn't really new...


Moreover, HN and the computer industry in general systematically ignored OCaml for many years (I'm happy to see its recent resurgence in popularity [which is to say, people are more likely to have heard the word before], but I do find it kind of puzzling).


>> newest/obscure technology

> Erlang isn't really new...

But is it (comparatively) obscure?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: