I've been a DBA for thousands of MySQL hosts in production and just a handful of PostgreSQL clusters, so I'm not a PostgreSQL expert by any stretch. These PostgreSQL machines were also inherited from a company we acquired, so some of my negative impressions can likely be explained by them being set up poorly, which sadly seems to be the norm.
PostgreSQL's replication has evolved slowly over the years, and as a result it's idiosyncratic, complicated to get right, and hard to generalize about. There aren't a lot of great setup guides and tutorials on the subject either.
Postgres replication works by shipping WAL (transaction/redo) logs to slaves, and the slaves are in a constant state of DB recovery. Streaming replication does so as writes happen, and file based log shipping copies the logs when the 16MB segment is complete. Ours were set up to use streaming replication, without having a separate file-based archive log host to use to fetch older logs. Whenever the write load became too much and the slaves could no longer keep up with the master WAL logs on the master would expire and replication would break entirely, requiring you to rebuild all your slaves from backup. This was my introduction to PG replication. Apparently PG 9.4 has 'replication slots' or something that make this less of a headache.
WAL logs were also applied without checksums or verification which corrupted the entire replication chain, and as I remember we didn't discover this until the corrupt pages got hit by a query.
Software upgrades with PG seemed to be a huge pain as well, we could never figure out how long they were going to take, and even if the on-disk formats didn't change, somehow all the cardinality stats on our tables disappeared, and all our query plans went to shit until we ran some process to rebuild them, which took days.
Operationally there were some things that made our team angry. We couldn't figure out how to reparent a slave without completely re-cloning it from the new parent, even though it was completely up to date from the authoritative master at the time of the reparenting. Also, do you really have to take down the entire cluster to change max connections on a slave? Many such settings seemed to not be dynamic and must remain in sync across the entire chain. One of the things people seem to love about PG is how correct and proper it is respecting the sanctity of your data. As an ops person I'd much rather deal with slightly inconsistent replicas (common in MySQL) than have to fight with how rigid PG is.
1) As a relational query engine first and foremost. Postgres wins handily here, leaving people wondering "why use Mysql."
2) As a reliable replicated datastore, which perhaps happens to be queried via SQL or some rough approximation thereof.
One item you didn't mention is how the replication chain behaves in the event of a node failure. With Postgres, yes, you can promote a slave to master, but last I checked (~2 years ago) promoting a slave to master would break replication to all other slaves, causing a complete loss of redundancy in your entire cluster. I believe this may have been addressed recently; I know folks were working on it.
Going back slightly further in history, Postgres has always lagged far behind Mysql in terms of replication tools and ability. What Postgres has going for it is a far, far superior SQL environment. Which is fantastic; I absolutely prefer it as a developer. But it can be a complete non-starter operationally and architecturally.
2. Used by very popular websites e.g. Twitter, Facebook, Yahoo so you get projects like http://webscalesql.org to utilise.
3. More options for client side tooling although debatable whether these are better than generic options.
4. MySQL Cluster is a more cohesive solution for clustering.
5. The most likely reason: you have bought something that needs it.
Obviously PostgreSQL has plenty of reasons for choosing it over MySQL.
Sure there are commercial options available, but it is really nice to have some good free or open source tools for a database.
Other than that PostgreSQL looks like a nice database, but I didn't find the "killer" argument to change to it. All the nice SQL syntax changes don't mean too much to me, because I am mostly using an ORM to access the database.
Another thing which isn't working as well on PostgreSQL is master - master clustering. All the solutions I found for PostgreSQL looked inferior to Galera (which has its own pitfalls). On the other hand I think it is far better to have one / two beefy servers than having a cluster of master databases.
 - https://datazenit.com
One thing I'd like to see in this software would be export to Excel directly (not CSV). I am no big fan of CSV (e.g. Excel can't handle UTF-8 text in CSV files) and think that exporting to xlsx is a tremendous time saver. Exporting to SQL dumps is another important feature - but I see that you have this in your tool.
Oh I agree, but for many projects its not really an issue.
With Django, you can use the Django admin for GUI changes and the REPL shell for more programatic changes.
On my Django projects, I've never had to actually write SQL.
I am using MySQL Workbench all the time for tasks like this.
Personally, I love PostgreSQL, it's my preference in terms of a db engine...but i have to agree with a lot of the above comments (excluding pt 4 about cluster - thats a world of trouble in my experience)
In addition, MySQL just has a lot fewer features and points of extensibility, so it would likely be easier to support.
7. Wider support by libraries in virtually any language
8. Nearly guaranteed availability in any web facing environment (think about WordPress - if it required Postgres, would it be as popular?)
This isn't to argue superiority in any sort of meaningful technical way - sometimes your data store decision isn't all about the technical bits, unfortunately.
Quite often, actually. I have more than few customers demanding Oracle, because you can't have serious business without Oracle, right?
Then they found out the price and started demanding Oracle XE for production use. Sigh.
I'm sure PHP (and to a lesser extent GNU Screen) enthusiasts know what this feels like. It's as easy as painting a bike shed to point out ways that these tools suck, they all have some historical baggage and warts.
9.5 is coming with Parallel Sequential Scan (http://www.reddit.com/r/programming/comments/30b7f7/parallel...) and lots of other goodies.
So the future is looking very bright for 9.6+ and beyond.
I don't know of any valid reasons why anyone would switch back.
"it is a great fit for ad-hoc, real-time analytics on timeseries data, but not ideal as the transactional backbone of your credit card processing system"
I've no experience with Citus, but on the MySQL side I have a lot of experience with Infobright. It might be a better application to compare against Citus. It's single-server, massively faster than MySQL for analytics work, but also comes with a number of caveats that make it unsuitable to be a general purpose database. Great for analytics, though.
1) CitusDB is not single server. It's multiple server. I have queries which destroy a single MySQL or single Postgres server.
2) Postgres 9.5 comes with using multiple cores. That's DEFAULT. Did you miss the link of Parallel Sequential Scan?
Postgres 9.5 Vanilla is going to out-do MySQL just on this alone.
I know some MySQL guys who wanted to go with Citus but couldnt because it was too expensive. Now looking at the Parallel Sequential Scan, it's showing some major successes. Trust me, once they get this right. 9.6/9.7 could be stuff like Postgres XL that is native. That's right, bye bye MySQL vanilla. It's was nice to know you.
I still think that's not fair to compare MySQL to Citus DB. That's the point I was trying to make. Citus is a multi-server MPP database for analytics. MySQL doesn't have a player in that game. I would certainly be interested to see a (fair) comparison between Citus, Redshift, Vertica, Teradata, Greenplum, Snowflake etc... but that's a different thing.
Parallel Sequential Scan sounds great, but it's not here yet. When it is, we can see whether it makes Postgres a one-stop solution for both traditional, transactional work as well as analytics (which tends to do better on column-oriented stores)
I never did. I just stated my experience and you READ that as a comparison. Due to me mentioning MPP, I then lead with PSS. See how that works?
I mention MPP and PSS to inform people who are on MySQL and looking to change. I don't think MySQL will come with native MPP or PSS because of licensing issues (I may be wrong).
There is shard query with MySQL which is a percona effort. But there are too many moving parts for a production env.
Finally, I'm not too bothered about other offerings in the MPP space. I'm already paying a license fee and reaping the rewards. It's good enough for me and was able to hit the ground running very very quickly.
You are talking about switching between them and even concluded, "I don't know of any valid reasons why anyone would switch back."
It might not be what you meant, but to me, it also reads as if you're comparing MySQL to CitusDB.
While it does happen, it's also very rare that people lose data because of MySQL.
* that's all.
I use it for almost everything... Throw navicat, mysql, google refine, python, r, emacs, some curls and terminal magic, and you can do magic mining, curing altering or whatever you want to do with data.
1) MySQL is bundled in so many low cost PHP hosting services that it is an almost forced choice if you're on a tight budget and presumibly at a very small scale. But anything would do there, maybe even a SQLite backend. Furthermore you usually don't have to really "know" the database you're using if all you do are basic selects, inserts and updates. Chances are that your code will work on any DB and MySQL will be perfectly fine for that kind of applications.
2) At large scale, setting up a cluster of MySQL servers is a more established practice than setting up a cluster of PostgreSQL servers. This is slowly changing as replication features are added to PostgreSQL's core and BDR will be fully developed for 9.5 or 9.6 but it will take time before you find as many experts of PostgreSQL than you have for MySQL.
3) The diagramming tool of MySQL Workbench is far better that anything you can get for free for PostgreSQL. The closest match is Valentina Studio which is not as usable.
I can provide plenty or arguments for not using MySQL though. I'll leave them to a search for why mysql sucks but I want to add some of mine. I'm migrating a project from PostgreSQL to MariaDB (drop in compatible with MySQL). The customer made the choice not because of technical reasons and I don't think I can get into the details. The annoying facts we are facing:
1) A whole mess with TIMESTAMPS that is getting sorted out only with MySQL 5.6 and MariaDB 10's switches for 5.6 compatibility http://shankargopal.blogspot.it/2013/03/mysql-566-timestamp-... ("In MySQL versions prior to 5.6, behavior of columns with default values for the TIMESTAMP column type is pretty much nonstandard [...]")
2) A pretty horrible procedural language adding complications over the PostgreSQL one and lacking some features we gave for granted. We must do some processing inside the database not to move whole tables to the application server so we have a few stored procedures. What used to be a < 20 lines procedure in PostgreSQL is becoming a 50+ lines little monster in MySQL/MariaDB because, for example, they can't return directly a resultset unless it comes straight from a SELECT. We're doing some processing on those SELECTs so we must create a temporary table, insert into it and select from it at the end. Ah, inserting 1100 records (INT, VARCHAR) into that temporary table took 40 seconds. We googled around and discovered that creating it with ENGINE=MEMORY fixes the problem. But even writing that to disk... 40 seconds?
3) The lack of row_number also gave us some problems in grouping records on some columns and picking the ones with the most recent timestamp for each group. See http://stackoverflow.com/questions/1895110/row-number-in-mys... for examples. We built our query over this answer http://stackoverflow.com/a/9652359
In general, MySQL has a pattern of being the only database among the big relational ones that doesn't have feature X or supports it in a non standard way.
4) MySQL's EXPLAIN doesn't seem as informative as PostgreSQL's EXPLAIN ANALYZE but maybe we're still missing something.
Summing it up, IMHO MySQL is OK for INSERT/SELECT/UPDATE applications with simple queries but this experience is confirming my years long policy of trying to steer clear from it.
5) If you're using Ruby, the mysql driver (mysql2) could be slower than the PostgreSQL one (pg). I have a seeding script written in Ruby and it got unbeliveably slow with MySQL (20 times as slow). I opened an issue and you find the details there https://github.com/brianmario/mysql2/issues/623
One of the mysql2 maintainers has been kind and helped me to fix the problem in this particular scenario (grouping inserts together) but it seems that the PostgreSQL driver is more efficient at sending data to the DB (it uses prepared statements, mysql2 doesn't yet) so your application could have a performance boost by using PostgreSQL over MySQL.
Be careful: this problem is Ruby specific, most probably not due to the database, and probably it's workload specific (you could say that Ruby's slow anyway, but still, when nearly all of what you do it talking with the database...)
> Erlang isn't really new...
But is it (comparatively) obscure?