Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are pros and cons of PostgreSQL and MySQL?
186 points by stanhou on Apr 9, 2017 | hide | past | web | favorite | 210 comments
I searched the web but most of the articles are written several years before. Many cons have been fixed.

From many years of hosting both - MySQL seems easier at first but there are so, so many problems with its internal design that hide problems until its too late. PostgreSQL in my eyes is engineered in a much smarter way and has sensible defaults. PostgreSQL won't implement something unless it's _going to work_ and work properly. MySQL is analogous to software like Drupal and 'hobbiest' projects that have grown organically without proper engineering - and when it's too late you realise you're missing data, MySQL makes it easy to do things the wrong way, PostgreSQL enforces you do things the right way. We've also found that PostgreSQL performance far outshines MySQL as long as you've setup basic things like shared buffer sizes etc correctly.

One thing I've heard said before that always seemed true to me:

PostgreSQL was built with their priorities being: correctness first, then features, then performance.

MySQL was built with their priorities being: performance, then features, then correctness.

I'd agree except that PostgreSQL performance seems to have always won over MySQL in non-synthetic tests regardless, perhaps just through good quality engineering they haven't needed to 'worry' about performance until later.

No, when MySQL didn't have transactions, it was the fastest "database" for writes!

Also, as somebody who's being very careful with which fields I update, keeping in mind indexes, I'm still in shock about learning that PostgreSQL updates all indexes on a table, even for columns not included in that write - that ha to have a cost.

If you have a table that undergoes frequent updates and this is a concern of yours, look into setting a fillfactor on it. This will help Postgres keep writes in the same page, which means it doesn't need to touch indexes for non-indexed columns.

Edit: non-updated columns, I mean.

> when MySQL didn't have transactions, it was the fastest "database" for writes!

`/dev/null` is the fastest "database" for writes ;)

They are working to fix that. It didn't make it into PG10 but the developers are aiming for PG11.

I know, it just was so unexpected. Very few people realised that this is how it worked until some big blog post a few months ago.

That wasn't always the case, but it's easier to start with something correct and make it fast than it is to start with something fast and make it correct.

You can always find (pretty basic) examples where either one is faster, and sometimes by a lot.


Where do you draw the line on synthetic vs. non-synthetic tests, and do you know where I should look to find them?

Performance a priority for MySQL? Just moments ago I spoke with some friends about their database performance problems with big joins on big datasets in MySQL. Switching to PostgreSQL solved the problem. And for a customer, simply upgrading from MySQL to MariaDB improved the performance 5-fold after just a 15 minute install.

> 've also found that PostgreSQL performance far outshines MySQL as long as you've setup basic things like shared buffer sizes etc correctly.

Interesting, I've found the exact opposite to be true the last time I looked. Most comparisons which I've looked at which have shown PG to be faster are usually making really basic mistakes in scaling up the MySQL config (which is, admittedly, not a trivial task, but there are a lot of resources out there).

Can you point to any benchmarks or usage patterns which show PG to be the better performer?

Uber made a pretty good case for mysql and against postgres when it comes to indexing: https://eng.uber.com/mysql-migration/

do you have specific example? (sincerely interested)

- MySQL example: for MySQL there is one thing called Collation, which decides about the correct ordering of chars in table values. The default was `latin1_swedish_ci` for years, not `utf8_unicode_ci` which might be more sensible.

- Postgres Example: to set up Postgres correctly, you need a Postgres user in the OS, who runs the database. That should enforce to not run the database process as root.

I spent this past week learning/solving this very issue. MySQL's utf8 character set doesn't actually include _all_ utf8 characters. Use utf8mb4 instead. Same goes for collation; use utf8mb4_unicode_ci.

The good news is they are changing the defaults in 5.8: http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf...

Thanks for sharing this. I looked it up on SO [1] and since I'm from Germany, I'll stay with utf8_unicode_ci.

> For examples, the Unicode collation sorts "ß" like "ss", and "Œ" like "OE" as people using those characters would normally want, whereas utf8mb4_general_ci sorts them as single characters (presumably like "s" and "e" respectively).

[1]: http://stackoverflow.com/a/766996/104959

Was just dealing with this today:

If some row is deleted due to a foreign key cascade constraint, delete triggers are not executed. When you delete directly, they run obviously.

It's worse than having no trigger support.

I'd start with reading this one: https://grimoire.ca/mysql/choose-something-else

Thanks eddd, I would have probably linked to this also, it's easier to point to people who have done proper write-ups than write my own experiences as I think there's enough out there.

Postgres is like the Apple of RDBMS - just kidding.

so many problems with its internal design that hide problems until its too late

I always got the impression (since I started using it in Prod in the late 90s) that the MySQL team had never worked on an RDBMS before, even as users, and didn't really understand it. Back then they would say, you don't need foreign keys, you can just enforce consistency in your application, you don't need transactions, you can just handle it in your application, and so on and so on. Monty Widenius was very arrogant and thought he knew everything. Eventually they matured a bit and realized that actually, yes, the entire rest of the database community weren't idiots, maybe there is something to these features, but now they needed to find a way to retro-fit them onto what they had and now - 20 years later - they still haven't figured out how to smush it into their architecture.

Whereas the Postgres crew were fresh out of the Ingres project led by the genius Michael Stonebraker, they had a solid foundation and very clear vision and a culture of doing the right thing, not the easy thing. There's a steeper learning curve to Postgres but once you have learned a few things you can easily guess the rest because it's so consistent. For MySQL you really need a "guru" because there are so many dark corners and weird edge cases that make no sense, you just have to "know" them.

> I always got the impression (since I started using it in Prod in the late 90s) that the MySQL team had never worked on an RDBMS before, even as users, and didn't really understand it.

IIRC that's actually correct.

I think some of them might have been involved in miniSQL but it's a long time ago and my memory is hazy now :-)

I was working for an ISP at the time and responsible for a gTLD, my project was called DNSQL, it was as you can imagine a DNS server backed by MySQL so we could edit it with SQL statements rather than manipulating the zonefiles in Perl or sed... Relational constraints and transactions would have made my job a lot easier... A few years later I inherited a 2Tb MyISAM database, which was "fun".

In other words, MySQL is the PHP of RDBMSes.

They practically grew up together.

Here are some issues that I've seen becoming important when scaling to a higher traffic with both MySQL and PostgreSQL (we're running both for different kinds of databases). Some of this is second-hand from my colleagues, so it might not be exact:

* Thread-per-connection (as opposed to process-per-connection with PG) means much lower memory overhead with many concurrent connections; with PG you might need to use something like pgbouncer, which doesn't come without its own issues.

* Logical replication; the MySQL format is compatible between minor release versions (e.g. between all 5.x releases), which enables upgrading individual replicas or master without complete shutdown. With PG this often a becomes way more complex.

* Better caching/buffer management. Essentially, InnoDB has its own implementation of LRU cache while PG relies on kernel page cache. This generally allows finer tuning on the database configuration level and you don't have dive into kernel page cache tuning (which will be always OS-specific).

EDIT: most of this stuff is probably even better described in the Uber article, linked in other comments here.

A note on logical replication. That exists in pg as an add-on right now. And will be in base/core when pg10 ships.

Any ETA on the PG equivalent of mixed replication - row based whenever possible, logical when row-based doesn't work?

Logical replication has the problem that missed messages can have really nasty effects on your data down the road.

fwiw, all 3 of MySQL's binlog formats (statement-based, row-based, mixed) are still actually logical replication:

* Statement-based is logging raw write SQL

* Row-based is logical effects of executing that SQL

* Mixed uses statement-based for everything except statements that are non-deterministic or otherwise unsafe for statement-based replication

Physical replication in MySQL would be storage-engine dependent. For example, in InnoDB physical replication would mean shipping the InnoDB transaction logs rather than the separate binary logs. Alibaba built something to do this, and from what I understand Amazon may be doing something along these lines under the hood in RDS MySQL and/or RDS Aurora.

Uber switched from mysql to postgres in 2013:


but switched back to mysql in 2016 which generated a lot of discussion:


There were a lot of followup posts afterwards arguing both sides if you search for them:


There's​ an excellent analysis of the Uber's post here: http://use-the-index-luke.com/blog/2016-07-29/on-ubers-choic...

It's strange to me everyone touts Postgres but when you finally scale to a medium to large architecture HN has a handful of articles on great start-ups (now full fledge companies) switching FROM Postgres to MySQL. It's more efficient, atm, especially when accounting for labor. I wish the HN community would see that.

There are more MySQL experts out there than there are PostgreSQL experts (though the number is still probably under 10,000 worldwide). MySQL replication and scaling is a well understood solution, and thoroughly battle tested. There are dozens of good solutions out there for handling near-instantaneous failovers, using little more than the tools provided by MySQL.

It has a lot of problems, but when you need to scale, you can do so much more affordably with MySQL.

I thought the reason for migrating over was for MemSQL?

Here is a video "Real Time Analytics at UBER Scale": https://www.youtube.com/watch?v=XSXvPNEykb4

In Uber's case they wanted a dumb K/V store, not a full RDBMS. MySQL is a better choice here, InnoDB is a great K/V store, but PostgreSQL is a better relational database assuming you're using the features it provides.

Regarding InnoDB being a great K/V store, you made a similar claim yesterday, but you haven't really explained your rationale in either thread. Care to do so?

EDIT: I'm referring to https://news.ycombinator.com/item?id=14067406 ; my reply to that comment included more specific questions. Unsure of why receiving downvotes here; being somewhat of an InnoDB expert myself, I am legitimately curious why the parent keeps making assertions about InnoDB being a K/V store.

That's because InnoDB IS a key-value store, the physical on disk structure is such that rows are stored on disk by their primary key, and that's how MySQL is designed. Postresql's heap files are organized much differently, as such it performs worse in some cases like Uber's.

That's a clustered index, but it does not inherently make it a key/value store. It is common practice in SQL Server to use a clustered index for tables; would you also think of that as a key/value store? Most database people don't describe clustered indexes in this way.

Also, you're describing how InnoDB is designed, not how MySQL is designed. Furthermore, you can effectively get a heap in InnoDB if a table has no PK and no unique indexes. InnoDB will internally use a hidden auto-incremental value in this case for the primary lookup, which more or less effectively acts as a row pointer.

ITT: Many people bashing MySQL for the exact things that OP is referring to in "Many cons have been fixed."

MyISAM isn't used anymore. MySQL has ACID transactions and foreign keys. Get over it. The engine is still there for users who want to use it and that's not a reason to knock MySQL.

Almost every one of the correctness and SQL standard flaws in MySQL is handled by the sqlmode flag. It allows users to choose correct behavior, but doesn't suddenly break the millions of MySQL apps when they upgrade. To suddenly break millions of apps to be correct by default without sufficient time for apps to migrate makes no sense.

The fact is, MySQL is significantly faster for simple CRUD applications which is really common in today's REST and Microservice designs.

That being said, Postgres is an amazing RDBMS, and would probably become the #1 database server if administration tasks like SHOW DATABASES weren't accomplished through \l (or if \l where a shortcut).

Postgres users wonder why people use MySQL. They wonder why people use NoSQL when storing JSON in Postgres can be faster. The fact is, the learning curve of Postgres is needlessly higher than these other systems.

    > Postgres is an amazing RDBMS, and would probably
    > become the #1 database server if administration 
    > tasks like SHOW DATABASES weren't accomplished 
    > through \l 
I have a hard time believing that.

The layman administers MySQL through Wordpress installs, cpanel, and phpmyadmin where nobody even runs a single query against the database by hand.

Those are the things that made MySQL popular.

Not \list vs SHOW DATABASES. C'mon.

Product Manager for the MySQL Server here -

I would like to re-enforce a good point you mention: quirks can be a side-effect of making sure you have a good upgrade story. Sure; we would prefer not to have them, but you want a vendor which has a strong commitment to backwards compatibility too. There is a balance: and I think we've gotten better at it in the last 5 years. The default is now STRICT mode on (since 5.7/2015 and new configurations since 5.6/2013).

Upgrade is a feature too :-) Because of the rate of CVEs at the moment, it is important to keep within the vendor EOL policy. Because MySQL Replication is logical, most users do major-version upgrades without downtime.

Quirks are a function of compatibility, yes, but also of how bad your initial design was. Mysql having a lot of quirks doesn't meant it's better at compatibility, it means it just had a worse design to begin with and had to compensate.

> MySQL has ACID transactions and foreign keys. Get over it.

For anybody depending on MySQL transactions, be sure to keep its implicit commit behaviour[0] in mind. Don't assume that rolling back an active transaction will roll back all the stuff you've done since you called BEGIN. MySQL is still weird.

[0] https://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html

  > would probably become the #1 database server
  > if administration tasks like SHOW DATABASES weren't
  > accomplished through \l
If you type \?, then psql lists all of the commands, and it's not hard at all to remember the ones you use the most. For the ones you use less often, there's \?.

For the programmer who chafes at such terse command-line syntax, I imagine that they're the type of developer who wouldn't use psql at all, but instead one of the many graphical tools that list databases and so forth on start-up for you.

> The fact is, MySQL is significantly faster for simple CRUD applications which is really common in today's REST and Microservice designs.

Citation needed.

Currently we are using MariaDB for most database needs. The main reason we've continued to use MariaDB is for the multi-master replication. It is simple and easy to setup, we've designed our applications assuming there is always a write master, but aware of possible slag lag. So far having a multi-master setup is not as simple or robust in PostgresSQL.

We also use tokudb storage engine for a large number of servers. This engine is designed for datasets that do not fit into memory, and we've had a great experience with it. We are able to load CSV's in 1/3 to 1/5th the time of innodb. Most queries aggregate queries run faster, there is online index creation and column changes. From a DBA prospective tokudb prints detailed statuses in the show processlist! It is definitely a niche storage engine, and not right for all cases but this has contributed to our continued use of MariaDB of PostgreSQL.

With PostgreSQL 10 things are looking interesting. We tested out 9.6 with it's parallel query and saw some significant improvements on aggregation queries. It was not enough by itself to warrant use moving off of MariaDB but has me very excited going forward.

MariaDB is not sitting still, there are many improvements coming to 10.2 which should be released soon. For most applications I'd use whatever database you are familiar with. If you have a specific case, terabytes of data, millions of incoming writes, requirements for live aggregation, that is where the PostgreSQL vs MariaDB really gets exciting.

This engine is designed for datasets that do not fit into memory

Can you explain a bit more about why this is "a thing" please? Because pretty much every database does this, and has done since the 1970s.

For the last little while, it's been recommended to have enough RAM for your entire working set.[0]

While most engines will still work with frequent disk paging, performance can degrade rapidly. An engine designed for the paging can be smarter about performance, at the cost of performance on smaller datasets.

[0] http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_B...

MySQL works fine if you chiefly interact with your DB through an ORM or using basic queries.

Postgres shines if you need advanced functionality: recursive queries, lateral joins, window functions, foreign data wrappers, partial indexes, advanced types... The list of things MySQL doesn't do as well or at all, be it user facing or under the hood, is fairly long.

There are plenty of examples of both DBs scaling successfully in the wild.

IMO it's a coin toss decision for most CRUD apps. For more advanced apps, you might want to pick Postgres over MySQL for the same reason some companies prefer niche languages over Java or PHP: the pool of competent engineers is smaller but tends to be of higher quality.

> MySQL works fine if you chiefly interact with your DB through an ORM or using basic queries.

I ran into an issue [1] with MariaDB a year and a half ago: 'SELECT * FROM table' failed to return a large fraction of the rows of that table. Selecting them in a number of different queries with ranges of ids worked fine. Nothing fancy, no transactions, on an idle database. I might have been unlucky, but I will pick PostgreSQL over MySQL/MariaDB any day if I have the choice. Having had a cursory look at the MariaDB source code and a more in-depth look at the PostgreSQL source code confirms that preference.

[1] https://jira.mariadb.org/browse/MDEV-9347?page=com.atlassian...

MariaDB != MySQL

They're protocol compatible, but MariaDB has been drifting away from the core storage engines and optimizer for some time. For good and ill (I think you hit the latter).

Personally, I'd reframe this as MySQL being better than Postgres at OLTP workloads, and Postgres being better (much, much better) than MySQL at OLAP workloads.

It isn't really a question of how advanced your app is, unless your app completely relies on some specific advanced pg feature. There are areas where MySQL is substantially more advanced than Postgres, such as replication options, which in turn affects operational maintainability at scale.

> There are plenty of examples of both DBs scaling successfully in the wild.

This is true, but there are far, far more examples of ludicrous-scale MySQL than there are for pg. Compare the list of largest sites/apps using MySQL (see some of my comments from a few months ago) vs a similar list for pg and the discrepancy is quite noticeable.

MySQL has wider adoption historically so of course you will see more large site/apps using it because you have to be around awhile to get so large. I expect that to change because PG has improved replication and performance over the last few years and now a lot of companies are choosing PG for new projects. Those stats will likely change more in favor of PG in 5-10 years.

That's a very good point. I suspect it will depend a lot on what the largest PG users contribute back, in order to make it scale.

For MySQL, the largest users were (and still are) the biggest drivers of scalability-related features... and that's despite MySQL being far less of a community-driven open source effort than PG.

From my perspective, the biggest roadblocks for PG adoption at high-volume OLTP use-cases are its historical weakness in replication options, its process-per-conn model, and its lack of clustered index support.

PG replication is improving at a great rate, especially with the addition of logical replication, but the overall spread of options is still behind MySQL (re: multiple kinds of logical replication; multiple options for async vs semi-sync vs sync replication; large ecosystem of tools to interact with the replication stream).

The other two issues are a bit more fundamental. But given the number of excellent engineers working on PG, I'm sure that solutions will be developed in the near future.

Those statements echo many other "which of the two" I have read over the past eleven years or so.

Background on me; I abuse databases. I currently use pgsql as a k/v store with locking metadata because it's the only database technology that supports being forced onto disk (not just writing into VFS). It's also pushing huge row sizes (1MB)

My team spent many months trying to mangle MySQL into doing a similar thing (which it will do) but eventually we hit a performance stop with MySQL.

We did many things to improve the performance; switched to using fastsockets in the kernel, changed memory allocators (tried two different ones) but at 40cores and 8 pci-e ssds it was spin locking some function in memory.

Not to mention it's history with throwing data away.

Most people use MySQL because they know it already- and I would say that's fair comment. But getting started with PostgreSQL is easier and it scales better, so it's hard for me to agree with using it in 2017.

To get started with a random test db you can do:

mkdir database

pg_ctl -D database initdb

pg_ctl -D database start

psql -h database

I'm reminded of the (long winded) article that popped up on here some time ago about a person who was suggested to modify redis' source code to fix a problem they had: https://medium.com/@paydro/hacking-redis-to-save-money-64723... (slightly scathing discussion: https://news.ycombinator.com/item?id=10835950).

I wonder if redis can be convinced to flush data to disk. AFAIK it does it periodically?

If only SQLite was a tiiiny bit more scalable. It's the only DB where you have to explicitly disable force-flushing of transactions to disk.

Creating a test-db can be even simpler:

   $ createdb mytest
   $ psql mytest

This assumes that you already have a data directory and a running pgsql instance. Mine doesn't. :)

Simply having the binaries is enough- this is something that gives pgsql another edge on mysql.

That's why I said it can be simpler ;)

May I ask what company you are working for?

Ubisoft. :)

How about the performance? Is it a good fit to use postgres as a k/v store?

> But getting started with PostgreSQL is easier and it scales better, so it's hard for me to agree with using it in 2017.

Facebook and Youtube use MySQL, YT via Vitess, in 2017. I am just curious if you could exactly point out what part of PSQL scales better.

To say that Facebook uses MySQL is a bit of a misnomer. No one at Facebook outside of a hundred or so people on the db team uses MySQL directly, it is just an on-disk storage system with a basic SQL front-end that can be used as an intermediate translation layer and even then it was more of a legacy issue than anything inherent to MySQL as a technology. Please don't use it as an example of why someone should consider MySQL, as there is almost no way anyone else is going to be using it in a similar fashion.

Facebook has spent literally thousands of man-years working to make this system scale and perform in the way they need it to, but there is nothing about this effort that required MySQL and in fact a lot of effort was expended working around glaring deficiencies in MySQL. If they had it to do over again I think that no one at Facebook would select MySQL at this time.

Just about everything you've said here is factually incorrect.

There are far, far fewer than 100 people working on MySQL at Facebook. I say this as a former member of the team, and I am still in close contact with several team members.

Additionally, every engineer at FB has the ability to issue ad hoc SQL queries, and many teams do use special-case MySQL DBs that they interact with directly. I was the lead on FB's DBaaS so trust me that I know what I'm talking about here. Over 1000 engineers had used the DBaaS at the time I left the company.

There is no legacy issue involved. FB is changing storage engines in MySQL right now; they have the resources to completely move away from MySQL if there was a compelling reason. Your statement of "If they had it to do over again I think that no one at Facebook would select MySQL at this time" is completely incorrect.

Even if you ignore FB, a much larger percentage of the largest sites on the internet use MySQL than Postgres.

As a former manager on the ops side who worked closely with the managers of the db teams, and the hadoop/hbase teams, and the presto folks, wormhole, scuba, a few other teams that may not be public knowledge, etc. I can say with absolute certainty that the only people in the company who thought it was not a sad joke that we used MySQL down in the bottom layers of the stack were the people on the db team. And why wouldn't that team like MySQL? Knowledge of its intricacies and bugs and how to work around its problems is what got them this high-paying gig.

I will simply state that having also spent a long time inside FB I disagree with your assessment regarding the high regard in which you think MySQL is held within the company.

> I can say with absolute certainty that the only people in the company who thought it was not a sad joke

You can say with absolute certainty that you know the opinion of thousands of engineers at Facebook? This is clearly hyperbole. It does not appear you are interested in having a meaningful discussion on this topic.

> I disagree with your assessment regarding the high regard in which you think MySQL is held

Where did I say it's held in high regard? I simply said there is no compelling reason to move away from it, and also said your numbers were way off base in terms of how many people are working on MySQL at FB, as well as how many people directly interact with MySQL at FB.

I also greatly question how closely you were interacting with the managers of the db teams if you think there were 100 people on these teams!

[EDIT: It's also possible that we just have very different definitions of people "on the db team". For people "working on MySQL" I would only include MySQL Eng, DB Client, MySQL Infra PE, and the majority of Data Performance PE. Arguably you could include some RocksDB people at this point, though their work is used in many non-MySQL ways too, and even then that brings the total to still way under 100 people. Overall I would not include teams like Wormhole or TAO as "working on MySQL", though even if you add them in, still significantly under 100.]

In retrospect my tone should have been softer and less combative here. My apologies. But to give some perspective on why this hits a nerve: By greatly overstating the number of engineers who worked on something, you're directly undermining the effort and ingenuity of the engineers who worked on it. This stings coming from a former coworker.

In your second reply (sibling to this one) you seem to state that Facebook's db engineers are motivated by self-preservation (re: staying with MySQL), and you imply they are over-paid. If self-preservation was a motivation at all, FB wouldn't be migrating major data sets from InnoDB to MyRocks, which has a completely different set of performance characteristics and administrative/operational/automation concerns. As for pay, yes the db teams are very well-compensated because they are literally among the top experts in the field, working on an absolutely massive billion-daily-user OLTP system and all of the insane challenges that entails.

Put yourself in my shoes. You worked in security, yes? If I posted some factually incorrect comments about your team, and the engineering motivations and talents of your team, on a public forum like this, how would you react?

I actually started at FB by managing SRO prior to joining the security team, so I am quite sure that I was over-estimating my total number of people by including the teams in Dublin and probably everyone who needed to hop on to a udb directly for some reason. As you can imagine, from that starting-point in my introduction to MySQL at FB it was all about dealing with various problems and slapping down hundreds of db replication error and similar alarms every hour in the pre-FBAR/pre-db-automation world. I was also probably out of line with the 'it is about job security' bits, sorry about that. Actually, scratch 'probably' from that apology, but having passed through the ops side, infra eng, and security roles at FB I ended up seeing all of the problems and a lot of the solutions developed there seemed in my view to be rather expensive ways to fix problems that either didn't manifest in Postgres or would be easier to fix in a different db.

I still maintain that while MySQL was the right choice in 2006, if you were starting FB today it would be hard to convince someone that MySQL is a better choice than Postgres.

That makes sense, though MySQL certainly had a lot more warts before version 5.6. Respectfully, you may be inadvertently comparing MySQL of 5+ years ago to Postgres of today, which isn't a fair fight :)

> if you were starting FB today it would be hard to convince someone that MySQL is a better choice than Postgres

Not sure I agree. Seems like more of a toss-up at the very least. How would you find or build the right talent pool? With MySQL, Facebook was able to hire the top folks from both Google (which cancelled its MySQL efforts around the exact right time for FB to poach) and MySQL AB / Sun.

I also strongly suspect it would be substantially harder to build something like Wormhole on top of Postgres replication.

That said, I will include my usual disclaimer that Postgres has a lot of compelling features that MySQL lacks, and I can definitely see the other side of the argument.

> I also strongly suspect it would be substantially harder to build something like Wormhole on top of Postgres replication.

Why's that? It seems like there's all the infrastructure needed for that?

MySQL uses logical replication, and traditionally this is statement-based logical replication -- log of raw SQL write queries, including arbitrary SQL comments embedded, which can be used for any purpose. I'm not sure if Postgres has something similar? (entirely possible it does, if so I retract my earlier comment!)

Wormhole relied on smart propagation of MySQL binary logs iirc. Postgres lacks this and the WAL would probably not work in the same way.

Plus there's the whole MyRocks thing too...

The memory locking which I mentioned.

Arguing "x uses y therefore it's good" is a poor metric, I personally spent months with a lot of very expensive hardware finding out if we could go forward with mysql (there was a strong push for mysql), but I found PGSQL not having any of the same single server scaling issues (IE; all 40 cores could be uses, all memory could be used, all disk channels could be used) with no hacking of the kernel or memory allocators.

This doesn't mean anything.

At some point, it's easier to throw resources at your old stack to make it scale instead of migrating to something else.

Example: Facebook and PHP. They decided it was easier to write an entirely new runtime instead of migrating to something else.

Often, this results in significant improvements (MySQL and PHP are great examples).

If you are just looking for something to point an ORM at, you are probably fine with MySQL, you may see performance gains, and administration is probably a little more straightforward (and certainly more googlable).

If you spend actual time in the database using SQL, Postgres offers vastly more functionality, e.g. window functions, broader and more mature GIS support, more data modelling options. Postgres also certainly _can_ outperform MySQL in a variety of situations - it supports different indexing options (i.e. last I checked MySQL didn't have partial indexes), and is moving to more parallelism in its query plans. MySQL's partitioning support is a little more natural than Postgres's but that's changing in 10.0.

So, I would say, if you just want a back end for an app and have never _really_ cared about your RDMBS, use MySQL or whatever you're most familiar with. If you have complex query requirements, time series data, analytics workloads etc, try Postgres.

Don't use a relational DB if you deal with time series data. There's time series databases which are far, far better at that. InfluxDB is a good example.

But you're also making it sound like MySQL is easier to deal with than Postgres and I find that to be completely untrue. Postgres has far better documentation, sane defaults and only one engine whereas the mysql documentation is all over the place and "googlable information" conflicts depending on your engine or hell, even whether you're using MySQL or MariaDB.

In fact if you don't want to have to care about administration, I'll go as far as recommending Amazon RDS Postgres.

I think there are perhaps 'soft' and 'hard' requirements for time series data. I do a lot of analytics based on (mostly offline) data that is sequential, and I benefit greatly from clustered indexes and window functions for many ad-hoc queries. If you're talking inserting millions of readings per minute, then yes, more targeted solutions exist. I think moving to esoteric databases before exhausting what's possible with RDBMSs is generally suboptimal, but other people are smarter than me.

In my experience, the main benefit of influxdb are its built-in functions (percentiles, integration/derivation, etc.) but its scaling ability is over-emphazised : in my last company my coworkers have had issues exceeding 30k insertions per minutes, way less than «millions per minute».

How long ago were you at your last company? I wonder which version of InfluxDB your coworkers were using? There've been some significant performance improvements in v1.0.0 and beyond, compared to the earlier beta releases.

It was in 2015 and early 2016, before version 1.0. You're right that's an important piece of context I should have added.

I disagree about TS databases. It's only if you are dealing with huge amounts of monitoring data that you need a purpose specific store. The rest of the time, you might as well reap the benefits of the relational model.

When I was dealing with time series, the most common query was an 'as of' join. To the best of my knowledge, no common relational database supports this efficiently (which isn't surprising, as it is somewhat incompatible with the relational model in general)

That's a good counterpoint. For most purposes a `last ... over partition by ...` query will do the job, but you are right that it will be not nearly as efficient as the native operation in a timeseries database.

With that in mind, I don't come to a different conclusion though. Just about every database I've worked with has had some timeseries data in it, but separating data out into multiple systems/models is usually something I would avoid until there is no other option.

> Don't use a relational DB if you deal with time series data. There's time series databases which are far, far better at that. InfluxDB is a good example.

Better how? FWIW I'm currently running a 1TB time series DB on Postgres. It runs butter smooth, provides great performance (I'm using table inheritance), great querying tools, plus admin load is minimal.

Relational databases with columnstores (especially some of the distributed versions) are great at timeseries and often better than the actual TS databases by having full SQL query power and joins.

Don't use a relational DB if you deal with time series data. There's time series databases which are far, far better at that.

You're mixing your metaphors there. Vertica is a relational DB which deals with time series just fine...

Disclaimer - I worked professionally with MySQL for several years, and actually don't mind using it. I've also used PosgreSQL professionally, and don't mind using it either.

The best advice I can offer is, "Use what you're more comfortable with." Both can be configured on a single machine to serve a ton of traffic. Both can be backed up to recover when they falling over. Both can be configured to keep your data consistent and safe. Don't worry about scaling, don't worry about sharding or how much better indexing method A is than B. Worry about your application.

Now that the obvious is out of the way, I personally prefer MySQL over PostgreSQL. It will be easier to scale when the need arises (the corollary to "avoid premature optimization" is "design to make optimizations easier"), it performs like a madman when configured properly, and there are simply more people out there who can help you configure it properly.

Tangent time: yes, it is comparatively feature poor. I can't store native JSON documents in it, I can't run Python stored procedures, I can't create nearly the same breadth of indexes. Then again, I have never needed to. JSON maps nicely to objects, and an ORM maps those objects (somewhat less nicely) to native tables and columns. I don't run stored procedures. I use purpose built tools for handling (for example) full text indexing when the built-in searches get too slow.

My biggest question, when everyone points out how many features PosgreSQL offers, is "why are you doing that in the database?" Your database is your biggest chokepoint. It's your biggest point of failure. If you're using it for anything more than "transactionally store and retrieve data", you're asking for even more trouble when it goes down.

Not if, when.

It also makes scaling harder; if you're doing most of your computations in the DB, you have to scale the DB when one machine can no longer handle it. Scaling DBs is inherently harder than scaling just about any other component in your stack; you can't just add another server to the pool.

In summary, if you use your database as a database, and only as a database, either option will stand you in good stead. The biggest advantage MySQL offers at that point is that there are simply more experts out there who can help you with it when you need that help (and will probably cost you less professionally).

I want to rebuke a couple of points here:

> there are simply more people out there who can help you configure it properly.

There are also more people out there who can help you mess it up. MySQL has two engines and three different mainlines. Advice that applies to MariaDB doesn't necessarily apply to MySQL. Advice that applies to MyISAM doesn't necessarily apply to InnoDB.

A lot of people in this thread are underestimating how much of a cognitive burden that is. Postgres has one mainline, one engine, one set of very solid documentation and its community is relentless about strictness. This greatly improves the quality of help you will receive (be it through Google, documentation, or contractors) compared to MySQL.

> "why are you doing that in the database?"

I'm a poor person to answer this but a lot of your post really stems on this question. You don't see the need to do certain things in the db, thus it doesn't bother you that you can't do those things in MySQL. That makes sense. At any sort of load though doing certain things in the database is a requirement. You may have to scale the DB earlier than you otherwise would, but it also sometimes means that application-level code you would spin up 8 dedicated mapreduce instances for requires orders of magnitudes less work and you don't have to deal with scaling as early in the first place!

> My biggest question, when everyone points out how many features PosgreSQL offers, is "why are you doing that in the database?" Your database is your biggest chokepoint.

You shouldn't be doing any CPU intensive calculations in the database if you can avoid it, definitely. But these postgresql features are often about improved efficiency. Take a look at new things coming in postgresql 10[1]. All the new features in there are either performance neutral or big performance wins.

FWIW, I've spent the last few years fixing performance problems in PostgreSQL databases. The solution is most often to de-normalize/pre-compute something, which means using stored procedures and triggers, which are most definitely doing more than "transactionally store and retrieve data".

[1] https://rhaas.blogspot.com/2017/04/new-features-coming-in-po...

Postgres adhered to a philosophy of "make it correct, then make it fast".

MySQL adhered to a philosophy of "make it fast, then if someone complains enough, consider making it correct".

The result of this is predictable: Postgres is now fast and correct in its implementations, while MySQL is fast and burdened with a large number of "gotchas" and legacy incorrectness.

For example:

MySQL's traditional "fast" table type doesn't support transactions. So you have to select a different table type if you want transactions, but then when you do, you discover that you need to usually go "up" one level from what you'd expect, because the REPEATABLE READ level only affects SELECT, not INSERT, UPDATE or DELETE queries. Which in turn means you can have an inconsistency between the set of rows you'd see in a SELECT with a set of conditions, and the set of rows that would actually be affected if you issued an UPDATE with that same set of conditions. You have to go up one level to READ COMMITTED in order to start solving this.

MySQL's Unicode collations have some bugs; when I was at Mozilla, one we ran into was that we had to set up our own custom collation to get, for example, "e" and "é" to be considered distinct characters (if you ever used MDN, and once saw a strange bug where French versions of topic tags showed on an English article? That's why -- MySQL thought "CSS Reference" and "CSS Référence" were the same tag).

MySQL infamously accepted almost any kind of garbage values regardless of declared column type, accepted values larger than the declared column width (i.e., you could insert a 300-character string into a 256-character VARCHAR column), and would silently or near-silently mangle, truncate or just drop the data.

etc., etc.

I cannot at this point imagine using MySQL for any kind of serious project.

> MySQL's traditional "fast" table type doesn't support transactions

InnoDB is the traditional table type now. Myisam was forgotten long time ago.

> we had to set up our own custom collation to get, for example, "e" and "é" to be considered distinct characters

You could just use "varchar CHARACTER SET utf8 COLLATE utf8_bin"

> MySQL infamously accepted almost any kind of garbage values regardless of declared column type

There is a setting that allows strict checking.

> I cannot at this point imagine using MySQL for any kind of serious project.

It is used by many for serious projects.

In regards to charset and strict checking, these are settings which should be defaulted. I think the problem is that people run into these issues then have to seek solutions, which shouldn't be issues to begin with. I don't know why they aren't now defaults.

Pretty much. "By default we mangle and throw away your data, but there's a configuration setting to change that and make it actually store what you wanted to store" is not a useful way to do things.

Strict has been the default since MySQL 5.7. The default character set switches to utf8mb4 in MySQL 8.0.

That's great to hear.

I'm reminded of https://developers.slashdot.org/comments.pl?sid=204433&cid=1... as I read this exchange.

I found the above link in another HN thread some time ago. When I showed it to some PHP devs recently (I unfortunately don't remember exactly who I talked about it to) I was promptly told PHP does not do what is described in the above link (from 2006) and is generally smarter nowadays.

It's sad that languages and runtimes take so long to fix their core insanities (if ever), and it's also sad when systems are built without front-and-center regard to design correctness from the start. But on the flip side, the only people capable of envisioning good designs of this level of scale are often worn out shells of their former selves at the "you'd have to pay me 7 figures" stage... and of course you have to make sure you pick people who aren't completely out of touch with the latest developments in $category science...

> You could just use "varchar CHARACTER SET utf8 COLLATE utf8_bin"

You don't get UTF-8 if you do that though, do you? That was MySQL's first attempt at UTF-8 support. You have to tell MySQL to use utf8mb to get complete support for UTF-8.

This is not a technical reason, but MySQL is now owned by Oracle, which carries a whole host of risks if you're a small fry. A technical comparison should be done between MariaDB and Postgres if you ask me...

For example we have upgraded our MySQL 5.5 to 5.7 and we were hit by a performance issue in few GROUP BY queries. Then we decided to use 5.6 instead of 5.7. I think the Oracle corporation doesn't care much about MySQL.

Edit: In MySQL 5.5 and 5.6 it takes around 1 second but in 5.7 it takes around 70 seconds.

From MySQL 5.6 to MySQL 5.7, the cost model changes significantly. So while most queries will improve - there are edge cases where regressions occur. In anticipation of this occurring (even if you are better 99% of the time; there will be cases where two wrongs made a right), MySQL 5.7 also added new query hints.

I work on the MySQL team; we are 2x the size under Oracle, and have made many improvements. Please check out my list here: http://www.thecompletelistoffeatures.com/


My boss got a call from an Oracle "sales representative" or something about getting our licenses - and we only use the default MySQL that comes with AMI on AWS.

I think we concluded he had entered contact information into one of the forms while trying to download something.

Did Oracle say that your company must buy licenses to continue to use MySQL. Or they just selling some service/products?

IDK the exact wording but think those two options can be a bit hard to see when you have an eager sales representative on the line.

Note I'm not an DB expert. I was trying to evaluate Postgres and Mysql for my side project. I went ahead with Postgres after doing the following example:


create table test (name varchar(3));

insert into test (name) values ("hello world");

>Query affected 1 row // success!!

Even though my varchar size is 3, Mysql didn't even complain about my string size in insert statement. The weird thing is that, when you try to retrieve the value it returns only the first 3 chars which is "hel" (What??). I was expecting an error message but Mysql was happy to eat my error!


create table test (name varchar(3));

insert into test (name) values ("hello world");

>> Error, value too long for the given varying (3)

The Postgres looks doing simple things correct for me.

I've migrated from MySQL to Postgres to use CitusDB (for a company project, many years ago).

For my own startup which the primary use-case is analytics, I have moved back from Postgres to MemSQL.

The four reasons were :-

1) In-memory rowstore, thus eliminating my in-memory nosql dependency.

2) The ability to join in-memory tables with disk to provide up-to the second data is awesome.

3) Multiple read/writer group nodes.

4) The biggest for me. Columnar storage.

Some examples on how columnar storage is so awesome.

Note that I am comparing MariaDB vs MemSQL on a old desktop. 8 cores, 8 GB Ram, HD (non ssd) drive.

- A query with a count of 1 table with 50m rows.

MariaDB - 49.60 sec

MemSQL - 1.12 sec

- A query with a count, sum, where clause and group by, with 50m rows.

MariaDB - 17 min 44.84 sec

MemSQL - 8 seconds


Here are some things to be aware. Prior to migrating over to Citus. We were shopping for MySQL consultants to see whether they could improve on the above figures. They quoted $70-100k to work on the project, without guaranteeing results.

With Citus I was able to see far better results and this is why the company switched over. However fast forward to now, MemSQL beats Citus soundly. 1 desktop beats a cluster.

I wager, that with a MemSQL cluster with decent hardware on the cloud, I will be seeing sub 1 second queries across the board.

Oh and the amount of data? It's tens of billions of rows a month. That's the workload I'm talking about here.


This post isn't to bash postgres. I loved using it. If Postgres 11/12, included columnar storage. I would literally scream! :)

Comparing PostgreSQL with MemSQL is like comparing a steam locomotive with a steam ship.

PostgreSQL ensures on-disk data consistency for each transaction. MemSQL does not implement this kind of complex logic.

VoltDB and MemSQL are of totally different kind of products. There is the CMU database course which explains the details.

MemSQL has full persistence and supports transactions. There is async lag to fsync in-memory data to disk, which is configurable, but obviously this is a trade-off with RAM based operations like any other database. It runs as a distributed cluster so it's faster and safer than typical single-node (with replica) installations.

For primary operational datastores, it's probably not a fit, but it's great for real time data warehouse needs.

VoltDB is on the other end of the spectrum and more exotic, designed for purely real-time in-memory SQL transactions, almost similar to in-memory datagrids like Apache Ignite/GridGain or Hazelcast.

> There is async lag to fsync in-memory data to disk, which is configurable, but obviously this is a trade-off with RAM based operations like any other database.

As a VoltDB dev, I don't accept this. Performance and sync disk persistence for every transaction don't have to be mutually exclusive, they just require the right architecture and engineering work. We get our speed from memory-centric data structures, and by never waiting on disks/users in a transaction. Doing batch commits (fsyncs) to the WAL every few MS means you can keep your throughput very close to an async thoughput without trading any persistence.

I'm not sure how exotic VoltDB is, but I agree it's a different animal than MemSQL. VoltDB is focused on transactional and operational workloads, while MemSQL is more of a pure analytics play. It has "transactions", but with weaker guarantees all-around. We take our guarantees very seriously (see Jepsen https://aphyr.com/posts/331-jepsen-voltdb-6-3).

VoltDB does do "real-time analytics", which we typically define as analytics you can do in milliseconds. This leverages materialized views, special sorted indexes with ranking support, and lots of other tricks that MemSQL doesn't focus on. This enables our users to transactionally respond to an event as it happens in a few milliseconds, all while leveraging gigabytes or terabytes of state.

> Doing batch commits (fsyncs) to the WAL every few MS means you can keep your throughput very close to an async thoughput without trading any persistence.

This is what I said. There's still a lag, regardless of disk speed, batching, append-only parallel writes, etc. It's physics.

MemSQL and VoltDB and other in-memory db with persistence do it the same. Not much of an issue these days with distributed systems anyway since the network is faster and the data is already on another node.

1) There's no lag between VoltDB confirming to clients that a write has been committed and that write being on disk. Zero.

2) The way VoltDB does persistence is actually quite different than MemSQL. VoltDB is able to write deterministic logical ops before the work even starts, while Mem has to write binary logs after the work is finished. The difference in practical terms is dramatic and will show up in almost any benchmark.

> Note that I am comparing MariaDB vs MemSQL on a old desktop. 8 cores, 8 GB Ram, HD (non ssd) drive.

... so not really a good comparison then. You're comparing the speed of your 8GB of RAM to an old, spinning bit of metal on an old desktop. Anyone who would use such a benchmark to drive their tech choices is being rather silly.

You're missing the point entirely.

I'm showing that a columnar store beat out a rowstore. With insane results.

Not only that, but MemSQL on a really crappy desktop beat out a CitusDB cluster of 5 bare-metal servers, each with E5-2620 CPU v2, 128GB Ram, 10x4TB 7200 RPM SATA Drives, RAID 10.

I actually still have the communication with Citus and here's similar workloads they worked on for me:

Query 7: local 1139.0s, distributed 76.2s (~15x improvement)

I could dig up the query, but I'm pretty sure it's similar. But guess what. It took days of consulting and optimising with CitusDB and one of their engineers. With MemSQL I was getting insane results in a matter of hours.

Talking about costs. A MemSQL cluster is going to be significantly cheaper to run than a Citus/Postgres cluster when looking at tens of billions of rows. I know because I'm looking at the CitusDB numbers and the MemSQL numbers when using enterprise products and bare-metal servers.

We use MemSQL and yes it's great, it's really meant as a data warehouse built for speed and realtime inserts/updates and queries.

Postgres has CStore today but even with included columnstore, it still isn't distributed which is one of the issues. Citus is looking into it though so maybe it'll all come together in a few years.

And apparently ClickHouse out-performs MemSQL: https://clickhouse.yandex/benchmark.html

Has anyone here experience with ClickHouse?

No experience, but did a thorough read through of the docs. One thing to keep in mind about clickhouse is that their replication guarantees aren't very strong. From the docs: "There are no quorum writes. You can't write data with confirmation that it was received by more than one replica."

That's pretty troubling, but at least they're open about it. That said their performance claims are pretty spectacular, and it seems solidly engineered. Further if you're not planning on using replication it certainly seems interesting. I'd be curious to hear about someone's production experience as well, since the list of companies running it seems rather thin.

(ClickHouse dev here)

Yes, replication in ClickHouse is asynchronous by default. For intended use cases (OLAP queries aggregating data from many rows) data that is a few seconds stale is usually okay. In a serious production deployment you absolutely should enable replication, otherwise you risk losing all your data, not just last couple of seconds of inserts.

That said, sometimes synchronous replication is necessary despite the latency penalty that comes with it. This feature is actually implemented but not yet considered ready for prime time.

We have several years of production experience with ClickHouse (as a DBMS powering Yandex.Metrica - second largest web analytics system in the world). If you have questions - just ask.

We're using it internally, and also for a relatively small customer-facing analytics API, on the order of 10-30 nodes.

It's been good so far. Good support from the maintainers, too.

PostgreSQL: being written in the sane subset of C instead of being a pile of amateurish C++.

As a result of this design decision the resource management of each major part, like the buffer pool, the lock manager, etc is explicit and straightforward.

This, in turn, leads to indeed stable and predicable runtime behavior during high loads.

Basically, PostgreSQL is one of very few canonical examples of what sanity and attention to details could accomplish.

We just recently switched from MySQL to PostgreSQL. God we're so glad:

- add a null column to a big table => have fun and plan your maintenance accordingly. Good luck estimating it. This is basically a no-op in Postgres

- partial indices; absolutely important for big tables. Think about ticketing systems where in general the number of "open tickets" is much lower than the closed ones

- being able to use advanced SQL features like window functions can easily be underestimated; you can't use potential you don't have

- renaming indexes, just like that. If you're into refactoring and lessen developer surprises, you want to have proper names and fix accidently mistakes or just legacy stuff

- in my experience (but I'm no DBA), the logging of PostgresSQL is much more useful then what I get with MySQL (e.g. deadlocks or slow logs)

- the EXPLAIN (ANALYZE, BUFFERS) output is ... what you get is beyond comparison to MySQL

The only technical downside we experienced with Postgres (but in comparison, these things are benign):

- limit of length of identifiers; even with aliases. This can trip ORMs which try to do some clever naming under the hood

- you cannot change the order of columns, you can only append at the end

To me, everything I do with Postgres feels much more predictable, whereas with MySQL it always felt a game of gamble.

All the talk here and everywhere have convinced me that:

- MySql is the choice for niche workloads and to be used by experts not only in DB but in MySql Quirks.

- PG is the sane choice for everyone. Experts benefit more, but inexperienced developers will be better served from a more solid foundation, and still PG is so good that will be along your side for years to come.

PG maybe will not shine in some niche workloads but we are talking about problems that most DO NOT HAVE.


Years before, I have read (and believed!) that "PHP, MySql, JS" was "better" for inexperienced (or new) developers, because are more "easy" to learn and use.

Then eventually is show how facebook and others used them.

However, after +18 years on the field, I become more and more convinced that is totally the reverse:

- PHP, JS, MySql, etc are tools for experts.

- More strict tools are better for everyone.

- Only expert will use correctly any tool, even if have quirks.

- This is even more notorious if the tool is more dynamic than static, or provide more "freedom"/less "safety", like C VS. Pascal.

- More well designed or strict tools provide equal benefits to experts, but sometimes a tool with not safety-protection allow to do hacky things easier (or possible).

It's funny to me because on one hand you have specialization where larger companies will typically abstract database work to the DBAs and application work to the developers creating some sane interface for interop and edge cases. And then on the other hand you have things like "full stack developers" who leverage new tech like kubernetes to essentially stay "that one guy who does everything".

"Jack of all trades master of none" and all that. What's the natural conclusion to this or does it stay this way forever?

The other day I saw a Google engineer with a shirt that read "MySQL is the Windows ME of databases".

I've seen online far more PostgreSQL fans bashing MySQL than the other way around.

It's like there's an online task force to make MySQL look like shit.

And for most cases it won't matter and one should pick what they're more comfortable with. And if the requirements aren't that simple, people should do their own research.

> It's like there's an online task force to make MySQL look like shit.

Yes, we call them the MySQL development team. For a long time MySQL was simply the default choice for many projects, and this status has persisted long after it was the best choice and is extending into the period where it is no longer even a reasonable choice. If you have a specific reason to use MySQL then go ahead and use it (and 'it is what I know and am comfortable with' is a reasonable justification) but there are few reasons to suggest it to someone else except for the fact that misery loves company.

That's not what I was talking about. I stand by my comment, and you just made my point stronger.

I stand by mine as well. There is no online posse trying to destroy MySQL, just people who have actually used it and been burned by it who have decided to warn others.

It is a mediocre at best technology when there are better options available, and anyone who suggest it to someone else for a greenfield project should be considered an untrustworthy source of technical advice.

Maybe I'm wrong, but last time I checked, adding a column to an existing (big) table in MySQl locked the table for a long time (like hours), and the same operation in posgress was finished in almost no time

This has changed in many cases (though not all): https://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-...

There are also a number of tools to work around it for other for other cases: https://githubengineering.com/gh-ost-github-s-online-migrati... https://www.percona.com/doc/percona-toolkit/2.1/pt-online-sc...

I have used PostgreSQL extensively before but now I just use MySQL. It is easy to install, maintain and use - even on a Windows box.

Basically it gives me everything I need in a relational database and let's me worry about other things.

Also I don't use non-standard SQL features so the criterias for me are really familiarity, ease-of-use, speed and here MySQL is certainly good enough.

You don't use LIMIT? Really?

Would you say learning SQL (or relational databases in general) with mySQL is easier than starting out with PostgreSQL?

I have learned SQL with Postgres, provided by my university, and found the "explain analyze" and "explain cost" to be a god-send :-) Especially when I was trying to learn the impact of various things on the query-execution time.

When I used MYSQL, I found its EXPLAIN counterpart much more rudimentary.

But if I were to teach somebody SQL, I would probably just use sqlite because it just uses a simple binary file as a backend and doesn't need a running process :)

I am not really OPS or DB guy, so I can't comment on meirts of the databases in production.

Yes - it seems that MySQL is easier to set started with and there is more information available online. Also a lot of cheap hosting uses MySQL which is plus for hobby projects.

More like, a lot of garbage outdated tutorials available online. Copy and paste to get SQL injection vulnerabilities! Easy!

Seriously, what is "not easy" about Postgres? Just read the official documentation, which is actually readable and good.

About hobby hosting… there is Heroku Postgres, but no Heroku MySQL. (There are third party addons for everything, yeah). Cheap shared Apache hosting is TERRIBLE. It should NOT be used for any new projects.

Yeah. Popularity and the ability to work at cheap horrible apache hosting companies all the way to big bad facebook does speak volumes, doesn't it? :-)

I happened to be talking about this with a couple of friends just moments ago. Their opinion basically amounted to this:

* MySQL is a really awful database. It's usable only for small hobby projects, like a simple Wordpress site or so. For serious datasets, it's a horror. Complex queries on large amounts of data are just broken. Many features don't work or work incorrectly. The design is fundamentally flawed.

* Use PostgreSQL instead.

* If you absolutely have to use MySQL, use MariaDB instead. It's a fork that fixed some of the issues.

* Still better use PostgreSQL. Or even Oracle or SQL Server. They're fine, MySQL is not.

It could be interesting to ask them what are they basing their opinion on and if they can enumerate all the advantages of one database over the other.

> use MariaDB instead. It's a fork that fixed some of the issues.

Which ones exactly? ;)

Check their changelog, but one example I've heard is that a simple 15 minutes upgrade to MariaDB improved performance for a particular query by a factor 5.

Is there something like TokuDB or MyRocks for PostgreSQL? These things allow to reduce disk space usage by several times.

I guess you could get part of the benefit by using e.g. TokuDB via FDW: https://wiki.postgresql.org/wiki/Foreign_data_wrappers

This kind of question is meaningless in my opinion without the workload being defined. For example, is this for transaction processing like for banking or web programming of social networking like with Facebook?

One also needs to consider the entire landscape. For example, the LAMP stack is mature community with out-of-the-box, ready-to-ship web sites. There exists lots of support and tools. Software stacks like Django for Python typically have community involvement for a particular database. If one is using an ORM like SQLAlchemy, what is SQLAlchemy best used with?

Here are just some of the things that should be qualified when asking database choice:. 1. Transaction support. 2. Scale out support. 3. NoSQL and other data store integration. 4. Cloud support if deploying in a cloud. Really depends on your cloud vendor. 5. Location, location, location. Is this an internal application on your company LAN, or a social networking application to be installed around the world? Replication in MySQL quite useful. 6. SQLite? Why did you exclude SQLite? I'm assuming because the person posting this question already knows something about the workload that defeats SQLite? What is that workload? Testing against SQLite is trivial, just create an in memory database. 7. ETL support. Are bulk updates due to ETL processing of files and such common place? Bulk updates happen better when indexes are not attached, etc. 8. Reporting support. 9. Denomalized versus normalized support. It is common place in web applications with data distributed applications around the world to shard a database table, completely denormalized.

The point is that designing a database schema is heavily dependent on the expected workload. The choice of SQLite, PostesSQL , MySQL, SQL server, Oracle or what have you depends heavily on the workload. It is nonsensical to ask what the right tool for the job is without knowing the job.

Mysql/Innodb: index organized table Pg: heap table

depending upon workload this can be significant.


This is a big thing I miss from SQL Server is clustered indexes. SQL server gives you the choice so you can go either way depending on situation.

Was actually kinda of surprised after using SQL server for years then looking at Pg that it didn't have IOT at all. Thought everyone had that option.

Postgres. The main reason I enjoy Postgres is because it is more standards compliant than MySQL, which makes it a bit easier to find documentation because it doesn't need to be specific to Postgres and easier to port to other standards-compliant RDBMSes. I have had a much easier time scaling Postgres with master-slave replication (it's only just come out in version 9 though). Also partitioning in Postgres is super-easy. I don't know how MySQL fairs these days, but Postres is completely transaction ACID compliant. I find user management much easier in Postgres. You create the user and then specify where they are allowed to connect from in a file. MySQL uses a separate user for each host pattern they want to connect from. permalinkembedsavegive gold

I forgot to comment when this question was only a couple hours old :( but hopefully someone sees this anyway.

MySQL has phpMyAdmin, which is (compared to some of the tools out there) incredibly basic (and ever so slightly buggy), but really helpful for getting started and providing a barebones "look at the database" interface. (In all fairness, it does have some pretty decent features.)

pgPhpAdmin... isn't really the same. In my case I couldn't even get it working - I don't remember what errors I got but as a complete beginner I wasn't sure how to respond (I think the errors I was seeing weren't giving any hits on google either).

So, I'm curious what other web-based or Linux-compatible postgres administration tools are out there?

pgAdmin (https://www.pgadmin.org/), of course.

btw, if you want to work with mysql you might want to use mysql workbench instead of phpmyadmin.

Wow, thanks so much!

pgAdmin looks great, and TIL about MySQL Workbench, which looks awesome (wish there was a PostgreSQL Workbench! :P) - this will be great for learning.

Databases are one of those things I know I badly need to learn more about, this is great start. Thanks again.

For a performance: Check the TechEmpower FrameworkBenchmarks - Data updates - check the 'DB' columns

- https://www.techempower.com/benchmarks/#section=data-r13&hw=...

-> Pg - Pg - Pg - Mo - Pg - Pg - Pg - Pg - My - My - My - My


Round14 - Preliminary data!


-> Pg - Pg - Pg - Pg - Pg - My - My - Pg - Pg - Pg - Ny ....

I don't think that's useful data. That could just as well be the code is more optimized for postgres, or the postgres client library in the language being used is more performant for some reason.

It's useful for comparing full stacks, but the queries used are very basic, and most of the update benchmarks don't use transactions or locking.

The 2 pro things that impressed me on both platforms from recent builds are probably under utilized: On PostgreSQL: Logical decoders (https://www.postgresql.org/docs/9.4/static/logicaldecoding.h...) and on MariaDB, the CONNECT engine (https://mariadb.com/kb/en/mariadb/connect/)

Using Postgres in Docker (for development of course!) is interestingly challenging, if you have a Windows host - due to the architecture. All our devs are working with macOS or Linux now.

Using Drupal, MySQL (or MariaDB) simply works much smoother than PostGres. That is because apparently Drupal has been designed around MySQL. You may be quite sure that if you are having a problem with Drupal on PostGres, that the same problem doesn't exist when using MySQL.

In 2011, we started out using Drupal with PostGres. After a year or two, we migrated to MySQL.

I think it's likely that the situation is quite similar with WordPress.

MySQL has BIG problems with views. If the view statement contains aggregate functions, distinct, group by and some others constructs, the results from the view are retrieved into a temporary table, which then is used to execute the statement. If involved tables contain tens of thousands of records, this behaviour will hinder performance. In PostgreSQL you don't have to worry.

> Many cons have been fixed.

it would be easier if you listed what you mean by "cons". Postgres is much much more powerful than MySQL if you are really into SQL and exploiting an RDBMS, MySQL administration is simpler.

This question is a bit too vague to warrant a specific answer.

> Postgres is much much more powerful than MySQL

I would say this used to be true in 5.5 and before but it's less and less of an argument. 5.6 and 5.7 bring a lot of features like JSON or geospatial. Here's a list of the new feature of 5.7:


I've been looking at MySQL's JSON and it's still pretty incomplete compared to PostgreSQL. It's still far superior to SQLServer 2016's abysmal offering of JSON support tho.

(Want to be able to support Marten document store on more than 1 database)

> MySQL administration is simpler


You have more GUI administration tool options with MySQL. For example, most people are used to PHPMyAdmin from the shared host era, and it has usually been "good enough".

I'm not sure how that's any different from pgAdmin, DataGrip, phpPgAdmin, and the many other tools out there for managing PostgreSQl.

For me, a key advantage of MySQL is Sequel Pro. I've never found a GUI for postgresql that can touch it.

If there was one, I will switch in a heartbeat as I will love to have access to some more advanced SQL than what's available in MySQL (like window functions)

I haven't used Sequel Pro, so please forgive me if there are some very advanced features beyond solid DBMS DML and DDL.

I started using Jetbean's DataGrip product last year when they released it. It is built on the IntelliJ IDEA tech stack and provides a fantastic multi-DBMS platform that has everything I need not only for Postgres but MSSQL and others.

If you use IntelliJ, you're already using DataGrip - it's the "Database" plugin.

I was about to ask this question on whats the difference between default IntelliJ database plugin and DataGrip. Thanks for your comment on this.

It's mostly the same, DataGrap is a bit easier/more obvious to use but the features are identical from what I can tell.

Try Valentina Studio - http://valentina-db.com/en/valentina-studio-overview. A lots of people say it is best PotgreSQL GUI Manager. May be mySQL also :)

You should give DataGrip a try. It's a new database IDE by Jetbrains that supports all the major databases. https://www.jetbrains.com/datagrip/

You should give http://dbeaver.jkiss.org/ a try.

Navicat has been around for several years, and you can even build queries using drag and drop. It's pretty feature complete.

I used to like Sequel Pro, but I've come to LOVE Postico (for Postgres) and I absolutely hate sequel pro now when I have to use it... and this is coming from someone who was in the same boat (I loved sequal pro and nothing even came close)

While not as clean and simple as Sequel Pro, I've found that Navicat Premium is pretty good. Especially since I use both MySQL and PostgreSQL.

It can also manage SQLite and a few others.

Agreed, Sequel Pro is far above anything I've seen in Postgres and is the primary reason we are still using MySQL in production - business team just love Sequel Pro too much

Try compare to Valentina Studio. It can do tons things which Sequel Pro do not have.

Just downloaded and will investigate - thanks!

You primary reason to pick a production DB server is the availability of a GUI client? Wow.

We didn't select MySQL on the basis of having SequelPro obviously, that would be insane. However if you are still using MySQL and want to switch over to PostgreSQL (like us) a major reason not to be not having SequelPro. Most other factors about PostgreSQL are better but just not enough for the effort to migrate to be worthwhile.

I really wish there was a GUI that could handle both Mysql and Postgresql, was cross-platform and worked as well as Sequel Pro with a similar polished UI.

Try Navicat Premium. It can handle MySql, Postgres, Sqlite, MSSQL and Oracle.

TOAD for MySQL is also free for those who like it and run Windows.

From my perspective MySQL does secondary index is much better than PostgreSQL. Uber found that out the hard way.

PostgreSQL has implementation of almost every possible indexing technique, unlike MySQL. It has been academic project (of great scholars) after all.

Except clustered indexes.

There are tradeoffs with both design, I know that I use secondary indexes in quite a lot of my queries, so the way Postgres handles them are much better in my case... but in someone else's use case (Uber) it could be terrible for performance.


It seems like Everytime on HNs people say postgre is so much better. Why is MySQL so much more popular?

Legacy systems, tooling + operational experience, better clustering (for now).

PostgreSQL wins at everything else and the gap is closing fast.

The largest MySQL-based companies have the resources to switch to another database if there was a compelling reason, but they do not do so. The db fleet sizes of these companies also dwarfs that of the largest companies using Postgres. The "legacy systems" argument is faulty logic.

Facebook is in the process of changing MySQL storage engines (from InnoDB to MyRocks), which is a massive migration involving a large number of engineers, yet they're still staying with MySQL overall because it's the best fit for their OLTP workload. Since pg doesn't even support pluggable storage engines, it would be a total non-starter, among many other reasons.

Thanks for the insights!

I'm actually using MySQL (MariaDB, actually) in production because it does work fine, does all we need in an OLTP data store, ops tooling is very good and we need reliable multi-master (Galera) for load balancing.

I'm not an expert on databases, but I've heard so many DBAs tell me how PostgreSQL is superior with regards to features + correctness (features that I don't actually need for our use case, but still) except for scaling out.

What's your take on this? If you were to start on a green field, what database would you recommend for a typical SaaS/OLTP workload?

I'd say it depends on your team's level of knowledge on various DB's, the expertise available in your geographic area, and the expected size that your data set would grow to.

I'm not sure there is a "typical" workload in the general case ;)

Don't get me wrong; Postgres is a really really good piece of software with many appealing qualities. But its fanbase is a bit overly rabid, or at least over-eager to trash MySQL for things that haven't been true in years or never were true in the first place. There are some very valid MySQL complaints mixed in there too, but it can take some effort to separate them out...


Our team is definitely going to stick with MySQL (no reason not to) and I have no reason to hate it, but I always like these sort of comparisons since you learn a lot about the technology.

I'm really interested in PostgreSQL for "real time OLAP" use cases with things like Citus - apparently, many companies use Citus in particular to run distributed analytics queries on their main database.

Hmm. Have you seen any wholesale feature comparisons between the two that you'd consider objective and fair, and which debunk old myths in the process?

Unfortunately I haven't. People who are equally well-versed on both databases seem to be the rarest of unicorns.

Postgre has been around for a very long time too so why did postgre never catch on?

Because MySQL looks easier for people who have never done SQL, which allowed to be the M in the LAMP stack and the only suported db for Wordpress.

I had little experience with Oracle SQL at school, and MySQL's bad design became very apparent to me, even when I was a junior dev. I'm afraid devs are repelled from SQL because all they've seen is MySQL. Postgres has everything Oracle has, with an even better design, and minus the price.

I am a big supporter of all open source and would choose pg over oracle for most projects. Oracle has many things pg does not

While PostgreSQL was finding its legs, MySQL was offering a complete solution. PosgreSQL was also missing a replication story for quite awhile, while MySQL was allowing for simple and native replication when it was needed.

PosgreSQL now has its legs, and has a native replication story (incomplete, but improving), but MySQL has a lot more intertia, and has also been improving in that same time. As a result, MySQL just has more (and thus less expensive) experts, which makes a big difference when picking a DB for your company.

PostgreSQL will always be technically superior, and it is a single product.

MySQL is MySQL or MariaDB.

Pros and Cons depend largely on what you're hoping to use your DB for and how you intend to deploy/administer it. Feature-wise, Postgres has a break-neck speed of new feature adoption compared to MySQL. That could be a Pro or a Con depending on your perspective. If you're dying to try out deep indexing of JSON columns in your SQL DB, then you definitely want Postgres. If you need something stable and reliable, then MySQL might be a better choice.

Not to be trite, but I think the biggest high-level Pro/Con comes down to this: Postgres has the more active OSS community, MySQL has Percona. Seriously, if you're in a situation where you have the option of spending money to solve problems, then having the option of hiring Percona to fix your problems alone makes MySQL a sound decision. If you don't have a large cash reserve, but are willing to spend time combing through message boards and blog posts, pick Postgres.

If you need something stable and reliable, then MySQL might be a better choice.

In my experience, this is unfounded. The Postgresql project is far more careful about data corruption and correctness. A few examples:

    Mysql used to truncate records silently (I think this was fixed a while back with strict mode)
    Mysql used to let you insert nonsense like 'notadate' into dates (now fixed I think with strict mode, is it in default?)
    Data corruption - I've seen data corruption in Mysql when a server was rebooted which led to the server being unable to read the database data, and required a repair before it would restart. I've never seen something similar with Postgresql.
Postgresql have added features at a steady rate, they've been quite measured in their approach to new features, particularly in point releases. 'Break-neck' is not a good characterisation of that process.

For stable and reliable I'd recommend Postgresql over Mysql without hesitation.

"you need something stable and reliable, then MySQL might be a better choice."

Thats not what I have heard of and experienced with postgreSQL. Its feature set isn't stable in the sense that they add new features all the time, but it is stable in the sense that features that are present stay. They carefully consider features before committing to adding them (example: upset support)

I also haven't heard of unreliability claims w.r.t. PostgreSQL, certainly not more than with MySQL.

Also, MySQL's deviations from the SQL standard (silently chopping data of long strings to fit them in the database, for example) for me are a cause for concern.

You seem to imply that Postgres isn't stable when in the fact the very opposite is true. Postgres is one of, if not the, most safe and stable relation database systems to ever exist. The fact that they keep adding or improving features does not take away from this, it only means that you have the option to upgrade if you want the new features.

> Postgres is one of, if not the, most safe and stable relation database systems to ever exist.

I would even say that it's one of the most safe and stable software to ever exist.

You're trying to be even handed but you can hire Enterprise DB or 2nd Quadrant to fix Postgres issues if you run into them so I don't think the benefit for MySQL is there.

The benefit of MySQL is that partitioning the database is much easier to do than in Postgres. There are also cons to mysql in how it handles data by default (silent coercion, not checking for invalid data like invalid dates, strings too long get truncated, etc.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact