PostgreSQL is much better at using multiple cores, but it doesn't matter (at lea...

sounddust · on Jan 7, 2009

PostgreSQL is much better at using multiple cores, and it absolutely does matter for web applications.

When you have 1,000 people who all want to load a page at the same time, you're only going to hit the database once for that.

What you will have instead is 5,000 people who want to go to 1,000 different pages, some of them updating data and some of them retrieving it, and you will have 1,000 different queries that are either selects or updates.

Now, PostgreSQL was designed from the start to be robust enough to handle this type of concurrency - to allow the minimum amount of data to be locked while updating, to free the selects from being blocked by those writes. As a result of being built on such a solid foundation, it has been easy to optimize over the years, and has supported the emergence of commmon multi-core computing quite well.

MySQL, on the other hand, was designed to return queries extremely quickly, with as little getting in the way of doing so as possible. Unfortunately, their model did not scale as well. When you have 500 reads and 1 write, and you lock the entire table for that write, it's not a big deal because the write takes no time and you're unblocked again. But when you have an array of hundreds of simultaneous reads and writes, it becomes a mess if you don't have fine-grained locking capabilities. That having been said, MySQL is progressing as well.

But I think (not trying to start a flame war) that PostgreSQL did things slow and right in the beginning and it's paying off pretty well.

Finally, I disagree that databases are not CPU constrained. Any website with decent traffic will have the database entirely cached in RAM at all times, with the only disk activity being the journaling.

mdasen · on Jan 7, 2009

PostgreSQL was designed "correct" from the start. However, MySQL does many of the same things today.

You start by saying you'll have 5,000 people who want 1,000 different pages with some updates, some selects, etc. Well, once you have more queries than cores, the benefit of splitting work over multiple cores is gone since none of the queries can logically use even a whole single core given the level of concurrency.

You go on to say that PostgreSQL was designed to eliminate unnecessary locking so that selects wouldn't be blocked. MySQL's MyISAM database doesn't support this, that is correct. However, InnoDB does support the same MVCC model that PostgreSQL uses and likewise eliminates the locking issue.

Your argument rests on MySQL locking an entire table to do a write: "When you have 500 reads and 1 write, and you lock the entire table for that write, it's not a big deal. . . But when you have an array of hundreds of simultaneous reads and writes, it becomes a mess if you don't have fine-grained locking capabilities." A great article on this problem is here: http://www.onlamp.com/pub/a/onlamp/2001/05/25/postgresql_mvc.... It's great that PostgreSQL supports that, but MySQL supports it too today (http://wiki.oracle.com/page/InnoDB?t=anon).

There are plenty of things that one can say are issues between the two. MySQL's inferior query planner. The fact that PostgreSQL can suggest indexes based on query history. PostgreSQL's weak replication (Slony-I's communication costs grow quadratically, yuck!). MySQL's acceptance of February 31st as a real date. MySQL's poor subquery optimization. PostgreSQL's more limited data partitioning.

They aren't equal in all ways and PostgreSQL is a wonderful database, but saying that MySQL needs to lock a table to do a write is just wrong in a very partisan manner. I've never really understood such partisanship. Knowing the strengths and weaknesses of multiple products makes you aware of what is good for a project and what isn't. Even better, once you're using one of them, you know what to do and what to avoid with it. Not confronting the reality of how alternative systems work just means that the chance of picking the best system is more luck than information. For what it's worth, I use PostgreSQL in my personal projects. It's great. However, it's also important to understand that MySQL of 2008 is not MySQL of 2001. It's come a long way in the "correctness" camp and the old arguments about Multi-Version Concurrency Control don't apply anymore.

Oh, and from Power PostgreSQL, Disks > RAM > CPU (http://www.powerpostgresql.com/PerfList).

sounddust · on Jan 7, 2009

1) I understand that MySQL (with InnoDB) supports row-level locking today. I was speaking of both databases as they existed in the past, to express how much easier it was for PostgreSQL to optimise and scale over the past decade.

2) I am not trying to slam MySQL. But seven years ago, we had one database which was not ACID compliant and had inconsistent behavior, but was very fast. And we had another which was designed and built properly from the ground up - with the future in mind - but was quite slow. There were advantages and disadvantages to each one. But over the past seven years, PostgreSQL has had time to optimise and stabilize code, and as a result it performs just as well as MySQL in most cases. Whereas in the past seven years, MySQL has worked to implement the essential features that allow it to be a robust database. If you were starting a website in 2009, why would you pick MySQL? I could easily understand why in 2001, but not now.

3) My point is that the "Disks > RAM > CPU" argument is no longer valid. The point of that statement was that you should spend your money on disks first, then RAM, and focus less on CPUs. This is not true in 2009! Now (for web apps), you can easily buy enough RAM to cache your entire database. All of a sudden, RAM and Disks are no longer an issue and your DB becomes CPU limited. All that discussion about "more spindles = better" and "raid 1+0 > raid 5" is not very important anymore; you just need a disk that's fast enough to log your transactions.

mdasen · on Jan 7, 2009

Well, I personally choose PostgreSQL for my personal stuff, but your logic doesn't hold. PostgreSQL was slow, but correct. Now it is fast and correct. MySQL was fast, but wrong. Now it is fast and correct. Most of the reasoning behind picking one or the other has disappeared and we're left scraping the bottom of the nitpick barrel trying to convince people to use one over the other.

As for why someone would choose MySQL, there are a bunch of reasons. There are a lot more people with MySQL experience out there. MySQL has better replication facilities (and I've set up replication with MySQL, Slony-I and PgPool-II). I'd really like to see Mammoth Replicator become the standard in the PostgreSQL community (as well as for 1.8 to be out of beta) as I think it's a considerably better replication design than the other options in PostgreSQL, but right now MySQL replication looks a lot better. Maybe you have a good use for one of MySQL's less used storage engines. MySQL Cluster looks interesting, but I wouldn't trust my data to it today (even if Zillow seems to think it's the best thing since sliced bread).

The differences between the two are really minor today. Choose whichever one you like, but there are definite reasons to choose either one.

Retric · on Jan 7, 2009

Your ignoring the issue of locking, which can effectively transform your multi core CPU in a single core. Secondly databases can be limited depending on what your asking them to do not just what type of work your doing. IMO the largest advantage to multi core CPU's is handing a longer list of requests the the disk controller so it can better optimize the the head's path.