Searching finds surprisingly outdated information.. Lots of articles say things like "For simple read-heavy operations, PostgreSQL can be an over-kill and might appear less performant than the counterparts, such as MySQL." but don't really back them up... 
One of the things that I think would be interesting to show, is how the consistency of performance has improved over time too. There has been a lot of focus on removing any stalls in recent versions of InnoDB, and this something that is incredibly valuable doesn't always show up in benchmarks :( I assume that PostgreSQL has gone through the same evolution in managing garbage collection, etc.
Peter Boros @ Percona has done some nice benchmark visualization, for example:
Slides on how to produce these graphs:
(Disclaimer: I work on the MySQL team.)
The other part of the story are filesystems - back in 8.x days the filesystem was usually ext3, and that handled sync pretty badly (and that made the impact of checkpoints / autovacuum much worse). Today we have xfs/ext4 which do handle that much better.
The LRU is basically the shared_buffers, and that's where most of the 9.2 improvements were (more or less).
As soon as you get into anything fancy, pg eats mysql for lunch. The query optimizer is MUCH smarter.
So, with joins, MySQL uses loop joins. These are often expensive compared to hash or merge joins. PostgreSQL's query planner can choose which of the three join strategies will have the lowest estimated cost. I believe MariaDB has implemented hash joins. I'm sure there are cool visualizations of the different join types that you could watch in action and see how they'd perform differently.
MySQL (with InnoDB) uses a clustered (index-oriented) table layout. Basically, it keeps the table in a B-tree and so querying by the primary key becomes nice and fast since the physical data layout of the table is in that lookup order and successive rows are likely to be in the same disk block. But this means that if you're looking up something in a secondary index, you then have to traverse the primary index (going through two trees). Databases like SQL Server and Oracle let you choose your table layout, but MySQL and PostgreSQL only support one (MySQL giving you clustered and PostgreSQL giving you heap).
MySQL has supported covering indexes for a while, but PostgreSQL has only added this capability recently. A covering index means that you don't have to read data from the underlying table if you only need columns that are included in the index (since all of the data is in the index).
Ultimately with databases, the key is limiting the amount of time wasted waiting for the disk. If you can cut down on the number of disk blocks you need to fetch to fulfil a query, you win! There are lots of things like these that databases do to try and get better performance (some doing more, some doing less, and some that involve trade-offs) and they're often quite reasonable to understand.
One interesting point about this is that the PostgreSQL benchmark tops out at 8 users whereas the MySQL ones start around 8 users and go up from there. Right at the 8 user point, both SQL engines are within the same range, just under 100k QPS on read; with the hardware (# of cores) not changing this value much, as well as the MySQL version 5.5-5.7 not having any effect either. That seems like a limit that these 2 SQL engines (PostgreSQL & MySQL) are hitting; most likely from parsing the SQL. Thus the only way to improve single threaded performance is to use something other than SQL; like the memcache protocol.
Link to the benchmarking tool they used: https://launchpad.net/sysbench. Also a guide on how to use it: http://wiki.mikejung.biz/Benchmarking#sysbench
Also, relational databases are general-purpose query engines, not caching solutions like memcached, with very limited set of commands. It might be possible to invent a more efficient binary protocol to send queries/receive results, but then you still have things like optimization, planning, permission checks, ... that are usually more expensive than parsing the data.
MySQL and Postgresql are insanely different. To a point that it's uninteresting to compare them. MySQL is a keystore with an SQL frontend, which eventually got ACID features bolted on. Postgresql is a full-fledged RDBMS that eventually got refactored with performance in mind. While the end goal seems the same, the results are really different.
All of this text to say something which I think is common knowledge and is found interspersed in the comments here. For any kind of moderately complex query, Postgresql will eat Mysql's lunch. For any kind of concurrent locking, highly parallel, use case, Postgresql is much faster and/or safer. For any kind of very simple query, Mysql will eat Postgresql's lunch. It's a result of different upbringings.
If your use case can be satisfied with memcache, you aren't using a relational database and, yes, Mysql is the best choice, out of these two. Anyhow, in this scenario, today, there is a vast choice of good persistent keystores that are better than mysql.
That's why I included a link to Heikki's FOSDEM talk, discussing the changes in 9.2 in more detail.
The best way to do identify the changes (in general) is probably by looking at release notes:
The large changes should be in major releases, because minor releases are supposed to be just bugfixes, so don't bother with anything except 9.x release notes (effectively 9.x.0).
There's usually a section called "performance", which for 9.2 contains for example these items:
* Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
* Allow group commit to work effectively under heavy load (Peter Geoghegan, Simon Riggs, Heikki Linnakangas)
* Allow uncontended locks to be managed using a new fast-path lock mechanism (Robert Haas)
* Reduce overhead of creating virtual transaction ID locks (Robert Haas)
* Improve PowerPC and Itanium spinlock performance (Manabu Ori, Robert Haas, Tom Lane)
* Move the frequently accessed members of the PGPROC shared memory array to a separate array (Pavan Deolasee, Heikki Linnakangas, Robert Haas)
There are other items, but these seem the most related to the pgbench improvements in this particular release. Each of these changes should be easy to track to a handful of commits in git.
Measuring the contribution of each of these changes would require a much more testing, and would be tricky to do (because some of the bottlenecks were tightly related).
The DL380 machine (the 'main' one) was running Scientific Linux 6.5 with 2.6.32 kernel and XFS.
The i5-2500k machine was running Gentoo (so mostly 'bleeding edge' packages), with kernel 3.12 and ext4.
Update: Actually, I've just noticed it's mentioned in the previous 'introduction' post in the series (http://blog.pgaddict.com/posts/performance-since-postgresql-...).
We also noticed a huge gain in performance from upgrading from Kernel 3.14 to 3.18 especially with regards to storage and network latency.
We (Infoxchange) actually CI our kernel builds daily so we end up with whatever the latest stable kernel is as its released, we also use that as an oppertunity to add the GRSecurity patches to the kernel.
Hmmm, but maybe I could use the kernel from epel. I see there's 3.10 (lt), 3.18 and 3.19 (ml) for el6. It's a bit tricky though, because I have absolutely no access to this machine except for ssh (no KVM or anything), so a botched kernel upgrade means a few days of outage before it's taken care of (it's hosted at a local university for free).
Additionally - there is also a trap where software is deployed that has been developed without the idea of being modernised (and in some cases maintained) over time - Vendors will often charge large amounts of money to modernise and test these applications.
I've observed this as both a means of increasing profitability, out of lack of quality engineering / development skills or practices or by misinforming customers to think that risk aversion and a slow rate of change is the least costly option over time which leads to or is caused by pathological or bureaucratic organisational culture.
That may be generally true of kernels, but when you are guaranteeing compatibility and support for a set of hardware, I would argue that after you've confirmed they work for a kernel, freezing that version and back-porting fixes and features that are worth it with another testing cycle is generally more stable.
I do have my own experience with this, particularly with 3.10 - we've thoroughly evaluated it, yet after a few days on production all the machines crashed because of deadlock in the I/O stack. Very time consuming to debug where the actual problem is (virtualised environment, ...).
So while I run Gentoo on most of my personal machines, I do understand why people are conservative.
I can't easily reinstall the OS on the Xeon machine (it's in a server room that I don't have direct access to), but installing an older kernel on the i5 machine should not be a problem.
Hmmm, a nice topic for another benchmarking quest ;-)