
Postgres 9.2 will feature linear read scalability up to 64 cores - chanks
http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html
======
rosser
As much as I love PostgreSQL (and I do; it's put food on my table for the last
decade), I have to stress that this linear-ish scalability needs both pg >=
9.2 and a Linux kernel >= 3.2. It seems to be a combination of the lseek(2)
changes in the kernel, and the lock contention/handling changes in the db.

That is: if you're running on an older kernel, you probably won't see quite as
much gain.

~~~
lobster_johnson
We upgraded from 2.6 to 3.2 recently and our Postgres has been flying ever
since. The scheduler changes (and presumably the lseek changes) make a huge
difference in load. We have not done any performance timings, though, so we
don't know if the changes translate to better performance.

To be specific, the change reduced read I/O (<http://i.imgur.com/L8NWO.png>)
and load average (<http://i.imgur.com/7793A.png>) both by an order of
magnitude, and the variance is much tighter than before. (The system is a dual
Xeon quad-core X5355/2.66GHz with 32GB RAM and RAID5.)

That improvement was fairly miraculous — a factor of 10x just by upgrading a
kernel is not something that happens every day. Still, I would not be
surprised if Postgres 9.2 pushes performance even higher.

~~~
andyzweb
what distros are offering a 3.2 kernel?

~~~
psadauskas
Ubuntu 12.04 will, when its released later this month.

------
ww520
Database vendors typically charge by the number of cores. 64-core can get
really expensive with database licenses. The hardware cost has decreased
drastically over time but database licensing are still in the dark age.
Postgres 9.2 has real competitive advantage here. Hopefully it would force the
other vendors' licensing cost down.

~~~
jonknee
Evidence:

[http://www.oracle.com/us/corporate/pricing/technology-
price-...](http://www.oracle.com/us/corporate/pricing/technology-price-
list-070617.pdf)

"The number of required licenses shall be determined by multiplying the total
number of cores of the processor by a core processor licensing factor
specified on the Oracle Processor Core Factor Table"

[http://www.oracle.com/us/corporate/contracts/processor-
core-...](http://www.oracle.com/us/corporate/contracts/processor-core-factor-
table-070634.pdf)

They give you a big discount for buying Sun servers (.25 factor). Either way,
it's a huge amount of money, the standard edition costs a cool $17,500 per
processor so with 64 cores and the best .25 multiplier you're still looking at
16 x $17,500 or $280,000 for the DB processor license (that doesn't cover
support or anything else). The Enterprise edition runs an astounding $47,500
per CPU, so you can easily run north of a million dollars per server if you're
running a lot of cores.

~~~
etrain
Oracle tends to be pretty opaque in their pricing, and part of that is because
with any sale this big, there's always going to be a lot of negotiation.

Sure, it's going to be expensive, but only schmucks pay full price for a
64-core license.

Still, it's good to see the best open source database out there delivering
cutting edge performance. Great work!

~~~
mapgrep
Yes you can get a big discount. Then when your business booms, and you need to
expand, upgrade or (for some licenses) renew, Oracle gets decide how much of
your profit to take off the table in licensing fees, up to the full list.

But of course I'm speaking out of school here. I have no proof Larry Ellison
or his minions will pursue such ruthless business tactics. Just a feeling :)

------
vladev
To me, Postgres is the most underestimated database. Not sure if this is a bad
thing...

~~~
ryandvm
Agreed. I get that MySQL is incredibly simple to set up, so I can sort of
understand why people use it for pet projects or whatever. But what I never
understood was why doesn't PostgreSQL see better adoption from the big player
(Google, Amazon, Facebook, etc.)?

~~~
randomdata
> But what I never understood was why doesn't PostgreSQL see better adoption
> from the big player (Google, Amazon, Facebook, etc.)?

An interesting selection of companies given that all three are known for their
use of home-grown databases (BigTable, Dynamo, Cassandra) for their primary
offerings that are not of the SQL variety at all.

Though I think it is still a good question. It may have something to do with
the ease of setting up MySQL when you are a young startup trying to get
something working as quickly as possible, leaving it often hard to justify a
change after you've hit the big leagues.

~~~
justinsb
They may be known for their home-grown databases, but they shouldn't be known
for their use of them:

Amazon's primary database is Oracle. Dynamo is used for their shopping carts
i.e. for storing sessions; Memcache would probably work just as well.

Facebook's database is MySQL, with sharding and Memcache. I was under the
impression they stopped using Cassandra entirely?

Google's business (advertising) is built on MySQL, as are many of their sites
e.g. YouTube. I think the newer Google-developed sites (e.g. GMail, Reader
etc) are indeed built on BigTable.

~~~
haberman
> Amazon's primary database is Oracle. Dynamo is used for their shopping carts
> i.e. for storing sessions; Memcache would probably work just as well.

What? Dynamo is a distributed, persistent, highly-available storage system
with incremental scalability and advanced techniques for dealing with slow or
unavailable servers. Memcache is a single-process daemon that vends an in-
memory hash table via TCP. They are not even remotely comparable.

I'm not dissing memcached, it's cool and useful, but its scope is far, far
more limited.

In particular, have you noticed that when you add something to your shopping
cart on Amazon but don't buy it, it's still there months or years later? The
data doesn't disappear just because some process that was holding the data in-
memory crashes.

~~~
justinsb
I can't say I've ever put something in my cart, not logged in to Amazon for
months, gone back to see it still there, and thought "wow - they must use some
sort of webscale storage solution - I'm now going to buy the thing I didn't
want to buy six months ago."

From the point of view of solving the actual business problem, therefore, I
think something based around Memcache would work just fine!

Dynamo is very useful for persuading you that working for Amazon would be
interesting, though.

~~~
haberman
Have you ever worked in an environment at the scale of Amazon? Machines go
down all the time. I used the "six months" example to illustrate that the
shopping cart is persistent, but the machine crash could just as easily happen
the moment before you push "checkout." Losing shopping cart data just because
one machine crashed is totally unacceptable.

Programming distributed systems that must survive machine failure and network
partitions is a completely different ball of wax compared to simple web
programming. If you haven't done it before, you would not believe how much
more complicated it is.

Here is an extremely simple example. Suppose you're a radio station that's
taking phone calls from people and you want to give an award to the 5th
caller. Implementing a program that does this for a single machine is easy,
and could be accomplished with a program something like this:

    
    
      import SocketServer
    
      class MyHandler(SocketServer.BaseRequestHandler):
        def handle(self):
          self.server.caller += 1
          if self.server.caller == 5:
            self.request.sendall("Congratulations, you are the 5th caller!\n")
          else:
            self.request.sendall("Sorry, you're caller #%d\n" % (self.server.caller))
    
      server = SocketServer.TCPServer(("localhost", 9999), MyHandler)
      server.caller = 0
      server.serve_forever()
    

Using this program, you can "call" the program by doing "telnet localhost
9999" and the program will tell you what caller you are. This took me about 5
minutes to write, and I'd never used this Python API before.

Now imagine that you want to implement this same logic, but using a cluster of
machines that could go down at any time. You want the group of machines to
form "consensus" about which number each caller is; consensus in this context
just means that the group of machines arrives at a single answer, and any
machine you ask will give you the same answer.

Finding an algorithm that can do this robustly is so difficult that it was a
major breakthrough when one was discovered in 1988. It's called Paxos and you
can read about it here:
<http://en.wikipedia.org/wiki/Paxos_(computer_science)> Even though it has
been known for over 20 years, it is still a complex topic that very few people
understand the details of.

The point of all of this is just to say; you can't compare a single-process
in-memory cache to a distributed and fault-tolerant system. They are
completely different beasts, and many business problems do indeed need the
latter.

~~~
justinsb
I know about Paxos. I know when to use it, and when not to use it.

Seeing as you came up with the example, using Paxos for an "Nth caller wins"
is a really bad idea - the Nth caller to your switchboard likely won't be the
one selected. You probably _need_ a single server system, like the example you
wrote (but without the threading errors.)

~~~
haberman
Then why would you compare dynamo and memcached? Or suggest that Amazon could
use an in-memory cache as the primary store for shopping cart data? It makes
no sense.

If you want to be particular about ordering, then you can always use paxos to
elect a master that handles everything serially and failsover.

~~~
justinsb
I see you work at Google. How about advancing the conversation by telling us
how Google stores its sessions?

~~~
haberman
Let's make a deal. I'll start "advancing the conversation" once you stop
misleading people (ie. admit that your initial sweeping claims were
incorrect).

~~~
justinsb
Deal. My initial sweeping claims were incorrect.

So: How does Google store its sessions?

~~~
haberman
So I'm not an expert in Google's front-ends and I'm not sure that Google
stores "sessions" in the way you'd generally think of them in web apps. But
usually when we have requirements like what you'd need for sessions (highly
available, low-latency, highly scalable) we use Megastore:

<http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf>

[http://www.readwriteweb.com/cloud/2011/02/megastore-
googles-...](http://www.readwriteweb.com/cloud/2011/02/megastore-googles-
answer-to-no.php)

Megastore is a layer on top of Bigtable that adds indexing, synchronous
replication across data centers, and ACID semantics within small partitions
called "entity groups."

~~~
justinsb
Good deal! Agree that traditional sessions are best avoided, but good to know
that megastore is suitable for session data.

------
robomartin
Is there such a thing as a suite of standardized performance tests for large-
scale, multi-core databases? How are people comparing Cassandra, PostgreSQL,
mySQL and other options against each other for raw performance?

~~~
jacques_chester
> Is there such a thing as a suite of standardized performance tests for
> large-scale, multi-core databases?

There is: the TPC family[1] and their opensource dopplegangers, the OSDL-DBT
family[1].

I don't think they've been applied to non-relational databases as yet.

[1] <http://www.tpc.org/information/benchmarks.asp> [2]
[http://sourceforge.net/apps/mediawiki/osdldbt/index.php?titl...](http://sourceforge.net/apps/mediawiki/osdldbt/index.php?title=Main_Page)

------
bsg75
With this improvement, how much is Postgres hampered by a lack of a parallel
query processor?

For OLAP work, it seems to be the primary bottleneck.

~~~
einhverfr
A good project to watch in this regard is Postgres-XC. Maybe not quite ready
for production but it's close. (<http://sourceforge.net/projects/postgres-
xc/>)

------
gtaylor
Yikes, that's a lot of cores. Glad to see the Postgres team keep pushing the
scalability envelope.

~~~
TylerE
Is it? I mean, bargain basement budget desktops have more cores than a typical
server of 10 years ago. A 24 or 32 core server can't really be considered that
exotic these days, can it?

~~~
__alexs
The number of places that run e.g. 4U $10,000 HP servers with 4 socket, 16
core Opteron servers is reasonably low I think.

~~~
justauser
Change the branding to Supermicro and cut the cost to $5,000 and suddenly,
many dedicated host providers out there have these servers. We're probably
only 2-3 years away from this core count being a $2,000-$3,000 box. Throw ARM
and Tilera architecture and whatever AMD is doing with the acquisition of
SeaMicro in there and it might even accelerate that pace.

------
ww520
That's awesome. I have to admit I have always know Postgres is great and toyed
with it but never used it in a real project, due to the availability of MySQL
or client preference. I'll try to put it into the current project. Client
wants Oracle since they already have Oracle license, but I will change the
requirement to support Postgres as well.

~~~
j-kidd
If possible, try not to go down the "support Postgres as well" route.

I was in your situation, where client wanted SQL Server since they already
have the license. During development, I use PostgreSQL instead, to "support
Postgres as well".

At the end, roughly one-third [1] of the _total_ development effort was spent
on overcoming SQL Server's limitations, things that you would never have to
think about in PostgreSQL.

So, try telling the client that they already have PostgreSQL license as well,
with unlimited future upgrade.

[1] This figure was pulled from ass. The actual productivity loss could be
more due to similar reasons outlined in
<http://news.ycombinator.com/item?id=3784750>

~~~
xradionut
Maybe the requirement is Oracle or SQL Server because the client has the
resources (DBA, support contracts, etc) to support those platforms in house?

~~~
j-kidd
Not in my case, the client doesn't have the resource, and we don't have the
expertise. At the end, I became somewhat of an expert in SQL Server (i.e. I
know how to start SQL Profiler and read query plan), at the cost of important
functionality not delivered to the client on time.

------
verminoth
I'm new to all of this, so does this mean that other databases don't have this
kind of performance?

~~~
gauravk92
Performance is subjective, but let's dig into the performance optimization
this patch includes. The update addresses an issue where to do a llseek (read)
of the database, the linux kernel would lock the read, causing it to have only
one output at a time. The patch removes the lock because it was unnecessary
and thus the performance scales concurrently without a lock creating
contention.

This kind of performance optimization isn't new, concurrency is the name of
the game. Erlang is a language built around concurrency and it has some
databases written in it (couchdb) that scale with more cores due to erlangs
inherent capabilities. So has this kind of performance increase been seen
before, yes.

~~~
jeltz
The bulk of the fixes were in the locking in PostgreSQL though, and before
those fixes were made the llseek problem did not appear to many since in
almost all cases people hit a bottleneck in the PostgreSQL code before hitting
the Linux Kernel one.

~~~
gauravk92
Thank you! Do you have any experience with CouchDB? I'm sure you've at least
of heard of erlang and all the benefits it promises, I'm considering whether
to use it for a future project, I might because the JSON REST api seems
convenient to use without having to worry about scale. But I haven't done
testing and could use advice.

