Secrets of Postgres Performance
Measure Everything Before and After
Do not do Dumb Things
Watch your query counts
shared_buffers: set to 25% of available ram, move up/down 5%
to find sweet spot
effective_cache_size: planning hint that tells PG how much RAM it can expect for OS disk cache
work_mem: per process amount of ORDER BY space
wal_buffers: Set to 16MB and forget it
checkpoint_segments: Increase to at least 10
maintenance_work_mem: 50MB for every GB of RAM
synchronous_commit: turn off with data loss risks
Disks. Lots of disks.
Not all disks are created equal. Faster disks make a huge difference.
Configuration of disks matters a lot. RAID-5 is bad. RAID-1+0 is good.
Separate out the WAL onto its own disk. Can 4x write performance.
Worry about CPU after your disk/RAID/filesystem configurations are more or less optimal.
Use pgbouncer to pool connections
Consider putting archive or legacy data onto slower drives/volumes
Separate your data and indices onto different volumes if possible
This applies even on SANs
Best of both worlds with Rackspace and Cloud Connect
EBS volumes and Software RAID is best but scary on AWS.
Choice of filesystem matters. No journaled filesystems for your WAL, it is a journal.
Use XFS for maximum performance.
Avoid queries in loops.
Count() is slow
And look at that - a 35 minute video is compressed into 1 minute of reading and I bet, thanks to the summary, I obtained 75% of the relevant info that I would have got if I had watched the video in it's entirety.
I really despise non-textual media.
I don't know if I'm an edge case here, but if not there might be some kind of opportunity there.
With text, at least I can speed read and ignore stuff that is not important to me.
Call it, "The wiki for getting to the fucking point".
One thing though:
"EBS volumes and Software RAID is best but scary on AWS"
I've managed an EBS RAID10 database for a few years now. I wouldn't touch this with a 10 foot pole.
Do yourself a favor, set up an m1.xlarge (or bigger) instance, put the ephemeral drives in a RAID0 and mirror across multiple machines using hot-standby, slony, londiste, or some other tool. You'll be much happier, your system will perform much better, and you'll have a failover strategy in place.
Or are you focusing on the RAID10 part - but then everywhere RAID10 is touted to be the best RAID solution (right balance between performance and safety)
The ephemeral drives are drives directly attached to the server and to the best of my knowledge are not a shared resource. Their performance characteristics are highly consistent, but if your server goes down all data on those drivers are lost.
EBS sounds nice in theory, but by going to EBS RAID you throw away most of its benefits (such as snapshotting) and take on it's worst aspects.
Use Wal-E to push wal logs to s3.
I enjoyed this video, even though I'm not a PostgreSQL guy, as it has a lot of good generic info. The advice about XFS and noatime was right on the money, although I don't think he strongly stated enough why you don't want to run EBS volumes over software RAID-0. In my experience this is a really bad idea, because just one of those EBS volumes getting picking up a laggy await (happens pretty frequently with volumes I've seen in the wild) will drag down the performance of the entire array. Also I'm told that the old "RAID-5 is always bad" notion is considerably more nuanced nowadays, and that many of the earlier OLTP write performance problems with RAID-5 have been largely mitigated with modern RAID controllers.
" Also I'm told that the old "RAID-5 is always bad" notion is considerably more nuanced nowadays, and that many of the earlier OLTP write performance problems with RAID-5 have been largely mitigated with modern RAID controllers."
I can think of good reasons to switch from mysql to postgres though, and I've a hard time coming up with reasons to switch from postgres to mysql (which is something else all together than choosing mysql over postgres in the first place - although even there, for serious deployments, the only real reasons I can see are in the "it's what we know" department).
Long time Postgres user so I may be a little biased ;-), but still.