1) Turning "enable_seqscan" off is almost never a good idea. If Postgres isn't using an index that you think it should, you should figure out the underlying reason why. This would be akin to killing a fly with a sledgehammer.
2) The main thing that is going to impact whether a given index is in memory is how often that index is used, not it's size. The advice about reducing the size of indexes is true (in that a smaller index uses less memory, which is always good), but if you have 10 indexes and you only actively use 1 of them, then those other 9 indexes aren't going be taking up much memory. The biggest issue is they are going to slow down your inserts.
3) Manually vacuuming isn't really necessary for newer versions of Postgres (>8.3 I believe, where autovaccum is enabled by default). There are times when you do need to VACCUM, but probably not that often in standard usage.
Here is some simple practice advice for postgres users:
1) If you are using an older version of Postgres, upgrade to Postres 9. It has a lot of huge improvements over previous versions.
2) Purchase & read Greg Smith's PostgreSQL 9.0 High Performance (as mentioned previously by another poster). This book is phenomenal.
2) The idea (sorry if wasn't clear) but to make sure the indexes you routinely use (get hit by the site many times a second) are all in memory. In the past for me that has generally meant roughly all the indexes (in aggregate) fit in memory.
3) I have autovacuum on and find my daily vacuuming actually does a lot. Sure, I can probably tune it to be more effective, but it is easier just do put a daily vacuum in cron (and also a safeguard in case anything goes way out of wack).
On your points, I bought the book :)
Trivia: A fsync call on any file on ext3 will force a sync of not just that file but the whole volume; ext4 fixes that. If you buy large SATA drives, then partition them to "short stroke" using only the outside of the disk (discarding the slower spinning inner region) you get competitive performance to SAS disks for many workloads.
One of the best low level books I've found in a long time.
His books is a copy of his blog with just the internals blog posts.
It's a shame the book doesn't have a Kindle edition.
Thanks for the explanation astine :)
"Details" or "specifics" sounds less ... trivial. "Nitty gritty details" ("let's get down to the nitty gritty") is also OK.
Postgres kicks ass.
The fact is, though, that often-times, a table-scan is the most efficient query plan. Yes, indexes can speed things up, but they do so by increasing random IO (leaving completely aside the write-time penalty you pay for indexing). When your storage medium is spinning rust, increasing random IO will eventually — and much sooner than you think, at that — cost more than buffering an entire table into RAM and sorting/filtering there. Moreover, all that additional random IO will degrade the performance of the rest of your queries.
Date indexing: This was tricky in the beginning as a timestamp with timezone column can not be indexed (because the value is not fixed). So we use timestamp without timezone columns and handle zones another way. Then, for a big table with events that you want to keep for a long time, you can CREATE INDEX i_date ON events(ts::date). Then in all queries fetching for this table, make sure to add a WHERE ts::date = 'YYYY-MM-DD' clause, this way you will hit the date index and get fast queries even with billions of rows.
Denormalization triggers: Suppose you have a Russian puppets structure like usergroups, users, events, items. You need to access to items, and also most of the time filter by a 'type' value that is properly stored at usergroups level. The straight way is to do a quadruple join, but this can be slow. So we add a denormalized column 'denorm_type' in items, with two triggers: one trigger forbids direct update of 'denorm_type', another trigger reflects any change done at usergroups level in the 'denorm_type' column. This helps a lot.
Materialization is a higher level of denormalization than the one above. It is necessary to keep the 'real' data highly normalized, but most often one need to access to the data in a more user-oriented form, specially for reporting purposes. Views are excellent at that task, but they are computed in real-time. Materialization is the process to write some of these views on the disk, and index them, for faster access. With proper triggers, it is feasible to keep these materialized views safe (ie. read-only) and always up-to-date (any change in the base data triggers an update or insert in the materialized view).
Oracle allowed you to disable the indexes on a table (something like 'ALTER INDEX foo DISABLE'), which allowed them to become stale when data in the table was modified. You could do this right before a big import operation for example, and you could then rebuild (a different 'ALTER INDEX' syntax) them when the operation completed.
PostgreSQL doesn't appear to have an equivalent at the moment.
RENAME TABLE current TO current_old,
current_new TO current;
alter table current rename to current_old;
alter table current_new rename to current;
checkpoint_completion_target = 0.9 # up from 0.5 default
So the checkpoints get written out more evenly and cause less and shorter "stop the world" events. This was especially painful on Amazon EBS (which is a relatively bad idea by itself btw) with large shared memory size. Decreasing this actually improved performance for us.
Off topic, but some PostgreSQL love: after using DB2 on a customer job for the last 5 months I much more appreciate PostgreSQL which is the RDMS I usually use. It seems like PostgreSQL is developed and documented in such a way to make it easy to use while DB2 is designed and documented to optimize IBM consulting services revenue.
Copy table from a tab delimited file. In mysql you can use load data infile which will do the exact same thing.
Indexes in memory obviously very important in mysql land too. Also when using joins make sure the columns being joined on are indexed.
Using innodb o_direct will prevent linux from swapping out the mysql process.
Mysql's default in memory table tables are very small which usually results in creating on disk tables, to prevent this increase tmp_table_size and max_heap_table_size. Alternatively you can specify mysql engine memory for in memory tables if you know how big they are going to be.
This is close, but not exactly true. Linux is loathe to discard buffer cache (even with swappiness at zero) it will often choose to swap out MySQL code/data instead when faced with downward memory pressure.
Using O_DIRECT helps to not swap out MySQL because it doesn't load up the buffer pool with a bunch of already-cached-by-mysql data, but it doesn't prevent it. A big rsync job for example will still load every copied file into the buffer cache and potentially cause swapping.
If you want prevent the InnoDB buffer pool (or Oracle SGA) from ever being swapped out use huge pages; they can't be swapped out.
I've read that the split-LRU kernel patch (kernels >= 2.6.28) fixes a number of these bad swapping behaviors, but I haven't tried it myself.