The other feature in this list that I really, really like is session tracing. When I used to work with Oracle I implemented a feature that allowed any web page to have the database session logged if you passed in a magic parameter. This was unbelievably useful in debugging performance problems on the live system as they were happening. In fact it proved so useful that we would turn tracing on for one session for some period of time each day, inserting pointless marker queries at the start and end of each web page. We then ran a script through the log to be able to break it up by web page, looked at the slow web pages, what they did, and thought about how we could make it faster. This exercise suggested optimizations that resulted in something like doubling the capacity of our website on that hardware.
This is invaluable data that you simply cannot collect in any other way. You certainly can't turn on logging for the whole database - the website will crash. Sure you can log things at the application level - I did that as well. But it just says, "We ran these queries and got this timing." It doesn't shed light on what is happening inside of the database.
Bruce Momjian told me on Monday that Robert Haas was leading the index only scan effort, and that this (important) feature should appear in the 9.2 time frame.
Most hints are a bad idea, they represent leaks in the relational abstraction. Further, the DBA's knowledge of the performance characteristics of various tasks is transient, and often quickly mooted by changes in technology, data scale/statistics, or even simple hardware/software reconfiguration. FIRST_ROWS is a reasonable hint, since it speaks to the logical intent of the query, and says something about how much of the data it intends to use and how it prioritizes the results. The planner can then work towards the stated optimization goal.
Assuming that in the author's example it's really the fault of the human the wrote the query with a hint in it, it's not an argument against allowing hints. Just because someone might misuse a tool isn't a reason for not creating it. It's the same argument that the RIAA/MPAA uses when they want to ban P2P technologies (i.e. It can be used to infringe copyright, therefore it should be illegal).
Most of the 'hints-vs-no-hints' arguments devolve into two camps:
1) People that want to have control over the query because they don't fully trust the planner (this mistrust could be logical or illogical -- i.e. people that experienced a need for hints, or people that just don't trust the planner even though it's never given them a bad plan)
2) People that put 100% trust in the planner and say that since the planner is deterministic, then it's a human's fault when it comes up with a bad plan. (i.e. They love The Planner. They trust The Planner. They are careful not to taunt or anger The Planner.)
Also: FIRST_VALUE and LAST_VALUE really need IGNORE NULLS to do proper, high-performance rollforwards for things like time series data and periodic analysis.
Anyway, I'd be curious to know what flags the Postgres guys pass into GCC when compiling it. Or do they know better than it's "planner"?
As for GCC, it's usually up to the packagers, but the source tree after running ./configure gets compiled with:
-O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv
-O2 means "take more time compiling but produce faster code". This hint will always be valid and useful, regardless of how the codebase changes. Query planner hints can become obsolete as the data distribution changes.
-fwrapv and -fno-strict-aliasing are correctness settings
-Wxxx are just help for the developers
(there's a caveat there, and that's that my biggest tables tend to be toasted, so there's a decent processor load uncompressing the toasted bits. Also, Power law governs table size, and the biggest table is ~ 50+ % of the database, with ~ 100 tables. YMMV)
I wish there was a way to transparently parallelize operations over more than one core. I've got enough operations that are actually processor bound that I could benefit (some) from it. I could partition the database, but that's a management headache.
Granted, this only exists for pg_restore (8.4 onwards), no equivalent functionality in postgres for pg_dump.
What I do is dump the schema, then split it between the table creation and the indexes. The first part gets loaded before the data, the second, after. Then all the tables and all the sequences are dumped, optionally teed to disk, and loaded in a psql process to the new location.
I've been doing this since the 7.x series, so it way predates 8.4. (It also predates the in place upgrades, which is a big thing that they've added since I was coming up to speed)
I've occasionally used Postgres at home, including doing some work at home long ago for a (small batch oriented) project to be hosted on Oracle at work. Irony on that project: my crap hardware at home outperformed the server at work, due to neglect by the Oracle DBAs at work. The DBAs had to really be pushed to index several of the tables, as we had to convince them that we really did read some tables and columns many times, rather than generate "write once, read never" type data :-) In contrast, the default Postgres setup did well at this particular job running at home with my "2 cans on a string" setup: 200 MHz Athlon client & 166 Mhz P1 "server", joined by a 96 Kb serial PPP connection! (I forgot why I used 2 machines instead of 1, but computers were small back then -- my "Atom" laptop now has more computrons than those boxes)
I found that MySQL was not even close (at least 10 years ago) to emulating Oracle. At least Postgres was close enough to make the effort, and the poster's comparison, worthwhile.
>1. Query progress
>3. Session Wait Data
>4. Session Tracing
These are all the same feature. Granted, it's an awesome feature, and thats a reasonable division.
>2. Index Only Scans
>5. Multiple Block I/O
These are real valid (and separate) points. There's a good argument for leaving 5 to the OS, but it does mean accepting that the OS might suck at predicting what you want.
>6. Tablespace Quotas
It doesn't seem like this should be so hard -- use a user-space FS with a cap.
>7. No Logging
Why are you creating indexes all the time? Large data loads I can see.
>9. Query Hints
Flamebait, but acknowledged as such.
>8. Better Export/Import Tools
>10. More Backup/Restore Options
Seriously, you just used the flamebait between these two so you'd make it to 10. Nice tactic :)
Wouldn't re-ANALYZE-ing the tables in question help fix this problem, where the statistics that Postgres has about the table access patterns no longer fit reality?
I can sum it up with a smiley :-)
Shipping WAL files over rsync seems a little.. hackey to me. Worse still, Postgres has to load the entire WAL on the slave when it wakes (whereas with Oracle you can query the slave before it has finished loading the WAL files since it can pause and continue).
Disclosure: I'm not a DBA or specialist in this area. I know enough to pick sensible architectures, optimize queries, and maintain good indexes.