with candidate_rows as (
for update nowait
), update_rows as (
set column = value
where candidate_rows.id = table.id
select count(1) from update_rows;
Want to add a column to your "orders" table and populate it without blocking concurrent writes for as long as it will take to rewrite a multi-million row table? Want to re-hash user passwords using something stronger than MD5, but not prevent users from ... you know, logging in for the duration?
CTEs are all that and the bag of chips.
Just read it. Seriously. Literally cover to cover, or nearly so. (Maybe simply scan the C-related sections if you're not into that, but do take note they exist. Similarly for some pl/ sections.) It might take you a few evenings, but you won't regret it. The Postgres docs is one of the best out there...
EDIT: When I wanted to learn about CTEs, for example, I conveniently had a large, ugly materialized view that needed refactoring, and they happened to fit the need perfectly. The previous version was, in places, five or six layers of subqueries deep, many of them repeated several times as they were reused. It hurt to read. Rewriting it to use CTEs made it about eight times faster, and the SQL script was a third the size of the original.
Still, its good to support Markus' effort and if you prefer having a PDF rather than going to the website, just buy it. It's like $15.
Use The Index, Luke is the free web-edition of
SQL Performance Explained. If you like this site,
consider getting the book. Also have a look at the
shop for other cool stuff that supports this site.
Quoting from http://Use-The-Index-Luke.com/ (the main page):
Use The Index, Luke is the free web-edition of SQL Performance Explained. If you like this site, consider getting the book. Also have a look at the shop for other cool stuff that supports this site.
PostgreSQL ALTER TABLE ... ADD COLUMN is an O(1) operation, and requires no data rewrite (as long as you are OK with NULLs).
You can even add an attribute to a composite type, and existing tables using that type will see the extra attribute. Again, O(1), no data rewrite.
This way, you're hitting a limited number of rows per iteration, significantly softening the IO impact (granted, at the expense of wall-clock time); the NOWAIT fails fast on rows that have write locks out against them when you try to grab them; and (if the column you're populating is indexed, obviating HOT updates) leaving a radically smaller number of dead tuples — particularly if you up-tune autovacuum on the table in question while doing this (though that can mitigate some of the reduction in disk IO).
Additionally, however, if you're bulk rewriting the table in one query, autovacuum has no (or very limited) opportunity to mark dead tuples as truly dead — especially if there's any lock contention going on, so other, older xids are waiting for your bulk update to complete — so in the worst case your table is half dead tuples, and twice the size it needs to be.
Isn't this only possible if you do it on user login - unless you are also cracking user passwords... Actually has anybody cracked their own MD5 user passwords to upgrade them?
Take the old hash, and use it as the input to a new stronger hash. Mark some column to indicate you've done this. Then next time the user logs in, you calculate the old hash and then the new hash from that. Once you validate the password use it to calculate a new new hash, put that in the field and clear the update column.
If you do it that way you'll pay a performance cost on every password check forever. I suppose that trade-off might worth it in some cases.
Either way, I've used this idiom at least half a dozen times in production, all without any downtime or user-visible effect, and we have millions of orders, SKUs and users.