
The State of Vacuum in Postgres - craigkerstiens
http://rhaas.blogspot.com/2018/01/the-state-of-vacuum.html
======
rosser
We've found that periodic invocations of pg_repack are necessary to keep
things from bloating beyond the inflection point where vacuuming just can't
keep up. I'd love to see VACUUM somehow able to take care of that, because the
WAL volume of repacking multi-terabyte databases — even when you're just
targeting the worst culprit tables — is absurd (especially when you're keeping
all your WALs around forever, for PITR purposes). I'm just not sure what more
it can do.

~~~
ioltas
There has been in the community for a couple of years now a set of patches to
support REINDEX CONCURRENTLY, and there have been discussions for having a
CLUSTER CONCURRENTLY, but nothing has materialized. Pushing forward into
having such patches would make sense. There are many users asking for the
possibility to do tuple reordering without holding an exclusive lock on the
relations involved.

------
gregw2
Where do things stand with the status of vacuum in Redshift which derived from
Postgres?

What version of postgres vacuum implementation are they derived from? Have
they implemented or incorporated any of these improvements since their fork?
(Obviously Redshift doesn't need the btree index ones since it uses
distribution keys and zone maps.)

I ask because my experience is that Redshift vacuum implementation is awful on
large heavily used data warehouses, and I wonder if there is any hope for
improvement.

~~~
bigtones
Amazon Redshift is not actually based in Postgres, it just supports a Postgres
interface. It is built on top of technology from the massive parallel
processing (MPP) data-warehouse company ParAccel which was acquired by Actian,
and licenced by Amazon.

~~~
Ralfp
> Amazon Redshift is based on PostgreSQL 8.0.2

Same wikipedia article you are quoting ;)

[https://en.wikipedia.org/wiki/Amazon_Redshift](https://en.wikipedia.org/wiki/Amazon_Redshift)

~~~
scapecast
I can confirm that Redshift is based on Postgres. And I know that straight
from the source: Dave Steinhoff, one of the founders of ParAccel, is on our
team.

[https://www.intermix.io/about-us/](https://www.intermix.io/about-us/)

------
hinkley
A number of GC and other resource reclamation systems, especially soft real-
time ones, have worked on accounting the cleanup costs to the parts of the
system that make the mess in the first place.

In these systems, a function that is very careful with resource allocation
will almost never experience a significant slowdown due to the background
processing, because they either pay no cost or an amortized cost. For
instance, one collector I knew of would garbage collect up to 10 objects on
every allocation. If large allocations are front loaded then a full cycle
almost never happens. In Swift or C++ or Rust you pay for _your_ allocations
as soon as they go out of scope.

In the context of an MVCC system, running a long query causes considerably
more versioned data to pile up. If part of the cost of retiring that data was
payed at the end of the transaction, several things happen. One, other
transactions are less impacted by the overhead. Two, you end up throttling the
bad actors, which encourages the problem to be fixed, and in the meantime the
high water mark for vacuuming is reduced.

~~~
rosser
Are you legit suggesting tying the cost of GC to _transaction commit_ in a
database platform that is very commonly used for OLTP workloads?

~~~
hinkley
Those are exactly the systems that are falling over already, aren’t they?
They’re already tied together whether that’s inconvenient or not. Right now
(or at least, recently) the performance simply falls off a cliff instead of
degrading.

A lot of the tools we have for dealing with down services also can deal with
degraded ones. If for instance throttling votes per second keeps the system
usable for everyone else then that has been done before and can be done again.
But you have to do it before the database has a seizure, not after.

[edit: I’m talking about back pressure. This is not a new or crazy concept.
Your incredulity and $5 will buy me a cup of coffee]

~~~
rosser
I wouldn't characterize things as "falling over", except perhaps in cases of
operator incompetence.

There's certainly room for improvement — The Fine Article wouldn't exist
otherwise — and backpressure might be worth exploring as a means of improving
things. If that backpressure causes the system to slow down in a way that
adversely affects customer or end-user interactions, however, it will often
just be turned off, mooting it — or making things _worse_.

~~~
hinkley
Like a lot of people, I was barely aware of VACUUMing before several high
profile OLTP systems had unplanned outages because Postgres' requests per
second dropped off dramatically and suddenly due to this arcane form of
resource exhaustion. That's a performance cliff. I didn't mean it is hyperbole
but as jargon. System hits a bottleneck, rate-of-change alters
drastically/unmanageably.

~~~
rosser
That sounds like operator error to me. If you (the general "you") use a
database in production without knowing what sorts of maintenance it requires,
that's neither the database's nor its authors' fault.

I guess you could argue the tool is somehow "flawed" for needing maintenance.
I'd counter with the invitation to show me tools that _don 't_.

------
devit
Is it really necessary to have a vacuum procedure?

Isn't it possible to just use a data structure to index rows by transaction
ids, and on each transaction commit efficiently find all rows that aren't
visible to any transaction, and add them to a free list?

Seems kind of a bad design to rely on periodic full data scans.

~~~
jandrewrogers
Efficient resource recovery while preserving consistency and minimally
impacting workload throughput is one of THE central problems in database
engine design. You can move the pain around but it never goes away, and a poor
solution will very negatively impact the throughput of normal workloads.
People have been thinking about this problem for as long as we've been
building database engines.

This is not to say vacuum-like mechanisms are necessarily the best method but
it was a common architectural idiom for database engines from the 1990s
because it works well with sequential storage devices. Newer techniques tend
to put resource recovery inline with the workload rather than outside of it
but that has other disadvantages.

Indexing all rows in a database by transaction id would cause an extreme loss
of throughput. That just creates a continuously mutated (and therefore
locking) structure every thread constantly uses that is also paged to disk.
And deletion of individual records is a problem for indexes generally.

~~~
lobster_johnson
Oracle, with its undo log and some clever in-place tuple modification and
space reclamation, seems to have sidestepped the issue to a much greater
degree than Postgres. Oracle is MVCC-like, but sacrifices technical elegance
for performance.

For example, as I recall, Oracle actually writes new row data to the page the
row lives in, overwriting the old data in place on commit. It also writes the
modifying transaction ID to the row header. If a different transaction finds
the row, it will see that the transaction ID doesn't match (it's newer than
itself), which forces the database to go to the undo log to look for the older
data.

I'm not privy to the technical details of how the undo log is implemented or
how it avoids the vacuum problem. I suspect it's related to the fact that the
undo log is separate from the tuple data, so undo bloat doesn't affect table
performance to the same extent as with Postgres.

~~~
Tostino
That is correct, it also means that things like rollbacks, which are pretty
much instant in Postgres can take significant time in Oracle.

------
hinkley
Why isn’t the rate of vacuuming in Postgres determined by the rate of churn
instead of things like wall clock time?

These weights and heuristics are either going to punish small databases of
sabotage large ones, unless the factor in a predictor or how much there is to
vacuum.

~~~
anarazel
> Why isn’t the rate of vacuuming in Postgres determined by the rate of churn
> instead of things like wall clock time?

It is determined by the rate of churn.

------
firasd
Related: Why Uber Engineering Switched from Postgres to MySQL
[https://eng.uber.com/mysql-migration/](https://eng.uber.com/mysql-migration/)

Edit: This is above my technical paygrade so I'll let others be the judge of
whether these are in fact related! I believed so, from my earlier readings
about the issue. Here's an article from the same blog in the OP about the Uber
migration: [http://rhaas.blogspot.in/2016/08/ubers-move-away-from-
postgr...](http://rhaas.blogspot.in/2016/08/ubers-move-away-from-
postgresql.html)

~~~
rosser
Vacuuming is not why Uber flounced off PostgreSQL. They were naïvely using it
as a KV store, and indexing every column, which caused a write-amplification
cascade.

Edit, re: the parent's edit: Quoting that article, "Perhaps the thorniest
problem which Uber raises is that of write amplification caused by secondary
index updates." The word "vacuum" is mentioned exactly once, in the context of
keeping up with index insertions (writes).

~~~
warent
Lol. If that's true (and I'm not disputing it) then how would switching to
MySQL fix that problem?

~~~
dragonwriter
Postgres’s MVCC implementation with immutable tuples means _every_ write to a
table affecting any index touches _every_ index on that table. Excessive
indexing will have a performance hit in _any_ DB, but combined with Postgres’s
architecture it’s particularly bad.

~~~
anarazel
> Postgres’s MVCC implementation with immutable tuples means every write (even
> updates to a column not in any indexes) to a table touches every index on
> that table.

The parenthetical comment is not true. If no indexed columns are updated, and
there's space on the same page, a so called "HOT" update is performed.
Basically a redirection is inserted triggering index scans to follow the
update chain to the newest version:

[https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f...](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT&hb=HEAD)

~~~
dragonwriter
Edited the comment to correct the error in my original understanding; thanks
for pointing it out!

