
An unexpected journey, a Postgres DBA's tale - mirceasoaica
https://engineering.semantics3.com/2016/07/20/an-unexpected-dba-journey/
======
limaoscarjuliet
There are more surprises around Postgres XID to be had:

\- Temp tables cannot be vacuumed by anyone else than connection that owns the
table (no way around it) - do not keep temp tables too long, nightly vacuum or
auto vacuum will not clear these,

\- Temp tables do not get automatically dropped on recovery if your DB cluster
crashes. They get reused but if you have many connections before crash and few
after some namespaces will linger - you may need to manually cascade drop temp
namespaces if you see the warning,

\- Full vacuum does not touch XIDs - use freeze or just normal vacuum,

\- Check oldest XID per table:

    
    
      SELECT c.oid::regclass as table_name,
    
            greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
    
            FROM pg_class c
    
            LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
    
            WHERE c.relkind = 'r'
    
            ORDER by age DESC
    
            LIMIT %s
    

\- Check oldest XID per db:

    
    
            select datname db, age(datfrozenxid) FROM pg_database ORDER BY age DESC

~~~
anarazel
> \- Full vacuum does not touch XIDs - use freeze or just normal vacuum,

Don't think that's right.

    
    
          copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
      			   bool *pSwapToastByContent, TransactionId *pFreezeXid,
      			   MultiXactId *pCutoffMulti)
      ...
      
      /*
      	 * Compute xids used to freeze and weed out dead tuples and multixacts.
      	 * Since we're going to rewrite the whole table anyway, there's no reason
      	 * not to be aggressive about this.
      	 */
      	vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0,
      						  &OldestXmin, &FreezeXid, NULL, &MultiXactCutoff,
      						  NULL);
      
      /*
      	 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
      	 * backwards, so take the max.
      	 */
      	if (TransactionIdPrecedes(FreezeXid, OldHeap->rd_rel->relfrozenxid))
      		FreezeXid = OldHeap->rd_rel->relfrozenxid;
      

i.e. the cutoff is computed in copy_heap_data(). And then in

    
    
          void
          rewrite_heap_tuple(RewriteState state,
      				   HeapTuple old_tuple, HeapTuple new_tuple)
      {
      ...
      	/*
      	 * While we have our hands on the tuple, we may as well freeze any
      	 * eligible xmin or xmax, so that future VACUUM effort can be saved.
      	 */
      	heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
      					  state->rs_cutoff_multi);

~~~
limaoscarjuliet
I'm speaking from experience with 9.1, I perhaps this has changed since (5+
years of dev!). I ran an experiment to demonstrate no XID reset on vacuum
full:

I ran a test:

XID age in callback 354015 (0% of max)

task=# vacuum full callback;

XID age in callback 354015 (0% of max)

task=# vacuum freeze callback;

XID age in callback 11 (0% of max)

~~~
anarazel
Yea, but that's not the same as not freezing at all - it freezes stuff older
than autovacuum_freeze_min_age.

------
saurik
It is my vague understanding that a lot of these XID vacuum issues are being
massively improved by a patch that landed a few months ago.

[http://rhaas.blogspot.com/2016/03/no-more-full-table-
vacuums...](http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html)

"No More Full-Table Vacuums" \- Robert Haas

------
vinothgopi
I don't think I’ve ever read a tech blog post that was so gripping, with
twists and turns AND ends with a cliffhanger (until you click on the link to
part 2)

~~~
the_duke
Haha I agree.

For a second I was conserned about the level of my geekiness.

Then I moved on the the next HN article...

------
Femur
I wonder is there a compelling reason that Postgres does not use 64-bit XIDs?

~~~
saurik
There are four XIDs per tuple, so a 64-bit XID would make every row 32-bytes
larger. (That's a _lot_ larger.)

~~~
stickfigure
...and yet still reasonable for many applications...

------
ivoras
For the elucidation of the audience: one thing which I found out can kill
PostgreSQL (and luckily there are few of those) is temp table over-use.
Imagine a busy app where most page loads create and work on wide temp tables.
What isn't directly obvious is that temp tables, like regular tables, have
their metadata (attributes) stored in system catalogues, and the catalogues
themselves apparently are very much (if not identical) like regular tables. So
now, there's huge write traffic on system tables which by the nature of temp
tables generates huge numbers of dead records, as information on every field
of every temp table, etc. is created and deleted. I'm also pretty sure there's
some ugly locking also in the mix.

So imagine now that autovacuum was turned off...

The user tables in such a system can survive if they are not updated often,
with an occasional vacuum. However, the situation here was such that for every
GB of user data, another GB has accumulated in _dead records in system_
tables. Half of disk usage has gone to bloat in system tables!

It's easy to fix, of course, with a full vacuum of everything, if you have the
time. Or more quickly, a full vacuum of system tables should be very fast,
right? Right, except in that particular old version of PostgreSQL there was a
race condition bug with vacuuming system tables which locks up the entire
cluster with a possibility of data corruption for the db in question. Guess
what happened.

A restore from backup.

------
whack
Given the dangers of the XID wraparound, I wonder if it makes sense for
Postgres to auto-launch a Vacuum Freeze, or some minimal variant of it, when
it detects that a XID-wraparound is imminent.

~~~
pilif
It does do that. But it runs at very low priority, so it might take a while.

But on the other hand, it starts warning you, first just in the log file, then
on every query you issue way before it actually stops working, so if you're
not ignoring warnings in your logs / your client-code, there should be ample
time to react.

At least, there was for me when I first ran into this a few years ago.

------
hobs
200,000 tables in one database? What?

~~~
barrkel
If you have a lot of variable schema datasets that you need to be able to
index and query, you can do a lot worse than create a table per data set.

At my job, in our largest environment we only have 30,000 tables in our
biggest database, but that's only a year or two's worth of data and the rate
of increase is increasing.

~~~
overcast
Can you give me an example of what you're doing exactly? Rapidly expanding the
number of tables in a database doesn't make sense. Sounds like you need a
flexible document store, and not a relational database.

~~~
barrkel
Have looked at what's on the market, and the performance characteristics of
what's available isn't as good as a relational DB, in terms of queries that
join the rest of our schema, static data, only a handful of warm tables but
quick ramp-up time, etc. Specifically we don't want a cluster solution (too
expensive in terms of hardware) and our data is almost completely static once
it's loaded (so we don't need something that e.g. keeps stuff in memory to
optimize updates).

The closest is RethinkDB, but its memory requirements per table are too high.

If you have other suggestions for something that can live on a smallish box
(4G) yet support thousands of tables, fairly arbitrary queries, and joins
against a relational DB, I'm open to suggestions. Fast bulk updates and
inserts are a bonus; a relational DML is a requirement, since a good chunk of
our logic is in SQL, and much more readable for it.

~~~
marcosdumay
The idea of a flexible document store is not using thousands of tables.
Instead, you get a flexible "table" that can store a big number of different
data types.

For some extent, Postgres can do that. But I don't know the extent of
flexibility/performance you'll need.

~~~
barrkel
We have 10s to 100s of thousands of tables - literally, tables, with rows and
columns, each column of a consistent type, that need to be sortable and
filterable - with each table having low thousands to low millions of rows.

When browsing any one table, the entire table's contents - more or less -
needs to be sortable on any column and served up in a paginated fashion.

A single table with heterogeneous data, with lookup by ID or some simple key,
is quite far away from what we need. What we need looks almost exactly like an
RDBMS table. It even has foreign keys into tables that are much more
conventional.

------
audleman
This article mirrors my own experience with Postgres. When something goes
wrong, furiously research for 10 days, try a bunch of complex things, then
discover a simple command that gracefully fixes everything =)

------
crb002
Or just change XIDs to 64 bit...

~~~
colanderman
An ironic comment, given yesterday's popular article:
[https://news.ycombinator.com/item?id=12202735](https://news.ycombinator.com/item?id=12202735)

------
approachingtraj
The title is misleading. The post only talks about Amazon's RDS which is not
Postgres.

~~~
vinothgopi
Amazon RDS is the managed version of Postgres (among other flavors of
relational DBs). Maybe you are confusing it with Redshift.

[https://aws.amazon.com/rds/](https://aws.amazon.com/rds/)

~~~
rplst8
Isn't Redshift also based on PostgreSQL? Edit: it is:
[https://en.wikipedia.org/wiki/Amazon_Redshift](https://en.wikipedia.org/wiki/Amazon_Redshift)
and RDS supports many different RDBMS flavors:
[https://en.wikipedia.org/wiki/Amazon_Relational_Database_Ser...](https://en.wikipedia.org/wiki/Amazon_Relational_Database_Service)

~~~
heneryville
Redshift is based on PestgreSQL, but is vastly different. Redshift is a
columnar store, and can't do things like create indexes.
[http://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands...](http://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html)

