
Transaction ID Wraparound in Postgres - zeeg
http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html
======
JimNasby
I think the real takeaway here is that any part of your underlying technology
that you don't thoroughly understand puts you at risk, and that risk goes up
significantly in a large volume environment. It's frequently not possible to
have experts on-hand for _everything_ , but you need to make sure you have
them for your critical stuff. They don't necessarily have to be on your
payroll either; really good consulting companies can do a lot for you as well.
(Disclosure: I'm a partner at a consulting company).

When it comes to Postgres and data, I'll just say what I've been saying for
~20 years:

"All databases suck, each one just sucks differently."

~~~
mercurial
> "All databases suck, each one just sucks differently."

That's true for languages and frameworks as well.

------
nierman
Running more aggressive vacuums that will clean up "old" tuples/tables is good
for off-peak times. See Josh Berkus' short three part blog series on postgres'
vacuum/freeze: [http://www.databasesoup.com/2012/09/freezing-your-tuples-
off...](http://www.databasesoup.com/2012/09/freezing-your-tuples-off-
part-1.html)

and his python script for doing this: [https://github.com/pgexperts/flexible-
freeze](https://github.com/pgexperts/flexible-freeze)

Also, note that long running transactions can prevent cleanup of tuples. Look
for old xact_start values of non-idle queries in pg_stat_activity
(particularly "idle in transaction" connections) and old entries in
pg_prepared_xacts.

------
aristus
After all of that discussion of very Postgres-specific limitations, _they
blame SQL_? SQL is just a notation for relational algebra. It is not the
problem here. There are better, distributed, scalable, write-optimized
implementations of real SQL out there. Disclosure: I work for a company that
makes one.

~~~
the_mitsuhiko
Armin from Sentry here.

> they blame SQL?

We did not mean to blame SQL more PostgreSQL specifically. However in our case
though we want to build our infrastructure on Open Source components since we
are an Open Source project and provide on-premise installations in addition to
our cloud hosted environment, so we do not want to depend on commercial
databases.

------
mangeletti
Wow, this was a great read. Fellow Djangonaut / Pythonista here, and I really
enjoy reading David's blog posts. I've learned from and been inspired by him
since what seems like early 2011 or so. I also didn't know Armin Ronacher
worked for Sentry (re: his comment herein). What a cool company.

Also, great call on truncating that table. I've witnessed situations where
nobody has been willing to accept loss, snd the result is a conference call
that lasts until 4am and a lot of angry customers.

~~~
JimNasby
FWIW, I got the impression that they didn't truncate the table, but just
forced an update to it's relfrozenxid to advance the freeze age and allow the
database to restart. That means there's a risk that rows that were currently
visible would suddenly vanish as the XID advanced and their xmin became "the
future". Eventually record of those commits would vanish and presumably the
rows would then get removed (though I'm not sure on that without checking the
code).

------
kuroyi
I hit this issue all the time and it is never easy or fast to fix once
postgres stops accepting writes. If you have a write heavy postgres database,
the autovacuum settings are not sufficient. The entire database needs to be
vacuumed aggressively.

~~~
dinedal
> I hit this issue all the time

Curious to know your problem space and context? User generated content? Sensor
logs?

~~~
kuroyi
Essentially a large volume of sensor data. Tons of inserts. Some updates in
other tables.

------
EmielMols
Postgresql's MVCC model (copy complete row on any write, clean up old rows
some time in the future) might work decently for read-centric workloads, it's
unsuitable for write-heavy stuff.

We're running a setup that employs postgres for a write-heavy (but far from
write-only) application, and are experiencing similarly troubling vacuum jobs.
We're considering moving back to good old MySQL because of it. Anyone who's
been here before and has some insight?

~~~
the_mitsuhiko
> Anyone who's been here before and has some insight?

Armin from Sentry here. We talked a bit about where we could scale with
postgres and part of our problem is that we're quite write and also delete
heavy which means our vacuum takes very long. A potential option would be to
partition some of our time based tables and make a rolling window view over
it. Then we could throw entire tables away after the hold period and our
vacuums should be faster at least. We could also move some tables to another
database which might help, there however we then need to deal with higher
connection counts unfortunately.

Downside for us is that we also want to keep the setup straightforward for our
on-premise customers so we need to be careful with not introducing unnecessary
complexity.

~~~
anarazel
> We talked a bit about where we could scale with postgres and part of our
> problem is that we're quite write and also delete heavy which means our
> vacuum takes very long.

Is this still a problem after you tuned vacuum to be actually aggressive? It
sure isn't perfect but you can go a _long_ way with with just adjusting a
couple settings.

~~~
the_mitsuhiko
> Is this still a problem after you tuned vacuum to be actually aggressive?

I need to check how long vacuum takes at the moment, but I think it's in the
high hours.

------
hendzen
So the root issue here is that postgres is using a 32-bit transaction ID?

~~~
Kassandry
From their article:

"Additionally we had been running with far too much delay in vacuuming which
meant lower load on the system, but more idle time in maintenance."

So, vacuum wasn't given the resources to keep up with their load, and it's not
clear if they were supplementing with manual vacuums during quiet times. Nor
was it clear when they started reacting, as PostgreSQL (also outlined in their
documentation link), will start squawking about wraparound well before it
shuts down, or the monitoring they had in place for whether or not
autovacuum/vacuum was keeping up with the workload, or the number of locks
their application was taking (locks can block autovacuum).

Adjusting the autovacuum settings by table will give you finer control over
the postgresql.conf parameters to better match the workload for specific
tables, as well.

Partitioning would have also helped make the actively written part of the
table smaller, and older data could be vacuumed with VACUUM FREEZE, or deleted
later. There are extensions to help make that easier.

[https://github.com/keithf4/pg_partman](https://github.com/keithf4/pg_partman)

Josh Berkus of PgExperts has some great advice here:

[https://dba.stackexchange.com/questions/21068/aggressive-
aut...](https://dba.stackexchange.com/questions/21068/aggressive-autovacuum-
on-postgresql)

So does Jim Nasby, here:

[http://bluetreble.com/2014/10/postgres-mvcc-and-a-look-at-
va...](http://bluetreble.com/2014/10/postgres-mvcc-and-a-look-at-vacuuming/)

Regarding monitoring, check_postgres.pl would give them an idea that their
vacuuming settings needed adjustment sooner, or their application locking
needed to be adjusted to not block autovacuum.

[https://bucardo.org/check_postgres/check_postgres.pl.html#tx...](https://bucardo.org/check_postgres/check_postgres.pl.html#txn_wraparound)

[https://bucardo.org/check_postgres/check_postgres.pl.html#la...](https://bucardo.org/check_postgres/check_postgres.pl.html#last_autovacuum)

[https://bucardo.org/check_postgres/check_postgres.pl.html#lo...](https://bucardo.org/check_postgres/check_postgres.pl.html#locks)

It's easy to mention things after the fact, though, and it's good they got
things up and running again for their customers.

~~~
zeeg
David from Sentry here.

We actually knew about the problem with delay and had been working to improve
it. We were a couple days away from failing over to the new hardware (safely)
and unfortunately we didn't have any early warnings in the logs. I haven't yet
looked at why.

~~~
skehlet
Can you share what you had for your autovacuum_vacuum_cost_delay, and your
approximate rate of transactions?

~~~
zeeg
We had at it 50ms on the previous setup, though I wish we I knew why that
value was used. Likely it was a default with the Chef cookbook we forked off
of, or we read something that convinced us at the time it was a good idea.

------
marknadal
Disclosure: I'm the architect for a distributed database. We started building
our database because we had similar problems with not being able to accept
writes when our Master went down. This is never fun, but you guys seemed
pretty well prepared, staying true to your company's motto - thank you for
writing this article.

First off, if your data is primarily logs you should not have to rely upon a
primary Master. PostgreSQL isn't designed to be Masterless, which presents an
obvious barrier. If you set up a Master-Master database then any/all of your
servers could accept writes, even if another one is down. And since they are
logs, you are getting idempotent behavior for free anyways, as everything is
create/insert you don't have to worry about concurrent updates to the same
record - which is what Master based systems are designed for.

I actually DO NOT recommend our database
([http://gunDB.io/](http://gunDB.io/)) because we are not stable or production
ready in any way shape or form. But we're in our seed stage funding and trying
to spend the time exploring problem spaces and designing solutions for them.
If you ever have time to chat, please drop me a line at mark@gunDB.io so I can
learn from you.

