
How long will a 64 bit Transaction-ID last in PostgreSQL? - vinayan3
https://andreas.scherbaum.la/blog/archives/970-How-long-will-a-64-bit-Transaction-ID-last-in-PostgreSQL.html
======
anarazel
There's no chance we go for 64bit transaction ids on the tuples themselves -
the space increase would be far too big. The overhead of tuple headers is
already a problem, and xmin/xmax are a significant portion of that.

There were patches however that kept an 'epoch' (the upper 32bit of a 64bit
transaction id) on a page level. Plus some rewrite logic when transactions
that are too far away from each other to be represented as an index from a
base epoch are about to be present on one page. That'd allow to effectively
have 64bit xids.

The in-development zheap storage engine basically does something roughly akin
to that, removing the need to perform freezing when a table becomes older than
~2^31 - safety-window transactions.

The transaction id that the system internally has effectively already keeps
track of of xids in a 64bit manner, albeit in a somewhat over-complicated
manner by keeping track of an epoch separately (there's a patch likely to land
in the next version to just go for 64bit there). That's why you can see e.g.
txid_current() return 64bit transaction ids.

~~~
devit
Why write the transaction ids in the tuples at all?

In most production cases, transactions in flight are going to affect a small
amount of rows overall, so you can just keep the data in memory, and store it
to disk in a separate table if it gets large.

~~~
anarazel
You need to access that data from different connections, so it needs to be
correctly locked etc. Looking purely at the tuple you need to know where to
look for the tuple visibility information. Accessing data stored in some
datastructure off to the side will also have drastically worse cache locality
then just storing it alongside with the data. E.g. for a sequential scan these
checks need to be done for every tuple, so they really need to be cheap.

~~~
londons_explore
Except in most usecases, the tuple is old enough that it was committed long
ago and visible to everyone.

It's only a tiny fraction of tuples which are recently committed and
visibility rules come into play. That can be a special-cased slow-path

~~~
anarazel
In a system like postgres' current heap you cannot know whether it was
committed long ago, without actually looking in that side table (or modifying
the page the one time you do, to set a hint bit). You pretty fundamentally
need something like the transactionid to do so.

Also, in OLTP workload you often have a set of pretty hotly modified data,
where you then actually very commonly access recently modified tuples and thus
need to do visibility checks.

There's obviously systems with different visibility architectures (either by
reducing the types of concurrency allowed, using page level information about
recency of modification + something undo based), but given this post is about
postgres, I fail to see what you're arguing about here.

------
amarshall
This seems to be (with coincidental timing) the cause of Mandrill’s current
outage [1]:

> Mandrill uses a sharded Postgres setup as one of our main datastores. On
> Sunday, February 3, at 10:30pm EST, 1 of our 5 physical Postgres instances
> saw a significant spike in writes. The spike in writes triggered a
> Transaction ID Wraparound issue. When this occurs, database activity is
> completely halted. The database sets itself in read-only mode until offline
> maintenance (known as vacuuming) can occur.

> The database is large—running the vacuum process takes a significant amount
> of time and resources, and there’s no clear way to track progress.

[1]
[https://news.ycombinator.com/item?id=19084525](https://news.ycombinator.com/item?id=19084525)

~~~
ldng
It is probably too late, for later reference, when this talk is up, check it
out, it has some answers.

[https://fosdem.org/2019/schedule/event/breaking_postgresql_o...](https://fosdem.org/2019/schedule/event/breaking_postgresql_on_scale/)

------
throwawaymath
The author talks a bit about the architecture of PostgreSQL transactions,
touching on lazy transaction ID consumption and vacuuming. Notably, writes
require IDs but reads do not. So this is focused on write-optimized workloads.

If you want to get the basic tl;dr which answers the headline: these IDs will
last so long it’s almost not worth quantifying. This is an obvious calculation
even if you assume ostentatatious performance requirements three orders of
magnitude greater than the author’s:

    
    
        2^64 / (86,000 * 1,000,000,000) = 213,503.9
    

The author uses 1,000,000 writes/second; I prefer 1,000,000,000 since it’s
more ridiculous. There are 86,000 seconds in a day. It will take you the
better part of a millenium to exhaust those IDs, assuming you consume an
average of one billion every single second.

The author didn’t talk about collisions, but those are worth mentioning
because you could even confidently assign these randomly instead of
incrementally. Since a collision will occur (in expectation) after 2^63
transactions, you shouldn’t even have to worry about a single one occuring (on
average) for almost 300 years.

Of course, using 64-bit IDs comes with nontrivial space increase - every
single tuple will increase by a factor of 2.

EDIT: Original collision estimate is wrong, see corrections. I took (2^n)/2 =
2^(n-1) as the birthday bound instead of 2^(n/2).

~~~
simcop2387
Actually you'd expect a collision with 50% probability after only a much
smaller fraction of the 2^64 space. This would be the birthday paradox, and
unfortunately I can't find a calculator or software at the moment that can
handle 2^64 power factorial to calculate it properly.

~~~
andreareina
Square rooting will get you in the proper ballpark. I imagine that's why UUIDs
are 128-bit values.

~~~
garmaine
Correct on both counts.

------
kostaw
Just imagine running `VACUUM` on that table that wrote 1M rows/seconds for 300
years and now you need to vaccuum quick because the transaction ids will wrap
around next year...

~~~
Aeolun
Unless I’m missing something, you’d still have some 299700 years for your
VACUUM to complete.

------
aaronbwebber
I highly recommend using flexible-freeze if you run Postgres in production -
does not take very much effort to set up and almost certainly will help you
avoid issues with txnid wraparound:

[https://github.com/pgexperts/flexible-
freeze](https://github.com/pgexperts/flexible-freeze)

It just runs `VACUUM FREEZE` when you schedule it (usually daily), starting
with the tables closest to hitting a wraparound.

------
Thorrez
What if we interpret Moore's law to say that transaction speed will double
every 2 years?

~~~
ants_a
15 years for 1B/tx/s, 35 years for 1M/tx/s.

Or more generally:

    
    
        t_years = doubling_time*(txid_bits - log( initial_tx_rate_per_year * doubling_time / log(2)) / log(2))

------
hyperman1
What I don't understand is how this only affects postgress. How do
db2/mssql/oracle handle mvcc? Is it superior or is it a case of trade offs?
Supposing the answer is publicly available.

------
xurukefi
I guess people had similar arguments when designing IPv4.

