There were patches however that kept an 'epoch' (the upper 32bit of a 64bit transaction id) on a page level. Plus some rewrite logic when transactions that are too far away from each other to be represented as an index from a base epoch are about to be present on one page. That'd allow to effectively have 64bit xids.
The in-development zheap storage engine basically does something roughly akin to that, removing the need to perform freezing when a table becomes older than ~2^31 - safety-window transactions.
The transaction id that the system internally has effectively already keeps track of of xids in a 64bit manner, albeit in a somewhat over-complicated manner by keeping track of an epoch separately (there's a patch likely to land in the next version to just go for 64bit there). That's why you can see e.g. txid_current() return 64bit transaction ids.
In most production cases, transactions in flight are going to affect a small amount of rows overall, so you can just keep the data in memory, and store it to disk in a separate table if it gets large.
It's only a tiny fraction of tuples which are recently committed and visibility rules come into play. That can be a special-cased slow-path
Also, in OLTP workload you often have a set of pretty hotly modified data, where you then actually very commonly access recently modified tuples and thus need to do visibility checks.
There's obviously systems with different visibility architectures (either by reducing the types of concurrency allowed, using page level information about recency of modification + something undo based), but given this post is about postgres, I fail to see what you're arguing about here.
That doesn’t hold true when working on multi-terabyte tables or databases when physical disk space in a chassis is actually a boundary you have to consider.
Often disk-space is less the issue, and it's more about whether things can fit in memory or not...
> Mandrill uses a sharded Postgres setup as one of our main datastores. On Sunday, February 3, at 10:30pm EST, 1 of our 5 physical Postgres instances saw a significant spike in writes. The spike in writes triggered a Transaction ID Wraparound issue. When this occurs, database activity is completely halted. The database sets itself in read-only mode until offline maintenance (known as vacuuming) can occur.
> The database is large—running the vacuum process takes a significant amount of time and resources, and there’s no clear way to track progress.
If you want to get the basic tl;dr which answers the headline: these IDs will last so long it’s almost not worth quantifying. This is an obvious calculation even if you assume ostentatatious performance requirements three orders of magnitude greater than the author’s:
2^64 / (86,000 * 1,000,000,000) = 213,503.9
The author didn’t talk about collisions, but those are worth mentioning because you could even confidently assign these randomly instead of incrementally. Since a collision will occur (in expectation) after 2^63 transactions, you shouldn’t even have to worry about a single one occuring (on average) for almost 300 years.
Of course, using 64-bit IDs comes with nontrivial space increase - every single tuple will increase by a factor of 2.
EDIT: Original collision estimate is wrong, see corrections. I took (2^n)/2 = 2^(n-1) as the birthday bound instead of 2^(n/2).
At the highest transaction isolation levels, you need to be able to perform a topological sort of the transaction dependency graph. That does require that the graph is acyclic, but doesn't preclude the DB engine from pretending a transaction that started later actually started earlier (or even had its first write chronologically earlier). For full transaction isolation, the DB engine just needs to be able to pretend transactions happened in a linear order, but that order isn't dictated by the chronological order of the first operation of each transaction. (It's not even constrained by the chronological order of the first write operation of each transaction in DBs that only track write-write conflicts.)
The easiest way to keep track of this consistency is some kind of monotonically increasing counter "timestamp" (or something like a vector of them in an asynchronous / distributed system ... Lamport vector clocks or similar), but this timestamp doesn't need to be identical to a transaction ID, and doesn't have to be unique. It's possible that most databases make unique transaction IDs moonlight as timestamps, but it's not a fundamental constraint.
A second large reasons why we'd not want to go for randomness is that we need to store data for each transaction id, namely whether it committed or not. If we'd assign them randomly we'd need to keep around a lot more of that data, and accesses would be a lot more expensive because there'd basically not be any locality.
Moving to 48-bit seems possible but like as the other HN discussions says supposedly other real world systems have wrapped around with that too.
It just runs `VACUUM FREEZE` when you schedule it (usually daily), starting with the tables closest to hitting a wraparound.
Or more generally:
t_years = doubling_time*(txid_bits - log( initial_tx_rate_per_year * doubling_time / log(2)) / log(2))