Hacker News new | past | comments | ask | show | jobs | submit login

You need to access that data from different connections, so it needs to be correctly locked etc. Looking purely at the tuple you need to know where to look for the tuple visibility information. Accessing data stored in some datastructure off to the side will also have drastically worse cache locality then just storing it alongside with the data. E.g. for a sequential scan these checks need to be done for every tuple, so they really need to be cheap.

Except in most usecases, the tuple is old enough that it was committed long ago and visible to everyone.

It's only a tiny fraction of tuples which are recently committed and visibility rules come into play. That can be a special-cased slow-path

In a system like postgres' current heap you cannot know whether it was committed long ago, without actually looking in that side table (or modifying the page the one time you do, to set a hint bit). You pretty fundamentally need something like the transactionid to do so.

Also, in OLTP workload you often have a set of pretty hotly modified data, where you then actually very commonly access recently modified tuples and thus need to do visibility checks.

There's obviously systems with different visibility architectures (either by reducing the types of concurrency allowed, using page level information about recency of modification + something undo based), but given this post is about postgres, I fail to see what you're arguing about here.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact