An important detail which this guide leaves out: The "actual location of the row" is usually the leaf node of another B-Tree. That is the primary B-Tree and it's indexed by the primary key.
The major consequence is every non-primary index query involves dereferencing N pointers, which could mean loading N pages from disk/SSD to get leaf nodes spread out across the primary tree. Whereas if you query a range in the primary B-Tree directly, the rows are consecutive so you would only load N/M consecutive pages based on the ratio of rowsize to pagesize.
That's why some people use composite keys for primary key, to get better data locality in the primary B-Tree index.
See "How We've Scaled Dropbox"[1] to hear the same thing.
At 48:36 they explain why they changed from PRIMARY KEY (id) to PRIMARY KEY (ns_id, latest, id). ns_id is kinda like user_id. So that change groups journal entries for users together on disk
Specifically. PRIMARY KEY (id) orders things by creation date whereas PRIMARY KEY (ns_id, latest, id) orders things by ns_id primarily.
You're right. Postgres doesn't give any control over primary data locality. That might cause querying 1 row a bit faster in Postgres (no log(N) traversal of the primary B-tree index) but picking out N rows could be a lot slower.
The major consequence is every non-primary index query involves dereferencing N pointers, which could mean loading N pages from disk/SSD to get leaf nodes spread out across the primary tree. Whereas if you query a range in the primary B-Tree directly, the rows are consecutive so you would only load N/M consecutive pages based on the ratio of rowsize to pagesize.
That's why some people use composite keys for primary key, to get better data locality in the primary B-Tree index.
See "How We've Scaled Dropbox"[1] to hear the same thing.
At 48:36 they explain why they changed from PRIMARY KEY (id) to PRIMARY KEY (ns_id, latest, id). ns_id is kinda like user_id. So that change groups journal entries for users together on disk
Specifically. PRIMARY KEY (id) orders things by creation date whereas PRIMARY KEY (ns_id, latest, id) orders things by ns_id primarily.
[1]: https://youtu.be/PE4gwstWhmc?t=2914