Naive question. Above comment suggests using bigserial as internal identifier an...

jpollock · on July 6, 2024

Each object has an external key and an internal key. This separation allows you to migrate to other layouts, technologies, etc. without breaking your customer's links or records.

Internally, your database looks like:

  User
    ID - uint128
    external_id - UUID (of some sort)
    name - string

  Post
    ID - uint128
    UserId - uint128 (User.ID)
    external_id - UUID
    ...

Then you have secondary indices on the external_id columns in both tables, and the UserId column on Post. You can then join from one to the other using the ID column.

e.g.

  SELECT count(*) FROM
   Post JOIN User 
   ON User.ID = Post.UserID
  WHERE 
   User.external_id = <%USERID_FROM_REQUEST%>;

  // Don't forget to bind USER_ID_FROM_REQUEST instead of 
  // concating the string, no Johny-Tables here!

There should be significant performance benefits from using int's for the joining keys (at least compared to strings), but my experience might be old.

dgb23 · on July 6, 2024

A uuid is just an uint128, or a byte array of the same size. A DB can use them as fixed size keys just fine, and they convert quickly from/into their string representaion.

The problem is indexing and generation.

For example sqlite already generates sequential ids whether you use them or not.

Non sequential uuids suck for indexing as the table grows. There are some benchmarks and explanations to look up.

The benefit is you can merge aggregates from different DB instances without a fuss. And you can generate uuids without asking your DB.

TeeWEE · on July 6, 2024

So it can’t use the internal id index, result: slow lookups for external ids.

bfung · on July 6, 2024

> secondary indices on the external_id columns in both tables

pick your poison, slower lookup or more disk usage

masklinn · on July 6, 2024

There are other fun ideas to explore e.g. if your external id is a hash you can rebuild, it might be an option to scavenge the full hash but keep the first few bytes of it, indexed. That increases lookup costs if the hash does not exist or has been scavenged but lowers storage.

In fact if your external ids are evenly distributed you can do that anyway, only index the few leading / trailing bytes of the id. That will increase lookup costs a bit but significantly decrease storage.

teddyh · on July 6, 2024

> no Johny-Tables here!

It’s “Bobby tables”: <https://xkcd.com/327/>