Naive question. Above comment suggests using bigserial as internal identifier and uuid as public facing ID.
Now let's say there's a user table and post table. Both will have only uuid available in the APIs. So every time API requests a post of the user or user of the post, we will find the the relevant row using uuid right? Since uuid will be sent by the public facing APIs?
How would bigserial be used here?
I don't know much databases in general BTW..
Edit: Can I use uuidv4 as PK since it doesn't leak timestamp data? Why bigserial + uuid7 is better?
Each object has an external key and an internal key. This separation allows you to migrate to other layouts, technologies, etc. without breaking your customer's links or records.
Internally, your database looks like:
User
ID - uint128
external_id - UUID (of some sort)
name - string
Post
ID - uint128
UserId - uint128 (User.ID)
external_id - UUID
...
Then you have secondary indices on the external_id columns in both tables, and the UserId column on Post. You can then join from one to the other using the ID column.
e.g.
SELECT count(*) FROM
Post JOIN User
ON User.ID = Post.UserID
WHERE
User.external_id = <%USERID_FROM_REQUEST%>;
// Don't forget to bind USER_ID_FROM_REQUEST instead of
// concating the string, no Johny-Tables here!
There should be significant performance benefits from using int's for the joining keys (at least compared to strings), but my experience might be old.
A uuid is just an uint128, or a byte array of the same size. A DB can use them as fixed size keys just fine, and they convert quickly from/into their string representaion.
The problem is indexing and generation.
For example sqlite already generates sequential ids whether you use them or not.
Non sequential uuids suck for indexing as the table grows. There are some benchmarks and explanations to look up.
The benefit is you can merge aggregates from different DB instances without a fuss. And you can generate uuids without asking your DB.
There are other fun ideas to explore e.g. if your external id is a hash you can rebuild, it might be an option to scavenge the full hash but keep the first few bytes of it, indexed. That increases lookup costs if the hash does not exist or has been scavenged but lowers storage.
In fact if your external ids are evenly distributed you can do that anyway, only index the few leading / trailing bytes of the id. That will increase lookup costs a bit but significantly decrease storage.
I don't know much databases in general BTW..
Edit: Can I use uuidv4 as PK since it doesn't leak timestamp data? Why bigserial + uuid7 is better?