Hacker News new | comments | show | ask | jobs | submit login

Author here.

For the purpose of the article, adding a user ID column to the items table primarily helps with distributing the data, since all of a user's lists and items can be placed on the same node based on the value in the user ID column, which enables transactions and efficient joins.

Distribution aside, it can be a good idea to separate user data by explicitly adding a user ID column for performance and data integrity reasons. It's difficult/inefficient to guarantee inside the database that item IDs aren't reused (e.g. due to bugs in the application, serial overflow). For row-level security [1], it's also required that all tables have a user column.

Always adding a filter or join by user ID to your queries also allows for more efficient query execution. You often access many or all items of a user at once, in which case it's more efficient for the database to use an index on user ID than doing multiple lookups using an index on list ID alone (this applies generally for multi-tenant apps).

Finally, you might have to scale out one day :).

[1] https://www.postgresql.org/docs/current/static/ddl-rowsecuri...




Second this. I used to keep tenant ID normalized but have found safety and performance in denormalizing it. It's not super fun to do, either, when you're at the point of needing to.


This is something I've ummmed and ahhhed about before. You think it's worth doing?


Somehow I assumed, once you create a distributed table, subsequent queries involving the distributed keys will be magically resolved for distribution.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: