
Scaling out complex transactions in multi-tenant apps on Postgres - Vekz
https://www.citusdata.com/blog/2017/06/02/scaling-complex-sql-transactions/
======
necrodome
The modeling in the example seems weird:

\- Why do you need a user_id in todo_items if you already associate lists to
users in todo_lists table?

\- "If users have the same list ID..." Why would they?

~~~
mslot
Author here.

For the purpose of the article, adding a user ID column to the items table
primarily helps with distributing the data, since all of a user's lists and
items can be placed on the same node based on the value in the user ID column,
which enables transactions and efficient joins.

Distribution aside, it can be a good idea to separate user data by explicitly
adding a user ID column for performance and data integrity reasons. It's
difficult/inefficient to guarantee inside the database that item IDs aren't
reused (e.g. due to bugs in the application, serial overflow). For row-level
security [1], it's also required that all tables have a user column.

Always adding a filter or join by user ID to your queries also allows for more
efficient query execution. You often access many or all items of a user at
once, in which case it's more efficient for the database to use an index on
user ID than doing multiple lookups using an index on list ID alone (this
applies generally for multi-tenant apps).

Finally, you might have to scale out one day :).

[1] [https://www.postgresql.org/docs/current/static/ddl-
rowsecuri...](https://www.postgresql.org/docs/current/static/ddl-
rowsecurity.html)

~~~
sb8244
Second this. I used to keep tenant ID normalized but have found safety and
performance in denormalizing it. It's not super fun to do, either, when you're
at the point of needing to.

~~~
aidos
This is something I've ummmed and ahhhed about before. You think it's worth
doing?

------
elvinyung
Is support for multi-shard transactions on the roadmap?

~~~
ozgune
(Ozgun from Citus)

TLDR: We're thinking about multi-shard transactions and we'll share more
updates as our timelines become concrete. In the 7.0 release cycle, we're
looking at prototyping work on distributed deadlock detection, a prerequisite
for multi-shard transactions.

As a quick question, we find that multi-shard transactions fall into multiple
categories. Which one did you have in mind?

1\. An UPDATE / DELETE over multiple shards

2\. BEGIN; tx on one shard from distributed table; tx on another shard; COMMIT

3\. BEGIN; tx on distributed table; tx on reference table; COMMIT

We currently offer master_modify_multiple_shards() or Postgres' 2PC as
potential workarounds, and know that there's definitely more work to be done
in this area.

