
How Citus Executes Distributed Transactions on Postgres - ibotty
https://www.citusdata.com/blog/2017/11/22/how-citus-executes-distributed-transactions/
======
mattb314
Does anyone know why they use deadlock detection rather than deadlock
avoidance (the simplest avoidance algorithm being simply to always acquire
locks in the same order)? I can't find a good reference right now, but I was
under the impression that deadlock detection, especially distributed
detection, is extremely costly and basically precludes high contention
workloads, but maybe I'm confused here.

The only two reasons I could think of were 1. You can't know all the locks you
want in advance (because the postgres api doesn't require it), or 2. Citus
doesn't expect customers to have high contention workloads where deadlocks are
likely (or at least they thing they can be reduced primarily to single node
deadlocks).

Am I overestimating the cost of deadlock detection?

~~~
ozgune
It's actually both of these reasons, with reason (1) being the primary one.

PostgreSQL allows interactive transaction blocks (meaning you don't have to
submit all commands within a transaction block upfront). Citus extends
Postgres and needs to provide the same semantics.

That said, we regularly evaluate different techniques on distributed deadlock
detection and avoidance. We have an FAQ that discusses deadlock avoidance
methods in the context of Postgres. In the link below, the last question on
"How can a distributed database prevent distributed deadlocks?" provides more
detail:

[https://www.citusdata.com/blog/2017/08/31/databases-and-
dist...](https://www.citusdata.com/blog/2017/08/31/databases-and-distributed-
deadlocks-a-faq/)

~~~
infogulch
Does (can?) Citus optimize the case where the whole batch is visible up front?

~~~
mslot
Not practically, except when it is a single-statement transaction.

Where deadlock prevention becomes useful/necessary is single UPDATE/DELETE
statements that span across multiple nodes. When those are executed in
parallel they could deadlock against each other due to non-deterministic
execution order.

Citus currently uses predicate locks to avoid these deadlocks, but there's
probably some room for improvement there. On the other hand, for Citus use
cases UPDATE and DELETE across shards are mainly batch operations (e.g.,
delete old data), so there's not a strong need for it yet.

------
jinqueeny
TiDB and CockroachDB also support distributed transactions. The transaction
model in TiDB
([https://github.com/pingcap/tidb](https://github.com/pingcap/tidb)) is
inspired by Google Percolator, It’s mainly a two-phase commit protocol with
some optimizations. More info on the blog:
[https://github.com/pingcap/blog/blob/master/_posts/2016-10-1...](https://github.com/pingcap/blog/blob/master/_posts/2016-10-17-how-
we-build-tidb.md#transaction)

------
curiousDog
Sounds like they're using the idea described in this Jim Gray, Lamport paper:
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159.6749&rep=rep1&type=pdf)

Azure SQL DB has had the same HA coordinator built in for a couple of years
now in their distributed transactions. And so has Spanner.

~~~
platform
and so has MarkLogic, I think since 2003 (may be a bit later)

\---

[http://cdn.marklogic.com/wp-content/uploads/2016/09/ACID-
Tra...](http://cdn.marklogic.com/wp-content/uploads/2016/09/ACID-Transactions-
Datasheet.pdf)

------
IamHWengineer
So Citus is now ACID compliant? Do you guys have a whitepaper where you
describe how you proved ACID compliance?

Thanks

------
conqrr
Are you guys hiring? Wondering what kind of experience you require. I am a
backend engineer with experience using these tools but not much building them
but would love to work on such things.

~~~
ozgune
Yes, we certainly are:
[https://www.citusdata.com/jobs](https://www.citusdata.com/jobs)

When you're sharing your resume, please feel free to mention this thread.

