
Databases and Distributed Deadlocks: A FAQ - fdr
https://www.citusdata.com/blog/2017/08/31/databases-and-distributed-deadlocks-a-faq/
======
burgerdev
Reading this makes me wonder in which situations locking will actually help.
If you have two distributed updates like "UPDATE table SET value = 6 WHERE key
= 'world'" and "UPDATE table SET value = 5 WHERE key = 'world'", isn't there
some kind of design (or usage) issue? Are immutable tables transformed with
monads going to be a thing in distributed relational databases?

~~~
mslot
Consider UPDATEs that depend on each other:

    
    
        UPDATE table SET value = value + 2 WHERE key = 'world';
        UPDATE table SET value = value * 1.1 WHERE key = 'world';
    

When these UPDATEs are executed concurrently, PostgreSQL guarantees that they
are serialised such that the value becomes either (value+2) * 1.1 or (value *
1.1)+2, i.e. there’s no lost update. To achieve this, PostgreSQL blocks the
second update until the first one is committed using a row-level lock.

An alternative mechanism could have been to create a log of changes. The order
in which the changes are added to the log then determines the order of
execution. That way, we don’t have to block the UPDATE, right?

Unfortunately, things are a little more complicated in an RDBMS. Each of the
UPDATEs could be part of a bigger transaction block that could (atomically)
roll back. Moreover, the RDBMS provides consistency which means that the new
value should be visible immediately after the UPDATE. However, the new value
cannot be determined until all preceding UPDATEs have been committed or rolled
back. So in the end, the UPDATE will have to block until all preceding
transactions that modified the same rows are done and the new value can be
computed, just like what would happen with row-level locks, except with more
bookkeeping (and serialisation issues).

If you strip out some of the functionality of an RDBMS such as multi-statement
transaction blocks or consistency, then you could potentially avoid locks, but
Citus distributes PostgreSQL, which does provide this functionality.

------
marknadal
This is definitely a challenge and something you have to address when building
on top of Postgres because of its architecture. But I want to highly encourage
people that you can avoid having to deal with this stuff if you approach it
with CRDTs (go google them, they are fun!), like what with Cassandra does -
(edit: disclaimer, I work for a competing system) I did this tech talk (
[http://gun.js.org/distributed/matters.html](http://gun.js.org/distributed/matters.html)
) a few years ago in Berlin on the things to avoid and the possible solutions
(and their tradeoffs / dangers).

~~~
Twisol
I'm actually very interested in CRDTs and related approaches, and I've been
exploring active research directions in the area, but I'm not sure they share
many usage patterns with data stores like Postgres, which support very general
access and update semantics.

The primary restriction for anything involving CRDTs is that no operation can
"take back" something added by a previous operation, so the data in question
can only evolve monotonically. It's great when you can get away with it, but
it's unclear how often you can actually get away with it.

~~~
marknadal
Yes, very true. Kyle Kingsbury was at the conference and I talked to him about
this, and his conclusion from his research is that Postgres is still all
around the best database out there. Especially if you need to do things that
CRDTs aren't yet equipped to handle.

Have you made a compilation of your thoughts from the various research you've
done? Would love to see them!

