Hacker News new | comments | show | ask | jobs | submit login
Databases and Distributed Deadlocks: A FAQ (citusdata.com)
114 points by fdr on Aug 31, 2017 | hide | past | web | favorite | 9 comments

Reading this makes me wonder in which situations locking will actually help. If you have two distributed updates like "UPDATE table SET value = 6 WHERE key = 'world'" and "UPDATE table SET value = 5 WHERE key = 'world'", isn't there some kind of design (or usage) issue? Are immutable tables transformed with monads going to be a thing in distributed relational databases?

Consider UPDATEs that depend on each other:

    UPDATE table SET value = value + 2 WHERE key = 'world';
    UPDATE table SET value = value * 1.1 WHERE key = 'world';
When these UPDATEs are executed concurrently, PostgreSQL guarantees that they are serialised such that the value becomes either (value+2) * 1.1 or (value * 1.1)+2, i.e. there’s no lost update. To achieve this, PostgreSQL blocks the second update until the first one is committed using a row-level lock.

An alternative mechanism could have been to create a log of changes. The order in which the changes are added to the log then determines the order of execution. That way, we don’t have to block the UPDATE, right?

Unfortunately, things are a little more complicated in an RDBMS. Each of the UPDATEs could be part of a bigger transaction block that could (atomically) roll back. Moreover, the RDBMS provides consistency which means that the new value should be visible immediately after the UPDATE. However, the new value cannot be determined until all preceding UPDATEs have been committed or rolled back. So in the end, the UPDATE will have to block until all preceding transactions that modified the same rows are done and the new value can be computed, just like what would happen with row-level locks, except with more bookkeeping (and serialisation issues).

If you strip out some of the functionality of an RDBMS such as multi-statement transaction blocks or consistency, then you could potentially avoid locks, but Citus distributes PostgreSQL, which does provide this functionality.

This is definitely a challenge and something you have to address when building on top of Postgres because of its architecture. But I want to highly encourage people that you can avoid having to deal with this stuff if you approach it with CRDTs (go google them, they are fun!), like what with Cassandra does - (edit: disclaimer, I work for a competing system) I did this tech talk ( http://gun.js.org/distributed/matters.html ) a few years ago in Berlin on the things to avoid and the possible solutions (and their tradeoffs / dangers).

I'm actually very interested in CRDTs and related approaches, and I've been exploring active research directions in the area, but I'm not sure they share many usage patterns with data stores like Postgres, which support very general access and update semantics.

The primary restriction for anything involving CRDTs is that no operation can "take back" something added by a previous operation, so the data in question can only evolve monotonically. It's great when you can get away with it, but it's unclear how often you can actually get away with it.

Yes, very true. Kyle Kingsbury was at the conference and I talked to him about this, and his conclusion from his research is that Postgres is still all around the best database out there. Especially if you need to do things that CRDTs aren't yet equipped to handle.

Have you made a compilation of your thoughts from the various research you've done? Would love to see them!

Warning: Cassandra counters are not backed by CRDTs [1].

[1] https://aphyr.com/posts/294-jepsen-cassandra

Except they are. Catalogued as δ-CRDT, under Lexicographic Counter in https://arxiv.org/pdf/1603.01529.pdf

edit: Although to be fair that blog post is ancient and covers Cassandra 2.0, now no longer supported. Counters were rewritten in Cassandra 2.1.

It's generally good form here to put in a disclaimer when you're plugging your own product.

Please stop spamming this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact