Correctness Anomalies Under Serializable Isolation

mjb · on July 8, 2019

Part of the problem here is the confusion in terms between the databases world and the distributed systems world. "Consistency" has very different meanings in each, and "atomic" frequently has different practical meanings, for example. These terms even differ in local systems: look at what the LLVM docs and C++ spec imply about "atomic" (they use it to mean something close to 'linearizable'), while the database folks use a very different definition. Words are hard.

I think a lot of people coming from the distributed world assume that Serializable implies Strict Serializable, and don't think about the re-orderings that databases are allowed to do that break linearizability. That causes a lot of confusion, and even more confusion when databases become distributed.

dataflow · on July 8, 2019

Could you point to which part of the LLVM docs use "atomic" to mean "linearizable"? (I'm not even sure what you mean by that, but if you mean "sequentially consistent", that's explicitly its own subcategory under atomicity in the docs... it's definitely not what they mean by atomicity in general. [1])

[1] https://llvm.org/docs/Atomics.html#optimization-outside-atom...

mjb · on July 9, 2019

Sorry, I think I was confusing LLVM's 'monotonic' with 'atomic'. The docs (https://llvm.org/docs/LangRef.html#ordering) say:

> In addition to the guarantees of unordered, there is a single total order for modifications by monotonic operations on each address. All modification orders must be compatible with the happens-before order. There is no guarantee that the modification orders can be combined to a global total order for the whole program (and this often will not be possible). The read in an atomic read-modify-write operation (cmpxchg and atomicrmw) reads the value in the modification order immediately before the value it writes.

Am I wrong to read that as being equivalent to linearizability, in the case of atomic operations with monotonic ordering? I'm not an expert on compiler internals, so it could be that I'm reading the docs wrong.

If there's a better reference, I'd love to see it.

dataflow · on July 9, 2019

Could you define explicitly what you mean by linearizability?

mjb · on July 9, 2019

The defininition from Herlihy and Wing (https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf), page 471. More informally, aphyr's definition from https://aphyr.com/posts/313-strong-consistency-models works well:

> each operation appears to take effect atomically at some point between its invocation and completion.

Here, operations are on single values (the paper calls them objects). So they are things like "add 1 to x" and not things like "read y and add its value to x".

dataflow · on July 9, 2019

If that's what you mean then linearizable can't be the same as monotonic since monotonicity is weaker than sequential consistency, not stronger.

If I understand LLVM's docs correctly, the difference between monotonicity and sequential consistency is that the former is for a single address whereas the latter is for all addresses. They would've been more clear on this if they had said "that the modification orders for different addresses can be combined to a global total order for the whole program".

GauntletWizard · on July 8, 2019

I feel like this article is using strange definitions - ones that may be common, but are also incorrect. There's an example, about halfway through, of a customer withdrawing $20 twice in different regions from the same bank account. The definition of "serializable" that the author is using includes the following flow:

    A         |         B
    Reads $50 |
              |Reads $50
    Writes $30|
              |Writes $30

This was a bug, and a huge one in either the application or the database. On one hand, the application should have sent the SQL statements:

    A         |         B
    Acct =    |Acct =
    Acct - $20|Acct - $20

Or the whole thing should have been wrapped in a transaction, so no interleaving of A and B would have been permitted.

Either is fine. If the database had not dealt with either, it would have been incorrect.

Of course, even better, and actually a requirement of proper financial systems: Write the whole thing as a ledger and archive the results. SQL databases do actually make this easy, if you archive your Write-ahead-logs. NoSql makes this easy to write and write verification tools for. The argument about which is better belongs to implementers, and to fools - the fools who do not verify correctness of their data regularly.

mjb · on July 8, 2019

That's part of the point he's making, though. To reject this case in a distributed database needs some additional concepts. From the article:

> Whichever transaction is second --- when it reads the balance --- it must read the value written by the first transaction. Attar et. al. named this guarantee “one copy serializability” or “1SR”, because the isolation guarantee is equivalent to serializability in an unreplicated system with “one copy” of every data item.

Your point about archiving logs assumes there is one log of transactions, which isn't true in general in distributed databases (although something equivalent to one log needs to exist for 1SR).