
Correctness Anomalies Under Serializable Isolation - evanweaver
http://dbmsmusings.blogspot.com/2019/06/correctness-anomalies-under.html
======
mjb
Part of the problem here is the confusion in terms between the databases world
and the distributed systems world. "Consistency" has very different meanings
in each, and "atomic" frequently has different practical meanings, for
example. These terms even differ in local systems: look at what the LLVM docs
and C++ spec imply about "atomic" (they use it to mean something close to
'linearizable'), while the database folks use a very different definition.
Words are hard.

I think a lot of people coming from the distributed world assume that
Serializable implies Strict Serializable, and don't think about the re-
orderings that databases are allowed to do that break linearizability. That
causes a lot of confusion, and even more confusion when databases become
distributed.

~~~
mehrdadn
Could you point to which part of the LLVM docs use "atomic" to mean
"linearizable"? (I'm not even sure what you mean by that, but if you mean
"sequentially consistent", that's explicitly its own subcategory under
atomicity in the docs... it's definitely not what they mean by atomicity in
general. [1])

[1] [https://llvm.org/docs/Atomics.html#optimization-outside-
atom...](https://llvm.org/docs/Atomics.html#optimization-outside-atomic)

~~~
mjb
Sorry, I think I was confusing LLVM's 'monotonic' with 'atomic'. The docs
([https://llvm.org/docs/LangRef.html#ordering](https://llvm.org/docs/LangRef.html#ordering))
say:

> In addition to the guarantees of unordered, there is a single total order
> for modifications by monotonic operations on each address. All modification
> orders must be compatible with the happens-before order. There is no
> guarantee that the modification orders can be combined to a global total
> order for the whole program (and this often will not be possible). The read
> in an atomic read-modify-write operation (cmpxchg and atomicrmw) reads the
> value in the modification order immediately before the value it writes.

Am I wrong to read that as being equivalent to linearizability, in the case of
atomic operations with monotonic ordering? I'm not an expert on compiler
internals, so it could be that I'm reading the docs wrong.

If there's a better reference, I'd love to see it.

~~~
mehrdadn
Could you define explicitly what you mean by linearizability?

~~~
mjb
The defininition from Herlihy and Wing
([https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf)),
page 471. More informally, aphyr's definition from
[https://aphyr.com/posts/313-strong-consistency-
models](https://aphyr.com/posts/313-strong-consistency-models) works well:

> each operation appears to take effect atomically at some point between its
> invocation and completion.

Here, operations are on single values (the paper calls them objects). So they
are things like "add 1 to x" and not things like "read y and add its value to
x".

~~~
mehrdadn
If that's what you mean then linearizable can't be the same as _monotonic_
since monotonicity is weaker than sequential consistency, not stronger.

If I understand LLVM's docs correctly, the difference between monotonicity and
sequential consistency is that the former is for a single address whereas the
latter is for all addresses. They would've been more clear on this if they had
said "that the modification orders _for different addresses_ can be combined
to a global total order for the whole program".

------
GauntletWizard
I feel like this article is using strange definitions - ones that may be
common, but are also incorrect. There's an example, about halfway through, of
a customer withdrawing $20 twice in different regions from the same bank
account. The definition of "serializable" that the author is using includes
the following flow:

    
    
        A         |         B
        Reads $50 |
                  |Reads $50
        Writes $30|
                  |Writes $30
    

This was a bug, and a huge one in either the application or the database. On
one hand, the application should have sent the SQL statements:

    
    
        A         |         B
        Acct =    |Acct =
        Acct - $20|Acct - $20
    

Or the whole thing should have been wrapped in a transaction, so no
interleaving of A and B would have been permitted.

Either is fine. If the database had not dealt with either, it would have been
incorrect.

Of course, even better, and actually a requirement of proper financial
systems: Write the whole thing as a ledger and archive the results. SQL
databases do actually make this easy, if you archive your Write-ahead-logs.
NoSql makes this easy to write and write verification tools for. The argument
about which is better belongs to implementers, and to fools - the fools who do
not verify correctness of their data regularly.

~~~
mjb
That's part of the point he's making, though. To reject this case in a
distributed database needs some additional concepts. From the article:

> Whichever transaction is second --- when it reads the balance --- it must
> read the value written by the first transaction. Attar et. al. named this
> guarantee “one copy serializability” or “1SR”, because the isolation
> guarantee is equivalent to serializability in an unreplicated system with
> “one copy” of every data item.

Your point about archiving logs assumes there is _one log_ of transactions,
which isn't true in general in distributed databases (although something
equivalent to one log needs to exist for 1SR).

