
Hermitage: Testing the “I” in ACID - martinkl
http://martin.kleppmann.com/2014/11/25/hermitage-testing-the-i-in-acid.html
======
Animats
A key concept here is that real databases have transactions, there's some
concurrency between transactions, and that in the more efficient modes,
transactions can deadlock, fail, and be rolled back.

In MySQL, a statement within a transaction can return "ERROR 1213 (40001):
Deadlock found when trying to get lock; try restarting transaction". This can
be forced by the following sequence of events. (This assumes an InnoDB table
in Repeatable Read mode.)

    
    
        Process A does a START TRANSACTION.
        Process B does a START TRANSACTION.
        Process A does a SELECT which reads row X.
        Process B does a SELECT which reads row X.  
        Process A does an UPDATE which writes row X.
        Process B does an UPDATE which writes row X.  - Deadlock error.
    

Process B gets a report that the transaction failed, and everything done in
B's transaction is rolled back. The entire transaction has to be retried.
Transactions are atomic if they commit, but can fail in a deadlock situation.

A SELECT does lock parts of the database. "Repeatable read" means that if you
read the same data item twice within the same transaction, you get the same
result, even if someone else is changing the data. This requires locking. If
you try to update the data in conflict with another process, you'll get a
deadlock error, but if you COMMIT a select-only transaction, you won't. You do
have to COMMIT select-only transactions, or you'll fill memory with locks and
stall out updates.

Ref: [http://dev.mysql.com/doc/refman/5.1/en/innodb-lock-
modes.htm...](http://dev.mysql.com/doc/refman/5.1/en/innodb-lock-modes.html)

~~~
martinkl
What you describe is the Lost Update (P4) anomaly. It's in the test suite:
[https://github.com/ept/hermitage/blob/master/mysql.md#lost-u...](https://github.com/ept/hermitage/blob/master/mysql.md#lost-
update-p4)

MySQL in repeatable read mode actually doesn't prevent this anomaly (it
doesn't deadlock). You need to be in serializable mode to get the deadlock.

~~~
Animats
You're right. MS SQL Server does reject that on repeatable read, but MySQL
does not.

Should MySQL have the same behavior? It detects the problem and blocks, but
then does the update when the other transaction finishes, losing one update. I
tried this for different values of "value" in each process, and it still
fails. (The test sets the same value from each process, so you can't see who
wins the race or if the database treated the update as a no-change
transaction.)

~~~
martinkl
I don't know what behavior it "should" have. IMHO there's scope for different
databases to implement things differently — otherwise there would be no room
for innovation. The important thing is just that we understand precisely which
guarantees we're getting and which we're not, so that we can write
applications which behave correctly under a given isolation level. And that's
the whole point of Hermitage.

~~~
Animats
That's not "implementing differently" or "innovation". That's very differing
semantics for the same named mode of operation.

Since you have a test suite, try getting that to the SQL committee for the
next revision of the standard.

------
jeffdavis
"...able withdraw more money than they had in their account...Most so-called
ACID databases — for example Postgres, MySQL, Oracle or MS SQL Server — would
not have prevented this race condition in their default configuration."

The author didn't really show that this was true. He obviously understands
isolation well, so I would tend to believe him, but it would be nice to see an
example.

In a simple case (not sure that it matches the exchange's case), postgresql in
any configuration will prevent this problem:

    
    
        CREATE TABLE account(
          id int8,
          balance numeric, check (balance >= 0)
        );
    

No matter what concurrent activity you have going on, it's impossible (as far
as I know) to end up seeing a balance less than 0. That's actually true in
_any_ isolation mode that postgres supports (read committed, snapshot
isolation, and truly serializable).

Does someone have a counterexample, or more details about the case at the
exchange that would not be solved by postgres?

I'd also like to point out that the postgres's implementation of true
serializability performs quite well and there isn't much of a cost to using it
over snapshot isolation.

~~~
jrapdx3
The key phrase is the line quoted is "default configuration". According to
PostgreSQL docs, the default isolation level is "READ COMMITTED".

Your example check constraint indeed appears to work as advertised. Trying

    
    
        UPDATE account SET balance = -1.00 where id = 1;
    

gives an ERROR message re: new row for relation "account" violates check
constraint ...

Still it's not clear exactly what the author sees re: where postgres fails.
I'll have to read the article again. With any luck someone more knowledgeable
than I am will shed light on the question.

~~~
jasonwatkinspdx
In short: prior to version 9.2, postgres's "serializable" level still allowed
some transaction patterns that would be rejected by a strictly serial
execution. Later versions prevent it.

This paper:
[http://dl.acm.org/citation.cfm?id=1376690](http://dl.acm.org/citation.cfm?id=1376690)
outlines the general issue. This paper discusses Postgres's solution:
[http://drkp.net/papers/ssi-vldb12.pdf](http://drkp.net/papers/ssi-vldb12.pdf)

------
jrullmann
Very cool - would love to see tests of FoundationDB!

FYI I'm an engineer at FDB, happy to help.

~~~
martinkl
Awesome, I'd love a pull request! I've been looking at FoundationDB, but
haven't had time to test it. Porting the tests to another database is
(hopefully) a mostly mechanical exercise.

~~~
wwilson
Here's one way you could run this test vs. FoundationDB:

[https://gist.github.com/MMcM/f00108c5943e919f73d1](https://gist.github.com/MMcM/f00108c5943e919f73d1)

------
wpietri
Interesting! One minor quibble:

> The idea of isolation is that we want our database to be able to process
> several transactions at the same time (otherwise it would be terribly slow)

Not necessarily true. Things like Prevayler and LMAX provide isolation by
processing transactions one at a time, and they're very fast. They manage this
by keeping everything relevant hot in RAM. LMAX, for example, can do 6 million
TPS for a financial trading platform. You can read Martin Fowler writing about
LMAX here:
[http://martinfowler.com/articles/lmax.html](http://martinfowler.com/articles/lmax.html)

------
a-priori
In case it's interesting to people, here's a blog post I made earlier this
year on the topic of what exactly transaction isolation means:

[http://www.michaelmelanson.net/2014/03/20/transactions/](http://www.michaelmelanson.net/2014/03/20/transactions/)

~~~
martinkl
Nice example of a bug caused by weak isolation. FWIW, Postgres has an
interesting implementation of "serializable" which takes far fewer locks than
MySQL, so may give you better performance while retaining the same isolation
level.

~~~
fdr
Yes: it is the relatively new approach by Cahill et al in 2008, known as SSI:
[https://courses.cs.washington.edu/courses/cse444/08au/544M/R...](https://courses.cs.washington.edu/courses/cse444/08au/544M/READING-
LIST/fekete-sigmod2008.pdf)

Which, as far as production-common database implementations go, is lightspeed
for implementing new academic work (9.1, the first version with the feature,
was released in 2011).

------
fintler
I wonder what approach you could use for global safe timestamp snapshot
transaction checking.

[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41344.pdf)

------
jeffdavis
I strongly rebut the following claim, which is central to the article:

"Internet commenters, in their infinite wisdom, were quick to point out that
if you’re dealing with money, you had better use an ACID database. But there
was a major flaw in their argument. Most so-called ACID databases — for
example Postgres, MySQL, Oracle or MS SQL Server — would not have prevented
this race condition in their default configuration."

I hesitate because I don't really understand the details of the situation the
exchange faced. But, going by the linked references here:

[https://bitcointalk.org/index.php?topic=499580](https://bitcointalk.org/index.php?topic=499580)
[http://www.reddit.com/r/Bitcoin/comments/1wtbiu/how_i_stole_...](http://www.reddit.com/r/Bitcoin/comments/1wtbiu/how_i_stole_roughly_100_btc_from_an_exchange_and/)

it appears that the pattern in question, if translated very unnaturally to
SQL, is something like:

    
    
        CREATE TABLE account(id int8, balance numeric);
        ...
        BEGIN;
        SELECT balance FROM account WHERE id = 123;
        -- application sees 100, subtracts 90, sees that
        -- it's still positive and does:
        UPDATE account SET balance = 10 WHERE id = 123;
        COMMIT;
    

Technically speaking, running that in postgres in the default configuration
(read committed) is prone to a race, and you'd need to use repeatable read or
serializable mode to protect you.

But that's ridiculous. Anyone using SQL would instead do:

    
    
        CREATE TABLE account(id int8, balance numeric, check(balance >= 0));
        ...
        UPDATE account SET balance = balance - 90 WHERE id = 123;
    

And that is _not_ prone to a race. Try it in any version of postgres, in any
configuration. You can't get double withdrawls (where only one takes effect),
and you can't get it to go below zero.

So, the author is _technically_ right: (a) if you translate the NoSQL-isms
into SQL in an unnatural way; _and_ (b) don't bother to use SERIALIZABLE mode,
which costs very little in most situations.

I agree with the author that isolation is tricky, and developers should not be
expected to understand the nuances. And I applaud the development of a testing
framework to really understand the various kinds of isolation and how they
apply to different products. But the example is a bad one, because it actually
does work just fine in postgres, and probably many other systems.

~~~
martinkl
Your first ("very unnatural") example is what I had in mind. And since read
committed is the default isolation level in most RDBMS, it is prone to the
lost update anomaly. (In MySQL, repeatable read is the default, but its
implementation of repeatable read doesn't prevent lost updates.) Note I did
point out that I'm referring to the default configuration, not the strongest
supported isolation level.

Although your second example is probably what a human would write, an ORM
framework would very likely generate a transaction looking like your first
example.

Another example would be inserting a transaction into a table, and summing the
transactions in the account in order to calculate the account balance. Making
that safe requires preventing phantom reads, which means requiring
serializability.

You say serializable costs very little in most situations. I can't claim to
know what most situations are like, but all I know is that I've seen many
people who have tried serializable and found it too slow for them. User
a-priori gives an example elsewhere on this thread:
[http://www.michaelmelanson.net/2014/03/20/transactions/](http://www.michaelmelanson.net/2014/03/20/transactions/)

My point is that weak isolation is very subtle, easy to get wrong, and you
don't know that you got it wrong until it's too late. We need better
understanding and better tools so that concurrency is less easy to screw up.

~~~
jeffdavis
Your example of poor SERIALIZABLE performance is from a MySQL user. Have you
tried SERIALIZABLE in PostgreSQL? The implementation is based on fairly recent
research, and performs quite well for a lot more use cases.

And the point about ORMs is valid, but still not enough to back up your
unqualified claim that using postgres would not have solve the problem. It
_might_ not have, but postgres offers a lot of tools to solve this problem (we
didn't even discuss SELECT ... FOR UPDATE), and even inexperienced users are
at least _more likely_ to have stumbled into one of those solutions.

I very much agree that isolation issues are subtle traps for many users, even
in SQL. I would like to see SERIALIZABLE become more common, and eventually
the default, in postgres. I also like the fact that you're writing real tools
to check up on these in a formal way.

But please be a little careful when making statements like that, because it
can turn people away from the systems most likely to help them.

------
talles
There's been so much drugs articles in HN lately that I literally expected
some LSD story here...

------
jackhulsom
Nice ! Very interesting.

------
amixofpersons
An interesting read.

