Hacker News new | past | comments | ask | show | jobs | submit login
Demystifying Database Systems: An Introduction to Transaction Isolation Levels (fauna.com)
202 points by freels 18 days ago | hide | past | web | favorite | 22 comments

Glad to see to see this post on HN. I'm the author and happy to respond to questions in this thread.

> If any of my readers are aware of any real lawsuits that came from application developers who believed they were getting a SERIALIZABLE isolation level, but experienced write skew anomalies in practice

Doubt this ever really happened. Can hardly imagine debating serializability in a court of law Oracle v. Google was bad enough.

A quick search shows data integrity lawsuits do happen: https://www.healthcareinfosecurity.com/1-billion-lawsuit-foc...

But I'd like to hear from engineers who have seen write skew bugs and other transaction anomalies cause business issues.

This is some nice work on the issue from Peter Bailis: http://www.bailis.org/blog/understanding-weak-isolation-is-a...

It seems grossly irresponsible to encourage the use of serializable without even mentioning deadlocks. I guess you won't have deadlocks if your db interprets as serializable as "lock the whole database on transaction start", but that means literally no concurrency whatsoever, which also seems like a grossly irresponsible recommendation.

I dunno, maybe I didn't get the memo that the database of the future will be single-threaded.

As mentioned in the post: "There are several ways to achieve [serializability] — such as via locking, validation, or multi-versioning."

Deadlock happens under some, but not all implementations of serializability via locking. There have been several database systems developed in my lab that use locking to achieve serializability, but yet never deadlock. Examples include:

(1) Calvin: http://www.cs.umd.edu/~abadi/papers/calvin-sigmod12.pdf (2) Orthrus: http://www.cs.umd.edu/~abadi/papers/orthrus-sigmod16.pdf (3) PWV: http://www.cs.umd.edu/~abadi/papers/early-write-visibility.p...

Bottom line: serializability does not necessarily mean deadlock. Deadlock can be avoided via non-locking implementations, or even in well-designed locking implementations.

One of the points in the conclusion of the post warrants being reiterated at this point:

"If you find that the cost of serializable isolation in your system is prohibitive, you should probably consider using a different database system earlier than you consider settling for a reduced isolation level."

What I'm talking about here is not academic database technologies in laboratories, but the typical mysql/postgres/oracle/sqlserver production world regular programmers live in. So yes, I know pragmatic strategies for avoiding deadlocks, but in all the databases I've used, you have to proactively employ such a strategy. Many if not most of these programmers have at best a rudimentary knowledge of transactions and/or whether they are even using them (thanks to ORMs). My concern is that someone reads a blog like this, concludes that serializable is best, and proceeds towards disastrous consequences.

You appear to suggest that programmers with “rudimentary knowledge of transactions” should prefer lower isolation levels which sacrifice correctness for performance. If anything is “grossly irresponsible” here, it’s that.

Such isolation levels are notoriously difficult to reason about—even for experienced practitioners—and their misuse can and does introduce persistent data anomalies that can be costly to remediate.

Generally speaking, performance issues are significantly easier to diagnose and resolve than data anomalies, and they may be address in a targeted fashion as the need arises.

There’s no substitute for thinking. But if I had to prescribe general advice, it’d be this:

(1) When given the choice, select a modern database system which supports scalable and efficient serializable and snapshot transaction isolation levels.

(2) Use serializable isolation, by default, for all transactions.

(3) In the event that your transactions are not sufficiently performant, stop and investigate. Profile the system to identify the bottleneck.

(4) If the bottleneck is contention due to the transaction isolation level, stop. Assess whether the contention is inherent or whether it is incidental to the implementation or data model.

(5a) If the contention is incidental, do not lower the isolation level. Instead, refactor to eliminate the contention point. Congratulations; you are now done.

(5b) Otherwise, lower the isolation level—only for one or more of the transaction(s) in question—by a single step. Carefully assess the anomalies you have now introduced and the ramifications on the system as a whole. Look for other transactions which could intersect concurrently in time and space. Implement compensatory controls as necessary to accommodate the new behavior.

(6) Repeat only as necessary to achieve satisfaction.

As others have commented, most widely deployed DBMSs provide deadlock-free serializability. FaunaDB is a commercially available implementation of Calvin that makes deadlock free serializable transactions performant in a global environment. The technology exists, and I think your concerns are unwarranted. The consequences of performance issues are far less severe than data correctness bugs, so preserving correctness is a much better starting point.

In my experience, novices generally stick with the default. And when they don't, their code breaks because they assumed serializable was somehow magic and they don't handle the inevitable transaction failures (generally on production, when there is load). This is exactly why PostgreSQL drivers that used serializable as a default had to backtrack and default to read committed. And yes, I agree that the blog is encouraging serializable without adequately pointing out the downsides; th extra burden on application developers would be #1 in the caveats in my book, and using serializable when you don't have to will lead to a net increase in bugs.

Those databases all allow and detect deadlock as an optimistic concurrency strategy out of the box.

In postgresql at least, serializable cannot deadlock. It uses predicate locks to provide serializable consistency with high levels of concurrency. You can have serialization failures though.

I'm not too familiar with postgresql, but isn't that serializable snapshot rather than serializable? I'm pretty sure all RDMBs have deadlocks in serializable. But in serialiable snapshot, a transaction doesn't deadlock, but simply fails.

Serializable Snapshot an implementation strategy for providing Serializable isolation. In PostgreSQL if you ask for Serializable you get SSI.

a transaction system in which its impossible for transactions to ever fail kind of isn't a transaction system.

see above comment from abadid - its entirely possible to impose a global ordering of all transactions either up front or retroactively without admitting deadlocks.

At work we use Sybase SQLAnywhere, and we added serializable access to one particular table.

Fine if you have a few dozen rows as we did in testing, but after a while in production the DB struggled hard the row count went beyond a hundred k... Turned out the serialized access caused it to issue a lock on every single row (for reasons I still don't understand).

The serialized access was used was to basically implement a fancy autoinc ourselves. I rewrote the code to use a unique constraint and retry-loop instead, which eliminated the speed issue entirely.

To implement the serializable isolation level, the database system must track access to every single row you access, even the ones you read or filter out. (The need to track reads, even for rows filtered out of a select statement is surprising but necessary.)

Consider a common scenario where you SELECT a set of rows and take a SUM over a column. Suppose your query and another query begin reading from the same committed state of the same table. Suppose that the other query uses an UPDATE command on a set of rows in the table, and that the other query commits before yours does. In order to be consistent, the database system must detect the situation where the other query updated a row that affects the filter WHERE you scanned the table, otherwise your sum could be incorrect if the other query's committed state would cause your query to compute a SUM over a different set of rows or over modified values in your SUM. The only way for the database system to guarantee there is no conflict is to keep track of every single row your query accesses, even if it is a row passed over by a WHERE clause!

Serializability is a well studied concept. You can find lots of good resources about algorithms for implementing concurrency control and for detecting whether or not two transactions are serializable. The high-level summary is that it takes a lot of operational bookkeeping to guarantee that two queries have no conflicts, especially when you are using real-world examples having many filters and joins.

> The only way for the database system to guarantee there is no conflict is to keep track of every single row your query accesses, even if it is a row passed over by a WHERE clause!

Why is putting a lock on the table itself not an option?

You're correct that a table-level lock is a useful tool to ensure serializability, but it's also a drastic one since it blocks other transactions requesting a higher access level. The concurrency control system is trying to provide a generic way to both guarantee consistency and maintain a high throughout on a variety of workloads, and locking tables always would diminish throughout for many common workloads. For example, one where inserts and reads tend to occur simultaneously at high volume.

In many real world applications using common databases you do not always need “transaction safety”, in particular when reading data for statistical purposes.

Performance gains and not blocking the database for many other readers and writers will in many scenarios outweigh the possibly of reading uncommitted/dirty data.



(whether you are are querying from within a transaction or not)

and you may be surprised how this may solve many of your performance problems in your application/service.

This advice is highly implementation specific and unnecessarily dangerous. For read only transactions, a well designed system will be able to provide SNAPSHOT isolation with little negligible perf impact compared to READ UNCOMMITTED. When combined with SERIALIZABLE write transactions, reads at SNAPSHOT are equivalent to SERIALIZABLE.

For transactions which must write, the entire point of this article is that anything less than SERIALIZABLE leaves you subject to anomalies whose affects on your data can be very difficult to predict beyond trivial scenarios.

If you must lower consistency for performance's sake, then so be it, but I would strongly argue that should be the exception, not the rule, and not be a decision made lightly.

It can certainly solve performance problems, in much the same way as turning of fsync. Even a contractor contracted specifically to fix a performance problem shouldn't do this without understanding how this will affect the particular application, or they open themselves to professional malfeasance (if noticed; otherwise they will likely get the contract to fix the data corruption problem since they did such a good job last time...)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact