Strict-serializability, but at what cost, for what purpose?

richieartoul · on Aug 4, 2022

I’ve had this debate with people numerous times and I think the “you don’t need strong consistency” crowd has really gotten it backwards. These same people will also agree with you that premature optimization is bad, but then argue that application developers should be forced to grapple with looser isolation models for vague reasons like “latency” and “costs”. Most applications don’t push their databases that hard, and the ones that do usually have their most expensive queries as large “read only” queries where strict serializability of other transactions adds little additional overhead.

It’s already hard enough to write correct software, we should be pushing hard problems like transaction isolation deep into the database so they can be solved correctly once, instead of 1000s of times incorrectly.

Most applications that depend on strong consistency in their database could be rewritten to require less strict guarantees, but why?

Strict serializability should be the default. Engineers working in performance sensitive settings and who really know what they’re doing can always loosen this constraint in the narrow part of the application where it matters.

It’s crazy to me that anyone would ask a developer to justify why they need strict serializability. It should be the opposite, you should be forced to justify why you’re using a looser isolation level for performance reasons just like we force developers to justify why they’re introducing more performant, but less readable code!

liuliu · on Aug 4, 2022

The post has more leg than you give it credit for. The author thinks serializable is table stake, and linearizability can be implemented cheaply (through sharding). Strict-serializable however, cannot really break "scale-out" barrier with some serious compromises (either time-bounds it (Spanner's 7ms bounds), or central ordering service that cannot scale out). It additionally provides a proof that strict-serializable != serializable + linearizable and then go on to question what you responded: does strict-serializable worth all these troubles?

richieartoul · on Aug 4, 2022

In my opinion the answer is unequivocally, yes, it is worth it, and developers should accept nothing less from modern databases. Even with strict serializability, developers will still be surprised by transactional behavior in distributed environments, but at least strict serializability provides a relatively straight forward framework for developers to reason about with almost no “ifs, ands, or buts” except for some confusing behavior around idempotency and transaction retries. As soon as you drop below strict serializability, the cognitive load of reasoning about your applications correctness increases dramatically. Most developers just accept this, but it’s not the right trade off for most applications IMO.

If this was some theoretical question that we hadn’t figured out if it was practical or possible yet, I would have more sympathy for this line of thinking, but it’s not anymore.

Spanner, CockroachDB, and FoundationDB have all proven that these systems can be scaled, and economically so.

FoundationDB scales incredibly well with great performance characteristics. That’s evidence enough to me that the cost of strict serializability is worth it for 99% of applications.

_benedict · on Aug 4, 2022

CockroachDB is not strict serializable. It is linearizable+serializable+(some other guarantee I forget).

Only Spanner currently offers properly scalable strict serializability, as FoundationDB clusters have a size limit (that is very forgiving, and enough for most use cases).

Apache Cassandra is working on providing scalable (without restriction) strict serializable transactions[1], and they should arrive fairly soon. So far as I am aware, at this time it will be the only distributed database besides Spanner offering fully scalable and fast global transactions with this level of isolation.

[1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15... (Disclaimer) I'm one of the authors

kbr- · on Aug 4, 2022

What do you mean by "scalable without restriction"?

I understand scalability by being able to handle more traffic through adding more nodes to your cluster. But if you want to handle more traffic, adding more replicas to a single Accord replica set won't help. Any two (fast path) quorums must intersect. And that intersection point for two queries is where you need to spend resources to handle both queries.

So - unless I'm missing something here - in order to scale out to be able to handle more traffic, you must shard the data, there's no way around it. Even assuming 0 failures won't help.

And once you shard your data and use multiple Paxos/Raft/EPaxos/Accord/whatever groups, you lose strict serializability across the shards.

_benedict · on Aug 4, 2022

> you lose strict serializability across the shards

Nope. That's precisely what Accord manages to maintain. It is not the first such protocol to do so, but so far none of the others have left the lab.

kbr- · on Aug 4, 2022

Ok. From skimming through the paper, it seems that for a given transaction, only the replica sets of the shards touched by this transaction need to participate. Thus for two transactions touching disjoint sets of shards, they can run completely in parallel. Hence the scalability. Is that right?

_benedict · on Aug 4, 2022

Pretty much.

richieartoul · on Aug 4, 2022

You’re right about CockroachDB, my mistake.

I think your characterization of Cassandra as “fully scalable” in a way that other systems are not is misleading, but I won’t argue it with you :p

_benedict · on Aug 4, 2022

I'm not talking about Cassandra being more generally scalable than other systems, only that no system as scalable as Cassandra offers strict serializable transactions besides Spanner (including Cassandra, today)

richieartoul · on Aug 4, 2022

What does scalable mean in this context? In general, Cassandra is much more efficient at writes and can handle higher volumes of write throughput assuming identical hardware, but from what I’ve seen FDB is as efficient (if not more so) for read heavy workloads than Cassandra. A small FDB cluster with no tuning running on small VMs will easily saturate each nodes 10gb NICs for read queries without breaking a sweat.

The cluster size limitations are also greatly overstated and mostly a hold over from stale documentation from 2013. You can (and companies do) run FDB clusters with 100s or 1000s of nodes without issue.

Cassandra supports this too but in my experience it gets quite difficult to operate at that point, although I’m sure recent versions have gotten much better about this.

If you only care about raw write throughout I think your statement is mostly accurate, but for general purpose workloads I would argue that FDB is a system “as scalable as Cassandra” that also offers strictly serializable (and interactive!) transactions which makes it much more broadly applicable to a wide variety of use cases.

bitbckt · on Aug 4, 2022

FaunaDB also offers scalable strict serializability.

Disclosure: I work on FaunaDB.

_benedict · on Aug 4, 2022

I believe Fauna inherits some scalability issues from Calvin, in that given NxN communication is necessary for transactions to be processed, a catastrophic failure in one shard can bring the entire database offline. Is that correct?

It may not happen in practice today, but as your clusters grow your exposure to such a failure is increased, and the NxN communication overhead grows does it not?

It's certainly more scalable than other systems offering this isolation today (besides Spanner), but algorithmically at least I believe the proposal we are developing for Cassandra is strictly more scalable, as the cost and failure exposure grow only with the transaction scope, not the cluster.

Not to ding FaunaDB, though. It probably is the most scalable database offering this level of isolation that is deployable on your own hardware - assuming it is? I know it is primarily a managed service, like Spanner.

I'm also not aware how large any real world clusters have gotten with FaunaDB in practice as yet. Do you have any data on that?

bitbckt · on Aug 4, 2022

> I believe Fauna inherits some scalability issues from Calvin, in that given NxN communication is necessary for transactions to be processed, a catastrophic failure in one shard can bring the entire database offline. Is that correct?

No, that's not correct. FaunaDB is inspired by Calvin, but is not a direct implementation of Calvin. Transactions are applied independently on each storage host, though checking OCC locks may involve peers within a single replica.

> assuming it is

We used to offer on-premises delivery, but we are strictly a managed service now.

> I'm also not aware how large any real world clusters have gotten with FaunaDB in practice as yet. Do you have any data on that?

I'm not at liberty to say.

_benedict · on Aug 4, 2022

> Transactions are applied independently on each storage host

I’m not talking about transaction application, but obtaining your slot in the global transaction log. How does a replica know it isn’t missing a transaction from some other shard if that shard is offline come its turn to declare transactions in the log? It must at least receive a message saying no transactions involving it were declared by that shard, no?

This is pretty core to the Calvin approach, unless I misunderstand it.

> I'm not at liberty to say.

I think scalability is something that is a function of both theoretical expectations and practical demonstration.

bitbckt · on Aug 4, 2022

> How does a replica know it isn’t missing a transaction from some other shard if that shard is offline come its turn to declare transactions in the log? It must at least receive a message saying no transactions involving it were declared by that shard, no?

A transaction is committed to the log with sufficient information necessary for the log (and each storage host) to know whether any particular host is involved in that transaction. There's no need for a message to prove a host's absence from that transaction - it has all it needs in the log to determine that it isn't involved. There is some complexity around how that information maps to the cluster's physical topology, but hosts which aren't involved in a transaction don't process that transaction and need not check with any other host to know that fact.

[ETA] That information is derived from the read set when constructing the transaction to propose to the log. Logical partitions that weren't a part of that read set don't need to know that they weren't involved - that fact is obvious from their absence.

> I think scalability is something that is a function of both theoretical expectations and practical demonstration.

I agree with you, though as a commercial product operated as a service, that practical demonstration is naturally limited in this forum.

_benedict · on Aug 4, 2022

You misunderstand my point. A replica may of course safely assume it is not involved in a transaction it hasn’t witnessed so long as it has a “complete” log (as in, the portion involving it).

> that fact is obvious from their absence.

Yes, but absence from what? There must be some message that contains the relevant portion of the log declared by each other shard. If that shard is offline, so it cannot declare its transactions at all, how does the system continue? This absence is one step back from the one you are discussing.

If a shard’s leader is offline, a new leader must be elected before its slot comes around for processing - and until this happens all transactions in later slots must wait, as it might have declared a transaction that interferes with those a later slot would declare.

If no leader can be elected (because a majority of replicas are offline) then the entire log stops, no?

bitbckt · on Aug 4, 2022

It's true: I've had trouble determining what exactly your point is, as you seem to think you know more about FaunaDB's architecture than you do.

You're under the mistaken impression that we have the same data/log architecture as described in the Calvin paper, which requires that every host talk with every other host. That is not true of FaunaDB.

_benedict · on Aug 4, 2022

What you describe is a very important distinction from Calvin, as this mechanism for deriving a global log is core to its design. Have you published this new approach in any public venue?

If not, we’re now in an unfortunate situation where neither your practical nor theoretical capabilities are public knowledge.

> every host talk with every other host

Only every shard

> you seem to think you know more about FaunaDB's architecture than you do

You are perhaps being uncharitable. The only public statements I am aware of about Fauna's capabilities relate to its Calvin heritage. I have only been asking if my understanding is correct regarding both Calvin and its application to Fauna. However, none of the prior issues you responded to were problems with the original Calvin paper either, at least by my reading.

bitbckt · on Aug 4, 2022

> What you describe is a very important distinction from Calvin, as this mechanism for deriving a global log is core to its design.

The salient aspect of the paper is how it derives the log logically, not how that is mechanically implemented in a real system outside of academia.

> However, none of the prior issues you responded to were problems with the original Calvin paper either, at least by my reading.

I wasn't clear what exactly you do (or don't) know of the architecture which would lead you to believe this NxN messaging story, as that relates directly to your contention regarding the scalability (or not) of the architecture. That much is clear to me now.

bitbckt · on Aug 4, 2022

LMGTFY:

https://fauna.com/blog/consistency-without-clocks-faunadb-tr...

_benedict · on Aug 4, 2022

Thanks

> for scalability, this insertion process happens in batches, and the log itself is replicated and partitioned

This is the link's total exposition on this topic, but it is consistent with my reading of the Calvin paper.

If any log partition becomes unavailable, the log in its entirety becomes unavailable until that partition recovers[1].

Anyway, it feels like we’ve wasted enough of each others’ time without making much progress. Might as well leave it there.

[1] ... and every log partition must communicate with every replica, hence NxN (or MxN)

grogers · on Aug 5, 2022

It seems like accord uses neither bounded synchronized clocks nor a shared timestamp oracle. How does it provide strict serializability then?

_benedict · on Aug 5, 2022

The loosely synchronized clocks are not required for correctness (i.e. to enforce strict serializability), only for performance, i.e. for achieving 1 WAN round-trip execution for conflicting transactions whose execution overlaps (all other transactions execute on 1 WAN round-trip anyway).

The system maintains a logical clock, and the logical clock enforces the strict serializability condition globally. The logical clock is just cheaper to maintain if the coordinator proposes a value that is already correctly ordered by the loosely synchronized clocks.

edit: sorry, I misread your post as interpreting the loosely synchronized clocks as being necessary for correctness. I answer (very loosely) how strict serializability is maintained in a sibling comment, but in brief a dependency set of all conflicting transactions that might potentially execute earlier is maintained, and a transaction executes in logical timestamp order wrt these dependencies (and, hence, also their dependencies). If a transaction and its dependencies never interacts with another transaction or its dependencies, then their execution order is commutative, i.e. any ordering is indistinguishable from any other.

evancordell · on Aug 5, 2022

I’m confused by this as well, it seems to only provide “strict serializability” for overlapping transactions, which other databases like Cockroach also provide.

_benedict · on Aug 5, 2022

Accord provides global strict serializability. Loosely speaking, strict serializability requires two things:

1) That the result of an operation is reflected in the database before a response is given to the client

2) That every operation that starts after another operation's response was given to a client has the effect of being executed after.

Accord enforces both of these properties. Accord only avoids enforcing an ordering (2) on transactions that are commutative, i.e. where it is impossible to distinguish one order of execution from another. This requires analysis of a transaction and its entire graph of dependencies, i.e. its conflicts, their conflicts, etc. So, if your most recent transaction operates over the entire database, it is ordered with every transaction that has ever gone before.

Cockroach does not do this. If it did, it would be globally strict serializable.

evancordell · on Aug 5, 2022

This might just be a language issue, because different sources use different words for consistency guarantees.

> Accord only avoids enforcing an ordering (2) on transactions that are commutative

If the committed state in the database records the transactions in a different order than an external observer could have observed them being accepted by the database, then you don't have what Spanner calls "external consistency" - I thought this is what you mean when you say "global strict serializability", but now I'm not so sure.

Cockroach does enforce this property, but only for transactions that touch the same (key-value layer) keys. They talk about this a bit in their article on "life without atomic clocks"[0].

In SpiceDB[1], we take advantage of this property (when Cockroach is selected as the backing datastore) to give us external consistency by forcing all transactions to overlap. This prevents the New Enemy Problem[2], which is what Google calls the problem of transaction ordering when seen from an authorization perspective.

Your description of Accord sounded very similar to how Cockroach works from this perspective, but there may be some subtlety that I am missing. Cockroach exposes snapshots via "as of system time" queries, which can expose "bad" ordering back to the client in future queries - if Accord doesn't allow snapshot queries then I suppose it wouldn't be an issue.

[0]: https://cockroachlabs.com/blog/living-without-atomic-clocks/...

[1]: https://github.com/authzed/spicedb

[2]: https://authzed.com/blog/prevent-newenemy-cockroachdb/

_benedict · on Aug 5, 2022

> If the committed state in the database records the transactions in a different order than an external observer could have observed them

So, property (1) provides this for non-overlapping commutative transactions. If w1 writes c1, then finishes; w2 writes c2 then finishes; then if r1,r2,r3, etc don't overlap either w1 or w2 either, then they must correctly witness their execution order and be "externally consistent".

If transaction executions overlap, we must work harder to ensure we see each others' strict serializable order, and so we simply avoid ordering ourselves with transactions that cannot affect our outcome. If these other transactions cannot affect our outcome, then we can't witness anything externally about them, by definition.

evancordell · on Aug 5, 2022

That makes more sense, thank you for the explanation.

Are there any plans for accord to permit snapshot queries?

_benedict · on Aug 5, 2022

Yes, amongst other things. It's certainly doable, and I expect they will arrive at some point, but I cannot say with any confidence when that might be.

richieartoul · on Aug 5, 2022

I’m not 100% sure, but my reading of the CockroachDB Jepsen test: https://jepsen.io/analyses/cockroachdb-beta-20160829 indicates that it does meet those two requirements, but that it still is not globally strictly serializable due to the presence of an anomaly they call “causal reversal” where: “transactions on disjoint records are visible out of order.”

If you read their Jepsen report and blog posts carefully, the anomaly they suffer from is not caused by a read transaction that started after previous write transactions had completed, but by a read transaction running concurrently with two other write transactions which both commit while the read is still running, and the read sees the writes in the wrong order. The definition you’ve provided above does not cover this case (I think) but its still a violation of strict serializability which makes me think your definition is too loose.

Correct me if I’m wrong though, at this level things get quite confusing.

_benedict · on Aug 5, 2022

This stuff is very subtle, but you have to consider all three transactions when applying the guarantees.

If w1 and w2 overlap their execution then any serializable execution is strict serializable, so w1 must have finished before w2 began. In this case (2) applies and w1 must have the effect of executing before w2, to all observers, including r.

If w1 and w2 are commutative then it doesn’t matter in what order they execute as they’re equivalent. If not, then at some point their dependency graphs intersect and will be/have been ordered wrt each other by the same restrictions.

edit: in the example given in the Jepsen report, with Accord r would have to execute either before w1; after w1 and before w2; or after them both. This is because one of the following must occur:

1) r takes a dependency on both w1 and w2 and executes after them both;

2) r takes a dependency on only w1; w2 takes a dependency on r, so w2 executes after r, and r executes after w1

3) w1 and w2 both take a dependency on r, so that r executes before either

richieartoul · on Aug 5, 2022

Isn’t that ignoring the point that the read is running concurrently with w1 and w2? Your comment about “have the effect” only applies to transactions beginning after a write has completed, but the anomaly cockroach suffers from is for a read running concurrently with two writes with no overlapping execution. In that scenario they fail to be strictly serializable, but so would any system meeting only the two criteria you outlined I think.

Edit: Ah ok I see you edited it. Ok that makes sense if it works.

_benedict · on Aug 5, 2022

yeah, to clarify also textually the "has the effect of" applies to all witnesses, including those that are concurrent. So if r executes after w1 or w2 then r must see that w1 applies before w2, since w1 "has the effect of" applying before w2, whether or not r is concurrent with them.

mad44 · on Aug 5, 2022

Why is it not possible for r to take a dependency on w2 and no dependency on w1?

w2 may hit different nodes than w1 in a sharded database?

Or is the leader-Paxos approach in Accord operate over all nodes in the database regardless of the shards the keys fall into? (I guess that is the only explanation.) But then scalability of these global transactions would be limited, no?

Thank you for your detailed explanations.

_benedict · on Aug 5, 2022

So this is where it can start to get a little complicated, but without going into the nitty gritty, taking the example precisely as given in the Jepsen report:

If r arrives at C1 before w1, w1 must witness it and take a dependency on its execution. If it arrives at C2 after w2, then r takes a dependency on w2’s execution. In this case, w1 will not execute until r does, and r will not execute until w2 does, and so it is simply not the case that w1 happened first - w1’s commit will not have been acknowledged to the client yet, and the actual order of execution will end up w2,r,w1, which is entirely correct to all observers since w1 and w2 will now overlap their execution.

> operate over all nodes in the database regardless of the shards the keys fall into

Nope, it’s all entirely independent for each shard. The (implied) dependency graph is what ties shards together, so only when they interact.

mad44 · on Aug 5, 2022

Thank you for the explanation.

I get it now. It is all thanks to separating the operation as 1. agreeing on WAL, 2. executing after waiting all WAL agreements stabilizing.

There is a separation between the two. This can introduce waits (which could be theoretically unbounded for EPaxos but Tempo and Accord bounds this).

Maybe in some sense this is in the same spirit of Calvin's serializing WAL first and executing it later. But Accord avoids a single entity doing the serialization and achieves it via a leaderless Paxos approach which uses dependency graphs for WAL serialization, and later using another protocol for WAL execution after settlement.

_benedict · on Aug 5, 2022

> Maybe in some sense this is in the same spirit of Calvin's serializing WAL first and executing it later

Yes, it is very similar in effect, with the absence of a global log conferring some important differences with pluses and minuses. I expect to offer interactive transactions eventually, for instance, which may or may not be difficult for Calvin, and faults on a “log shard” are isolated with Accord, but have global impact with Calvin. But snapshot queries are much easier with Calvin.

grogers · on Aug 5, 2022

How do you break dependency cycles across shards (i.e. deadlock detection)?

_benedict · on Aug 6, 2022

There aren’t any dependency cycles, as dependencies must take a strictly lower logical timestamp. Dependency resolution is essentially two step: first every transaction that MIGHT take a lower timestamp is adopted as a dependency, then once those transactions have fixed their timestamp those that take a higher timestamp are simply removed from the dependency set.

Once we offer interactive transactions we’ll have to detect cycles, but that should be relatively straightforward since execution dependencies are managed explicitly.

mamcx · on Aug 4, 2022

Is important to take in context the SIZE.

A lot of Ink is wrongly interpreted because people not think about the SIZE of the (data/problem).

MOST, if not ALL(99%) of the thinking about this stuff is for the very end of the scale: Terabytes, dozen/thousand of machines, insane latency, etc.

MOST, if not ALL(99%) of the ACTUAL data people deal with, the one that matters, is in the low of gigabytes, at best. One, maybe 2 servers, answer in a second(is) is more than ok.

You can fit most databases in an iPhone with spare to work.

Instead, the real trouble that people face is wrong schema/query designs, and introducing bugs and other stuff that the RDBMs thankfully correct/protect against.

Making the primary store MORE strict is a big win to alleviate the real issues.

aidenn0 · on Aug 4, 2022

Is scale-out something most services need though? TFA points out that it's likely only a subset of the data that needs strict-serializable; the flip-side of that is that it's likely only a subset of the data needs scale-out.

jrockway · on Aug 4, 2022

Yeah, I always use strict serializability by default. The reason is because I have no model for reasoning about less strict levels of isolation. For example, with "default" isolation in Postgres, can you transfer money between two bank accounts atomically? The answer, I'm pretty sure, is no! At that point, everything you "know" about databases is out the window. You're basically crossing your fingers and hoping that nothing weird is happening, and if it does, hey it was only one user a day, 99.9% of users are perfectly happy and didn't lose their home and family! That's within our SLA!

I think the article is talking about distributed systems where it's costly to guarantee ordering among multiple entity groups (shards for example), and there are "never" transactions across those entity groups. For example, the database hosting your website and the database backing my website can't talk to each other, so it doesn't really matter that transactions that we execute against our database don't appear in a defined order; we don't even know that the other exists.

I do agree with the sentiment that if those two databases are at the same company, eventually you'll be burned by not being able to do transactions across them. Ever try to make an atomic change to two git repositories? Not possible, but often quite annoying. This is doubly annoying when you start having internal sync processes because two teams used two different database to ultimately build, what looks like to the user, one app.

My philosophy is that if strict serializability ever makes it impossible to run my service, I'm already so rich that I can just sell the service and let someone else deal with the problem. So far, no dice on that.

sitkack · on Aug 5, 2022

> Ever try to make an atomic change to two git repositories?

I know your intent wasn't to discuss this. But you could

1) combine them

2) use tags or git hashes as your URI

3) create a third repo that uses git submodules to combine them by said git hash

Trying to maintain global state by having both gitrepos be compatible at the same wall time isn't as important as it seems. There are lots of ways to meaningfully deal with this.

closeparen · on Aug 5, 2022

The hard part about working at scale is not that you need to use more efficient algorithms or combine computers to handle the load. It is that all the weird and spooky non-reproducible edge case stuff is happening constantly & will actually demand your attention.

The lower your scale, the more you can get away with using a "high scalability" database. Who cares how concurrent transactions are reconciled when you've never seen one? I would wager that most users aren't actually handling the caveats at application layer. They're just not hitting the weird stuff often enough to ever really consider it.

jiggawatts · on Aug 5, 2022

I've wasted a lot of brain cycles dreaming up my "ideal" database engine, and as you've said, a key feature would be strict serializability.

In my opinion the trick to implementing this without too many performance issues is to add the HTTP cookies feature to database engine protocols.

Every transaction with the DB engine should require a client "transaction sequence logical clock" cookie. The returned result set should return the updated cookie, which should then be "threaded" through all the way to a web browser cookie or GUI client local store.

That way, individual users will always be telling the database engine (or cluster!) that they need to see their previous transactions in a linear order, but concurrent users could potentially have a slightly difference perspective on things. Cache entries could also be tagged with the clock cookies and then used to verify if they're still valid or need to be refreshed.

An important feature would be to have a union operator on the clock cookies that provides a new cookie that is "in the future" of all provided clock cookies. This would allow cross-user queries to correctly reflect the required input transactions.

Similarly, it should be possible to use a special "now" cookie, which is identical to typical single-server DB behaviour with serializable transactions.

Last but not least, "now minus t seconds" would produce the equivalent to what you get if you do read-only queries against an asynchronous replica.

This would allow combinations not possible with a naive DB engine that always enforces a single serialized transaction sequence, but the default behaviour would be very safe and consistent.

The computer science for this exists. There are lamport timestamp implementations using compressed tree representations, for example. The timestamps could be tracked per table, which would allow transactions across unrelated tables to be independently serialized.

Essentially, transactions should form a directed acyclic graph much like Git commits instead of a strictly linear linked list. There are many analogies: Git commit hashes are a type of cookie that can be passed around and used for various things, you need to provide a predecessor commit hash for new commits, and the hashes are used to keep caches (local repos) 100% consistent with the central transaction history, etc...

samsquire · on Aug 6, 2022

This is really interesting. Thank you.

I implemented multiversion concurrency control in Java (https://github.com/samsquire/multiversion-concurrency-contro...) and I am yet to get to the point where I'm distributing it between machines for multimachine transactions.

I have a simple read timestamp that is monotonically increasing. If I knew the read timestamps of other shards, distributed by raft log I could implement transactions across shards.

naasking · on Aug 6, 2022

If you want a git-like DAG you might be better off with using a versioning API directly. Take a look at concurrent revisions:

https://www.microsoft.com/en-us/research/project/concurrent-...

siddontang · on Aug 4, 2022

Disclosure: The maintainer of TiDB here.

When we started to build the distributed database TiDB, we had the similar discussion - whether we need to provide a so strong consistence for the customers? Inspired by Google Spanner, we also wanted to let the application programmers to take easy and not struggle with the consistent problems in applications layer by themselves.

So we decided to provide the strong consistency. But unlike Google Spanner, we couldn’t use TureTime API, so we introduced a standalone timestamp service named TSO to allocate timestamp. Using TSO is simple to guarantee all transactions’ order, but this introduces another problem - if the TiDB is deployed crossing different regions, the transaction latency is high.

Another example for consistency is from our realtime analytical processing design. TiDB is a HTAP(hybrid transaction/analytical processing) database. It is normal to keep consistency in the TP part, but for the AP part, is it necessary to keep consistency? If there is an update in the table, does the customer really need to get the latest data from the AP part? Here we still choose “YES”, we want our customers to focus on their business and not worry whether the query data result is consistent of not.

Of course, keeping consistency in the database comes at a cost. Now we have been trying our best to optimize the latency and performance problems :sob:

IMO, we choose a right way. We now have supported strong consistency at first, and then we can provide a loose consistency for performance too, but on the other hand, if we only build a loose consistent database at first, it is hard to provide a strong consistency for the customers.

Gwypaas · on Aug 4, 2022

Don't really have an axe to grind regarding strict-serializability, serializability, snapshot isolation or whatever. Start at the highest level of consistency and consciously step away if you encounter issues.

The Postgres default of "Read committed" gives me shivers on the other hand...

> If I were to make a concurrent map data structure whose operations are not linearizable and then put it on a library and give it to my users, any user of that library would come back to me and say that my library has a bug. They may not be able to tell me exactly what it is, but they'll understand that there is a bug in it.

> However, if I take the same concurrent map data structure, and put in an application and call that application a "Key-Value store" or a "Database" (DBMS) and give it to the typical database users, it seems they may certainly use it for a several decades without ever complaining that this "Database" is not linearizable (or serializable as the DB folks call it).

https://concurrencyfreaks.blogspot.com/2020/06/you-say-read-...

WJW · on Aug 4, 2022

I like my strictness just as much as anyone else, but this seems like a "works in practice" vs "works in theory" issue to me. Linearizability is just not as important as some people make it out to be.

(On a slightly meta level, you can see this happening for a great many issues in programming. Memory safety is undeniably nice, but approximately none of the worlds databases or kernels are written in memory safe languages. Type safety is nice but a huge majority of frontend code is untyped. Don't even get me started on the value of unit testing vs its absence in academic code)

Gwypaas · on Aug 4, 2022

> Linearizability is just not as important as some people make it out to be.

That's my entire point. Start with strong models and weaken when you decide you need it. Thus rule out entire classes of bugs for 99% of your code and closely scrutinize the 1% with weakened guarantees to know that it works. That is kind of the entire premise behind Rust. Make it extremely visible where the dragons may exist, because you have the key to unlock that door.

Regarding memory safety, 70% of CVEs at Microsoft and in Chrome are memory safety bugs. I would say the choice of languages without memory safety is more based on legacy than some kind of insightful choice.

https://msrc-blog.microsoft.com/2019/07/16/a-proactive-appro...

https://security.googleblog.com/2022/03/whats-up-with-in-wil...

WJW · on Aug 4, 2022

> I would say the choice of languages without memory safety is more based on legacy than some kind of insightful choice.

You say this as if legacy choices are not some of the most persistent choices that exist in technology. I don't disagree that 70% of CVEs is memory safety related, but still my previous point stands that none of the popular kernels or databases are written in Rust.

A new database developer with a bright idea is basically confronted with the choice to either:

1. Write it in C and get it merged into Postgres relatively quickly. (Within the year or so)

2. Write it in Rust and also have to implement the rest of a RDBMS, then have to find a user base.

When confronted with that kind of choice, most devs would choose option 1 every time and I can't blame them for that even though I love memory safety.

naasking · on Aug 6, 2022

They'd probably be better off writing a Rust-to-C converter than writing another DB engine.

liuliu · on Aug 4, 2022

Read this post a few days ago. I am trying to articulate my answer properly, mostly from a client-side dev's perspective. Here is my half-baked thoughts:

Strict-serializability pretty much guarantees that you don't need any extra business logic to resolve logical states you know you won't, but turns out end up with. If your changes go out in-order over the wire, you know it will be go in with that order on the other end, either that end is temporal or spatial away.

However, strict-serializability doesn't solve either business logic errors or collaboration issues. If you have wrong transaction wrapping (things should happen in one transaction got split), or you have race condition locally, SS doesn't help on that. SS doesn't solve collaboration issues because the real-time ordering means both of you will end up in the same state, but if naively implemented, that same state may not be what you would expect (you still need to solve conflicts, and how you solve the conflict has to match the user expectation).

Then it goes back to what I originally said, for client-side devs, what they are looking for is to not write any extra business logic to handle states they know they won't end up with when their requests to the server went out in order. For many of these business logics, SS is too strong. A combination of serializability and linearizabily, as the author suggested, could be sufficient. It is just hard to articulate what combination that will be for broad spectrum of business applications.

truncate · on Aug 4, 2022

I feel is kind of similar to static vs dynamic type system argument. Dynamic Type systems can work great if you have strong contracts and everyone understands how to define/maintain that, but as you grow up the team, you'd have people having gaps in understanding those contracts and break them eventually leading to bugs. Enforcing rules sometimes just keeps the rules alive in everyone's head as it is harder to get around that usually.

jwr · on Aug 4, 2022

I've spent enough time dealing with problems caused by lack of strict serializability. Theoretical discussions are great, but eventually your data model isn't just a bunch of separate key-value pairs. The usual approach is to avoid problems by redesigning your application and doing gymnastics with your data. But what's the point if we have FoundationDB (which I'm in the process of migrating to)?

adamnemecek · on Aug 4, 2022

What's crazy to me is that the idea of linearizability is essentially the same as the idea of fixed-points or dimensionality reduction.

I have been collecting some links on this topic

https://github.com/adamnemecek/adjoint

kevincox · on Aug 4, 2022

The TL;DR is that every anomaly that your database can experience is one thing that every developer needs to consider for every change they make. This is a huge overhead. Most importantly every developer won't correctly consider these anomalies for every change because humans aren't perfect. This means that you will encounter bugs from time to time.

So that is the purpose. If you solve this at the database layer you relieve your developers of this burden and reduce bugs. You can argue that the cost is too high (it may be) but there is clearly a benefit to balance it out. Especially if the cost keeps dropping (as it seems to be doing) the balance tips every in favour of stricter consistency models.

deathanatos · on Aug 4, 2022

I'm just sitting here wanting "read-your-writes" from systems.

richieartoul · on Aug 4, 2022

Funny enough, this actually cuts to the core of why strict serializability is so important.

“Technically speaking, every read-only transaction could return the empty-set (which was the initial state of the database) and not be in violation of the serializability guarantee --- this is legal since the system can bring read-only transactions “back in time” and get a slot in the serial order before any transactions that write to the database.”

https://fauna.com/blog/serializability-vs-strict-serializabi...

someon3 · on Aug 4, 2022

So you can serialize stuff ?