Requiring consistency in distributed system generally leads to designs that reduces availability.
Which is one of the reasons that bank transactions generally do not rely on transactional updates against your bank. "Low level" operations as part of settlement may us transactions, but the bank system is "designed" (more like it has grown by accretion) to function almost entirely by settlement and reconciliation rather than holding onto any notion of consistency.
The real world rarely involves having a consistent view of anything. We often design software with consistency guarantees that are pointless because the guarantees can only hold until the data has been output, and are often obsolete before the user has even seen it.
That's not to say that there are no places where consistency matters, but often it matters because of thoughtless designs elsewhere that ends up demanding unnecessary locks and killing throughput, failing if connectivity to some canonical data store happens to be unavailable etc.
The places where we can't design systems to function without consistently guarantees are few and far between.
Also, the banking system does have the system of compensating transactions (e.g. returned checks) when constraints are violated, as well as an extensive system of auditing to maintain consistency.
Do most distributed systems have the ability to detect inconsistency? How do they manage to correct problems when found? Is it mostly manual? I'm genuinely interested in hearing what people have done.
I've certainly experienced inconsistencies which were in very low-stakes online situations that were still annoying. For example, a reddit comment that was posted on a thread, but never shows up my user page. Or when I stopped watching a movie on Netflix on my PC to go watch something else with my wife on our TV, but Netflix won't let us because it thinks I'm still watching on the PC.
Come to think of it, there was also something that used to happen frequently with Amazon video, though I haven't seen it happen in a while. When I would resume a video, it gave me two choices of times to resume, one based on the device and one based on the server. The odd thing was that I had only ever used one device to watch the video but it was always the server-stored time that was right. Just a bug, I guess, but that does indicate the kind of inconsistencies that have to be resolved somehow.
They do if you build them to have them. Your bank account works without requiring consistency because every operation is commutative: You never do anything that says "set account to value x". You have operations like "deposit x" and "withdraw x".
For operations that require you move money from one account to another, you need to guarantee that both or none steps gets durably recorded, and you may (or may not, depending on your requirements) insist on local ordering, but even there global consistency is unnecessary.
> How do they manage to correct problems when found? Is it mostly manual? I'm genuinely interested in hearing what people have done.
You design systems to make them self-correcting: Log events durably first, and then act on the events. If possible, make operations idempotent. If not, either make operations fully commutative, or group them into larger commutative operations and use local transactions (so require local consistency) to ensure they are all applied at once. The point is not to require no atomicity, but to localise any guarantees as much as possible, because the more local they are, the less they affect throughput.
On failure, your system needs to be able to determine which operations have been applied, and which have not, and resume processing of operations that have not.
> I've certainly experienced inconsistencies which were in very low-stakes online situations that were still annoying. For example, a reddit comment that was posted on a thread, but never shows up my user page. Or when I stopped watching a movie on Netflix on my PC to go watch something else with my wife on our TV, but Netflix won't let us because it thinks I'm still watching on the PC.
This type of thing is often down to copping out on guarantees when a system includes multiple data stores, and is often fixed by applying a local transactional event log used to trigger jobs that are guaranteed to get processed. E.g. consider the Reddit example - if you want to be able to effortlessly shard, you might denormalize and store data about the comment both tied to a thread, and tid to a user. If these are in two different databases, you need some form of distributed transactional system.
A simple way of achieving that is to write a transactional log locally on each system (as long as operations are design to not need coordination) that ensures that both "attach post to thread" and "attach post to user page" will get logged successfully before giving a success response to the end user. You can do that by using a commit marker in your log. Then you "just" need a job processor that marks jobs done when all operations in a transaction has completed (and that can handle re-running the same ones).
But this is something people often fail to do, because it's easier to just write out the code for each step sequentially and deal with the rare instances where a server dies in the middle of processing a request.
So the basic pattern for ditching global consistency is to write transaction logs locally, representing each set of operations you need to apply, and put in place a mechanism to ensure each one is applied no matter what. Then use that "everywhere" including in the many places where you should've had guarantees like that, but don't because developers have been lazy.
You can't do this in general, because we often want to build applications where the user expects their actions won't be reordered. People often forget this: the user is sort of the unstated extra party in a distributed system. A canonical example is privacy settings on a social network. If I want to hide my profile from a user by unfriending them and then post something unkind about them, it doesn't really work for me that the application is free to reorder my actions.
At some point your desire to have HA via a weakly consistent system conflicts with intuition and usability of an application.
The key being they don't expect their actions to be reordered. In most systems this is trivially handled by logging event time as seen from the users system and ensure the user sees a consistent view of their own operations.
There are exceptions, naturally, but the point is not that you should be able to do this "in general", but that expecting global consistency is a relatively rare exception.
If I post a comment here, I expect my comment to be visible when I return to the page. But there is no reason for me to notice if there is a strict global ordering to when posted comments first appears in the comment page. If someone presses submit on their post 100ms before me, but the comment first appears to me the next time I reload, while it appeared the opposite way for the other poster, it is extremely unlikely to matter.
> A canonical example is privacy settings on a social network. If I want to hide my profile from a user by unfriending them and then post something unkind about them, it doesn't really for me that the application is free to reorder my actions.
This is a design issue. Your privacy settings is a local concern to your account, it most certainly does not need to be globally consistent. All you need to do to avoid problems with that is to ensure consistency in routing requests for a given session, and enforce local consistency of ordering within a given user account or session depending on requirements.
As long as you do that, it makes no difference to you or anyone else if the setting and post was replicated globally in the same order or not.
I think you're misunderstanding what the problem is here. I want global consistency (really I want two party consistency) to prevent the said user from witnessing the unkind thing I posted about them. It's necessary that my own actions appear in their original order to me, but also to anyone else who is plausibly affected by their reordering.
The full example also involves posting on a mutual friend's wall and having that post hidden by unfriending. You could route all requests to fetch posts to the owning poster's pinned set of servers, but this would be extremely expensive. Sharding solutions also weaken availability guarantees, because you necessarily shrink the set of nodes able to service a request.
You don't want global consistency and you're saying it yourself.
What you want is to be presented with a view that is consistent with the actions you've done. [it's really basics101 for user experience.]
The internals of the system don't matter to you. It only matter to us, as the builders. We can make it so that it is internally 'inconsistent' yet it always presents a 'consistent' view to the user.
 I don't think the word consistent describes well the situation. Nothing is ever strictly perfectly consistent.
Sequential consistency is basically the most lax model you can get away with in this case (and have it generalize to other similar operations). This is a very strong consistency model and weakening linearizability to this basically provides no advantage for availability. There is a proof that anything stronger than causal consistency cannot provide "total availability", which I'd argue is somewhat mythical anyways (except when you're colocated with where data is being stored).
My point here is that this is clearly a case where many of the strongest ordering guarantees are required to get the desired behavior, and subsequently availability must be sacrificed.
> consistency... consistency... consistency
What if I am not aiming for consistency in the first place?
We both understand that "total availability" is mythical. So why not target partial availability in the first place?
No, it's necessary that the effects appear in the right order, and in the example case what matters is that the privacy setting is correctly applied to the subsequent post and stored with it.
> The full example also involves posting on a mutual friend's wall and having that post hidden by unfriending
If that is a requirement - and it's not clear it would be a good user experience, as it would break conversations (e.g. any reply; and what do you do about quoted parts of a mssage) and withdraw conversation elements that have already been put out there -, but in any case this does not generally require global consistency. It does require prompt updates.
It sounds like you're saying that the friend-state for a given post are part of the permissions for the post, and therefore part of the post itself. In which case, when a user posts, I'd include the version number of their friend-state in the post's permissions. Then on view attempts, I make sure that the friend-state I'm working with is at least as new as the friend-state referred to by the post.
Your friend-state solution only protects the visibility of the post if the friend-state between the two users has not changed more recently than both the post and your version.
Friend state 0 is we're friends.
Friend state 1 is we're not friends.
Friend state 2 is we're friends.
Friend state 3 is we're not friends.
I have friend state 2, and the post is at state 1. I accept my version incorrectly and am able to view the post, despite the fact that we're no longer friends.
Pretty sure the subtext is that developers are making their lives harder than they need to by not understanding the choice they are making.
aka snapshot consistency.
(Thoroughly enjoyed your "the user is sort of the unstated extra party in a distributed system", btw.)
balance = 0
balance += 100
balance -= 50
You can't swap the order of addition and subtraction, because that would mean withdrawing from an empty account, and afaik you can't do that. Is there something I'm missing?
It is entirely up to the bank whether or not it will be allowed to complete, but regardless the response from the bank the end result of the operations will be correct, but it may be different (it will be different if the bank will reject an operation in one order, but accept all of them when presented in a different order).
You are right that they are therefore technically not strictly commutative in the sense the outcome may differ in edge cases.
But the point is that what will not happen, is that you will get incorrect results. E.g. money won't "disappear" or "appear" spontaneously.
Most commonly things like this are done by somehow reifying the operation as data and attaching some information to determine ordering. One way would be to attach a timestamp to bank transactions. By itself, that would be enough to ensure bank transactions commute (but without bounds on clock synchronization, wouldn't guarantee that you see those operations in the order you issue them). You could also require a single consistent read to an account to get a state version number and attach an incremented version of that in your transaction. Ties in both cases could be resolved by account id of the person initiating the transaction.
> Log events durably first, and then act on the events.
Nothing has changed my view of how to build systems more than the log-then-act paradigm. So many problems go away when I start from that perspective.
I think it took me so long to see it because so many common tools work against it. But those tools were built in an era when most systems had "the database", a single repository of everything. That paradigm is very slowly dying, and I look forward to seeing what tools we get when the distributed-systems paradigm becomes dominant.
For example: addition is commutative, because 1+2 = 2+1
Subtraction, however, isn't.
Since a transfer is basically a transfer from one account to the other, it couldn't possibly be commutative in a mathematical sense.
I suspect there might be a better term for the property you are trying to name.
"Atomic" seems adequate to me...
In math (say abstract algebra / group theory), the operations on your database form a group under composition. Given two operations f,g (say f is "subtract $10" and g is "subtract $20"), the composition of those operations could be written "f•g" (depending on notation this can mean g is done first or f is done first, typically the former convention is used).
(So, instead of numbers and plus (+) in your example, we're using functions and composition (•) as our group.)
This group, of operations under composition, is said to be commutative if f•g = g•f for all f and g. (That is for all combinations of your possible operations.)
For addition/subtraction operations you're good to go.
If you can consider operations on a hash table being things like 'set a value of 10 for key "a"' the order that the operations are executed in do matter, and the group of those operations, under composition, is not commutative.
The subject of CRDT's in computer science is an attempt to create data types whose operations are commutative.
One reason to want do deal with commutative operations is that you don't have to figure out the order in which the operations officially occurred.
When talking about the operations being commutative, we're generally talking about each individual account, though it may apply wider as well.
In any case, the difference is that we are not talking about the elements of a given operation. You are right that if you see the operations as being "balance = balance + x" or "balance = balance - y", then the subtraction in the latter is not commutative.
But the operations seen as indivisible units are commutative.
Consider the operations instead as a list, such as e.g. "add(x); sub(y)" (in reality, of course, we'd record each transaction into a ledger, and store a lot more information about each).
See that list as an expression with ";" as the operator. This is commutative - the operands to ";" can be reordered at will. That ability to reorder individual operations is what we're talking about in this context, not the details of each operation.
One big reason for that is that the settlement and reconciliation process is easier to manage when the parties involved can at least be confident that their own view of the situation is internally consistent. Gremlin-hunting is a lot less expensive if you can minimize the number of places for gremlins to hide.
1. Bank transactions are easy to express as a join-semilattice, which means its intrinsically easy to build a convergent system of bank accounts in a distributed environment. Many problems are not easily expressible as these and usually trade off an unacceptable amount of space to formulate as a join-semilattice and still work with complicated types of transactions.
2. A bank transaction has a small upper bound on the types of inconsistency that can occur: at worst you can make a couple overdrafts.
3. A bank transaction has a trivial method for detecting and handling an inconsistency. Detection is easy because its just a check that the transaction you're applying doesn't leave the account with a negative balance. Handling is easy, because the bank can just levy an overdraft charge on an account holder (and since they have the ability to collect debts on overdrawn accounts, this easily works for them). Complicated transactions often don't have a trivial method of detecting inconsistency and handling inconsistency is rarely as simple as being able to punish the user. In fact corrective action (and likewise, inconsistency) is often non-intuitive to users because we expect causality in our interaction with the world.
> The real world rarely involves having a consistent view of anything
I strongly disagree with this. Most of our interaction with the world is perceived consistently. When I touch an object, the sensation of the object and the force I exert upon the object is immediate and clearly corresponds to my actions. There is no perception of reordering my intentions in the real world.
We expect most applications to behave the same way, because the distributed nature of them is often hidden from us. They present themselves as a singular facade, and thus we typically expect our actions to be reflected immediately and in the order we issue them.
Furthermore, this availability vs consistency dichotomy is an overstatement of Brewer's theorem. For instance, linearizable algorithms like Raft and Multi-Paxos DO provide high availability (they'll remain available so long as a majority of nodes in the system operate correctly). In fact, Google's most recent database was developed to be highly available and strongly consistent .
That may be true for a given set of transactions in a very abstract way, but that has some serious practical problems. First, like it or not, banks have a history of making transactions non-associative and non-commutative; there are many reported cases from the wild of banks deliberately ordering transactions in a hostile manner so that the user's bank account drops below the fee threshold briefly even though there is also a way to order the transactions so that doesn't happen. So as a developer, telling your bank management that you are going to use a join-semilattice (suitably translated into their terms) is also making a policy decision, a mathematically strong one, about how to handle transactions that they are liable to disagree with.
There is also the problem of variable time of transmission of transactions meaning that you may still need to have some slop in the policies, in a way that is not going to be friendly to the lattice.
The nice thing about join semilattices and the similar mathematical structures is that you get some really nice properties out of them. Their major downside, from a developer point of view, is that typically those properties are very, very, very fragile; if you allow even a slight deviation into the system the whole thing falls apart. (Intuitively, this is because those slight deviations can typically be "spread" into the rest of the system; once you have one non-commutative operation in your join-semilattice it is often possible to use that operation in other previously commutative operations to produce non-commutative operations... it "poisons" the entire structure.)
I think you're badly underestimating the difficulty of the requirements for anything like a real bank system.
Whether or not its actually implemented with something like CRDTs in the real world isn't really my point, my point is that its a cherry-picked example of a system that clearly does reorder operations and has weaker-than-usual concerns about witnessing inconsistency.
Never tried to type something by thinking only of the word (rather than specifically the sequence of key strokes) only to have your fingers execute the presses in slightly the wrong order -- giving you something like "teh" instead of "the"? I have, a lot actually, and if I'm in a hurry or distracted, it can even pass by unnoticed, leaving me with the impression I typed the correct word while having actually executed the wrong sequence. (Hurry or distracted I think would be the most analogous to systems which don't take the time to register aggregate feedback and verify correct execution after the fact.)
Humans have at least one experience of trying to execute a distributed sequence of actions and having the output be mangled by the actual ordering events so common they invented a word for it -- the "typo".
It's insanely easy to explain to people that sometimes really complicated systems make a typo when they're keeping their records, because it's such a shared, common experience.
To be clear: there's an economic tradeoff to make here that isn't identical at all scales. At a certain point, lost revenue due to user dropoff from availability is more expensive than users who stop using a site due to a frustrating experience (and sometimes that scale itself is the compelling feature able to stave off a user from giving up on your app completely). But in most cases, I think its advisable to start with stronger consistency semantics and only weaken them when a need presents itself. Hell, Spanner is the foundation for Google's ad business, if that's not living proof that you can preserve strong consistency semantics and keep your system sufficiently available, I'm not sure what is.
The typo model is merely meant to explain why it happened. You'd be no more happy if you were locked out of a door because a locksmith fat-fingered the code on setting a pin. Not all typos are harmless, and that wasn't the point of my analogy.
Helping people understand why these technology mistakes happen -- through intuitive models like typos -- helps us discuss when mistakes are and aren't acceptable, and how we're going to address the cases where they're not. It simply gives me a way to discuss it with non-technical people: under what situations would a person doing the computer's job making a typo be catastrophic instead of merely annoying?
Most managers or clients can understand that question, at an intuitive level.
> But in most cases, I think its advisable to start with stronger consistency semantics and only weaken them when a need presents itself.
Completely agree! Unless you have a specific reason it's necessary for you, you likely don't have to actually address the inconsistency directly and are better off letting the database handle that for you.
I wanted to communicate through a fairly succinct and understandable example that there are potentially high stakes, and I wanted to show a 'NoSQL' example rather than a micro-services one. It was the first one that came to mind.
Almost every transaction is of the form: 1. estimate authorization; 2. present request for settlement; 3. settle during batch processing; 4. any additional resolution steps (eg, returned checks)
The estimator used in 1 need not be fully consistent because a) as long as the periods of inconsistency are low (eg, < 10 sec) it won't affect the usage patterns of most accounts and b) the bank only need quantify the risk to their system (and be less than the cost of improving the system), rather than have it be risk-free. Risk management techniques like preventing rapid, high-value transactions on top of a basic estimated value like "available balance" is fine for the estimation used in authorization phase.
The actual settlement is run during a special batch transaction during hours where they can't receive new transactions, and thus is a special case of it being really easy to implement consistency of the outcome, and it's acceptable if the settlement has some number of accounts end up negative or requiring additional steps (such as denying the transaction to the presenting bank), as long as the number isn't too high, eg, the estimator in step 1 is right most of the time.
Basically, the only requirement the banks have is that a) they can estimate how inconsistent or wrong their system is until their batch settlement where they resolve the state and b) the number of accounts which will be in error when settled remains low. Then they just hire people to sort out some of the mess and eat some amount of losses as easier than trying to make the system perfect.
The fact is that banks make better use of what databases are and aren't than most designs, because they're very clear about when and where they want consistency (weekdays, 6am Eastern in the US) and the (non-zero!) rate at which the system can be in error when it resolves the state. Any application that can be clear about those can get good usage of databases.
A lot of transactions that need to be reconciled happens without involving 1. e.g. checks and offline card transactions. Even ATMs can often continue to dispense cash when their network is offline, though there arguable 1 applies in the form of maximum withdrawal limits.
But in general I agree with you - their system is built around having accepted, calculated and accounted for the risks rather than try to mitigate every possible risk. Which makes sense because a large proportion of a banks business is to understand and manage risk, and some risks are just happens cheaper to accept than to mitigate via technology or business processes (and the ones that aren't, or aren't any more, are where you'll see banks push new technology, such as making more settlements happens faster and fully electronically)
You misunderstand me, and I appreciate that I should have been clearer: A single observer in a single location that is both the sole observer and creator of event will have a consistent view. But this isn't a very interesting scenario - the type of interactions that matter to us are operations that requires synchronisation between at last two different entities separated in space.
Consider if you have two identical objects, on opposite sides of the world, that represents the same thing to you and another person. When you move one, the other should move identically.
Suddenly consistency requires synchronisation limited by the time it takes to propagate signals between the two locations to ensure only one side is allowed to move at once.
This is the cost of trying to enforce consistency. If you lose your network, suddenly both or at least one (if you have a consistent way of determining who is in charge) are unable to at least modify, and possibly read the state of the system, depending on how strict consistency you want to enforce. In this specific case, it might be necessary.
But most cases rather involves multiple parties doing mostly their own things, but occasionally needing to read or write information that will affect other parties.
E.g. consider a forum system. It requires only a bare minimum amount of consistency. You don't want it to get badly out of sync, so that e.g. there's suddenly a flood of delayed comments arriving, but a bit works just fine. We have plenty of evidence in the form of mailing lists and Usenet, which both can run just fine over store-and-forward networks.
Yet modern forum systems are often designed with the expectation of using database transactions to keep a globally consistent view. For what? If the forum is busy, the view will regularly alrady be outdated by the time the page has updated on the users screens.
> Furthermore, this availability vs consistency dichotomy is an overstatement of Brewer's theorem. For instance, linearizable algorithms like Raft and Multi-Paxos DO provide high availability (they'll remain available so long as a majority of nodes in the system operate correctly). In fact, Google's most recent database was developed to be highly available and strongly consistent .
They'll remain available to the portion of nodes that can reach a majority partition. In the case of a network partition that is not at all guaranteed, and often not likely. These solutions are fantastic when you do need consistency, in that they will ensure you retain as much availability as possible, but they do not at all guarantee full availability.
Spanner is interesting in this context because it does some great optimization of throughput, by relying on minimising time drift: If you know your clocks are precise enough, you can reduce synchronisation needed by being able to order operations globally if timestamps are further apart than your maximum clock drift, which helps a lot. But it still requires communication.
In instances where you can safely forgo consistency, or forgo consistency within certain parameters, you can ensure full read/write availability globally (even in minority partitions) even in the face of total disintegration of your entire network. But more importantly, it entirely removes the need to limit throughput based on replication and synchronisation speeds. This is the reason to consider carefully when you actually need to enforce consistency.
My point in bringing that up is that applications don't present the appearance of appearance of entities separated in space, that's why weak consistency is unintuitive. We expect local realism from nearby objects the same as we expect behavior in an application that exhibits effects from our immediate actions.
> E.g. consider a forum system. It requires only a bare minimum amount of consistency. You don't want it to get badly out of sync, so that e.g. there's suddenly a flood of delayed comments arriving, but a bit works just fine. We have plenty of evidence in the form of mailing lists and Usenet, which both can run just fine over store-and-forward networks.
Most threaded or log-like things are examples that fare well with weak consistency (because they're semi-lattices with lax concerns about witnessing inconsistency). I'm not saying that there aren't examples where its a tradeoff worth making, I'm saying that many applications are built on non-trivial business logic that coordinates several kinds of updates where it is tricky to handle reordering.
> They'll remain available to the portion of nodes that can reach a majority partition. In the case of a network partition that is not at all guaranteed, and often not likely. These solutions are fantastic when you do need consistency, in that they will ensure you retain as much availability as possible, but they do not at all guarantee full availability.
Weakly consistent databases can't guarantee full availability either when a client is partitioned from them. In fact there's no way to guarantee full availability of internet services to an arbitrary client. I think the importance of network partitions is somewhat overstated by HA advocates anyways. They're convenient in distributed systems literature because they can stand-in for other types of failures, but link failures (and other networking failures, but link hardware failures are disproportionately responsible for networking outages) happen at an order of magnitude less than disk failures.
> In instances where you can safely forgo consistency, or forgo consistency within certain parameters, you can ensure full read/write availability globally (even in minority partitions) even in the face of total disintegration of your entire network. But more importantly, it entirely removes the need to limit throughput based on replication and synchronisation speeds. This is the reason to consider carefully when you actually need to enforce consistency.
I'm all for giving careful consideration to your model. And I agree, there are several cases where weakening consistency makes sense, but I think they are fairly infrequent. Weakening consistency semantics per operation to the fullest extent can add undue complexity to your application by requiring some hydra combination of data stores, or come at a large cost by requiring you to roll your own solution to recover some properties like atomic transactions. You called this "laziness" earlier, but I think this is actually pragmatism.
It's a slimy abuse of language to call a service "available" when it's giving incorrect output and corrupting its persistent state.
I can write a program with lots of nines that is blazingly faster than its competitors if its behavior is allowed to be incorrect.
E.g. a search result from a search engine is outdated the moment it has been generated - new pages come into existence all the time, and besides the index is unlikely to be up todate.
What matters is whether "close enough" is sufficient, and a large proportion of time it is: E.g. for the search engine example don't care if I get every single page that matched my query right at the moment I pressed submit. I care that I find something that meets my expectations.
Getting "good enough" results and getting them at a reasonable time beats not getting results or waiting ages because the system is insisting on maintaining global consistency.
Yes, there are cases where "good enough" means you need global consistency, but they are far fewer than people tend to think.
Congratulations! You just understood distributed systems, or put more simply the real world of app development.
Most of the programs in the real world ARE allowed to be incorrect at times.
Corollary: Most of the programs in the real world do NOT need to be correct every single time.
These statements should be on top of system design and requirement gathering classes.
That said, if the problem has been thought about, I'm happy. My frustration is with people not understanding the trade-offs.
Enforcing a globally consistent views basically requires coordination. Coordination requires that a signal traverses a distance and is acted on no faster than the instruction throughput of a single CPU core.
When the sources of operations are geographically spread out, you will be limited based on the fastest communication possible between those locations in addition to the processing latency.
Ultimately this means that there are scaling limits to throughput of a fully serialised system that causes real problems all the time.
Sometimes a single global truth is still fast enough. Very often it isn't, when e.g. your users are spread out globally. That's why it's important to consider avoiding consistency when it is not necessary.
The way I like to think about it (and mind you, I'm an early-stage kind of guy, so I rarely work on huge systems) is that consistency guarantees give you a lot of leverage, and while it's almost unattainable in the real world, in the computing world we have extremely powerful CPUs and networks that allow communication between any two points on the globe in less than a second. There is frankly a huge number of applications that can be built on an ACID database and never have to worry about the offline or distributed use cases that would require building a proper distributed system.
The inconsistency is an absolute PITA for the consumer. The delays are are a terrible, terrible user experience, and the consumer has to spend more money in the end.
It's extremely expensive, both in money the bank has to spend on auditing and in the legal framework that has to be put in place.
The bank example is the perfect argument for consistency, not against.
But the shorter you make the processing time, the less benefit you get from introducing consistency requirements, to the point where the thing that is important to address is the remaining cases where processing transactions take time, not the consistency (at least as long as banks also don't take the piss by taking advantage of the lax consistency to charge unwarranted overdraft fees).
There is no single "truth".
Under the unfortunate consequences of the CAP theorem and the speed of light, consistency grinds to a halt as the number of observers increases. Time itself would have to stand still in the limit if at each time step every particle in the universe had to agree on the state of the system.
What we need to preserve is mostly strong "causal consistency", which means that from a single observers perspective, the system "appears consistent".
This article is really good and isn't obscured by NoSQL vs SQL arguments.
Thankfully the author's premise of "NoSql == inconsistency" is wrong in at least one case. Google Cloud Datastore offers ACID compliance. I use that to great effect (I run a subscription SaaS with usage quotas)
Picking a pattern, and sticking to it is fundamental to systems flow.
Yes, have "randomness" is important, but trying to cultivate chaos from utter disorder is a something at the very least historically speaking people are not good at collectively.
Chaos functions best when it's on the edge of a system, not embedded in it.
The truly nefarious aspect of NoSQL stores is that the problems that arise from giving up ACID often aren't obvious until your new product is actually in production and failures that you didn't plan for start to appear.
Once you're running a NoSQL system of considerable size, you're going to have a sizable number of engineers who are spending significant amounts of their time thinking about and repairing data integrity problems that arise from even minor failures that are happening every single day. There is really no general fix for this; it's going to be a persistent operational tax that stays with your company as long as the NoSQL store does.
The same isn't true for an ACID database. You may eventually run into scaling bottle necks (although not nearly as soon as most people think), transactions are darn close to magic in how much default resilience they give to your system. If an unexpected failure occurs, you can roll back the transaction that you're running in, and in almost 100% of cases this turns out to be a "good enough" solution, leaving your application state sane and data integrity sound.
In the long run, ACID databases pay dividends in allowing an engineering team to stay focused on building new features instead of getting lost in the weeds of never ending daily operational work. NoSQL stores on the other hand are more akin to an unpaid credit card bill, with unpaid interest continuing to compound month after month.
You can have non-ACIDic SQL databases and ACID NoSQL databases.
While it is true that many KV stores do not present a fully consistent view over distributed data, there are some that do.
It is very important that people shed this idea that only SQL databases can have strong consistency, that's for sure.
Why? In practice, people are going to be building things using the same common paths: Postgres, Mongo, MySQL, Redis, etc., and from a purely pragmatic standpoint, the SQL databases that you're likely to use come with strong ACID guarantees while the NoSQL databases don't.
If you can point me to a NoSQL data store that provides high availability, similar atomicity, consistency, and isolation properties provided by a strong isolation level in Postgres/MySQL/Oracle/SQL Server, and has a good track record of production use, I'm all ears.
I use it in production for a game that supports over 100,000 concurrent users and it's been around for a very long time.
It's fully ACID by default but you get the ability to drop elements of ACID in exchange for greater performance in a per table or transaction level. It also supports two different isolation approaches on a per table level. Locking or MVCC (like postgres).
It's a really great NoSQL database from the days before the term NoSQL existed.
Still there are plenty of highly distributed stores to go around, and it's good to know that people are actively using them with great success.
It's very nice to have fine control over consistency vs performance even if you do have to think about it as a developer, but then that's our job :-)
LMDB has taken its place in a lot of projects that used to use Berkeley.
which was inspired by this paper:
which google made available for all in form of it's cloud datastore:
"We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions." 
I haven't used it in a few years, but from what I remember you could have ACID like operations on grouped entries because grouping them makes their physical storage the same so ACID transaction becomes a localized operation.
Consistency is always relative to what the DBMS can enforce, which is in turn always relative to what the the DBMS's schema language can express. A KV store can be trivially consistent in that, between writes, it always presents the same view of the data. But it won't be consistent in the sense of presenting a view of the data that's guaranteed not to contradict your business rules. The latter is what users of relational databases expect.
On the other hand, SQL makes it hard to maintain the state definition, I had to develop a mechanism to upgrade the DB from any possible state to the latest state.
Still, this allows me to define very accurately what is a valid row or entry in the database, which means my Applications need to make far fewer assumptions.
I can just take data from the DB and assume with certainty that this data has certain forms, formattings and values.
MongoDB makes none of these guarantees.
A lot of developers don't want to do that, because they want to think of data integrity as this tangential concern that's largely a distraction from their real job. A lot of developers also have a habit of cursing their forebears on a project for creating all sorts of technical debt by cutting corners on data integrity issues, while at the same time habitually cutting corners themselves in the name of ticking items off their to-do list at an incrementally faster pace.
This isn't to say that NoSQL systems make your system inherently less maintainable. But I do think NoSQL projects gained a lot of their initial market traction by appealing to developers' worst instincts with sales pitches that said, "Hey, you don't even have to worry about this!" and deceptively branding schema-on-read as "schemaless". So a reputation for sloppiness might not be deserved (or at least, that's not an issue I want to take a position on), but, in any case, it was very much earned.
Reading your comment reminds me of ESR's point in The Art of Unix Programming that getting the data structures right is more important than getting the code right, and if you have good data structures, writing the code is a lot easier. The database is just a big data structure, the foundation everything else is built on, and the better-designed it is, the easier your code will be.
When I get data from a well validated DB, I don't have to check that it is valid or has certain formats, that is ensured by the DB.
I can remove checks from the entire CRUD stack, if it's invalid data it does not get created or won't make it into the update.
The easiest way to check data is then to try the INSERT or UDPATE and if it works it's valid.
Otherwise I'd need to validate all fields, see if any contain errors, if any are missing, etc etc.
And I love it to death for it.
It also allows me to do some nifty little tricks.
I've developed my own Migration Tool that works by exploiting this data integrity, it actually aids me to identify and upgrade the database dynamically without worrying about which updates are applied and which not.
Surely I get the feeling, NoSql just exposed the issue and became synonymous with it.
Still, it's funny how banking seems to be the canonical example for why we need transactions given that most banking transactions are inconsistent (http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on...).
It's a useful example because it shows how there can be serious consequences for getting it wrong.
I think something a lot of people miss is that the universe itself is not strongly consistent. This is Einstein's theory of relativity. It fundamentally takes time for information to travel.
So if strong consistency is not viable, even at a physics level, what can we do instead? That is why our team here at http://gun.js.org/ believe in CRDTs. What are CRDTs? They are data structures that are mathematically proven to produce the same results on retries - so even if the power goes out, or the network fails, you can safely re-attempt the update.
This means you fundamentally don't need "transactions" or any of these jargon words that people often throw out. Sadly CRDTs are becoming another one of those jargons, despite how simple they are in reality.
Which is why Google's hack in Spanner and F1 RDBMS of simply using or waiting out the known time delay was so clever. Also relies on a tool that was a test of Einstein's theory. ;)
"That is why our team here at http://gun.js.org/ believe in CRDTs."
Thanks for the link. I'll look into it. Plus the liberal license that gives it a chance of making a dent in commercial sector or in combo with BSD/Apache projects.
For example, imagine someone doing the same thing year after year diligently. (S)he'd increase his or her skill say 10% a year (have no clue what realistic numbers are). Would that mean that the compound interest effect would occur?
I phrased it really naievely, because while the answer is "yes" in those circumstances (1.1 ^ n). I'm overlooking a lot and have no clue what I overlook.
I know it's off-topic, it's what I thought when I read the title and I never thought about it before, so I'm a bit too curious at the moment ;)
Then, there is the rise of microservices to consider. In this case, I also agree with the author that it becomes crucial to understand that the number of states your data model can be in can potentially multiply, since transactional updates are very difficult to do.
But I feel like on the opposite side of the spectrum of sophistication are people working on well-engineered eventually consistent data systems, with techniques like event sourcing, and a strong understanding of the hazards. There's a compelling argument that this more closely models the real world and unlocks scalability potential that is difficult or impossible to match with a fully consistent, ACID-compliant database.
Interestingly, in a recent project, I decided to layer in strict consistency on top of event sourcing underpinnings (Akka Persistence). My project has low write volume, but also no tolerance for the latency of a write conflict resolution strategy. That resulted in a library called Atomic Store .
In these environments you atomically create objects in your application's "local" storage and have a reconciliation loop for creating objects in other services or deleting these orphan "local" objects.
At the service level integration, we can leverage a reliable message queue middleware to make sure a task is eventually delivered and handled (or be put in a dead letter queue so we can do a batch clean up)
Also as a general principle, I would make each of those sub-transactions to be idempotent, so that retrying multiple times won't hurt, and there would be a natural way of picking the winner if there are conflicting ongoing commit / retry attempts.
Edit: I wasn't familiar with the terminology, but it sounds like what I have written about here: http://kevinmahoney.co.uk/articles/immutable-data/
There are scenarios where a two-phase commit is used to ensure that invariants across aggregates are maintained.
We use an ACID compliant database to store our domain events and we project our events into a relational schema. When projecting events, we use transactions to make sure the database updates from a specific event are either all applied or not at all.
- would you have a single aggregate for "all bank accounts" so that one financial transaction equals one event, or is there a good reason to have multiple aggregates ?
- is command execution transactional, or is it possible for another thread to generate (possibly conflicting) events between the time you check a precondition and the time you write the intended events ?
My team uses an ES implementation with very strong guarantees: one aggregate per macro-service, and command execution is transactional, but I'm fairly certain that this is not the way CQRS/ES is described or recommended.
Command execution is not transactional, but writing to the domain event store is. That means that two writers cannot write the same domain event version number to the event store. Say an aggregate is at version 20. The next action would put it at 21. If two threads generate version 21 of the aggregate, only one of them can write it to the event store and the other gets an exception. Domain events are written in a transaction batch to the event store so we cannot get any writes in the middle. So, the first to write the next version wins. The other one loses.
We also partition our command and event processors by aggregate id. The same aggregate cannot have its events or commands processed more than one at a time because the processors are synchronous per partition.
If I understand you correctly then you have one instance of a very large aggregate per bounded context? If so, then no, it is not in the spirit of CQRS/ES. If you are handling commands in transactions, then the designers of your system may have chosen to favor consistency instead of the availability that CQRS provides.
I am not sure I understand why having multiple aggregates would provide availability. Unless the idea is that the events from different aggregates are stored on separate servers, so that a single-machine outage would only take down some aggregates ? If so, I agree that is the case in theory, though with limited practical applicability in our situation.
Since each command produces a new version of the bank and many commands are coming in at the same time, most of them will fail when writing to the event store. Do they keep retrying until they succeed? Either way, this is not efficient and could effect the availability of your servers.
If you have multiple aggregates, the consistency boundaries are smaller and therefore you have a smaller state that you have to maintain consistency across. There are less operations on a single aggregate and less opportunities for contention that arise from simultaneous operations.
Also, if you are using queues for your commands and events (we are, but I suspect you are not), then you can partition your queues such that you can process your workload N ways without worrying about things happening out of order. Each aggregate processes serially within a partition. If you have just one large aggregate then you would have to jump through a lot of hoops to be able to partition the work while maintaining ordering guarantees.
I can really only guess at the details of your implementation, but I am guessing that your design has accounted for all of the above in some way. If there is one thing I've learned, it's that there are many, many ways to implement CQRS. If your software works correctly under load, then I am not sure it would matter if it fits squarely into the definition of CQRS. In fact, you may have something new altogether that presents a better way to achieve the same goals as CQRS without the same cognitive overhead.
Our architecture prevents us from parallelizing the execution of commands within a bounded context. The ability to execute commands on any number of servers is for availability (transparent failover if an instance dies) rather than performance, since the event stream acts as a global lock anyway.
In practice, our system clocks in at a comfortable 1000 commands per second under stress tests ; during our peak hours and on our busiest aggregate, we only have to execute one command per second, so we can afford at least a x100 increase before we need to consider changing our architecture (and we're no longer an early-stage start-up, so x100 would mean a lot of revenue).
Similarly, the full state for our largest aggregate (not the command model, but the entire state of all projections) fits snugly in a few hundreds of megabytes, so all instances can keep it in-memory.
Basically you make the transfer of money an event that is then atomically committed to an event log.
The two bank accounts then eventually incorporate that state.
But I agree that often life is easier if you just keep things simpler. If you require strong consistency like with the user/profile don't make that state distributed. If you do make it distributed you need to live with less consistency.
From a more religious perspective, every action a user takes is sacred data that you should not lose, unless you deliberately want to lose it for privacy reasons. You can rely on your software to have no bugs and never confuse your user, but again that's a game you eventually use. I would rather keep that action log so I have a place to rebuild from when things are lost. Otherwise your only choice is to reverse engineer your data. Data which, if not technically corrupt, is corrupted from a human standpoint.
And if you want undo you need a log and playback anyway.
To me data consistency at the database level is not a real solution to the problem. It is a good tool, but it only solves a very narrow slice of predictable bugs. It doesn't help at all with the inevitable bugs you can't predict. A log based approach gives you a powerful tool in all kinds of tough situations.
Create the profile with a generated uuid. Once that succeeds, then create the user with the same uuid.
If you build a system that allows orphaned profiles (by just ignoring them) then you avoid the need to deal with potentially missing profiles.
This is essentially implementing MVCC. Write all your data with a new version and then as a final step write to a ledger declaring the new version to be valid. In this case, creating the user is writing to that ledger.
If you can relax the requirements, you can relax the constraints.
I've stopped using bank transfers as an example for Acid transactions, and instead talk about social features:
- if I change a privacy setting in Facebook or remove access to a user, these changes should be atomic and durable
- transactions offer a good semantic of which to make these changes. They can be staged in queries, but nothing is successful until after a commit.
- without transactions durability is hard to offer. You would essentially need to make each query flush to disk, rather tha each transaction. Much more expensive.
account[a] = -8,
account[b] = 8
You are using mysql, you make a transaction with say deposit and withdraw.
What happens on the mysql machine if you pull the plug exactly when mysql has done the deposit but not the withdraw?
The ONLY difference between SQL transactions and NoSQL microservice transactions is the time between the parts of a transaction.
Personally I use a JSON file with state to execute my NoSQL microservice transactions and it's alot more scalable than having a pre-internet era SQL legacy design hogging all my time and resources.
There are no "parts of a transaction" because a transaction is definitionally atomic.
Transactional file modification is a fairly tricky problem, and I'd be surprised if you'd actually implemented a safe system in that manner. What's certain though is that spinning up MySQL or Postgres and using it to store simple records is essentially a zero-cost setup task - so I doubt it's ever going to "hog all your time and resources".
Please refer to source code to prove your argumentation.
Yes, the database either reverses the changes.
If, after a start, the database is found in an
inconsistent state or not been shut down properly, the
database management system reviews the database logs for
uncommitted transactions and rolls back the changes made
by these transactions. Additionally, all transactions that
are already committed but whose changes were not yet
materialized in the database are re-applied. Both are done
to ensure atomicity and durability of transactions.
I'm not digging through source code for you; transactional atomicity is a well-known and thoroughly researched problem.
Transactions in an ACID system are atomic by definition. They're designed in such a way that either an entire transaction occurs, or no part of a transaction occurs – in other words, it's not possible to partially apply a transaction, by design.
There are a bunch of different approaches to implementing this. I'd guess that MySQL does something like write the complete set of modifications in a transaction to disk as a separate buffer, and only once the entire transaction is complete updating some associated metadata to add the transaction to the database. A power failure at any point will result in an uncommitted transaction, which will have no effect on the database.
Here's the appropriate Wikipedia article for further reading -https://en.wikipedia.org/wiki/Atomicity_(database_systems) - lets suffice to say that if you are rejecting the idea that atomicity exists then I don't know what else to tell you. It's a core concept in information systems.
I'm not rejecting anything, all I'm asking is; where is the code and how does it work? If you don't know then how come you are so confident it does work?
I'm pretty sure the "write some status to a file and rollback if transaction incomplete on startup" code is pretty horrible on all SQL systems. And on top of that it doesn't scale. Defending status quo is always worth questioning.
Nothing happens. And the client discovers that the connection to the db was closed before the 2 phase commit was done and gives an error to the user.
Nothing bad -- the deposit will never be seen. It sounds like you need to read up on what ACID means.
_What happens on the mysql machine if you pull the plug exactly when mysql has done the deposit but not the withdraw?_
MySQL guarantees that this cannot happen. That's what transactions are all about.
For example you can put into the log the two operations and then an end of transaction marker. When the system comes back after a crash you can replay those transactions from the log that have the end marker, and roll back those that don't.
A transaction means that the amended state is not visible to any readers until it has been fully committed. Pulling the plug means the transaction is lost, and the writer knows that it was not committed (and so can try again).