That puts a upper (EDIT: corrected from lower) bound on throughput of many types of operations, such as e.g. if you need to guarantee strict ordering of transactions (EDIT: in the worst case, where transactions are being issued from multiple nodes; best case you can optimize in a variety of ways).
So while we won't have to deal with interplanetary latencies, it still matters greatly for throughput and consistency guarantees.
You're right that there are operations that you can increase throughput of, and I was imprecise in that you are right you can order them "after the fact", but that's not very interesting. The scenario I had in mind was where clients talking to different nodes are issuing transactions that depend on the same items of data.
In that case you either need to obtain a lock, in which case you best case need to wait for the latency interval (plus a margin or largest possible clock skew) to see if the other side wants a lock on the same object, or you need to be optimistically firing off transactions, but the best case then is that you just get your transaction in right before the remote side start operating on it, and they happen to just fire off another transaction, and so on. In reality there'd be slowdowns.
If you're dealing with uncontended objects, you can do much better on average, but you can't guarantee better than the latency between nodes.
There are certainly tons of special cases where you can optimize.
The answer to any distributed consistent database is always the same: we gave up on one letter in CAP. In this case, it was 'A', availability.
And hey, cool, that's certainly an option. But I'd prefer if they had simply made that the subtitle.
For more information on this perspective you can read _Spanner, TrueTime and the CAP theorem_ by the fella that coined the term CAP: https://static.googleusercontent.com/media/research.google.c...
Yammer concurred with this perspective, stating:
"At Yammer we have experience with AP systems, and we’ve
seen loss of availability for both Cassandra and Riak for
various reasons. Our AP systems have not been more
reliable than our CP systems, yet they have been more
difficult to work with and reason about in the presence of
inconsistencies. Other companies have also seen outages
with AP systems in production. So in practice, AP systems
are just as susceptible as CP systems to outages due to
issues such as human error and buggy code, both on the
client side and the server side."
This claim is much stronger than just saying AP systems sometimes fail. Therefore observing some AP systems failures is not enough to prove it. For the statement to hold true, AP systems would have to fail just as often as CP systems. However, such claim is totally unfounded. The article doesn't link to any research showing AP systems have the same (or worse) availability than CP systems.
On the other hand, most CP systems, even centralized ones, are not truly consistent as required by ACID either. Most RDBMS systems run at Read Committed level for performance reasons, which is ACD, not ACID.
It is also ignoring the fact that most systems are neither AP or CP, and that this distinction is not very useful to describe database systems.
I'm not sure about the claim about CP systems, but I probably have a different opinion on "common" CP systems (etcd, consul, Spanner-likes, which do offer consistency in the sense of ACID.) But it's definitely worth considering things weaker than that - e.g. Azure Blob Storage and Google Cloud Storage (S3-likes) offer strong consistency but no cross-key transactions. These databases are very applicable but weaker than Spanner-likes like the one from TFA.
Agreed that AP vs. CP isn't a good way to describe databases :)
"AP systems are not 100% available in practice"
Nothing is 100% available, it is like absolute zero, you can get close but never reach it.
I am not sure if the intent on this was to justify the uptime problems that yammer had this year, or if it was to just self justify engineering choices.
The real flaw is ignoring the horses for courses reality of systems needs. Automated Teller Machine's transactions are not strongly consistent, yet we manage to get by.
Spanner is a "shared nothing" architecture which has advantages availability and recoverablity. But you do pay a price for the high level of consistency.
Each write is mediated by a leader, and has to be committed on at least 2 of the three regional nodes which does have a cost.
Yammer (and I guess rayokota) chose to use PostgreSQL which is not a horrible choice for a traditional RDBMS these days, but we all know the pain of a master node failover, re-sync etc.. And with most streaming replication methods there will be a serious issue with the nodes that may have been on a partition with the original master and the fail-over node that is promoted.
Distributed systems are hard, but I am having a hard time really finding much value in this post outside of a fairly opaque opinion that there is no perfect product for all needs, which will always be true.
But as for why I chose to respond to your comment jacobparker. The thing to remember about strong consistency is that locks, two stage commits, single mediators etc...they will all limit the theoretical speedup in latency you can have by adding resources.
While not perfect this means that you should engineer with Amdahl's law and hope for Gustafson's law.
As a practical example on how this becomes an issue with scaling consider the following article about the limitations of single disk queues in the linux kernel on systems with multiple CPUs
Note the extreme falloff on Figure 4.
This doesn't mean that you can't start out with a simple strong consistency model, but if you don't at least try to avoid the need for ACID type transactions it will be very hard to scale later. Worse most teams I have been on try to implement complex and fragile sharding schemes which in effect replicate a BASE style datastore. If we think managing distributed systems are hard, writing them is much more difficult.
But really there are no universal truths in this area, and all decisions should be made on use case and not by selecting the product before defining the need (which is our most common method it seems)
I think all that would need to be true is that there exists one CP system that has uptime figures comparable to AP systems. Then you could say that targeting AP is pointless, when you could either just use that system; or engineer your own new CP system to do the same things to achieve uptime as that system.
Can you imagine every ATM having a redundant networking connectivity and two independent power lines?
There is no free lunch. Comparing just availability figures is therefore not enough. Still an AP system may be preferable if it has the same availability at a fraction of a cost of that unicorn available and consistent system.
This is different from an AP system, because an AP system has to be built from the ground up to assume that stupid corruptions will happen during netsplits and that it'll have to re-integrate them; while the CP system can just assume it'll all go down, and not have to have any of that extra code or infrastructure.
A good example of a high-uptime CP system that I'm aware of, is telecom switches. They're usually architected as simple pairs: one live, and one hot standby. And—even in platforms that don't have "hot upgrade" functionality—you can still upgrade both nodes without downtime by just intentionally causing connections to fail-over back and forth between the nodes.
Telecom switches have very little downtime. But they do go down, rather than becoming inconsistent. And the resulting architecture is much cheaper (in both hardware/networking costs, ops costs, and software development/maintenance costs) than the equivalent architecture you'd need to serve calls under AP guarantees.
Consider a bank account. I can consistently service writes that add or subtract from the balance as long as I can guarantee durability and is willing to blindly update without a consistent view of the balance.
I can only service consistent reads when I can guarantee that I have seen every committed transaction.
In this case we opt for availability over consistency when presenting the current balance, but for consistency when e.g. cutting statements (through the brute-force method of just waiting until all settlement for the relevant dates has occurred).
But there are lots of consistent systems that can service reads but not writes during disruption. Databases with synchronous replication, for example would typically continue to handle reads, but fail writes, during a partition.
It's very easy to imagine this scenario. All you need is a read replica of your CP RDBMS.
For most systems consistency is only required on write transactions and eventual consistency on read only transactions is acceptable.
I think the same may end up being true of globally distributed databases that try to provide generic ACID transactions. Will they actually ever achieve the performance and reliability needed to serve massive scale global work loads?
Or will this be a Mongo situation where all of the scale promises turn out to be unfulfilled once you actually reach the scale where you need them?
edit: oh, and it's written in Go, my main language, which is definitely a plus if you're a gopher
Confirmed Development (I see you already reference this issue above):