I can't find any documentation on how, exactly, ACID is achieved across all these partitions. It's impossible to reason about performance characteristics in the absence of documentation.
For example, if they use two-phase commit, what is the granularity of locks ( table, row, etc )during an open transaction? Table locks during two-phase commit with, say, 50ms coast-to-coast latency would be an immediate dealkiller.
Or, if it's eventually consistent as most asynchronously distributed DBs are, they can't claim AC.
I really wish Cockroach had better documentation - not so much how to use it, but how it works so as to let engineers reason about performance characteristics.
Meanwhile, I acknowledge that CockroachDB does not yet convey well its performance model so that engineers can get an intuition about it. There is documentation, but it's either too detailed (like the design doc) or too high-level (most of the docs site). This needs a new sort of documentation, which we are working on.
I don't believe that is correct and that is not a good meme to promote. Successor implies the 'deprecation' of Paxos, which is certainly not the case. For the architectural sweet spot where both protocols can be used, it is fair to say Raft provides the more accessible approach, and that for those who insist on rolling their own consensus mechanism.
Paxos and Raft serve distinct architectural concerns in the shared context of providing a distributed state with formally defined consistency guarantees. And to the point, it is entirely reasonable to see a distributed system that utilizes both to address system component specific concerns.
CockroachDB, ironically (in context of your OP), in fact could benefit from a vendor lock-in were the candidate cloud provider to provide atomic clocks and GPS cards in their offerings.
A lot of synchronization problems are solved with access to a very accurate synchronized clock. E.g. with it you can determine a reliable "happens-before" relationship between all in flight transactions, among other useful things.
Literally all GPS does is provide a way to determine the time very accurately in relation to a bunch of synchronized atomic clocks in orbit. (As a side benefit, knowing the time like this will also give you your position.)
Windows for concurrent operations are bounded by clock uncertainty. If you say you did something at 2:00 and I say I did something at 3:00, you probably did your thing before me. But if it's 1:57 vs 1:59, there's strong possibility my clock is fast or yours is slow. If I know both our clocks are within 5 seconds of the correct time, then I can say your action happened before mine. So accurate clocks allow you to provide ordering to more events.
Additionally, If you do identify operations as concurrent, you can explicitly deal with them. Dealing with them is doable, but expensive. If your clocks are in sync with a small margin of error, the rate at which you will have to do expensive things to resolve concurrent operation is low.
Google's Spanner database uses atomic clocks to have a max clock skew of 7ms. That eliminates most concurrent conflicts, and dealing with conflicts is fast because we can recognize conflict quickly. Basically, it waits 7ms on all writes to see if any other nodes have a conflicting write.
CockroachDB can operate in "spanner mode", but without atomic clocks it uses a max clock skew of 250ms. This is a much bigger window and identifying conflicts on write takes a lot longer.
CockroachDB recommends that you don't run in "spanner mode" because of the performance hit. Instead, the default mode does lazy conflict detection, where instead of waiting after all writes, it will sometimes wait to read data until the clock skew window has passed.
The trend Ive been seeing is "AP with eventual C". It comoes down to the idea that we have a planet, with datacenters in locations on said planet. Light down fiber or copper only goes so fast.
The "eventual consistency" part is simply acknowledging that we can't beat C when our databases are in North America, South America, Europe, Africa, Australia, and Asia. One-way comms takes time. 2 way comms, even more so.
The tradeoff is, that changes can come in, and 'eventually' will propagate to the rest of the DB nodes. It does sound not ideal initially, however this type of operation would allow a localDB running as AP to sync when back online.
cockroachdb and spanner are "CP with high probability of A". There are situations where the db (or possibly a subset, haven't read too deeply into the papers) can become unavailable due to a loss of quorum, but these situations are highly unlikely.
I'm not convinced this helps much with lock in. Assuming you want to do things like deployments, scheduled backups, monitoring, etc, you're either:
a) Using "lock in" features like Cloud Formation, Lamba, Cloudwatch, etc, to do so, such that you use some of the "value" of the cloud.
==or==
b) Rolling your own solutions for these peripheral functions (deployment, backup, monitoring, etc) and running them on the bare VPS instances. In which case...why are you on this expensive cloud? You can get a VPS with lower bandwidth costs elsewhere.
Basically being sort of "half in" on the cloud, where you use it as a glorified VPS makes no sense to me. Either be all the way in, or all the way out.
Alternatively, you can use tools like Kubernetes and Terraform as abstractions around cloud environments while still being able to utilize them fairly effectively. Kubernetes is a clever product for Google to release, because it makes migrating loads to GCP much easier if you already run them in Kubernetes.
Similar decision for me though. If you want k8s, use Google or Azure, where they take care of it for you.
K8s on AWS, at least right now, means you are in the business of being a plumber. For example, if I remember right, the solid k8s ingress controller means ELB, which has issues like the need for cache warming. ALB is better, but has only bleeding edge support as an ingress controller in K8s. Similar for getting it installed on AWS in the first place.
Rackspace or Digital Ocean or similar is a cheaper better place if you want to roll your own k8s solution.
I'm hosting K8S on AWS in production. Kops takes care of most of the details, though I _do_ understand the low level details of what's going on. Gotta be able to debug somehow.
Google Cloud Platform is definitely cheaper than AWS, and their hosting of Kubernetes seals the deal for me. The only problem? Legacy. If I want to link two virtual machines across cloud environments, it introduces somewhat unacceptable latency, and throughput will be less reliable. So connecting to things like extremely busy Redis or AMQP clusters is not really as safe and downtime is more likely.
Still, I don't feel like Kubernetes is terrible on AWS. It works fairly well. I'm not using Ingress resources right now, just a bunch of services with their own ELBs.
Sure, I concede it can run well. It just feels like renting a furnished house and storing the supplied furniture in the basement so you can use your own.
Why not just rent a cheaper unfurnished home if you insist on bringing your own...
I'm not saying I entirely disagree. I'm mostly suggesting that moving toward systems like Kubernetes and Terraform help you to reduce lock-in so that you can pick the best tool for the job. 7 years ago, it made sense to use AWS and not look back. But I'm sitting here now, and a lot of software in our stack is pretty tied to AWS when I'd much rather use GCP.
The furniture analogy is a bit flawed. I can get comfortable in a new, furnished house, but furnishings don't come with vendor lock-in. Cloud services naturally do, at least the highly proprietary ones. I'm not saying put the furniture in the basement, I'm just saying throw a slipcover over it so we don't touch it directly. :)
DigitalOcean meanwhile is a lot more limited. The new stuff they've added is nice with firewalls and load balancers, but the tooling with AWS is more complete and I can utilize a fair bit of that tooling from within Kubernetes, including things like Amazon's certificate provisioning.
Basically to be clear, I'm saying maybe now embracing GCP and Azure seems like a solid plan, but years down the road you might want to have some more mobility. If you're already using things you can bring with you to the next provider, you're going to be ahead of the curve.
It's like developing your app to work with MySQL, PostgreSQL, sqlite, Oracle, SQL Server....
Unless that is a business requirement, don't do it.
It reduces you to the lowest common denominator and the chances that you will actually switch from one DBMS to another is low and probably would require lots of work regardless.
I think this is a poorly written blog post that completely ignores the challenges and difficulties that arise when going multi-cloud. Multi-cloud introduces huge amounts of latency, requires additional load balancing and routing logic, requires twice as much administration, etc.
Multi-cloud is a huge decision that need lots of planning and application level support. I'd go so far as to say it is an order of magnitude easier to design an application that can be moved between clouds easily than it is to design a multi-cloud application. This still defeats vendor lock-in.
I think CockroachDB is interesting tech, but this blog post reads like a pretty empty marketing piece.
It’s true that operating your own cluster (of any technology) spanning cloud providers is not for the faint of heart. However, at FaunaDB we’ve found performance of multi-cloud globally consistent operations to be acceptable. We currently run across AWS and GCP, with Azure on the way, and we have customers running their own clusters in hybrid configurations.
I'd cloud vendor lock-in a serious concern for most businesses? It always feels like a requirement that originates from the technical side rather than the business side. But it never feels like it's that important actually.
We have a bunch of work that requires large, process heavy, calculations, so yes, vendor lock-in is a concern for us as it can mean either huge cost increases, if the vendor changes pricing for the required instances upwards, or huge savings if a competitor can do the same computations for less. We solve this problem by making sure we can trash servers and spin them up elsewhere if needs be (mostly via terraform etc).
I don't think this should apply to anyone running some hosts for web applications though.
It's a overblown concern for many, considering that the major clouds are massive companies that will likely be around far longer than most of their customers (or will acquire them). Every new client actually contributes to the longevity of a cloud provider. There's also lots of competition to offer similar features and many services can be mixed/matched easily (eg: we run on GCP but use Azure event hubs).
My main concern with vendor lock-in in hosting (not just cloud), is that it'll easily turn business problems into ops problems. Credit card suspended because of (possibly spurious) "suspicious activity"? Cloud account suspended for non-payment. Trouble with hosting company x invoicing? Same result. Spurious DMCA takedown filed? Maybe a TOS violation - account suspended. Etc...
It's not so much that most of these would be permanent - but it does add risk of downtime from the "not ops domain" to the ops domain.
And multiple providers isn't a universal fix - but it might mitigate some risk.
I think that vendor lock-in on a cloud is more about reliance on technologies designed for a specific cloud. AppEngine, DynamoDB, or Cloud Formation being examples. To avoid lock in, use technologies like Kubernetes, MongoDB, or Terraform instead. That way jumping clouds is at least possibly without a rewrite.
I've been through datacenter power outages, refusal to renew, and difficulty in getting circuits in a timely manner in multiple Tier 4 datacenters.
Self-hosting is only as good as your planning, strategy, and execution, which is much the same as the cloud world. You're probably safer running a single EC2 instance than running a single dedicated server in some random colo, but certainly running your own geographically distributed infrastructure is likely better than relying on a single AWS availability zone (for example).
However, it's a great idea to open up peoples' eyes to the idea of how being on a major cloudy provider can make you collateral damage of an attack that never would have targeted you if you were in DIY mode.
What I would be curious, is, how much faster the db would be if it was in c++/rust instead of java/go. I know bigtable claims 3x faster than hbase. Scylla claims ~10x (though also different architecture) faster than Cassandra. Trinity ~2x compared to Lucene. etc etc.
I did a performance evaluation for a project recently and CockroachDB while awesome to setup and operate, was painfully slow. Couldn't get more than a couple thousand writes per second on a cluster consisting of more than half a dozen nodes all in the same datacenter. There seemed to be a huge amount of write amplification (nearly two orders of magnitude) and the load distribution across nodes wasn't really even either.
Sadly had to give up on CockroachDB. They've got good engineers on their team and have some nice funding as well so I'm hopeful that they'll make big improvements in this department eventually. But it'll take quite a lot of work to bring it up to speed - excuse the pun :)
Oh and I don't think Go is any limiting factor here. It's more of an architectural/engineering issue.
I would argue that data structure and system architecture play much more of a role than language choice. Scylla's claims only make sense for datasets that are completely in-memory. Cassandra explicitly targets much larger than memory datasets, and so your bottleneck should be disk throughput not CPU. If you plan to keep everything in memory and just snapshot to disk, you should probably change your choice of data structures.
We're also talking about wide-area storage. Any latency from Go over C++ should be negligible: the Go GC is sub-ms latency now and will just get better [1] while for inter-DC network latency you can expect 50-2/300 ms latencies. The GC just doesn't matter here. Go also gives you control over memory layout, and has an ever-improving compiler [2]. C++ would only really be necessary for sub-ms tail latency systems which are really rare.
I think the gap between BigTable and Hbase is likely due to Google spending more time optimizing BigTable. They do use it for basically everything not built on Spanner.
[1] See recent activity on https://github.com/golang/go/issues/10958
[2] Go has a ways to match GCC, but you can work around the language for now to get close to C. If you want to improve things here, maybe contribute to gccgo or gollvm so Go users can get all of LLVM & GCC's optimizations.
Note that the underlying data store where a lot of the byte churning happens is RocksDB, written in C++. Also the Go/C++ interface in CockroachDB is coarse to reduce overheads.
For some reason(s) nobody is using LMDB on distributed dbs (only know actordb). While for single-nodes there are many cases that have switched and been better.
I know and that you take special care to bulk as much stuff to it for lower overhead. But then how much of % speed is left into GO ? Say GO is 2x slower, and the db is bottlenecked on cpu/ram, you're using 2x servers. While if it's bottlenecked on disk it's rocksdb which you can't make faster by using a different language.
Like even TiDB (mysql protocol, golang) uses TiKB (kv in rust,rocksdb) to make it faster.
Agreed. I love Go, but a database is one of the things in the stack that ought to be as optimized as possible in my opinion. C++ still seems preferable in these types of applications. At the same time, if the underlying architecture is enough of an improvement over older systems, it's still an upgrade.
You can't optimize anything with C++ if your database is stuck waiting for consensus with nodes 100 ms away. Close to the metal and low level languages are really poor choices here, even Go is too low level for CP databases.
Yes, but you'll have lower overhead on each query/request and more concurrency ? For each query the bottleneck is the network which you can't fix, but you can fix the 2,3,etc bottlenecks which are inside 1 machine (cpu/ram/disk).
Yeah...I'm not really aware of a higher level language that mystically makes networking faster. As far as I'm aware, the most efficient network protocols tend to be implemented at a low level, too.
Well we already have metrics on what the languages themselves can do. And given that performance will definitely matter in this domain, it's not a premature optimization at all. A database isn't something that can be written easily enough to simply rewrite it in another language later as an optimization.
Yup. And the amount of work it takes to be competent in just one (such as AWS) will make it nearly impossible / not worth it for most people to run cross-cloud.
It's a valid concern for business. A lot of my customers go to some pretty interesting lengths to make sure they're not locked into a single vendor's stack.
FWIW, CRDB should be a good fit for people worried about database lock-in. You'd only be "locked in" if you discover that we're much better than the alternatives :)
CRDB uses the standard SQL language. Also, CRDB speaks the PostgreSQL wire protocol; for using CRDB you'd be using the Postgres drivers for different languages. So, more or less, any application that can work with us should also work with Postgres (and vice versa). Moreover, the Postgres drivers generally respect other higher interfaces (odbc, jdbc) so, more or less, a CRDB application should also work with any other established SQL database (and vice versa).
CRDB can also export your data in formats easily importable by other SQL databases.
For example, if they use two-phase commit, what is the granularity of locks ( table, row, etc )during an open transaction? Table locks during two-phase commit with, say, 50ms coast-to-coast latency would be an immediate dealkiller.
Or, if it's eventually consistent as most asynchronously distributed DBs are, they can't claim AC.
I really wish Cockroach had better documentation - not so much how to use it, but how it works so as to let engineers reason about performance characteristics.