How interchangeable with Postgres was it?
How is performance?
Performance for OLTP queries with high-selectivity using primary keys/indexes is pretty good. Will not be as fast as PostgreSQL because there's consensus required to get the latest data, although it now supports timestamped queries so you can get stale data from replicas for faster reads. Complex queries can slow down or stall the server. Deletes are handled by tombstones so heavy update/delete workloads will cause increasing amounts of slowness in queries until compaction happens so it needs tuning for your workload. 
Treats all the servers as one giant global cluster instead of multiple sub-clusters for each regional location so there will be very high latency for writes if you're running in multiple regions. I recommend sticking to a single region with multiple zones instead, or very close regions like US east + central.
Overall its a good solid database choice for core operational data that you want safe. Skip it if you need fast low-latency key/value (use redis or scylla) or large analytical queries (use numerous data warehouse options).
Yes, used it in 2017 for a fairly small dataset (order of GBs) and 5-7 nodes running in Kubernetes.
> What do you like about its killer features?
Easy admin and good at scaling horizontally.
> How interchangeable with Postgres was it?
Wire compatibility helps but there were many minor differences at the syntax level. I tried for the longest time possible to maintain compatibility with Postgres syntax and it was painful.
> How is performance?
Back then it was garbage, probably 10-100x slower than Postgres. This was pre-2.0, mind you, I heard that things improved noticeably since then. And most of the problems can be circumvented by not doing stupid things and optimizing your queries/data model. You might want to throw a cache in front of it anyway.
I tried running Cockroach DB a few years ago too, also on Kubernetes. It worked but today we just use Postgres. Turned out we didn’t really have the problem Cockroach DB is designed to solve. We just needed a reliable performant SQL database.
Killer features: Change data capture (to feed elk for full text search & as a event bus for microservice), follower reads, horizontal scaling, easy to manage/operate, compliance by letting user decide where should their data reside (row-level partitioning)
Before choosing crdb we made a deep investigation about the ppl involved, the activity on github, the quality of the process/PR & contributions to the go ecosystems. It’s incredible how high is the quality bar for these folks.
The first version wasn’t that good but since 2.0 things changed A LOT.
Really enjoying it
We ran into a few panics (crashes) but support was great and they managed to fix the issues within a few days. Ultimately we stopped using it after multiple cascading failures that ended up bringing down the whole cluster. Granted we were running 4-core nodes with high CPU utilization so I'm sure it would've performed better with more cores but it wasn't in our budget.
I think CockroachDB will be amazing when cloud providers start offering APIs for High-Resolution Clocks as a service, of even better, if I got to opt-in to the system clock being synced to highres, much like in GCP you can upgrade your instance's network to premium tier on the fly.
> What happens if I'm monitoring and see an elevated clock-skew?
If you have monitoring that can detect that, NTP should also be able to run and correct it, apart from some very weird failure modes of the network itself (like traffic dropping in one direction).
> I think CockroachDB will be amazing when cloud providers start offering APIs for High-Resolution Clocks as a service
With the hybrid logical clocks that cockroach uses (basically wall time + vector clock), they don’t need to be all that precise. The trade off it makes is giving up linearizability (aka external serializability).
Spanner gets that by making every transaction wait for the maximum clock skew (7ms, with their atomic and GPS clock systems). But in order to run on commodity hardware Cockroach forewent it, in favor of allowing more acceptable skew.
One question though, doesn't CockroachDB also have a holdout period of 250ms (default) when reading across shards, kind of similar to Spanner's 7ms holdout on writes? I remember reading something vaguely similar to that somewhere.
If you have a workflow that consistently exercises the same ranges, leaseholders will be moved to the nodes that you’re connected to, in a process called “follow the workload”.
If you run a load generator against a cluster you’ll see that - for the first 30-90 seconds or so it’ll have some latency, but after some period of time, the leases should be moved closer to your generator and you should see latencies drop.
When you’re looking at data access patterns and partitioning you should keep this in mind, as it can be a big performance improvement if properly considered.
Until it doesn't. Low probability large impact doesn't directly translate to minor risk imo
Yugabyte is a proprietary C++ document-store engine called DocDB running on top of RocksDB. It offers access in multiple APIs with Redis, Cassandra CQL and now PostgreSQL. Dataset can only use a single API so you pick when creating each one. The PG part is getting the most focus right now and is very capable since it takes the planning part from PG itself. Supports more advanced queries than CRDB. More moving pieces again with a separate Master and Tablet layers. Great UI with cloud integration and multi-regional management built in. Also recently made all enterprise features free for open source too.
Citus is an extension, basically an automatic sharding system for PG that works across servers. It's bottlenecked on a single master node working with multiple workers but there's an experimental branch called Citus MX to make every node master-capable. Since all servers are just running PG, you have all the expected functionality and extensibility. All normal tables in a database live on the master (which can be limiting) and only "distributed" tables will be stretched across worker nodes. If you have a good scheme for sharding (like TenantID) then all data for tables sharing the same shard key can live on the same server and get 100% query functionality. Distributed SQL queries are not as good and can have strange failures. There was big recent update to improve this but you will have issues with very complex queries and joins. Interesting use-case for OLAP workloads that need OLTP like updates with indexes and constant writes. Microsoft uses this for a project and now owns the company so its available on Azure.
No experience with NuoDB but another mention is MemSQL for a distributed relational data warehouse that has a unique in-memory rowstore and on-disk columnstore model for fast OLTP and OLAP query support. The best support for complex SQL out of all of these and will probably run anything you send it. MySQL compatible and one of the most polished database products. It's expensive and proprietary although now has a free limited community edition, but extremely fast for analytics while maintaining usability.
Kubernetes (using TiDB-Operator): https://pingcap.com/docs/dev/how-to/deploy/orchestrated/kube...
Terraform on AWS: https://github.com/pingcap/tidb-operator/tree/master/deploy/...
What's esoteric about it though? I consider this to be standard knowledge if you're going to be investing into a database.
The incomplete list of TiDB adopters: https://pingcap.com/docs/v3.0/adopters/
and some of their adoption stories:
If you read mandarin, here are more:
If you are starting a new project and willing to build on it, it looks pretty awesome. But there is zero migration capabilities, which in my mind would be the core focus of a product that pitches drop in scalability.
Did this not work?
The MySQL import does not work for a lot of edge cases. They are aware of this and have open bugs that don't seem to be moving quickly. :/
But you can definitely find some case studies from here - https://www.cockroachlabs.com/community/tech-talks/
Firstly, if you're using SQLAlchemy as anything more than a query builder rethink your approach and act accordingly. That being said, maybe rethink the use of an ORM at all -- but that's my pet peeve it might not be yours. SQLAlchemy in particular at least in versions 2.x just doesn't like the retryable transaction approach that CRDB takes. We've had to move to modifying SQLAlchemy in our application to get it to co-operate. Had we been using stored procedures or raw sql this would not have been a problem.
We started with v2.1.x and are now on v19.1 with a commercial license and overall are happy with the support: we have direct access to the engineers via slack, a ticket escalation process with quick turnaround times, and monthly meetings -- all that help us with the edge cases one is likely to hit with a new database.
Even the smallest requests get heard: I had been using one of the betas to check out what would be coming in the admin UI and didn't like the SQL query plan at the time and mentioned it in passing and a product manager reached out and picked my brain and my input along with others made it into the shipping release. The turnaround and the professionalism was impressive.
Our ops team loves CRDB. We set it up in K8s and it just works. It heals itself if a node goes down and replication ( a huge pain the ass with MySQL) is seamless. That said, they have a managed CRDB option, which if we didn't insist on running our own infrastructure I would have signed our company up for instead of rolling it ourselves.
I'd say in our use case properly setup schemas with good indexes doing queries that are not OLAP like, but OLTP in nature, perform very well.
We went with CRDB about 6 months ago because it had the best Kubernetes support at the time and was much easier to get started with.
We recently did a Mysql 5.7 to CRDB migration (actually a number of them for our partners) and it went smoothly. We did many trial runs, but I wrote some MySQL dump and import scripts in python and we just waited for it to complete. The documentation helped there a lot.
It's not perfect, nothing is. We had hoped for a simple drop in replacement for a single node Postgres instance but one that could scale and be "cloud native". It is sort of like that. With more nodes you have the overhead of consensus but it scales with more nodes almost linearly in performance, and it's robust -- we can kill a node or two (depending on your replication factor you can determine how many nodes you want to allow loss of) and bring it back up again and it's fine. There's novel approaches with interleaving that help performance, the documentation is great, the support is as well.
Best thing to do is try it out -- you can play around with a paid license for free -- and try it on your data and workloads.
So you should be using MySQL or Postgres for the Source of Truth (SoT) and things like Cassandra or CockroachDB for specialized use cases (caches, end-user metrics with TTL, etc.) But I would never put important data in a distributed database since there's too many moving parts and it's impossible to verify if the input matches the output over a decade.
If you need multi-master for failover today, I would pick Percona XtraDB Cluster (I've used it in production for 5 years) or MySQL 8.
If you don't have paying clients, then it really doesn't matter what tools you choose.
An example of the above is MongoDB, irrationally beloved by HN. Until recently it had a global write mutex and a POS (myisam-class) storage engine. After most companies migrated off of it for production, there are two Innodb-class storage engines now (native and WiredTiger) that add credibility ... after a decade of data loss and poor write performance. Those new engines will be ready for heavy-duty use in ... about 5 years.
Postgres is surprisingly capable, rock solid service that is always my first choice. I also used Riak in production for several years and the biggest problem was with it that the operations team forgot the root password because they did not need to login for a long time. :) I have heard great things about Percona XtraDB but I never used it. RocksDB is also interesting but it is just a data store not a full fledged service, based on it there is one service called TiDB that I have heard is very reliable.
Because of the write amplification issue of RocksDB, we build a RocksDB plugin, TiTan. Here is its design and implementation detail: https://pingcap.com/blog/titan-storage-engine-design-and-imp...