Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Anyone Using CockroachDB in Production?
121 points by sergiotapia on July 18, 2019 | hide | past | favorite | 47 comments
What do you like about it's killer features?

How interchangeable with Postgres was it?

How is performance?

Tried it before. Compatibility depends on how complicated your SQL is. There's a detailed list of supported syntax which is getting better every release but it's best for simpler OLTP scenarios. [1] Horizontal scaling and reliability are great. Security setup and certificate management is horrible, and if you don't use certificates then you also can't set passwords for any user so its all or nothing for some reason. [2]

Performance for OLTP queries with high-selectivity using primary keys/indexes is pretty good. Will not be as fast as PostgreSQL because there's consensus required to get the latest data, although it now supports timestamped queries so you can get stale data from replicas for faster reads. Complex queries can slow down or stall the server. Deletes are handled by tombstones so heavy update/delete workloads will cause increasing amounts of slowness in queries until compaction happens so it needs tuning for your workload. [3]

Treats all the servers as one giant global cluster instead of multiple sub-clusters for each regional location so there will be very high latency for writes if you're running in multiple regions. I recommend sticking to a single region with multiple zones instead, or very close regions like US east + central.

Overall its a good solid database choice for core operational data that you want safe. Skip it if you need fast low-latency key/value (use redis or scylla) or large analytical queries (use numerous data warehouse options).

1. https://www.cockroachlabs.com/docs/stable/detailed-sql-suppo...

2. https://github.com/cockroachdb/cockroach/issues/32448

3. https://github.com/cockroachdb/cockroach/issues/32522

> Anyone Using Cockroach DB in Production?

Yes, used it in 2017 for a fairly small dataset (order of GBs) and 5-7 nodes running in Kubernetes.

> What do you like about its killer features?

Easy admin and good at scaling horizontally.

> How interchangeable with Postgres was it?

Wire compatibility helps but there were many minor differences at the syntax level. I tried for the longest time possible to maintain compatibility with Postgres syntax and it was painful.

> How is performance?

Back then it was garbage, probably 10-100x slower than Postgres. This was pre-2.0, mind you, I heard that things improved noticeably since then. And most of the problems can be circumvented by not doing stupid things and optimizing your queries/data model. You might want to throw a cache in front of it anyway.

This is what I love about HN - a simple question about a somewhat rare technology and a detailed answer.

I tried running Cockroach DB a few years ago too, also on Kubernetes. It worked but today we just use Postgres. Turned out we didn’t really have the problem Cockroach DB is designed to solve. We just needed a reliable performant SQL database.

Using it in production since early 2018 as our data backbone. Other systems are feed by using CDC.

Killer features: Change data capture (to feed elk for full text search & as a event bus for microservice), follower reads, horizontal scaling, easy to manage/operate, compliance by letting user decide where should their data reside (row-level partitioning)

Before choosing crdb we made a deep investigation about the ppl involved, the activity on github, the quality of the process/PR & contributions to the go ecosystems. It’s incredible how high is the quality bar for these folks.

The first version wasn’t that good but since 2.0 things changed A LOT.

Really enjoying it

We used it in production for a few months with around 4k QPS and a P99 of ~100ms. We specifically were using it because it has a "follow the workload" feature where it moves ranges of data to the server/region where they're most used.

We ran into a few panics (crashes) but support was great and they managed to fix the issues within a few days. Ultimately we stopped using it after multiple cascading failures that ended up bringing down the whole cluster. Granted we were running 4-core nodes with high CPU utilization so I'm sure it would've performed better with more cores but it wasn't in our budget.

We have used it in production for over a year in a highly threaded environment, with multiple terabytes of data, and have had virtually no issues. The support is excellent, but it practically takes care of itself. The Postgres compatibility is very good, and we rarely run into compatibility issues. If you have a highly concurrent OLTP workload, I would suggest using CockroachDB.

I wanted to try it before, but the clock synchronicity thing scares me too much to dare. How do you even keep them in line? What happens if I'm monitoring and see an elevated clock-skew? Can I even do anything or do things just start to fail mysteriously?

I think CockroachDB will be amazing when cloud providers start offering APIs for High-Resolution Clocks as a service, of even better, if I got to opt-in to the system clock being synced to highres, much like in GCP you can upgrade your instance's network to premium tier on the fly.

NTP is plenty accurate for Cockroach to work correctly.

> What happens if I'm monitoring and see an elevated clock-skew?

If you have monitoring that can detect that, NTP should also be able to run and correct it, apart from some very weird failure modes of the network itself (like traffic dropping in one direction).

> I think CockroachDB will be amazing when cloud providers start offering APIs for High-Resolution Clocks as a service

With the hybrid logical clocks that cockroach uses (basically wall time + vector clock), they don’t need to be all that precise. The trade off it makes is giving up linearizability (aka external serializability).

Spanner gets that by making every transaction wait for the maximum clock skew (7ms, with their atomic and GPS clock systems). But in order to run on commodity hardware Cockroach forewent it, in favor of allowing more acceptable skew.

I'll read up on that, thanks for pointing these things out.

One question though, doesn't CockroachDB also have a holdout period of 250ms (default) when reading across shards, kind of similar to Spanner's 7ms holdout on writes? I remember reading something vaguely similar to that somewhere.

As far as I’m aware there is no explicit holdout period on reads, however, on read you do need quorum of the range holders for consistency sake. The exception here is if you’re talking to the “leaseholder” of that data, in which case it can respond to you directly.

If you have a workflow that consistently exercises the same ranges, leaseholders will be moved to the nodes that you’re connected to, in a process called “follow the workload”.

If you run a load generator against a cluster you’ll see that - for the first 30-90 seconds or so it’ll have some latency, but after some period of time, the leases should be moved closer to your generator and you should see latencies drop.

When you’re looking at data access patterns and partitioning you should keep this in mind, as it can be a big performance improvement if properly considered.

NTP works fine. This is a very minor risk. VMs on cloud providers are already preconfigured to use NTP with their own time servers so they are very reliable [1][2] and you can always add your own monitoring [3]. Also CRDB is tested with clock-drift and you can read the jepsen report [4].

1. https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-ti...

2. https://developers.google.com/time/

3. https://docs.datadoghq.com/integrations/ntp/

4. https://jepsen.io/analyses/cockroachdb-beta-20160829

>NTP works fine. This is a very minor risk.

Until it doesn't. Low probability large impact doesn't directly translate to minor risk imo

I'm terrified of meteors too!

Great; so have you used CockroachDB in production? I bet you haven't!

Yes, read my other comment at the top of this page.

Can we expand this question to include other so-called NewSQL database technologies like TiDB, YugaByte, NuoDB, Citus, etc.?

TiDB is mysql-compatible, written in Go, and runs on top of TiKV which is a custom key/value store written in rust. It's rapidly developing and serves well for a horizontally scaled simpler OLTP workload. It can handle OLAP queries better than CRDB but neither are great at it. It also has more moving pieces with the placement driver, TiKV and TiDB layers so install is harder.

Yugabyte is a proprietary C++ document-store engine called DocDB running on top of RocksDB. It offers access in multiple APIs with Redis, Cassandra CQL and now PostgreSQL. Dataset can only use a single API so you pick when creating each one. The PG part is getting the most focus right now and is very capable since it takes the planning part from PG itself. Supports more advanced queries than CRDB. More moving pieces again with a separate Master and Tablet layers. Great UI with cloud integration and multi-regional management built in. Also recently made all enterprise features free for open source too.

Citus is an extension, basically an automatic sharding system for PG that works across servers. It's bottlenecked on a single master node working with multiple workers but there's an experimental branch called Citus MX to make every node master-capable. Since all servers are just running PG, you have all the expected functionality and extensibility. All normal tables in a database live on the master (which can be limiting) and only "distributed" tables will be stretched across worker nodes. If you have a good scheme for sharding (like TenantID) then all data for tables sharing the same shard key can live on the same server and get 100% query functionality. Distributed SQL queries are not as good and can have strange failures. There was big recent update to improve this but you will have issues with very complex queries and joins. Interesting use-case for OLAP workloads that need OLTP like updates with indexes and constant writes. Microsoft uses this for a project and now owns the company so its available on Azure.

No experience with NuoDB but another mention is MemSQL for a distributed relational data warehouse that has a unique in-memory rowstore and on-disk columnstore model for fast OLTP and OLAP query support. The best support for complex SQL out of all of these and will probably run anything you send it. MySQL compatible and one of the most polished database products. It's expensive and proprietary although now has a free limited community edition, but extremely fast for analytics while maintaining usability.

Impressively informative! Regarding the install/deployment about TiDB, we have build tools such as open-source tools such as TiDB-Ansible for on-prem and TiDB-Operator for kubernetes. Here is a list of deployment instructions:

Ansible: https://pingcap.com/docs/dev/how-to/deploy/orchestrated/ansi...

Kubernetes (using TiDB-Operator): https://pingcap.com/docs/dev/how-to/deploy/orchestrated/kube...

Terraform on AWS: https://github.com/pingcap/tidb-operator/tree/master/deploy/...

Docker: https://pingcap.com/docs/dev/how-to/deploy/orchestrated/dock...

Binary: https://pingcap.com/docs/dev/how-to/deploy/from-tarball/prod...

Adding on: there's also Vitess for anyone looking for automatic sharding for MySQL. It's not an extension but works as as middleware that operates mysql instances for you and has great support on Kubernetes. Lots of big companies as users and originated from Youtube. The system is written in Go and has unique features like table materialization and resharding ability.

Vitess is far too complex to run imo unless you want to become a vitess-admin. We tried it for good 6-weeks to get a working PoC and moved to CRDB and had one working in hours.

Curious how/why you know what seems to be pretty esoteric knowledge/experience?

It's all personal experience and research from using these database products. I've also talked to the devs for each one, including here on HN sometimes.

What's esoteric about it though? I consider this to be standard knowledge if you're going to be investing into a database.

Disclaimer: I work for TiDB.

The incomplete list of TiDB adopters: https://pingcap.com/docs/v3.0/adopters/ and some of their adoption stories: https://pingcap.com/success-stories/

If you read mandarin, here are more: https://pingcap.com/cases-cn/

Sure - but I asked because I'm specifically interested in crdb.

I tried to use CockroachDB for a side project. Gave up after a month on and off of trying to migrate a x00 GB MySQL database to it.

If you are starting a new project and willing to build on it, it looks pretty awesome. But there is zero migration capabilities, which in my mind would be the core focus of a product that pitches drop in scalability.

It's Postgres-compatible so MySQL will be more difficult to migrate seamlessly, but they do documentation detailing the process and functionality to restore a database dump from object storage: https://www.cockroachlabs.com/docs/stable/migrate-from-mysql...

Did this not work?

Sure - but the majority of customers with cash who want to migrate to something better are not going to be coming from Postgres.

The MySQL import does not work for a lot of edge cases. They are aware of this and have open bugs that don't seem to be moving quickly. :/

If you want a MySQL-compatible NewSQL database, maybe you can try TiDB.

Used in DC/OS for the Identity and Access Manager https://docs.mesosphere.com/1.13/overview/architecture/compo...

Not directly related.

But you can definitely find some case studies from here - https://www.cockroachlabs.com/community/tech-talks/

Does anyone know how much the support costs?

Yeah, without at least a few lower-level support pricing examples, I just assume it's too much and move on when I've looked at it... the backup limitations for community installs is a no go for me.

We are using it in production on Kubernetes. As a former MS SQLServer DBA here are some thoughts:

Firstly, if you're using SQLAlchemy as anything more than a query builder rethink your approach and act accordingly. That being said, maybe rethink the use of an ORM at all -- but that's my pet peeve it might not be yours. SQLAlchemy in particular at least in versions 2.x just doesn't like the retryable transaction approach that CRDB takes. We've had to move to modifying SQLAlchemy in our application to get it to co-operate. Had we been using stored procedures or raw sql this would not have been a problem.

We started with v2.1.x and are now on v19.1 with a commercial license and overall are happy with the support: we have direct access to the engineers via slack, a ticket escalation process with quick turnaround times, and monthly meetings -- all that help us with the edge cases one is likely to hit with a new database.

Even the smallest requests get heard: I had been using one of the betas to check out what would be coming in the admin UI and didn't like the SQL query plan at the time and mentioned it in passing and a product manager reached out and picked my brain and my input along with others made it into the shipping release. The turnaround and the professionalism was impressive.

Our ops team loves CRDB. We set it up in K8s and it just works. It heals itself if a node goes down and replication ( a huge pain the ass with MySQL) is seamless. That said, they have a managed CRDB option, which if we didn't insist on running our own infrastructure I would have signed our company up for instead of rolling it ourselves.

I'd say in our use case properly setup schemas with good indexes doing queries that are not OLAP like, but OLTP in nature, perform very well.

We went with CRDB about 6 months ago because it had the best Kubernetes support at the time and was much easier to get started with.

We recently did a Mysql 5.7 to CRDB migration (actually a number of them for our partners) and it went smoothly. We did many trial runs, but I wrote some MySQL dump and import scripts in python and we just waited for it to complete. The documentation helped there a lot.

It's not perfect, nothing is. We had hoped for a simple drop in replacement for a single node Postgres instance but one that could scale and be "cloud native". It is sort of like that. With more nodes you have the overhead of consensus but it scales with more nodes almost linearly in performance, and it's robust -- we can kill a node or two (depending on your replication factor you can determine how many nodes you want to allow loss of) and bring it back up again and it's fine. There's novel approaches with interleaving that help performance, the documentation is great, the support is as well.

Best thing to do is try it out -- you can play around with a paid license for free -- and try it on your data and workloads.

What other solutions are you evaluating?

As a DBA, I can say that it takes 5 to 10 years for a new database or file system to be ready (reliable, few global mutexes, third-party tools, programming language support, etc.) for production with paid customers.

So you should be using MySQL or Postgres for the Source of Truth (SoT) and things like Cassandra or CockroachDB for specialized use cases (caches, end-user metrics with TTL, etc.) But I would never put important data in a distributed database since there's too many moving parts and it's impossible to verify if the input matches the output over a decade.

If you need multi-master for failover today, I would pick Percona XtraDB Cluster (I've used it in production for 5 years) or MySQL 8.


If you don't have paying clients, then it really doesn't matter what tools you choose.

An example of the above is MongoDB, irrationally beloved by HN. Until recently it had a global write mutex and a POS (myisam-class) storage engine. After most companies migrated off of it for production, there are two Innodb-class storage engines now (native and WiredTiger) that add credibility ... after a decade of data loss and poor write performance. Those new engines will be ready for heavy-duty use in ... about 5 years.

Exactly. I also migrated companies off from MongoDB after a total data-loss that even the vendor could not recover or explain.

Postgres is surprisingly capable, rock solid service that is always my first choice. I also used Riak in production for several years and the biggest problem was with it that the operations team forgot the root password because they did not need to login for a long time. :) I have heard great things about Percona XtraDB but I never used it. RocksDB is also interesting but it is just a data store not a full fledged service, based on it there is one service called TiDB that I have heard is very reliable.

TiDB is not based on RocksDB, it runs on top of TiKV which is a custom distributed key/value store written in rust.

TiKV is built on top of RocksDB, see our chief engineer’s speech of how we used RocksDB in TiKV: https://pingcap.com/blog/2017-09-15-rocksdbintikv/

Because of the write amplification issue of RocksDB, we build a RocksDB plugin, TiTan. Here is its design and implementation detail: https://pingcap.com/blog/titan-storage-engine-design-and-imp...

Excellent thank you.

One small data point - our small MongoDB cluster serves around 2gb/s traffic of times. Much more reliable than it was three years ago.

Found your post interesting. Thats it :-)

Also, how "exciting" is it?

Meaning what exactly?

"Exciting" technologies tend to keep you up on the weekends. The opposite of: BoringSSL.

Something like I'd like to pretend people don't make technology choices out boredom, but I'm tired of pretending.

Of course not. They make tech choices for their CV.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact