Hacker News new | past | comments | ask | show | jobs | submit login
CockroachDB 19.2 (cockroachlabs.com)
141 points by irfansharif on Nov 12, 2019 | hide | past | favorite | 60 comments

Some comments I have regarding the documentation. It seems like there is a lot describing how to set up a cluster and how to do various SQL operations that most people probably know how to do.

I think the information that should be presented much more clearly are:

1.) How do we have to partition our data/what restrictions are in place? Basically, what considerations are necessary when designing the data model due to the constraints of the technology?

2.) What functionality that we expect from RDBMS do we give up when working across partitions? Can foreign keys exist across partitions? Can joins work? Inner/Right joins?

Head of Docs and Training at Cockroach Labs here. Thanks for this feedback, ralusek. It's spot on, and we're planning to create much more direct and prescriptive guidance and best practices for working with CockroachDB across multiple regions. That's when placement of data (via geo-partitioning or other approaches) is crucial for reducing network latency.

In case you haven't seen them, for now, we have docs on some data placement patterns for multi-region deployments: https://www.cockroachlabs.com/docs/stable/topology-patterns..... We believe the first and third are best for most cases. This tutorial also walks through the impact of using those patterns in a cluster spread across 3 regions of the US: https://www.cockroachlabs.com/docs/stable/demo-low-latency-m...

But we'll definitely continue working on better guidance here.

Last month I tried to deploy cockroachdb in Kubernetes and I felt the documentation wanted to treat me like a 5-year-old.

I don't think the your product documentation has to explain what Kubernetes is or its terminologies[1].

The worst part is that it does not tell the cluster admins what need to be setup in the Kubernetes cluster at all, instead, it wraps a bunch of `kubectl` commands in a Python script and says "just download the script and run it"[2]. I know a simple `python setup.py` command is easier for novice to just try out. But it could really gives seasoned Kubernetes admins some headache...Our clusters enforce GitOps and nothing can bypass pull requests. Running some random `kubectl` commands is simply impossible. I ended up spending my day reading and translating the script into Kubernetes object definitions by hand and I didn't enjoy it...

1. https://www.cockroachlabs.com/docs/stable/orchestrate-cockro... 2. https://github.com/cockroachdb/cockroach/blob/master/cloud/k...

Appreciate this feedback, and sorry those docs didn't help you much. If you'd be willing to open a github issue with some details about your need to translate the commands in our docs into something more useful for your use case, we'll look into this further: https://github.com/cockroachdb/docs/issues. The input would be greatly appreciated.

I'm curious why you would want to deploy CockroachDB in a K8 cluster. Not being critical...genuinely curious. Since it has its own idea of a cluster, it sounds complex to me. Especially since the typical geographically wide CockroachDB cluster would likely span outside a region centric k8s cluster.

(Cockroach Labs developer here) In CockroachDB parlance, a "cluster" is just some number of cockroach binaries that have local storage and can connect to one another via TCP/IP. It's entirely feasible to run a multi-node cluster on a single laptop by just starting several instances of cockroach bound to different port numbers.

The Kubernetes concept of a "cluster" is a much broader term, encompassing the compute nodes and a lot of control-plane software to make all of the magic happen.

Fundamentally, running CockroachDB on a Kubernetes cluster abstracts away the process of getting the cockroach binary running and offers a lot of convenience to the human operator vis-a-vis reliability and service discovery.

We like to say that CockroachDB is "Kubernetes native" in that you can easily build a CRDB cluster using only the basic k8s building blocks, without requiring a separate operator program to manage the deployment.

You can `kubectl apply` this config and get a fully-functioning cluster. https://github.com/cockroachdb/cockroach/blob/master/cloud/k...

Utilities like Helm et al. are certainly easier than managing a bunch of YAML configs, but they are entirely optional.

Some other CockroachDB+Kubernetes synergies to consider:

1) When using a StatefulSet and PersistentVolumes, a CockroachDB node will easily survive being rescheduled off of its underlying host (e.g. due to maintenance or hardware failure) with no human effort needed.

2) All cockroach instances are, from the perspective of a client, homogenous. That is, a client can send a SQL query to any member of a CockroachDB cluster and get a meaningful response. This maps exactly onto the k8s Service abstraction.

3) Federated k8s clusters and multi-region network fabrics do exist, although they're not exactly common yet. CockroachDB can maintain its clustering across "non-uniform network architectures" that exist within- and cross-region.

Is the primary advantage of CockroachDB over FoundationDB primarily in supporting SQL out of the box? The multi-region support seems pretty banger at first glance.


... and I had a tirade here about the licensing information being largely absent on their FAQ or product pages, but I did find this after a google before posting:


which... from what I can tell at my cursory inspection, seems okay to me. I'd be pretty pissed too assuming a giant SaaS company came by, forked their product, and then resold it with improvements they didn't upstream. Mostly, because it's just fucking rude. One could have a good business relationship with another company with both prospering, but instead you take the short-n-shitty road for no reason...

With that said, it'd be nice if you put that in yer FAQ.

> Is the primary advantage of CockroachDB over FoundationDB primarily in supporting SQL out of the box.

That + foundation db pulled the carpet out from all their users at the time apple purchased them. Cannot trust that team / product again, it would be incredibly and blasphemously foolish. At least CDB likely won't pull any such stunt.

Disclaimer: I was personally affected by the FDB team making this choice back in 2015 when events unfolded. Still leaves a bitter taste.

The situation is completely different these days. Back then they were a small startup. The DB space is incredibly challenging especially because engineers tend to want things for free and are pretty demanding. An acquisition seemed like a likely outcome. To partially offset that risk they offered source code escrow to prospective customers.

Now it's an opensource project backed by Apple. And from what I can tell the team is massively different these days. I'm not sure why the obvious conclusion is "we can't trust them".

I've heard this narrative from people that got burned by fdb's purchase so I'll never use it again multiple times. It's largely confusing to me, as it doesn't feel connected to the current realities.

Cockroach Labs PM here. During the rollout of the new license, we put together a quick license FAQ you can find here: https://www.cockroachlabs.com/docs/stable/licensing-faqs.htm...

Noted that this is not quite discoverable. I'll push to get that fixed on our end. We'll also include a brief overview of our motivation for the change.

I have not heard a good explanation of foundationdb's consistency model. The claims have always seemed magical.

Here’s an explanation from the lead developer on the team at Apple who has been working on FoundationDB for a very long time: https://youtu.be/EMwhsGsxfPU

Can we actually backup the database with the community edition yet?

Last I checked this was impossible.

Backups are still an enterprise feature but doing a dump is still supported in the non paid version.

Yeah, that is not a real solution for a database of any real size, which is the only reason to use CockroachDB in the first place.

I know. It’s a bummer.

I’ve not tried to do block level backups for our setup in production.

I have - it doesn't work sadly. :(

Can we lock a replica for writing (stop syncing) and do a disk-level snapshot of the disk that it's running on? As a way of doing backups?

Agree with the OP, no way of doing backups is kind of a dealbreaker even for hobby use :/

Problem is not every node has all data from all nodes so you couldn’t say take one node offline and do a block level backup of that node’s storage.

Excuse me if that has been discussed numerous times, but I distinctly remember CockroachDB 2.0 coming out, and it being a big deal. Where are releases 3.0 to 19.0? Was the versioning scheme changed? How? Why?

Other than that, I really wish I could use CockroachDB at "$COMPANY". Our architects, unfortunately, deem it “too new” and “unproven”. Bah.

They switched to calendar-based versioning earlier this year with the 19.1 release, for reasons explained in this blog post: https://www.cockroachlabs.com/blog/calendar-versioning/

Thanks for the link!

“““ We wanted to find a solution that would minimize frustration (and time-consuming meetings) internally, while setting the correct expectations with users around quality and stability. ”””

Honestly, this doesn't explain much. If anything, hearing “FooDB 19.1” makes me think of stuff like React 16 or Chrome 69. That is, of “hip dudes” who “live fast and bump major versions”. On the other hands, hearing “FooDB 2.16” would make me think “Yep, this thing seems stable as Perl or Linux”. The meetings point also doesn't explain anything. Go simply does a minor version bump every six months. Why couldn't Cockroach Labs just do that?

Oh well, who cares, really, as long as the product is great.

Pro-tip: if at all possible do not run this on top of Ceph. Bare metal SSDs or VMs with SSDs.

That's normal advice for any database or system that does it's own data replication. Having multiple distributed storage layers never works out well.

From experience I take it? Was Ceph running on any of those?

Yes it is how we run it in production and sadly it’s not great.

On my own laptop I’ve found it to be maybe 85% or so as quick as normal single mode Postgres. That said things aren’t great when the storage layer is doing consensus and replication and then the database is as well.

Glad to see Cockroach Cloud, I signed up for the beta.

I'm only really interested in an elastic, managed program with transparent pricing. As a contractor with a fair amount of exposure to various startups around the bay, I find these to be relatively ubiquitous criteria.

CockroachCloud PM here -- we hear you on the transparent pricing. Cloud pricing is transparent, see here https://www.cockroachlabs.com/docs/cockroachcloud/stable/coc...

Glad to hear you signed up for the beta. We've been letting people off in batches and hope you can use CockroachCloud for your apps! If you have any feedback on the beta product, do let us know!

Is this the new version that has the new license terms that prevent amazon doing an "elastic" on them?

Waiting for the comments about the name.

On a more serious note, looking forward to trying this new release out. I found previous versions really straightforward to get up and running with on Kubernetes but performance was always lacklustre.

I respect the name at this point. It won't go away - just like a real Cockroach and a distributed fault tolerant database system.

You can find our more recent performance numbers on https://www.cockroachlabs.com/docs/v19.2/performance.html

Is that saying that you are comparing Cockroach running on 81 c5d.9xls vs Aurora running on 2 r3.8xls? I get that part of what you are trying to show is that Cockroach will scale far past what any single-master system can, but it feels pretty lame to run a test comparing transaction throughput on 2900 cores vs 64 cores and a data set size comparison on 81 hosts vs 2.

The sysbench metrics seems like a much fairer comparison, and CockroachDB looks great in those metrics as well, so I don't really get why you are leading with a comparison that looks really sketchy at first glance.

Hi- product manager from CRL here. You are right--we hope to demonstrate that CockroachDB is built to scale horizontally. Deployments of CockroachDB can grow by easily adding more nodes to the cluster which in turn linearly scales throughput. We have posted the most recent published Aurora numbers as a comparison to demonstrate how architecture can influence scale.

We also hear your point about efficiency as tpmC (throughput) alone isn't sufficient to compare systems without taking hardware into account. TPC-C asks users to provide a tpmC per dollar amount. We conducted this price comparison previously in this blog post https://www.cockroachlabs.com/blog/cockroachdb-2dot1-perform.... These results are even lower in 19.2 because we can achieve greater tpmC with fewer nodes.

This page wasn't met to be competitive-we simply showed Aurora as a reference point. Since we want to focus on CockroachDB we will remove the Aurora comparison.

Performance in what sense? Searching online for benchmarks gives some impressive numbers, specifically this: https://www.cockroachlabs.com/blog/cockroachdb-2dot1-perform....

Others have commented about how the language that CockroachDB was written in causes performance issues due to Go using a GC. Can't say that this is the main reason for sure, but I guess it could be a possibility.

Try it out on your stack and with your data and see.

How does CockroachDB compare to YugabyteDB

CockroachDB is built from scratch in Go, uses Raft + RocksDB as a distributed key/value store, and then implements the Postgres data structures and wire protocol on top. Focuses on strong consistency and durability (which was the origin of its name). Can run active/active in multiple regions. Lacks full compatibility with Postgres.

Yugabyte is a multi-model database built in C++, uses Raft + RocksDB as a distributed key/value store, and then implements a proprietary document-store model on top called DocsDB. This is exposed as a Redis API, Cassandra API, and now PostgreSQL RDBMS interface by using the actual Postgres code for the SQL layer. Much higher compatibility with Postgres with all data types, most advanced SQL, and even some extensions that work without problems. Not as advanced on the distributed side and uses more of a primary/replica setup.

Both are great options for a distributed scalable RDBMS with Postgres.

Thanks for this overview. Is Cockroach an open-source version of Google's Spanner as well?

Not technically open-source (BSL-licensed), but yes, source available on https://github.com/cockroachdb/cockroach/.

They used to be completely open-source and have changed the license similar to Redis Labs, MongoDB, etc.

And yes, you can consider it as an alternative to Spanner with more usability and running on commodity hardware.

First google hit: https://www.yugabyte.com/yugabyte-db-vs-cockroachdb/ My understanding is that CDB is a bit more focused on correctness before focusing on performance.

Heh, they didn't seem to test their University website[1] before publising it. There is a gaping:

  <!-- End Google Tag Manager -->
  Additionally, paste this code immediately after the opening <body> tag:
  <!-- Google Tag Manager (noscript) -->
In the source code. That “<body>” isn't escaped either, which makes the whole thing borked-up.

This is why we need XHTML.

[1] https://university.cockroachlabs.com/catalog

Thank you, we didn't get that resolved beforehand. Should be fixed now.

Wow, that was quick. Thank you :-)

Awesome to see a new release! We've followed Cockroach for a while, would be a good fit four our workload.

However, I think it's sad that their pricing is so opaque.

Even after a few emails back and forth with their sales people it still feels fairly arbitrary, and since it's not transparent we don't know if they're just making numbers up based on what they think we're willing to pay.

I find this sales-tactic off-putting. In theory it allow them to "not leave any money on the table", as they would probably frame it, but I'm curious how many deals they will lose simply because people don't want to be held hostage, or don't want to put up with the sales-hassle.

With AWS and Google Cloud (their main competitors), we know that prices won't be jacked-up.


An interesting example of this mistake is Mapbox.

They had a limited free tier, where most use-cases would be "talk to sales for pricing". We had a frustrating 2 months talking with their sales people, finally getting a reasonable deal. But we were very close to just ditching them for Google Maps (even though I liked Mapbox product better and was willing to pay more for it), just because the price negotiation and all the cheap sales-tactics was so time-consuming.

Then, 6 months later, Mapbox CEO posted this blog post:


    [excerpt, from Mapbox CEO announcing pricing/sales rethink]
    How we price our tools has a significant impact on how those tools get used 
    and what gets built with Mapbox. We were pricing some of our APIs wrong — 
    making things confusing and restrictive. Rolling out new pricing took 
    close to five months; informed by the stories of our builders 
    whose curiosity and insight gave us a lot of honest feedback. 
    Here’s what we were doing wrong:
    - Unpredictable pricing at scale
    - Development slowed by price modeling and negotiating volumes
    - Unintuitive billing units
    - Hard to compare measurement to our friends at Google
    - Confusion around when commercial plans are needed

    Our goal with this change is to reduce the friction to build with 
    our tools and to allow our team to help developers use maps and locations
    more creatively. As we designed this new pricing, we kept our key 
    pricing principles in mind:
    - Predictable and aligned with metrics our customers already measure
    - Clearly defined discount tiers as businesses scale with no surprises
    - Product usage measured in a way that’s clear to all involved
    - Generous free tier to encourage building and make it easier to get started
I experienced all those downsides, which almost pushed us to leave Mapbox, so very happy to see them reconsider (for their own sake).

Perhaps CockroachDB could unlock similar benefits by making their pricing/sales more open.

This is so true, I had the exact same experience last year (back then they says it was something like 700 USD per core yearly for the startup plan), decided to table the offer and use their dump backups until I was bigger. Then they released Regional replicas as an enterprise feature and realized their core offering was complete and all the new cool stuff was going for their paid enterprise offering. Then they became proprietary and lost me completely. If I ever need distributed read replicas I can do it myself with Postgres or just use AWS RDS. My money is on Yugabyte now.

Agree. My current firm has a policy of not working with companies that don't post public prices, which sadly writes off CockroachDB for us. It's a shame because we have a major use case for it.

CockroachCloud PM here. CockroachCloud pricing is transparent. You have the option of three different node sizes in GCP and AWS: https://www.cockroachlabs.com/docs/cockroachcloud/stable/coc...

Somewhat related but why not have an object store like ceph but expose the guarantees of the data (cap or whatever) and just interact with objects? Bonus points for making the API pluggable.

Anyone use the cross data center clustering in the real world? Got a bunch of people considering this - a cluster spanning two data centers as the way to do transactional updates.

CockroachDB seems to be getting a lot of attention and marketing cycles lately... What has made it the focus of late?

Because we blogged about our new vectorized sql engine[1] & our new parallel commit protocol[2] recently, two "big"-ish engineering lifts that were packed into this one release.

[1]: https://www.cockroachlabs.com/blog/how-we-built-a-vectorized...

[2]: https://www.cockroachlabs.com/blog/parallel-commits

It has a lot of promise to provide Spanner-like functionality to people outside of Google.


Probably has to do with their decision to name it after the cockroach. I mean, if they had called it MaggotDB or BotFlyDB or TapewormDB, people might find it unappealing.

Those aren't nearly as resilient though. Now, if you made a version of sqlite that replicated itself inside your program then you could use botfly ...

The successor will have to be named TardigradeDB.

You can't even kill it with Boring Company's flamethrower! Tardigrades are virtually indestructible as going into space and coming back won't always kill them.

Love the product name!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact