Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What Is the Value of CockroachDB?
47 points by lokiju 15 days ago | hide | past | favorite | 29 comments
I dont mean this sarcastically, I just keep hearing how this technology is great and distributed but I cant really see the current use case in current enterprises.

Is there something I am missing? Why would someone simply switch their backed databases for something that does not even offer Data Warehousing or other data management tools?

Disclaimer: I am from PingCAP, the company built TiDB

CockroachDB is well suited for applications that require reliable, available, and correct data, and millisecond response times, regardless of scale, and belongs to the category of NewSQL, which promises to combine benefits from RDBMS (strong consistency) with benefits from NoSQL (scalability); it mainly achieves this through new architecture patterns and efficient SQL storage engines. TiDB and CockroachDB are a few of the leading NewSQL databases. Each database implementation has its own take on how to ensure strong consistency with a scalable architecture.

The following situations are some hints you may consider for a switch,

1. RDBMS is becoming the performance bottleneck of your backend service

2. The amount of data stored in the database is overwhelming

3. You want to do some complex queries on large amount of data that cannot fit in one machine without manual sharding

4. Your application needs a full ACID transaction for data distributed on multiple machines.

There are multiple choices for NewSQL too, engineers are always trying to compare TiDB and CockroachDB, here’s some considerations before you decide.

TiDB is compatible with the MySQL protocol while CockroachDB is compatible with PostgreSQL. You can directly connect to TiDB server using any MySQL client and benefit from the MySQL ecosystem. Plus TiDB is well adopted and trusted by 1000+ large scale Internet companies and banks.

Currently, CockroachDB is not suitable for Online Analytical Processing (OLAP) while TiDB is a Hybrid Transactional and Analytical Processing (HTAP) database that supports both OLTP (Online Transactional Processing) and OLAP workloads. So with TiDB, a typical ETL (Extract, Transform and Load) process that moves data to a different database for analysis is no longer required, enabling you to create new values for your users, easier and faster.

Hope the above can help you a bit.

Thanks, very helpful.

I have a feeling these technologies will be much more mainstream in 2-3 years

No problem at all:)

Indeed! With the current situation going on, the digitalisation accelerated, and the dataset is moving to a large scale, we see a huge demand of services like this.

It's relational, it scales horizontally, and it's free. That combination is surprisingly hard to find.

i'm not a database person, but does postgres not scale horizontally?

No. Read replication approaches linear horizontal scaling (though global latencies can lead to weird edge cases).

Write replication requires external products.

Specifically, Postgres extensions that add write sharding are generally either (a) non-FOSS commercial products or (b) not very well maintained.

From my experience, CockroachDB is super simple to set up. You can pretty much follow their tutorial and get a working cluster in no time. So I would say it's a good solution for non-experts. Now if you are already running Postgres, then I don't know why you would want to switch.

The locality features are huge. Being able to do database operations near where your users are is a big win for latency in global environments.

Pairing that with edge compute like fly.io is a killer combination. That you can use most of your normal Postgres libraries with it make it an easy transition.

I see, thanks. The uses for locality seem a little niche though no? Are the benefits that much greater that it would be noticeable to users?

You can see hundreds of ms of latency per request based purely on distance.

Tokyo - NYC is ~175 ms and that’s a good route. Compound that by all the requests you app does.

“Shard your Postgres DB without the typical devops headaches.”

Keep in mind, this is the only part of that elephant I've interacted with. The key things for us were it was transnational and had cross data center replication. You could update a node in data center a, and have it replicate down to the other data center with minimal latency.

NOTE: I am on the product marketing team at CockroachLabs.

Cockroach Labs has been building CockroachDB for over five years and has reached a level of maturity where many enterprise and community customers are gaining value from the database.

We consider it a cloud-native database as it was architected with the same principles as a distributed platform like Kubernetes. It is built for scale, resilience, is shared-nothing and is comprised of a single binary across all nodes with consistent API. We have also gone to great lengths to ensure it is wire-compatible with PostgreSQL and implements a large portion of standard SQL syntax. \It is used as a relational database.

The key points we typically talk about fall into four key areas, that we lovingly refer to as CRLS (an acronym for Cockroach Labs)

Consistency - the database implements serializable isolation across distributed nodes within a cluster with acceptable latencies even both local and globally dispersed deployments.

Resilience - the database replicates data across nodes in the cluster and you can configure this to survivability at the table level for optimization of a loss of a node, zone or even region. It is an active-active system that is always on and data is always available. Further, you can implement online schema changes and rolling upgrades in production without downtime,

Scale - scaling the database is accomplished by simply spinning up a node and pointing it at the cluster. The database will take care of redistributing replicas to incorporate the new node. This is basically auto-sharding and adheres to the aforementioned policy of survivability mentioned above.

Locality - Unique to CockroachDB, you can also tie data to any particular node in the cluster. You define this at the table level. The database uses KV at the storage layer to accomplish this. Each table is represented as ordered KV pairs using the PK for the table for the Key. You work the location into the key and the distribution policy will ensure data is written to explicit nodes. This is used to ensure low latency access of data and/or to tie data to a location for compliance and privacy requirements. With online schema changes, you can manipulate the PK in production and rearrange where data is physically stored in a live production cluster.

There are numerous other capabilities in the database. We have a complete UI, deliver distributed backup/restore, change data capture and have implemented a cost-based optimizer that uses locality.

It is a relational database that you can deploy the database as a service on kubernetes and or in any cloud.

There are countless more capabilities and we invite you to check out one of our features we are most proud of… our documentation!

Also, please check out customer page - yes, these are high level but there are some stories there.

Thanks for the info - very helpful. Are there any specific use cases that you can point to? I dont doubt that the tech is robust, just that the use cases arent too obvious right now.

There are quite a few case studies / customer stories on the website: https://www.cockroachlabs.com/customers/

It's an OLTP database, not a warehouse.

Everytime CockcroachDB has been on HN, the conversation has been always about their name ("what a shitty name for an enterprise software" vs. "who cares if it works well"), but I have been also wondering the same question.

Are they getting adoption? And what's their unique value proposition?

> "what a shitty name for an enterprise software" vs. "who cares if it works well")

It think you mean “who cares, if it works well?”. Or, if you don't want that much disambiguating load being held up by a comma, “if it works well, who cares?”

I think the name is probably hurting adoption.

The creature it refers to pretty universally considered repulsive. I'm no psychologist or anything, but I would expect that to have an undue influence on even the most rational person.

I don't know anything about it, but my first impression is that it's as robust as a cockroach, able to take a lot of damage.

The main metaphor I'm familiar with is that cockroaches always come back even as you try to kill more and more of them, rather like zombies, and never in a positive way.

i'm pretty sure that's what they're going for, except that the rest of the world doesn't think so.

It will definitely not help any CIO or anyone in an organization who tries to lobby for their org to adopt cockroach db.

I would love to see more of the common mistypes of the product name.

how about PeriplanetaDB?

what "data warehousing"/"data management" tools are missing?

In case of OLAP queries is performance[1] which is not surprising since it's an OLTP database AFAIK.

I don't know what he means by "data management"


What I mean is - and I get that these are somewhat different technologies - snowflake offers a more seemingly comprehensive platform for data by aggregating it and focusing the product on that aggregation for future use.

Compared to Cockroach DB which seems like a niche SQL replacement?

As a DBA, I would not use a database product without 5 years of reported production success. Additionally, most companies don't need yet another product to do training and support on.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact