
CockroachDB 2.0 released - nate_stewart
https://www.cockroachlabs.com/blog/cockroachdb-2-0-release/
======
caleblloyd
I am genuinely interested in using CockroachDB as a primary datastore but feel
like I have been burned too many times by hopping on board with a young
database.

I tried lucene based databases that offered amazing search capability but were
riddled with data corruption issues. Then there was RethinkDB which was very
promising but ran out of funding.

I am skeptical that a networked database with multiple nodes can match the
performance of a single master database such as MySQL, PostgreSQL, or SQL
Server. I did a quick benchmark of CockroachDB 1.x and MySQL last year and
found that CockroachDB was 5-10x slower on simple CRUD queries:
[https://github.com/caleblloyd/MySqlCockroachBench/wiki/Concu...](https://github.com/caleblloyd/MySqlCockroachBench/wiki/Concurrency-
Benchmark-Results) Are there any good independent benchmarks of performance?

According to crunchbase, Cockroach Labs has raised $53.5m (RethinkDB had only
raised $12m). Is there evidence that Cockroach Labs is on track to make money
and is going to survive?

If the company is healthy and the performance is verified as close to a major
RDBMS, I would be comfortable trying it out as a primary datastore. If those
questions can't be definitely answered right now, I'll probably continue to
wait for the company and the tech to mature.

~~~
darksaints
> I am skeptical that a networked database with multiple nodes can match the
> performance of a single master database such as MySQL, PostgreSQL, or SQL
> Server.

It likely can't unless there is some black magic going on. Single node speed
will always be faster. But once you get to that point where a single node
chokes on the amount of data or query throughput that you have, you don't
really have a choice anymore.

My personal plan is to start with Postgres RDS. Grow until RDS doesn't work
anymore, then move to ultra beefy bare metal servers in colocation with AWS
Direct Connect. If I ever outgrow an 4x24 core server with 3TB of memory on a
RAIDed NVMe disk cluster, I might move to Citus.

For the various distributed database companies out there, I believe the one
that will win in the marketplace is the one working or partnering to develop
specialized hardware and networking, and then optimizing for it.

~~~
p0rkbelly
Why not bare metal on AWS? Going to DX is going to give you a conservative
10ms on your calls...

~~~
darksaints
That might be a good intermediate option, but they still don't have instances
with 4x24 cpu, or >488GB ram.

~~~
openasocket
Did you look at the x1 options? Up to 128 CPUs and 3,904GB of RAM on the
x1e.32xlarge.

~~~
darksaints
Those aren't bare metal. And in terms of database scalability, I would jump
for bare metal before beefier servers, due to the fact that the disk is local
and you don't have to deal with EBS noisy neighbors.

~~~
openasocket
The x1e.32xlarge has 4TB of SSD in addition to dedicated EBS bandwidth.

~~~
saosebastiao
I haven't tried it, but IIRC that's only network bandwidth to the block store.
Better than the alternative for sure, but I would assume disk access is still
noisy unless you have a dedicated drive.

~~~
welder
The max IOPS for provisioned EBS is less than 10% the max IOPS of dedicated
hardware SSDs.

------
pat2man
I think the combination of Cockroach and Kubernetes is going to be a game
changer. Modern orchestration tools really are still struggling with single
points of failure and Cockroach really shines there. The fact that both tools
are maturing at around the same time is a real win for Cockroach. Additionally
its use of certificate authentication just fits in with the rest of
Kubernetes.

~~~
erikpukinskis
How does Kubernetes help with failover?

~~~
lilbobbytables
It's distributed, with no single point of failure. Your machines or vms that
manage the cluster (should be) separate from those running the containers.

Ideally you set up 3 master nodes, with distributed etcd cluster for state,
and enough machines to run your services with replication.

All of which is not bad at all to get started with if you use something like
Kops.

Some useful docs: [https://kubernetes.io/docs/admin/high-
availability/building/](https://kubernetes.io/docs/admin/high-
availability/building/)

------
michaelmior
Geo-partitioning seems to be the killer feature here especially with GDPR
looming. Perhaps I'm wrong, but I'm not aware of any other DB which makes
physically locating records in a particular region so easy.

~~~
synthmeat
You can create geo-zones for your MongoDB shards. [1] It has pretty powerful
geo-spatial queries too, but that's unrelated.

[1] [https://docs.mongodb.com/manual/tutorial/sharding-
segmenting...](https://docs.mongodb.com/manual/tutorial/sharding-segmenting-
data-by-location/)

------
udia
I find it very interesting that although CockroachDB itself is stable and
production ready, all of the client drivers are still 'beta'.

[https://www.cockroachlabs.com/docs/stable/install-client-
dri...](https://www.cockroachlabs.com/docs/stable/install-client-drivers.html)

That being said, I am really excited to try this out. 2.0 adding JSON support
is what converted me to try this out for a project.

~~~
manigandham
It uses the postgresql interface so any Postgres compatible client should
work. The status just refers to esoteric features that may not be supported.

------
bit_4l
Congrats on the great advancement. It's also a coincidence that TablePlus just
started to support CockroachDB yesterday. In case someone need a good GUI:
[https://github.com/TablePlus/TablePlus/issues/374](https://github.com/TablePlus/TablePlus/issues/374)

------
jinqueeny
Congrats on a major achievement, CRDB! It’s been really hard-core and cool
stuff! Super excited to see that one more use case is coming in, the one with
the Blockchain! In case someone is interested, TiDB released a case study with
Mobike (with a daily data growth at ~30TB) just one day before:
[https://www.pingcap.com/blog/Use-Case-TiDB-in-
Mobike/](https://www.pingcap.com/blog/Use-Case-TiDB-in-Mobike/)

------
bogomipz
Tangentially related - I would be interested in hearing anyone's experience so
running CockroachDB 1.x in production.

------
lacker
Question on the efficiency of inverted indices for Cockroach json tables - if
you do a query for identity on two of the fields, like searching for all json
documents containing `{a: 1, b: 2}`, how efficient is that? Will it
essentially only use one of the indices on a and b, or will it work like a sql
multi column btree index?

~~~
foldU
Good question! In 2.0 we only use one of the fields for the index lookup and
then use the rest as a filter. For 2.1 we’re planning on augmenting our
execution engine to enable making use of all the fields in a more efficient
way. We have an RFC[0] describing the way our inverted indexes are designed,
though not everything outlined in that document is implemented yet.

[0]:
[https://github.com/cockroachdb/cockroach/blob/master/docs/RF...](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171020_inverted_indexes.md)

------
notheguyouthink
Here's a different take on databases.. Which is easiest to manage for a shop
with minimal DB knowledge.

That seems like a scary situation, .. and you're correct. It doesn't change
the reality though, heh.

So how does Cockroach compare to Postgres, or Maria, TiDB, etc on ease of
management? Any thoughts?

~~~
manigandham
If you want rock-solid stability? Just use Postgres. Decades of production
use, experts easily available, and scales very high on a single node.
Replication and backup are solved and there are lots of extensions for more
functionality. You can also try the hosted options which all cloud vendors
have, or I recommend Aiven.

Don't use any distributed relational database unless you actually have a need
for it, like horizontal scalability for massive data, multi-regional access,
or 100% uptime guarantees.

~~~
notheguyouthink
Appreciate the insight

------
didibus
Can someone explain the inner workings and thus tradeoffs it makes? And how it
would impact when and how you'd want to use it?

~~~
tshannon
The FAQ on the website is a good place to start:

[https://www.cockroachlabs.com/docs/stable/frequently-
asked-q...](https://www.cockroachlabs.com/docs/stable/frequently-asked-
questions.html)

When is CockroachDB a good choice? CockroachDB is well suited for applications
that require reliable, available, and correct data regardless of scale. It is
built to automatically replicate, rebalance, and recover with minimal
configuration and operational overhead. Specific use cases include:

Distributed or replicated OLTP Multi-datacenter deployments Multi-region
deployments Cloud migrations Cloud-native infrastructure initiatives

When is CockroachDB not a good choice? CockroachDB is not a good choice when
very low latency reads and writes are critical; use an in-memory database
instead.

Also, CockroachDB is not yet suitable for:

Heavy analytics / OLAP

~~~
didibus
I guess I was hoping for the engineering expert FAQ.

What level of reliability, how, what tradeoff in my table design will I need
to make, same thing for availability, correctness and scale.

None of the info in this FAQ allow me to know that CockroachDB is the right
choice for me against the competition which advertises the same generic DB
marketing terms.

Edit: And yes, I can go and read the documentation and deep dive into the
internals myself, but I don't care enough to do so, because I already have DBs
that fulfill those use cases that I know off, and I would hope therefore that
CockroachDB would make it very quick, easy and in my face to find the info
that will make me go: Ah Ha, this is the distinguishing factor and the reason
why I might want to favor and care to use CockroachDB the next time I've got
such a use case.

------
truth_seeker
Is community edition build capable of Multi DC ?

~~~
pat2man
Yes, but for multi DC you will probably want many of the tools in the
enterprise edition:
[https://www.cockroachlabs.com/pricing/](https://www.cockroachlabs.com/pricing/)

~~~
teraflop
There's a noticeable lack of pricing on that "pricing" page...

------
irfansharif
For those in the area, the CockroachDB folk are hosting a meetup demoing some
of the new 2.0 niceties:
[https://twitter.com/CockroachDB/status/978701442050592768?s=...](https://twitter.com/CockroachDB/status/978701442050592768?s=20)

------
kevindqc
I thought the name was weird and found this:

The name was chosen in 2012, two years before the open source project was
started. I had just gotten done with an exhausting and ultimately frustrating
survey of OSS database products for the backend of a new private photo sharing
service called Viewfinder. I'd tried and found wanting MySQL, Postgres, AWS
SimpleDB, Hbase, Cassandra, and Riak.

I was annoyed. Why wasn't there a scalable, survivable, consistent database
with transactions? I was even willing to drop transactions as a requirement –
a terrible sacrifice. The frustration led me to write a manifesto. What would
the "right" database look like?

I imagined it would be composed of symmetric nodes, require no external
dependencies, spread itself naturally across availability zones for survival.
Each node would autonomously replicate and repair data. These were the
capabilities that led me to the name "cockroach", because they'll colonize the
available resources and are nearly impossible to kill.

\- Spencer Kimball

~~~
hashkb
Can mods just delete comments about the name of this project? It's getting
absurd.

~~~
rjpr
One could argue that if the creators didn't want a constant discussion around
the name every time they tried to promote it, they would have chosen a better
name.

I don't think we should be censoring a discussion just because you've already
had that particular discussion.

~~~
Gigablah
I think it’s a brilliant decision. It serves as bait to separate the vapid,
superficial commenters from those who are actually interested in the
technology.

~~~
menacingly
I'd bet good folding money that they change it at a certain growth point, and
I'll take it as a good sign when they do.

~~~
erikpukinskis
How much you want to bet? I’ll take 50/50 odds.

------
doall
Is there any good abbreviation or a nickname for this DB?

Initially I thought I could get used to it, but after many years watching in
HN I haven't succeeded yet.

Perhaps you cut the word in half, but it just creates another two strong words
and shows the survival power of the word :(

Calling it CDB doesn't click to me either.

~~~
loiselleatwork
We call it CRDB internally

------
mkhalil
Queue the comments about it's name...

Congratulations to the CockroachDB team. I have been using this on a few
projects. I find the ease of scalability and redundancy very nice.

Also, the ability to use SQL commands on a transactional DB is REALLY helpful.

Keep it up.

------
MockObject
I wouldn't use it because I just don't want to see and write its name daily.

~~~
shepardrtc
People are downvoting you, but there's something to be said about it. What
happens if you try to pitch CockroachDB to your non-techie upper management?
They're going to look at you like you're an idiot. Names aren't important
unless they're bad because first impressions do matter.

~~~
is_true
I think the opposite. The name makes it easier to describe what it does to
people that doesn't understand the technical vocabulary.

~~~
hunterjrj
In this case, with the connotations that the name "coach roach" is associated
with, you'd be wrong:

[https://www.huffingtonpost.com/molly-reynolds/why-brand-
name...](https://www.huffingtonpost.com/molly-reynolds/why-brand-names-are-so-
im_b_11930994.html)

~~~
megaman22
The only connotations I've got with cockroaches is resiliency, and maybe
hissing?

~~~
rhencke
I guess that does explain the custom support for

    
    
        PRAGMA hissing_noise();

------
davidy123
I think names are important. CockroachDB and GunDB both chose names that will
hurt their acceptance. I often wonder how intentional it is for open source
projects to choose strange names (GIMP, etc). Is it a defense against having
to fit in with "mainstream" perspectives?

Also, I wonder why Apache doesn't change their name. It would be a good
opportunity for rebranding.

------
misterbowfinger
Reposting this to see if anyone could comment (last time i promise):

So. For me, personally, I don't care about the name. I generally care that
it's great tech, and it clearly has a great team behind it. However....

If I worked at CockroachDB, and I saw the negative feedback around the name,
I'd take it to heart. At the end of the day, the name is marketing for the
hard work of their engineers, and marketing for the engineers that want to use
this DB (remember, they need to sell it to their managers who may not be
technical).

This issue can show up in unexpected ways. For example, for cloud providers
like Compose (IBM company), would they be comfortable with putting
"CockroachDB" on the front page? They might if it's good enough, but it's at
least a consideration (i.e. another meeting, another stakeholder to convince).

Or how about an enterprise company that's going through due diligence, and
when their client asks them about their tech stack do they say "CockroachDB"
or do they obfuscate the name by saying "It's a high-performance distributed
database". That's a crucial moment to market CockroachDB, and it could get
lost. As sad as it is, saying that you're using MySQL "because Oracle" is a
point of leverage for some sales people.

Is the name worth it? Asking honestly.

~~~
zzzcpan
> I saw the negative feedback around the name, I'd take it to heart.

You shouldn't, marketing is not about your personal feelings or feedback on
your marketing. Cockroach name is clearly superior to every other database
name, look how memorable it is and how much buzz it generates.

