
CockroachDB 1.0 - hepha1979
https://www.cockroachlabs.com/blog/cockroachdb-1-0-release/
======
dis-sys
I really like the fact that the CockroachDB team recently did a detailed
Jepsen test with Aphyr. The follow up articles from both CockroachDB and Aphyr
explaining the findings are very interesting to read. For those who might be
interested -

[https://www.cockroachlabs.com/blog/cockroachdb-beta-
passes-j...](https://www.cockroachlabs.com/blog/cockroachdb-beta-passes-
jepsen-testing/)

[https://jepsen.io/analyses/cockroachdb-
beta-20160829](https://jepsen.io/analyses/cockroachdb-beta-20160829)

~~~
dmix
> CockroachDB is a distributed, scale-out SQL database which relies on hybrid
> logical clocks

I was curious what "hybrid logical clocks" meant and found the linked paper a
bit over my head. I found this more layman description:

[http://muratbuffalo.blogspot.ca/2014/07/hybrid-logical-
clock...](http://muratbuffalo.blogspot.ca/2014/07/hybrid-logical-clocks.html)

Apparently Google used GPS/atomic clocks to keep time synced:

>> To alleviate the problems of large ε, Google's TrueTime (TT) employs
GPS/atomic clocks to achieve tight-synchronization (ε=6ms), however the cost
of adding the required support infrastructure can be prohibitive and ε=6ms is
still a non-negligible time.

And CockroachDB created more of a hybrid version that works on commodity
hardware.

Distributed systems programming sounds endlessly challenging as you are always
balancing trade-offs.

~~~
irfansharif
You might find our post[1] on atomic clocks, rather having to do without them,
partially interesting.

[1]: [https://www.cockroachlabs.com/blog/living-without-atomic-
clo...](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/)

~~~
socmag
Hey guys, I'm a fellow developer of distributed systems here.

First of all I think what you are doing is great.

My question is what's the point of clocks at all? The current time is a very
subjective matter and I'm sure you know this, the only real time is at the
point when the cluster receives the request to commit. Anything else should be
considered hearsay.

Specifically the time source of any client is totally meaningless since as you
say further in the discussion that client machine times can be off by huge
margins.

If you accept that then one has to accept the fact that individual machines
within the cluster itself are prone to drift too, although one can attempt to
correct for that I appreciate.

Wouldn't you think though that what is more important is that the order is
more based on the bucketed time of arrival (with respect to the cluster).

I don't see how given network delays anyone can be totally sure A is prior to
B, atomic clocks or not.

What is important is first to commit.

[edit] Yes would love to talk privately about this topic @irfansharif

~~~
moe
_My question is what 's the point of clocks at all?_

I would highly recommend to read the link by irfansharif. It's probably the
best primer ever written on the subject.

~~~
andy_ppp
Yes, I really enjoyed it!

------
Svenskunganka
Pardon the nature of my question, but I'm really interested in what your
experience has been so far building a database with Go? Has its runtime (the
GC for example) posed any issues for you so far? Looking at other RDBMS's,
languages with manual memory management like C or C++ seems to be the go-to
choice, so what were the reasons you chose Go?

I'm quite frankly amazed that Go's runtime is able to support a database with
such demanding capabilities as CockroachDB!

~~~
arjunnarayan
We have a post on why we chose Go, from a year and a half ago:
[https://www.cockroachlabs.com/blog/why-go-was-the-right-
choi...](https://www.cockroachlabs.com/blog/why-go-was-the-right-choice-for-
cockroachdb/)

More technically, here's a somewhat random set of thoughts on the subject:

The Go GC is performant and predictable, unlike the JVM GC. We do have some
very memory-allocation-conscious code patterns to minimize the performance
impact of working in a garbage-collected language runtime, but in the end it's
not as bad as you might expect if your expectations are coming from the JVM
world.

Library support is good. To quote our CEO, "Most of us on the team have done
extensive work with C++ and Java in the past. At Google, C++ was the standard
for building infrastructure and there are a lot of good reasons for that. It's
fast and predictable. It would be a good choice for Cockroach, except that in
the world outside of Google, in open source land, the supporting libraries for
C++ are either terrible, incredibly heavyweight, or non-existent. We didn't
want to rebuild everything which you take for granted at Google from scratch.
It turns out that Go has many of the necessary libraries, and they're
straightforward and very well written."

Basically, if Google's internal C++ libraries, tooling, style guides (and the
tooling to enforce them) were available externally, we might have gone with
C++.

Some of us are fans of Rust, but Rust sadly did not exist in a stable state
when CockroachDB started. I'm not sure we would pick Rust were we to start
today (tooling is still a concern there), but it would certainly be part of
the discussion.

The native support for concurrency in Go is a huge plus. We use thousands of
goroutines in CockroachDB, and that's been a huge blessing.

I can answer any more specific questions if you have them.

~~~
Others
Why do you think the Go GC is better than any the JVM options? From what I've
seen, while the Go GC is well tuned for low latency, by picking the right JVM
GC parameters you can on balance get a better throughput latency tradeoff. I'm
just wondering if you have any reliable benchmarks or evidence to support what
your saying? I don't use either language for work, so I think you might have
better information than I.

~~~
bdarnell
I talk about this in the presentation I linked in another subthread
([https://www.cockroachlabs.com/community/tech-
talks/challenge...](https://www.cockroachlabs.com/community/tech-
talks/challenges-writing-massive-complex-go-application/)). The key to getting
good performance out of any GC is to generate as little garbage as possible,
and in our experience Go makes better use of stack allocation and value types
keep many objects out of the garbage-collected heap. We've found that
idiomatic go programs tend to produce less garbage than similar java programs,
and in the presentation I discuss some tricks we use to get that even lower in
critical paths. Admittedly, we're not JVM tuning wizards so maybe there's more
that could have been done on the JVM side.

------
rantanplan
In an era where hot air and hip DB technologies prevail, I'd like to emphasize
the fact that the CockroachDB engineers are consistently honest and down to
earth, in all relevant HN posts.

This builds up my confidence in their tech, so much so that even though I had
no real reason to try this new DB, I'm gonna find one! :D

~~~
nicwagenaar
Exactly! The confidence that the devs inspire by taking the time to explain
the choices behind the tech, makes me want to find a project to test it out
on.

------
wmfiv
Are there published benchmarks for multi-key operations and more complex
SELECT statements? I apologize if I missed them.

I'm trying to determine whether there's a place for Cockroach within what I
think are the constraints in the database space.

* Traditional SQL Databases
    
    
      - Go to solution for every project until proven otherwise.
    
      - Battle tested and unmatched features.
    
      - Hugely optimized with incredible single node performance.
    
      - Good replication and failover solutions.
    

* Cassandra
    
    
      - Solved massive data insert and retention.
    
      - Battle tested linear scalability to thousands of nodes.
    
      - Good per node performance.
    
      - Limited features.
    

It seems like many new databases tend to suffer from providing scale out but
relatively poor per node performance so that a mid-size cluster still performs
worse than a single node solution based on a traditional SQL database.

And if you genuinely need huge insert volumes, because of the per node
performance you'd need an enormous cluster whereas Cassandra would deal with
it quite comfortably.

~~~
arjunnarayan
[Cockroach Labs engineer here working on performance benchmarking]

We have load generators for YCSB (just raw key-value ops in a firehose) and
TPC-H (very complicated read-only queries) running right now, and we're about
to start running TPC-C queries (moderately complex queries in large volume) as
well. You can follow along on our progress here:
[https://github.com/cockroachdb/loadgen](https://github.com/cockroachdb/loadgen)

In the context of your dichotomy, we want to bridge that gap. We want the
linear scalability of your second group along with the full feature-set of the
first group.

We will be publishing our performance numbers, but we haven't so far because
the product has improved rapidly, and our numbers have been quickly obsoleted,
but rest assured, we will be publishing a series of blog posts very soon.
Anecdotally, our beta customers are not finding that they need very many more
CockroachDB nodes than their existing database solutions, even with something
as high-performant (but inconsistent) as Cassandra.

~~~
wmfiv
That's great. Thanks for the response and I'll keep an eye out for the blogs.

------
sixdimensional
How does Cockroach efficiently handle the shuffle step when data is on many
nodes on the cluster and has to move to be joined? Does Cockroach need high
capacity network links to function well?

I always see companies making the claim of linear speedup with more nodes but
surely that can't be the case if the nodes are geographically disjointed over
anything less than gigabit links? Perhaps linear speedup with more nodes is
only possible over high speed connections? How high is that exactly?

Congratulations to the team on the release! Introducing this kind of database
is no easy task - thank you and great job, keep up the good work!

~~~
ww520
15 years ago I was working on a similar distributed DB product. At the time,
the idea was to send the query execution plan to each node to execute any
filtering criteria to trim down the candidate row set. Then compute a Bloom
Filter on the joining keys on the node with the largest candidate set (using
some heuristic statistics), ship the Bloom Filter to other nodes with smaller
data set to greatly reduce the non-matching rows. The rows survived the Bloom
Filter are highly likely joinable and are shipped back to the main joining
node to perform the final join. Bloom Filter is the perfect compromise between
size and speed.

I'd imagine CockroachDB is doing something similar for distributed join.

~~~
irfansharif
haven't come across this idea before, interesting - will definitely have to
give it some more thought. our 'distributed joins', so to speak, run through
our distributed query execution model (distsql) setting up incremental
'stages' of computation with the results pipelined and plumbed through
individual computes. viewing it through this model our implementation more
closely resembles the Grace Hash Join[1] algorithm. you might be interested in
the PR[2] that landed this changeset, there's a cool visualization in one of
the comments[3] showing the query execution plan.

[1]:
[https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join](https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join)

[2]:
[https://github.com/cockroachdb/cockroach/pull/12221](https://github.com/cockroachdb/cockroach/pull/12221)

[3]:
[https://github.com/cockroachdb/cockroach/pull/12221#issuecom...](https://github.com/cockroachdb/cockroach/pull/12221#issuecomment-267084941)

~~~
ww520
The Grace Hash Join approach ships the entire joining key set across network.
Even if each node just get one partition of it, the aggregate network traffic
is the entire set. For small table, it's fine. Large table is going to really
tax the network.

------
vtomasr5
I think this is the DB Project of the year in the open source community.
Cockroachlabs has done an incredible effort to develop and test a new Database
and these guys are giving it for free (I read about the series B raise too
;)), for us to use it.

Thanks for doing this. You're very much appreciated. (BTW I love the name and
the logo!!)

------
toddmorey
There was a great session with Spencer Kimball (CockroachDB creator) and Alex
Polvi (CoreOS) at the OpenStack Summit. It's a good overview and demo:
[https://youtu.be/PIePIsskhrw](https://youtu.be/PIePIsskhrw)

~~~
irfansharif
there's a second part to this presentation[1] running cockroachdb across 16
(!) cloud vendors.

[1]:
[https://www.youtube.com/watch?v=nBXXLNIwAoo](https://www.youtube.com/watch?v=nBXXLNIwAoo)

------
daliwali
CockroachDB looks like a great alternative to PostgreSQL, congrats to the team
for doing so much in such a short time. The wire protocol is compatible with
Postgres, which allows re-using battle-tested Postgres clients. However it's a
non-starter for my use case since it lacks array columns, which Postgres
supports [0]. I also make use of fairly recent SQL features introduced in
Postgres 9.4, but I'm not sure if there are major issues with compatibility.

[0]
[https://github.com/cockroachdb/cockroach/issues/2115](https://github.com/cockroachdb/cockroach/issues/2115)

~~~
jordanlewis
I'm an engineer on the SQL team at CockroachDB. We're very aware of our
missing support for array column types - and in fact beginning to add support
for arrays is one of my team's priorities for the next release cycle.

What kind of other recent SQL features introduced in Postgres 9.4 do you use?
Postgres has a ton of features, as I'm sure you're aware, and while we strive
for wire compatibility with Postgres it's not a goal of ours to implement
support for every Postgres feature out there.

~~~
daliwali
I double checked my codebase and it looks like it's just JSONB, which
CockroachDB also doesn't support [0]. Sorry to bother about missing features,
but there are really some things that prevent a smooth transition from
Postgres.

[0]
[https://github.com/cockroachdb/cockroach/issues/2969](https://github.com/cockroachdb/cockroach/issues/2969)

~~~
jordanlewis
Yep, JSONB is on our roadmap as well, although it won't come before array
column type support. Thanks for the feedback - I'd personally love to see
migrations from PostgreSQL to CockroachDB become seamless for more complex use
cases as we continue development.

~~~
greenshackle2
It occurred to me to migrate Odoo ERP to CockroachDB, scaling up the DB is one
of our biggest challenges with some of our clients.

However Odoo leans heavily on Postgres, migration would be a lot of work I
imagine. The first snag I've hit with CockroachDB is the lack of 'CREATE
SEQUENCE'.

Plus, Odoo uses REPEATABLE READ + a hand-rolled system of locks for
consistency, I'm not sure how that would play out with CockroachDB. In my
experience some of the performance issues come more from long lived locks in
the app than from sheer DB performance.

------
v_elem
It looks like there is still no mechanism for change notification, which in
our particular case is the only missing feature that prevents using it as a
postgresql replacement.

Does anybody know if this feature is planned in the short or medium term ?

[https://github.com/cockroachdb/cockroach/issues/6130](https://github.com/cockroachdb/cockroach/issues/6130)
[https://github.com/cockroachdb/cockroach/issues/9712](https://github.com/cockroachdb/cockroach/issues/9712)

~~~
arjunnarayan
This feature is planned, but I cannot give you a concrete timeline. We want to
do this right, and we need other parts in place to do this with high
performance, in a transactionally consistent fashion, in the face of high
contention, and for arbitrarily complicated "views".

I will say that this is the single feature that _I_ personally am most
invested in at the company, so it will happen.

------
sergiotapia
Is Cockroach DB intended for just "big-data" companies? Would a small project
run really well with Cockroach DB?

Of course a small database probably won't need a lot of the unique features,
but is this aiming to replace PG/MySQL in the small/mid-size projects?

~~~
zokier
I can't speak for others, but at least for me the main attraction of
CockroachDB is getting foolproof HA straight out of the box. That is something
I think anyone can appreciate regardless of their dataset size.

Note that I haven't actually ran CockroachDB yet, so I can't confirm if it
really delivers on that promise, but I'm hopeful.

~~~
raybb
What is HA?

~~~
the_duke
High Availability

~~~
nebabyte
Here we see a cornerstone of HA: redundancy

------
nik736
What advantages do I have using Cockroach compared to Postgres, Cassandra,
Rethink or MongoDB? (I know that all of them are completely different, that's
part of the question)

~~~
irfansharif
We have an comparison page[1] that might potentially be what you're looking
for.

[1]: [https://www.cockroachlabs.com/docs/cockroachdb-in-
comparison...](https://www.cockroachlabs.com/docs/cockroachdb-in-
comparison.html)

~~~
elmalto
Do you have any performance comparisons as well?

~~~
arjunnarayan
So performance is complicated. Right now, we’re performance testing
CockroachDB regularly, and everything is out in the open. Everything we do is
tracked with a GitHub issue with the “perf:” prefix, if you want to follow
along.

Here are all our issues that track performance:
[https://github.com/cockroachdb/cockroach/issues?utf8=%E2%9C%...](https://github.com/cockroachdb/cockroach/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20perf%3A)

Here’s our open source repository where we keep our load generators:
[https://github.com/cockroachdb/loadgen](https://github.com/cockroachdb/loadgen)

A blog post (well, many) are in the works outlining our performance
benchmarking. The situation on the ground is changing fast - our performance
has improved rapidly over the past months, and each time we sit down to write
a blog post, it gets quickly obsoleted. So, trust that we will have a blog
post talking about performance very soon.

Anecdotally, our customers are not finding performance to be a bottleneck. I
encourage you to set up a Cockroach cluster, and try the various load
generators (we've got the standards and a couple other homegrown ones in the
repository).

------
apognu
I've been following CockroachDB for quite a while. Great job on 1.0.

I've had a question for quite some time though (and I think there is an RFC
for it on GitHub): do we still need to have a "seed node" that is run without
the --join parameter, or can we run all the nodes with the same command line,
with the cluster waiting for quorum to reconcile on its own?

~~~
bdarnell
Currently, you need to run one node without --join for the initial
bootstrapping (as soon as this bootstrapping is complete, you can and should
restart it with --join to get everything into a homogenous configuration). I
was hoping to make some changes here so you could start every node with --join
from the beginning, but it was trickier than anticipated so it didn't make the
cut for 1.0. Watch for improvements here in a future release.

~~~
apognu
Thank you for your answer.

That's okay, for now, I run a simple StatefulSet where each pod checks whether
the Service is reachable on port 26257 to determine if it should join or init
the cluster.

It's not as nice as if it was handled by Cockroach itself, but it does the
job.

~~~
bdarnell
This bootstrapping problem is tricky. We publish kubernetes templates at
[https://github.com/cockroachdb/cockroach/tree/master/cloud/k...](https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes)
that contain our current best solution for the join/init problem.

------
therealmarv
Does this work theoretically interplanetary (just asking because for science)
?

~~~
arjunnarayan
No. Once your latency goes beyond single digit seconds, performance will
probably collapse. Too many subsystems would time out. in theory it could be
made to work (with terrible performance, and extremely long commit-waits due
to having to wait until the remote planets get back to you), but I wouldn't
architect a planetary spanning distributed database this way. We probably
would have to go back to the drawing board and start from scratch.

~~~
therealmarv
Thanks for the long answer. Much appreciated. The question came into my mind
when reading some of graphics and specifications.

------
misterbowfinger
Can someone give a brief pros/cons between Cockroach DB Core and Google Cloud
Spanner?

~~~
bpicolo
Open source vs not open source. Cockroach still in it's infancy vs spanner.
I'm sure there are a variety of things here, but they mostly aim to solve a
similar problem with a slightly different approach.

Some of the big details relate to not requiring atomic clocks:
[https://www.cockroachlabs.com/blog/living-without-atomic-
clo...](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/)

Here's their comparison chart, though naturally it's biased for things-
cockroach-does: [https://www.cockroachlabs.com/docs/cockroachdb-in-
comparison...](https://www.cockroachlabs.com/docs/cockroachdb-in-
comparison.html)

(I guess you can't write to Spanner with SQL? That seems like a big
difference. No INSERT/UPDATE?)

~~~
greenshackle2
I'm confused. What's the difference between 'Yes' and 'Optional' in the
'Commercial Version' row on the comparison chart? To me 'Yes' suggests there
is _only_ a commercial version, but clearly that's not true for CockroachDB.

~~~
dianasaur323
Thanks for pointing that out! We will fix that to optional for us :)

------
ericb
Can Cockroach be plugged into a Rails app where mysql was?

I'd be interested in hearing:

\- the backup story

\- the replication/failover story

\- horizontal scaling story (is it plug and play)

~~~
arjunnarayan
I have ported a MySQL-based ActiveRecord Rails app that was somewhat
complicated to Postgres, and then on to CockroachDB. It works pretty well, so
I'd give it a go. We're also committed to supporting ActiveRecord via the
Postgres connector, so if you run into any bugs, we would do our best to fix
them. I am personally invested in ActiveRecord support myself. At this point
ORM support on CockroachDB is driven mostly by usage so please try it!

Your other questions are better answered on the blog post, but quickly:

* CockroachDB core comes with a `dump` command to backup your databases. CockroachDB Enterprise has blazingly fast _incremental_ cloud backup and restore, the kind that you might want for a very large deployment.

* Replication is managed under the hood by sharding the data into many ranges that are each 64mb in size. Each range is replicated using Raft, and if a node goes down, the other replicas scattered across the cluster seamlessly take over and upreplicate a new replica to "heal" the cluster.

* The horizontal scaling is indeed plug and play - just add more nodes to the cluster and they'll automatically rebalance replicas across the cluster with no downtime and no additional configuration.

------
gred
Very interesting. I have to admit I've seen the product name a few times, but
never took the time to have a look. I do have a few questions, though, if any
of the engineering team are still around watching the discussion :-)

From the high availability page [1] in the docs:

> Cross-continent and other high-latency scenarios will be better supported in
> the future.

Do you have a specific timeline in mind? I've been working on an application
that needs to be highly-available, and which uses Oracle right now. It seems
like you can add all sorts of tools to the mix (RAC, DataGuard, etc), but
there are always significant caveats around the capabilities of the resultant
system. We're talking 1 to 2 TB of data total, tables of up to 100 million
rows with 1 million rows added per day, distributed across three data centers
(US, EU, Asia).

And regarding high availability in the context of application deployments, is
there any documentation on the locking characteristics of DDL statements? I'm
interested in the ability to modify the schema during an application
deployment without having to bring down the system or implicitly locking users
out. Apologies if I missed it somewhere on the website!

[1] [https://www.cockroachlabs.com/docs/high-
availability.html](https://www.cockroachlabs.com/docs/high-availability.html)

~~~
radub
I don't have a specific timeline but it is something we will be focusing on in
the following releases.

Regarding DDL statements, this blog post [1] has details. In a nutshell,
online schema changes are possible; the changes become visible to transactions
atomically (a concurrent transaction either sees the old schema, or the fully
functional new schema).

[1] [https://www.cockroachlabs.com/blog/how-online-schema-
changes...](https://www.cockroachlabs.com/blog/how-online-schema-changes-are-
possible-in-cockroachdb/)

------
Gurrewe
Congratulations to the team on the relase!

Everything under "The Future" really excites me, especially the geo-
partitioning features. That is something that I'm really looking forward to be
using!

~~~
jazoom
That might end up being an enterprise feature though.

------
v3ss0n
Will there be a rethinkdb style REALTIME Changefeed or PostgreSQL's Listen
Notify ?

~~~
ralusek
I'd also like to know this. PG notify and triggers in general. Any equivalent
to DB link?

------
nathell
I read the announcement, got all excited, then clicked "What's inside
CockroachDB Core?" and got rewarded with a 404. Ouch! This itches.

~~~
orangechairs
[cockroachdb here] Yeah, we're experiencing some caching issues.

------
gog
Slightly offtopic, but what do you use for your blog and documentation pages?

~~~
mjibson
The blog and other non-docs pages use hugo
([http://gohugo.io/](http://gohugo.io/)) and the docs use jekyll, but will be
ported to hugo soon. We use github pages for hosting with cloudflare in front
(for https on a custom domain).

------
api
About nine months ago we made the decision to go with RethinkDB for our
infrastructure in place of PostgreSQL (at least for live replicated data), but
if this existed at the time we'd have seriously taken a look. We're pretty
happy with RethinkDB but I plan on still taking a look at this so we have a
backup option.

~~~
dianasaur323
[cockroachdb here] We are big fans of RethinkDB, but also glad to hear that
you'll explore CockroachDB. Let us know how it goes, and definitely file any
issues / feature requests in our GitHub repo!

------
MichaelBurge
It probably scales but how is the performance? If I need to load a couple
billion rows and do a dozen joins in some analytics, is that one machine, a
dozen, or 100?

Is it more for web apps, analytics, or what? When would I consider switching
from e.g. Postgres to CockroachDB?

~~~
arjunnarayan
[Cockroach Labs engineer here]

For just a couple billion rows and a dozen joins, a single node will suffice
(with the caveat that you really want at least 3 nodes because CockroachDB is
built for replication and fault-tolerance and you're not getting that with a
single node cluster), but you'll get linear speedup as you add more machines.

Your performance on a single node should be on the same order of magnitude as
doing this in Postgres right now. We are rapidly closing that gap, and intend
to close it completely for TPC-H style queries, while retaining the linear
performance speedup with more nodes.

The reason this gap isn't already closed is we've been focused on
transactional performance in distributed, fault-tolerant situations rather
than analytics performance, for 1.0. There are lots of optimization low
hanging fruit that we haven't focused on in analytics scenarios that we are
just getting started on.

~~~
gflarity
Hi Cockroach Labs Engineer here,

On the feature FAQ joins are describe as 'functional' which doesn't inspire a
lot of confidence but maybe it's just a perception thing. What exactly does
functional mean?

A SQL db without joins sounds a lot like just a NOSQL db with a familiar query
dialect.

~~~
arjunnarayan
If you are using Joins in an OLTP setting, everything should work absolutely
as you might expect.

"Functional" is our caveat that if you run Joins across your data in an OLAP
setting, it will work, but it may not be the most performant Join possible.
For example, our query planner does not currently plan Merge-joins even if the
appropriate secondary indices exist. So after a point (joining ~billions of
rows of data) it no longer is as performant as it could be. Now we expect to
roll out this particular fix within 6 months. However, optimizing 4 or 5-way
nested Joins in OLAP-cube style settings isn't something we're going to be
performant at for years. We need a lot more infrastructure built up before we
start solving the kinds of problems revealed by, say, the Join Order Benchmark
paper
([http://www.vldb.org/pvldb/vol9/p204-leis.pdf](http://www.vldb.org/pvldb/vol9/p204-leis.pdf)).

------
bfrog
Should've gone with tardigrade instead as a name, those little bastards can
live in space!

------
bish2
I'm struggling to understand how this company has raised $50 million dollars
when db companies with paying customers like RethinkDB and FoundationDB had to
shut down.

They are gonna earn back $50 million by selling...a backups tool?

~~~
swsieber
I think one major difference is that it's a drop in replacement for certain
SQL products, plus a major selling point of NoSQL - good horizontal scaling.

RethinkDB and FoundationDB are great, but require a paradigm shift I think.

------
v3ss0n
Congrats Ben Darnell and team! I am fan of his work on Tornado web server!

~~~
bdarnell
Thanks v3ss0n!

------
nhumrich
Does the replication work cross-region, say US-East and US-West? or even cross
continent? It sounds like the timing requires very short latency and might not
work in these scenarios

~~~
dis-sys
Jepsen test results basically show that latency caused by replica distance
won't screw your data. On the other hand, clock drift can stop your system, or
even potentially corrupt your data, depending on how fast such incident can be
detected/handled and what is your workload/what you are doing.

------
doanerock
Since CockroachDB is Eventually Consistent Reads then how would that affect my
SaaS multiuser application? How long on average would I have to wait for them
to become Consistent?

~~~
a-robinson
CockroachDB reads are strongly consistent, not eventually consistent. You
don't have to wait at all.

------
singularjon
How does the speed compare to that of Postgresql and MongoDB?

------
wtf_is_up
Does CockroachDB have a streaming API a la RethinkDB changefeeds? This is a
killer feature, IMO.

~~~
arjunnarayan
Not yet, but it's on our roadmap.

~~~
ralusek
Just out of curiosity, do you mind elaborating a little bit on why not? It
strikes me as something that would be very easy to implement in a database, is
there a reason why so few databases have a mechanism to do this?

If it's about maintaining an open connection in order to notify the client,
that part makes sense, but at the very least the changefeed itself should be
toggleable and easy to query in any DB.

~~~
state_machine
One of the challenges for us in implementing something like LISTEN/NOTIFY
comes from our distributed nature: since a table is likely broken up across
many nodes, you somehow need to aggregate changes from all of them back into a
single change feed wherever the listener is, and in such a way that it doesn't
create a single point of failure.

------
amq
Can someone explain how is/can it be better than MariaDB Galera or MySQL Group
Replication?

~~~
dis-sys
You can't deploy your MariaDB Galera/MySQL Group Replication systems across
the Pacific and then expect it to further scale from there.

------
acd
Congrats to bringing out 1.0 bern following the project and look forward to
try it out!

------
doanerock
Say you scaled up to 100 nodes for the holiday season, is there any way to
tell how many/much storage/nodes you have to keep running in order to keep 3
backups and maintain your new post holiday load?

~~~
BramG
We don't have any auto scaling for either up or down scaling, but if you're
using a deployment tool such as Kubernetes, I don't see why it wouldn't be
fairly easy. And it might be a good idea to add a message in the admin UI if
you all of your nodes are experiencing a high load.

By just looking at your max load over the last 24h or perhaps week, it would
be pretty easy to see when to down scale.

That being said, as long as you remove the cockroach nodes one at a time ,
it's pretty easy to down scale a cockroach cluster.

------
raarts
On a three node cluster will it survive two nodes going down?

~~~
irfansharif
short answer: nope. cockroachdb replicates data for availability and in order
to guarantee consistency across the replicas, it uses Raft[1] internally. Raft
necessitates a majority of the replicas remain available in order to operate.
it ensures that a new 'leader' for each group of replicas is elected if the
former leader fails, so that transactions can continue and affected replicas
can rejoin their group once they're back online.

[1]: [https://raft.github.io/raft.pdf](https://raft.github.io/raft.pdf)

~~~
novembermike
What are the recommended configurations then? If I want to survive multiple
node failures could I have 9 replicas?

~~~
irfansharif
raft is premised on overlapping majorities, so to speak. in order to tolerate
up to `n` node failures you'd need to run `2n + 1` instances (for nine nodes
you'd tolerate up to four node failures).

------
brightball
How does it compare to Couchbase with it N1QL?

~~~
ansible
The main difference is the consistency model:

[https://blog.couchbase.com/10-things-developers-should-
know-...](https://blog.couchbase.com/10-things-developers-should-know-about-
couchbase/)

Whereas CockroachDB aims to be strongly consistent. This makes life for the
application developer much easier.

------
daxfohl
Curious why Mac is better supported than Windows. This is obviously something
you'd run on a server. Do orgs run Mac servers? Is it just to support dev work
for people too lazy to launch a VM? Sorry, Windows/Linux ops person here with
very little awareness of Mac ecosystem.

~~~
jpgvm
It's not so much a matter of Mac > Windows but rather Mac+Linux+*nix >
Windows.

This just comes down to the fact that Windows is a special snowflake that does
everything differently. Sometimes for good reasons, but usually not for good
reasons.

------
ncrmro
Any support for postgres trigram searches?

------
xmichael99
Now if we could get a 1.0 of TiDB ???

~~~
ngaut
Almost there.

------
newsat13
Very disappointed with HN turning into a 4chan/reddit style trolling board
about the name. Guys, we get it that you don't like the name. Can we please
stop bike shedding and move on? The people at cockroachdb have obviously seen
all your messages but decided it's worth keeping the name. What more is there
to talk about? Why not talk about the relative technical merits of this DB?

~~~
SilasX
It's not bikeshedding when the bikeshed's color will actually have concrete
effects on adoption. Most people -- i.e. in procurement, management, finance,
and others you need to appeal to -- don't want anything to do with
cockroaches. The idea disgusts them at a gut level, not something you can talk
away.

HN users are giving vital advice, for free. Those who ignore it will have only
themselves to blame.

As I say every time this comes up, would you be so dismissive about critics of
naming a product PubesDB? Or GonorrheaDB? Or [n-word]DB? Then you agree that
disgust-invoking connotations of the name matter, and we're just haggling over
the details.

Ubuntu, Mongo, Swagger (edit: Hadoop also) ... they're _weird_ , sure, but
they don't evoke the visceral feeling of disgust that cockroaches do.

~~~
rubyn00bie
Not to be an ass but...

It so far appears not to be hurting them. In the slightest.

This "warning" comes from the HN crowd every time something is posted about
CockroachDB. I think it's time to LET IT GO.

I for one, completely disagree with you but that's because I have a different
understanding of the relationship between the business side and engineering.
We are already looked at as eccentric and strange people, rarely if ever has
an absurd technology name caused issue.

Someone talking about "cockroach" is equivalent to talking about "unicorns" or
"git." Its considerably less offensive than talk of "masters" and "slaves." If
you think this is such a problem for you, then work on your salesmanship as I
wouldn't hesitate to talk to other departments or investors about this
product.

I was a CTO up until I took medical leave this past October and I cannot
stress how important salesmanship is to the role. I think your examples of
other databases are hyperbole and not the point. You want them to be
equivalent but they aren't. This comes down to what you can sell in your
organization and if there is merit to it, then selling it should not be a
problem.

One last point is other departments don't give a shit what the database
technology is called unless it's something to put on their CV. Just call it
the "database" as they most certainly will.

~~~
mburst
> It so far appears not to be hurting them. In the slightest.

I feel like that is tough to judge because the public has only known them by
one name as far as I know. If they switched to this name from another name and
saw no difference then we could surmise that the name has had no affect.

~~~
rubyn00bie
I disagree that it's tough to judge but that's because they've raised a
considerable amount of capital ($53 million over three rounds):

[https://www.crunchbase.com/organization/cockroach-
labs#/enti...](https://www.crunchbase.com/organization/cockroach-labs#/entity)

~~~
eridius
The end goal of a company is not to raise venture funding. So you cannot use
"they raised capital" as proof that their name isn't a problem. Their name
absolutely will hurt their adoption. Maybe the product is good enough that
they'll still be successful, but if so, you would expect them to be even more
successful if they didn't have such an off-putting name.

~~~
rubyn00bie
Did I say it was the end goal? It's merely a metric for a young company. What
it means is that enough people have decided that there is a future that
current revenue, growth, and expectations are being met or substantial.
Raising $53 million dollars isn't easy. So I can say capital raised is a
metric on which to base a judgement.

Your statement that it "absolutely will hurt adoption" is unqualified and
nothing but opinion. And what exactly is "more successful?"

The handful of people who won't try this because of the name won't matter to
their bottom line. If it's good enough then for even a large majority of those
they'll end up using it anyway.

~~~
eridius
> _And what exactly is "more successful?"_

Pretty much any reasonable definition will do. For example, higher adoption is
one metric that can be used to define success.

> _Your statement that it "absolutely will hurt adoption" is unqualified and
> nothing but opinion._

It's an opinion that a lot of people share, judging from the HN threads I've
seen about CockroachDB. And really, I shouldn't need to defend the idea that
having a name that disgusts people will hurt adoption. It's just common sense.
The only real question is how much damage will the name do? The better the
product is, the more people will forgive things like bad names, but there will
definitely be at least some level of damage.

In addition, if there's multiple products in the same category that are fairly
close in quality, then subjective things like names will matter more. Maybe
CockroachDB is significantly better than the alternatives right now (I really
have no idea; this product category isn't something I know anything about),
but if so, surely it won't remain "significantly better" forever. Other
products will catch up, or other products will be created to compete, and
we'll end up with several products that are similar, and once again, naming
will become more important.

And finally, you're completely ignoring the fact that a lot of decisions about
tech stack aren't actually made by technical people. They're frequently made
by managers rather than engineers. And when the decision is made by non-
technical people, marketing (e.g. name) is very important. Heck, even when the
product is made by engineers, marketing is important, because that's how you
convince the engineers to spend the time investigating the product to see if
it lives up to its claims or does what they need.

Speaking as an engineer, if tomorrow I suddenly have the need for a cloud-
native NewSQL database, I'm probably not even going to look at CockroachDB,
simply based on the name, unless someone else convinces me that it's clearly
superior. I find the name very off-putting and I'd rather not be confronted
with the mental imagery of cockroaches any time I use the product.

------
anthonylebrun
Since there's a little side riff about the name going on I thought I'd throw
in my 2 cents. Personally I love the name. I think it does a great job of
conveying the spirit of the project and provides unlimited pun opportunities.
Plus it's memorable, just like a real life roach encounter. Unfortunately I'm
sure some people will discriminate against your DB on the basis of name alone.
That's ludicrous, but that's our species for ya.

~~~
thraway2016
I see it as technical people on HN who appreciate the metaphor, versus
marketing/business people who can only think of "image".

It's to expected with the massive infestation of HN by suits and khakis in the
last few years.

~~~
ixtli
I think the problem is worse: marketing / business people have convinced the
worker that this surface level analysis is all we can expect of anyone. As
said by other commenters: if the name of the DB solution influences your
choice then you're probably gonna get what you deserve.

(Within reason. Someone on here actually said this argument is reasonable to
have "because what would you do if they named it 'n-word'DB." Seriously.)

------
johnwheeler
I think the name "Cockroach" was a really poor decision from a marketing
standpoint. The team intended to convey durability, since cockroaches can live
through anything. But when I think of a cockroach, I think, gross, disgusting,
etc.

~~~
scandox
It's memorable. So if the product is really excellent and is needed by
customers - then I think it could be a boon.

I mean Mongo has very bad associations for me in terms of childhood taunts and
Blazing Saddles...but now the name really relates more to the product than to
the original meaning.

~~~
gm
The difference is that "mongo" does not have a universally-known meaning.
Cockroaches and known throughout the world, and are disgusting throughout the
world.

~~~
1_player
In Italy "mongo" conjures up the slur used for mentally challenged people. I
had a friend smirk when I mentioned MongoDB once.

~~~
vec
I dunno, a reference to the mentally challenged that is more than a little
obscene in some circles seems to really capture the essence of MongoDB.

------
sandstrom
I think it's an excellent name!

Also, biologists would argue that cockroaches is a magnificent creature,
highly adaptable and very fit (in 'survival of the fittest' terms).

I would pay for and deploy a cockroach db — because of its name.

------
ccallebs
First, this is awesome! Congrats to the team for reaching this milestone.

Secondly, I think the name is memorable and conveys exactly what it should. If
I were ever on an engineering team that chose not to use CockroachDB due to
being "grossed out" by the name, I wouldn't be on that engineering team for
long. Perhaps someone can explain the knee-jerk reaction to it for me.

~~~
gervase
I might be an interesting case.

I had previously been a big supporter of their name, agreeing with some other
posters that it promotes the durability of the system.

However, after a move last year, I was forced to live with cockroaches for
approximately 6 months, after never encountering them prior to that.

Since then, I've completely switched camps. Can't see the name without being
skeeved out. The reality of cockroaches is so absolutely repulsive that it
completely changed my view 180º.

I moved out of that place in November, and haven't seen once since; I'm
curious if my aversion will fade over time.

------
triangleman
Name doesn't bother me. It's memorable and I'd definitely consider using it,
whether in a startup or enterprise. Better than "Postgres" \-- how do you even
pronounce that?

------
cwisecarver
Cue the comments stating that no one will use this because the name is bad.

~~~
jabl
Just like, err, GIMP?

What is it with Spencer Kimball and naming things that gets people so upset?
It's not like other company or product names are that good; we're just used to
them.

Some high profile tech companies:

\- Google: some propellerhead big number joke (hey, I have a Phd and I don't
know offhand how big a googleplex is...)

\- Alphabet: Really? That out of ideas?

\- Amazon: Some hot snake and insect-infested jungle? Why should I go there?

\- Microsoft: At least it gives a hint what the company does, but really...
(cue the penis jokes)

\- Yahoo: WTF, some slang term I've never heard of before..

\- Apple: Mmmm, are they organic and locally produced? Oh, they sell computers
and phones? WTF?! (Yes, I've heard the backstory about Alan Turing and the
poisoned apple which I guess puts me in a _very_ small minority)

~~~
cocktailpeanuts
There's bad name, and then there's repulsive name.

All the examples you mention fall under "bad name", and it's not even
objectively bad, I actually think they're great names, so it's subjective. And
NONE of them are repulsive.

Then again, if you insist cockroaches are lovable creatures I have nothing
more to say.

~~~
jabl
> All the examples you mention fall under "bad name", and it's not even
> objectively bad, I actually think they're great names, so it's subjective.
> And NONE of them are repulsive.

My argument was not that they are good or bad, but rather that we've come to
associate positive things with the companies in question, and then we post-hoc
come up with explanations why they names are good etc.

> Then again, if you insist cockroaches are lovable creatures I have nothing
> more to say.

I don't think they are lovable, no. But they are an evolutionary success
story; they've been around for hundreds of millions of years, long before
humans. And they'll be here after we humans have extincted ourselves in some
nuclear holocaust/massive environmental disaster/pick your favorite
apocalyptic scenario/.

And if you manage to squish one, there's hordes of em left; just like I'd like
my DB to be, so actually I think it's a very good name! :)

~~~
Retra
There's no post-hoc for some of those names. Some of them were picked
_because_ they actually _were_ good. Even something as bland as 'Microsoft'
fit right in with the culture that spawned it. And the rest were picked
because they were simple, neutral, and had the potential to be iconic brands.

Cockroach is not something someone picks because it is good. That's a name you
pick to make a statement that your name doesn't 'technically' matter beyond
the fact that it is memorable and associative.

------
deferredposts
In a couple of years, I suspect that they will rebrand their name to just
"RoachDB". It conveys the same meaning, while not being that awkward to
discuss with users/clients

~~~
chimeracoder
> "RoachDB". It conveys the same meaning, while not being that awkward to
> discuss with users/clients

"Roach" has some other connotations as well[0], which may not help with
selling to larger and enterprise clients.

[0]
[https://www.urbandictionary.com/define.php?term=roach](https://www.urbandictionary.com/define.php?term=roach)

------
whatnotests
/me forks the damned repo, renames it, wins the Internet.

~~~
knz42
[https://github.com/tschottdorf/bikesheddb](https://github.com/tschottdorf/bikesheddb)

------
socmag
Clocks are meaningless under load.

The higher frequency the transactions the more you get into quantum physics.

In reality, nobody cares if T-Mobile debited your account 0.01ms before
WalMart.

[edit] what is important is isolation and consistency of the transactons.

~~~
socmag
Instead of just downvoting, how about refuting my claim?

I'm seriously curious what is the disagreement. These guys already established
atomic clocks are unnecessary. Very interested in which use cases require
them.

~~~
nocman
Am I wrong in remembering that the HN guidelines _used_ to say that you should
not downvote someone's comment simply because you disagreed with it?

I went looking, and I don't see that in the current guidelines. I could be
wrong about it being there before, but I was almost certain that it was at one
point.

Seems like it used to say that you should only downvote comments that you
think don't contribute anything of value to the conversation.

Just curious, because it seems to me that for quite a while now there have
been a lot of comments that appear to get downvoted just because people don't
agree with what the person said (and often there are no responses to counter,
the person just gets downvoted).

~~~
dec0dedab0de
I think you're thinking of somewhere else. The up/down votes are a way of
agreeing or disagreeing without cluttering up the comments with a bunch of "me
toos" or "nuhuhhs"

~~~
nocman
I'm certain that I'm _not_ thinking of somewhere else. I'm completely open to
the possibility that I just remember it wrong, but I'm sure that it was HN
that I was thinking of, and not another site.

~~~
tw04
You're thinking of Reddit.

~~~
cachvico
Reddit, we did it again.

~~~
nocman
No, you did not "do it again". I almost never read anything on Reddit. So that
was not the source of confusion.

------
Perignon
Name still sucks and is disgusting af.

------
niceperson
>Cockroach

What were they thinking?

~~~
dkersten
Cockroaches are highly resilient creatures. The name, I assume, is alluding to
the goal of this database being a highly resilient system. Whats the problem?

~~~
camus2
Cockroaches are disgusting, the name hurts to the product. I'm confident they
will acknowledge that fact sooner or later and change the name.

