
CockroachDB: A Scalable, Geo-Replicated, Transactional Datastore - sokrates
https://github.com/cockroachdb/cockroach?hn=true
======
jacquesm
That's a funny name for a database. At least you'll know that whoever uses it
does not use it for its buzzword value.

How will this handle partitioning of the network? The readme has a lot of info
about bits of the far-flung cluster failing but nothing about how it would
deal with the whole cluster being chopped up into roughly equal halves. That's
one of the harder problems to deal with for solutions aimed at this space.

~~~
simpsond
Cockroaches are resilient and hard to kill.

~~~
jacquesm
I'm aware of that.

------
warfangle
Riak, when you use LevelDB behind it, certainly has indexes. As many as you
want (within reason), through secondary indexes[0]. While it doesn't have
joins specifically, you can link data blobs and walk the links[1]. For when
that isn't quite enough, you can always perform a compiled erlang map reduce
across a given dataset[2].

I don't quite see how CockroachDB offers anything Riak doesn't.

Riak, while not offering true locking transactions (it doesn't look like
CockroachDB does either - imagine how long it would take to perform a locked
transaction across sixteen data centers in as many countries, two of which
have gone dark due to power outages and giant robots), offers you the option
of resolving data version conflicts when you read the record[3]. (ed. Many
times if doing a partial update of a record, you need to read before writing
anyway. This resolves a conflict before you write to a potentially conflicted
record chain. Typically this is done with a pre-commit hook. [4])

(ed.: The major differences seem to stem from the snapshotting system CDB uses
to provide external consistency across data centers. This comes at a
(potentially huge, especially if two clusters lose connection with each other
but not with clients) delay in write verification.

Riak, on the other hand, would still allow writes - and would resolve any
conflicts when the datacenters connect again. It's a hairy problem to fix,
especially in a general manner.

It all depends on what kind of data you're storing.)

0\.
[http://docs.basho.com/riak/latest/dev/using/2i/](http://docs.basho.com/riak/latest/dev/using/2i/)

1\. [http://docs.basho.com/riak/latest/dev/using/link-
walking/](http://docs.basho.com/riak/latest/dev/using/link-walking/)

2\.
[http://docs.basho.com/riak/latest/dev/using/mapreduce/](http://docs.basho.com/riak/latest/dev/using/mapreduce/)

3\. [http://docs.basho.com/riak/latest/theory/concepts/Vector-
Clo...](http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/)

4\. [http://docs.basho.com/riak/latest/dev/using/commit-
hooks/](http://docs.basho.com/riak/latest/dev/using/commit-hooks/)

~~~
maaku
Fully ACID transactions is a big deal.

~~~
dgrnbrg
They're based on Raft--that's not a consensus protocol that's designed for
multi-datacenter operations. I suspect you'll have reliability and throughput
issues fairly quickly, just as you see with multi-datacenter zookeeper.

The solution Google uses for this kind of problem: multidatacenter
transactions are rare, so they're not optimized for latency (instead for
reliability), and they tend to use 2PC, as it's easier to get right with
unpredictable WAN latencies.

------
limsup
I assume it's called this because a cockroach can supposedly survive a nuclear
attack. But it's a bad name. It does not invoke good feelings.

~~~
taternuts
I have to agree - maybe 'RoachDB' would be better

~~~
jc_dntn
I prefer "CockDB".

~~~
jjoergensen
It reminds me of a female programmer that I once worked with. By mistake she
had originally named one of our busiest databases "ClickCuntDB"

It was a database for counting the clicks on our website :-)

------
orasis
Change the name. I get the joke, but it has an emotionally negative
connotation that bosses will hate.

~~~
bdevine
That's the first thing I thought. If anybody from the team is looking, how
about something like BlattoDB[0]?

[0]
[http://en.m.wikipedia.org/wiki/Blattaria](http://en.m.wikipedia.org/wiki/Blattaria)

~~~
notduncansmith
Sounds a lot like "blotto", which is the last state I want my DB in.

~~~
bdevine
True, true. But the point still stands that there are options which hint at
the durability of cockroaches without invoking disgust!

Actually this all does remind me of research on the tangible effect of disgust
on products -- see [0]. That work studied physical contact, but it's easy to
extrapolate from there.

[0]
[https://faculty.fuqua.duke.edu/~gavan/bio/GJF_articles/conta...](https://faculty.fuqua.duke.edu/~gavan/bio/GJF_articles/contagion_jmr_07.pdf)

------
dang
Given that most of the comments are merely about the name, and that the author
has implied that the software doesn't work [1], it seems there's little to
discuss here. We're going to demote this submission [2].

1\.
[https://twitter.com/andybons/status/472458545154494465](https://twitter.com/andybons/status/472458545154494465).
The answer to that question, btw, is yes. Reposts of stories that have had
significant attention are treated as dupes for about a year.

2\. That's not a criticism of the submitter. We want to see original work on
HN. But there ought to be some substance to it, as well as to the resulting
discussion.

------
rb2k_
How would one communicate with this DB?

I'd love to see some API examples.

~~~
Meai
Also benchmarks comparing it to RethinkDb, Mongodb, and sql examples
(supposedly this is NewSQL, how much SQL does it even support?) These
questions are important

------
candybar
As for the name, which I agree is problematic as is, how about EntomoDB for
entomos (insect)?

Edit: It's not problematic if success is not an objective. But if it is,
choosing a name with such strong established negative connotations is not
wise.

~~~
maaku
How is the name problematic? I knew exactly what they were saying and why when
I saw it. If it were me I would have shortened it to RoachDB, but that's just
marketing.

~~~
enraged_camel
>>How is the name problematic?

Most people are disgusted by cockroaches. I think that's a good enough reason
to change the name, at least if you want the product to be taken seriously.

~~~
mahkoh
Friendly reminder that "Mongo" is a very offensive word in German. A "Mongo"
is a person suffering from Down syndrome. CockroachDB is a walk in the park
compared to MongoDB.

~~~
strangemonad
and french

------
nawitus
How does it handle replication and the resulting conflicts?

~~~
teraflop
It uses strongly consistent replication, so there are no conflicts.

~~~
nawitus
So it doesn't support "proper" replication, e.g. the kind where the databases
are not connected 100% the time perfectly? And I wonder how they can prevent
conflicts due to latency.. Even if there's a 50ms latency, is the other
database going to wait for 50ms between every write or something?

~~~
teraflop
Well, that's a matter of terminology. It uses quorum replication, so it can
make progress as long as a majority of replicas are online and communicating.
I would consider that "proper" replication in the sense of a replicated state
machine.

You're right that it's different from, say, master/slave replication in an SQL
database. There's no distinction between an authoritative master and a slave
that provides stale data. Each machine either gives you consistent reads and
writes, or is unavailable.

As far as latency goes, the gory details are in the design document. You need
to talk to at least N/2 other replicas; there's no way around that without
giving up consistency. But that doesn't mean you can only do one transaction
every 50ms; they can be pipelined, and non-conflicting transactions can
proceed simultaneously.

~~~
nawitus
Okay, so there will be conflicts, which brings us back to the original
question.

>I would consider that "proper" replication in the sense of a replicated state
machine.

When I think about proper replication, I'm thinking about master-master
replication which doesn't fail if the connection between peers is sometimes
down, even for very long periods (e.g. what CouchDB can handle). I'm of course
not saying that other kinds of replications are somehow inherently bad, but
multi-master replication without active connections is what I'm after and what
a lot of modern applications can benefit from.

Once you have two databases that are not connected all the time you need to
handle conflicts. You can move the conflict handling totally to the client
side, but the conflict handling must be implemented somewhere. I think that's
such a common use-case that the database should provide basic interfaces and
implementation for it. If nothing else, it reduces boilerplate code by large
amounts. Of course no database can handle conflict handling fully, as some of
it always depends on the business domain.

