
CockroachDB Skitters into Beta - orangechairs
https://www.cockroachlabs.com/blog/cockroachdb-skitters-beta/
======
Spiritus
I don't really get all the whining about the name, I even kind of like it. It
fits well with their narrative.

Besides, they could've called it PoopDB and I still would use it if it
survived a nuclear blast and had joins. I couldn't care less about the name.
But apparently I'm in a minority...?

~~~
skoczymroczny
I think it's more about the senior management and people like marketing, who
aren't technical. That could go well: "Yeah so we're ditching MS SQL Server
Pro and instead will go with Cockroach from now on"

~~~
gizzlon
Can you just call it something else internally? It's not like any of those
people know what databases do and do not exist. Maybe CR, CRDB or like someone
here suggested, RoachDB.

------
kodablah
Been watching this DB for a while. Glad to see it enter beta. I recommend
including [https://github.com/mauricio/postgresql-
async](https://github.com/mauricio/postgresql-async) on
[https://www.cockroachlabs.com/docs/install-client-
drivers.ht...](https://www.cockroachlabs.com/docs/install-client-drivers.html)
(it's my favorite Postgres client, it's very fast).

~~~
jseldess
Hey kodablah, just opened an issue for CockroachDB devs to investigate this
driver and possibly add it to that page. Thanks!

------
lobster_johnson
I've been watching this project for a while. This is very cool. I think
they've made a lot of very good design choices.†

Oh, and the client drivers page should probably also list pgx
([https://github.com/jackc/pgx](https://github.com/jackc/pgx)), which is a
nice, fast PostgreSQL client that doesn't use the not-as-nice database/sql
package.

___

† The name being the only exception; I don't have bug phobia at all, but
cockroaches are just icky.

~~~
jseldess
Opened an issue for CockroachDB devs to investigate this driver and possibly
add it to that page. Thanks!

------
DanielBMarkham
Looks neat!

I noticed this: CockroachDB is not suitable for use cases involving joins
(yet)

Well guys, that's the kicker. If you've got joins, you've got SQL. The
selection language isn't so important if you can't do set relations.

I'm sure you're going to rock this. Can't wait to see how you do it!

~~~
ericfrenkiel
Agreed - a lack of joins means it's not real SQL.

------
krenoten
This is one of the most exciting DB's coming into the world, and I've had a
lot of fun watching progress from the sidelines! Keep kicking ass!

~~~
assface
> This is one of the most exciting DB's coming into the world, and I've had a
> lot of fun watching progress from the sidelines!

I'm going to have to disagree. The German HyPer DBMS has a lot more
interesting ideas. Cockroach is just distributed MVCC from the 1980s on top of
a key/value store.

MemSQL also has a new LLVM architecture written by the HipHop VM inventor that
they poached from Facebook.

~~~
ngrilly
HyPer and MemSQL are in-memory databases. CockroachDB is not. This is a big
difference.

And according to HyPer home page, HyPer doesn't look distributed, which is
another big difference.

~~~
ericfrenkiel
MemSQL used to be just in-memory. It added a second storage engine designed to
hold data on disk using a column store.

------
gregwebs
Great news! I still don't understand why HyperDex (which has similarities in
design and goals) has existed for years and nobody seems to know about it. Is
this because it doesn't support SQL?

[http://hyperdex.org/](http://hyperdex.org/)

~~~
lobster_johnson
HyperDex is interesting. It has a lot of pluses — fast, written in C, easy to
run, seems robust.

On the other hand, the author (or authors) seem completely unresponsive about
some things. Their Homebrew package's Ruby support is still broken a year
after reporting it [1]. The Homebrew package is also ancient (1.4.4 instead of
1.8.1); I did a PR [2], which requires that the authors to roll a release of
their fixes to HEAD of libpo6 [3]. All of this has been ignored for some time.

I'm just worried that the project, which seems to be maintained by just one
developer, is languishing. Very little commit activity, lots of open issues,
lack of attention, etc.

[1] [https://github.com/HyperDex/homebrew-
hyperdex/issues/10](https://github.com/HyperDex/homebrew-hyperdex/issues/10)

[2] [https://github.com/HyperDex/homebrew-
hyperdex/pull/15](https://github.com/HyperDex/homebrew-hyperdex/pull/15)

[3]
[https://github.com/rescrv/po6/issues/9](https://github.com/rescrv/po6/issues/9)

------
nnq
I still don't get it why people get so excited with every new even-more-
scalable DB yet at the same time completely ignore what's happening with
things like _graph DBs_ or _hybrid-document-graph DBs_ that seem so f cool!

Also... why would you want to live WITHOUT ANY JOINS nowadays?! Unless the
scale requirements actually prohibit it, and let's face it, 99% of what people
are doing is not even close to "big data" on modern hardware, and no, your 1TB
DB is not big data and you can handle it with Postgres or a powerful graph db
just fine on a pretty cheap VDS if you stay way from AWS/Azure etc. that get
nowhere near the performance/price of renting "dedicated metal" on mid-term
deals. I'm getting closer tho thinking along the lines of: _not even plain
JOINs are enough, I want at least Postgress-like recursive joins if not full-
fledged infinite-depth graph traversals! I know the budget /hardware allows
is, so give it to me, stop the whole "scalability" bullshit!_

~~~
ssmoot
For me it's because there are very few databases (you can count them on one
hand) that check the following boxes:

* Document or SQL Storage * Integrated Full Text Search * Can transparently survive a node going down (load balancer or client driver support) * Highly consistent by default, or allow operation level quorum settings * DR options with minimal lost writes * Transaction support * Hosted * Affordable for small businesses with low average load, but including the occassional peaky or seasonal issues (large bulk imports, auto/consumer/product/etc-show traffic spikes for a few days, etc)

Cloudant meets everything but Transactions. But that's a big one.

Most databases you might think of don't satisfy the Hosted option. And
considering how easy it is to loose other items on the list through
misconfiguration or maintenance issues, that's a lot more of a core feature
than might be obvious for a small shop. It's great if node failure is
transparent in theory, but if it doesn't work in practice it may as well not
exist.

Most of the rest don't integrate full-text-search. That's huge. It means a ton
of concerns move from the database vendor, to you, the application developer.
Any database that doesn't (at least plan to) integrate full-text-search is one
with a much higher development and operations cost.

And the number of databases that promise to (eventually) deliver distributed
SQL? There was 1 a few years ago (AFAIK) that met most of those: FoundationDB.
Never to be heard from again. :-)

I don't need a GraphDB for my CMS-like sites. That's just not a very good fit.
And working without JOINs (though it sounds like they're on the roadmap here)
is not as onerous as you'd imagine for many sites, and it often brings along
the side-effect of forcing you to write a much more efficient/fast product
against denormalized data.

So any database that promises to check most of those boxes is worth keeping an
eye on (IMO). Though the things that kill most solutions candidacy for me are:

    
    
      - lack of integrated Search
      - lack of Hosted/Managed option
    

Everything else is pretty flexible.

~~~
lobster_johnson
We're building something that satisfies all of those criteria.

We call it a data layer, not a database, because it's a higher-level document
data model layered on top of Postgres (with pluggable data stores that can be
added at runtime; we hope to support other backends such as Redis and
Cassandra and maybe CockroachDB if it's a good fit) and Elasticsearch (also
intended to be pluggable), similar to how TitanDB is implemented.

It supports transactions — atomic multi-document updates within a single data
store that supports this — fine-grained CRDT-style document patching, update
streaming, versioned schemas and a fine-grained permissions system. The query
language isn't SQL, but our own relationship-oriented, vaguely GraphQL-like
query language that allows querying on anything that Elasticsearch can
express, as well as complex joins and aggregations. Since Elasticsearch is
eventually consistent and all queries go through it, we aim to provide some
consistency-tolerance semantics when a client does need strict consistency in
combination with queries.

We're not quite ready to publish the code, but drop me line (email in profile)
if you want to be notified when it becomes available.

------
thebiglebrewski
Honestly as a new yorker the name of your company and product and the title of
this blog post make me cringe!

~~~
troymc
The "mouse" (computer hardware) has an equally bad name, and yet...

~~~
PhasmaFelis
You think mice and cockroaches evoke the same emotions in the average human?

~~~
troymc
No, but many humans will jump up on a chair, or run from the room if a mouse
runs across the floor.

Calling a product a "mouse" must have seemed like a strange decision at the
time. Now we don't think twice about it. My point is that people get
accustomed to things over time.

------
endymi0n
Like both idea and traction, but I'm taking bets how long it takes for the
investors to "convince" the team of a rebranding. I mean imagine anyone
suggesting "CocktoachDB" in an enterprise setting... Guess we'll see the rIse
of the all new and shiny HydraDB soon, mythological creatures always go well
:)

~~~
erikn
I'm an investor in CDB. I love the name and there's always an acronym for
those with Acarophobia.

------
pbarnes_1
Any chance we'll see something like functional indices or Pg-like UPSERT?

~~~
ericfrenkiel
Check out MemSQL Community Edition which has these features

~~~
pbarnes_1
Thanks Eric, MemSQL is pretty awesome but I can't really begin to consider it
when:

1\. Backing up your data is somehow an Enterprise feature?

2\. No pricing anywhere on the site, which usually means:

2.1. Sales calls. Nothing I hate more in this world than sales calls.

2.2. $$$$$$$ (which is fine, but I like to know roughly how much $$$$$ before
2.1.)

As far as functional indices, I don't see that here:

[http://docs.memsql.com/docs/create-index](http://docs.memsql.com/docs/create-
index)

What I mean is specifying a WHERE clause on the index.

~~~
ankrgyl
If you are referring to indices on expressions, MemSQL definitely supports
them. In MemSQL you do it by creating a computed column
([http://docs.memsql.com/v5.0/docs/persistent-computed-
columns](http://docs.memsql.com/v5.0/docs/persistent-computed-columns)) and
creating an index over one of those. Computed columns turn out to give you a
significant perf boost since you only have to run the expression once (at
INSERT time), and never while executing a SELECT query or seeking in the
index.

------
krylon
I am just beginning to realize how many delicious puns are hiding in the name.
"There is a bug in your database" \- "No, the database _is_ the bug." ;-)

Seriously, though, this looks like a very interesting project. Until now, I
had not been aware it supports SQL. I think I have to give CockroachDB a try
in my next toy project.

It sounds like they try do build something very ambitious. To my knowledge
none of the commercial database vendors (except maybe Tandem back in the day?)
have succeeded in building a distributed/replicated SQL-based database engine
with strong consistency and high availability. I know MSSQL supports hot-
standby/failover, but AFAIK that does not cover all the scenarios CockroachDB
seems to want to address.

------
zimbatm
What are the current performance characteristics in terms of number of select
and insert per second per node ? Does anyone have synthetic benchmarks
published ?

~~~
orangechairs
CockroachDB here. We haven't published benchmarks yet and performance work is
ongoing. As this is a beta product, the current performance is not indicative
of where it will eventually be.

------
bogomipz
Is this an implementation of Google's Spanner? I looked the projects github
page but didn't see much in the way of project's origins.

~~~
bru
The designed is compared against Spanner's several time in the interesting
design doc:
[https://github.com/cockroachdb/cockroach/blob/master/docs/de...](https://github.com/cockroachdb/cockroach/blob/master/docs/design.md)

------
e12e
Aside from everything else, it's nice to see what appears to be sane support
for using TLS right there in the command line, and in the beta:

[https://www.cockroachlabs.com/docs/secure-a-
cluster.html](https://www.cockroachlabs.com/docs/secure-a-cluster.html)

(Much thanks to Go's extensive standard library:
[https://github.com/cockroachdb/cockroach/blob/master/securit...](https://github.com/cockroachdb/cockroach/blob/master/security/certs.go)
)

I've lost count on the number of projects that leaves "securing the server" as
an exercise to the reader (about as useful as those error prompts that
helpfully ask you to "contact the server administrator" (I AM THE
ADMINISTRATOR!)).

I understand why not using TLS doesn't give an error, although I'd prefer it
if it was more work to set up an insecure instance (eg: --force-no-tls
--force-no-auth) -- but what cockroach does here is pretty good, and AFAIK
best-in-class from those which it is natural to compare to (in all fairness,
projects like postgresql a) doesn't do too bad, and b) have a lot of legacy
cruft -- the old (current) assumption is of course that if you want TLS,
you'll use your secure in-house CA for everything. Which even when running
Microsoft AD is in my experience way too complex for most to bother with).

Another (bad) example here is openssh, that has had decent support for
certificates rather than keys for a long while now, and yet I've yet to see
anyone that appear to use ssh certs in anger (Myself included, it's high up on
the infinite todo-list).

One other great example of making the best of what awful tools are available
for securing communications is the Caddy web server, that comes with built-in
support for letsencrypt: [https://caddyserver.com/](https://caddyserver.com/)

That said, while I think the docs are pretty good here, a note that leaving an
unecrypted, ca-key lying around is a bad idea, might be worth a mention even
in the quick-setup. At least as it stands, it's reasonably easy for someone
with a working knowledge of TLS/certificates to guess which parts should be
secret (all the keys) and which parts should be kept air-gapped (the ca-key).

Maybe a link to how cockroach parses/verifies certs would be nice too, for
those that _do have_ a working internal CA -- along with a little info on
how/if certs integrate with authenticating nodes (eg: will my
printer.example.com x509-cert allow my hacked printer to join the cluster, if
it is signed by the same CA?).

Despite all the comments, I'm really liking what I see so far -- I'll be
keeping an eye on this project!

~~~
e12e
On a tangential note, I used to think the openssh developers was a little
crazy for not using a (a subset of) x509 certs, but now, seeing how ssh is a
part of go, I'm not sure if using ssh certs[1] for things like intra-server
auth/authz might not be a good idea.

TLS is ok if you need to secure something that speaks HTTP (or IMAP etc) --
but I wonder if ssh certs might not be the lesser evil for something like what
cockroachdb needs. Perhaps esepcially now as self-signed/"home grown" x509
certs are being relegated to the background by modern browsers -- so you'd end
up needing a "real" cert for the status-server to present securely to a web-
client, which might not match the wish for every node to have its own keypair.

[1]
[https://godoc.org/golang.org/x/crypto/ssh#Certificate](https://godoc.org/golang.org/x/crypto/ssh#Certificate)

------
diskcat
But why the name?

~~~
mwambua
Cockroaches have the ability to survive some of the most difficult
situations... they've been around for the past 320 million years.

That's what they expect from their db... the ability to survive huge failures
whilst maintaining data integrity.

------
shash7
Nice seeing a new SQL database but who comes with with names like this?

~~~
RohithMeethal
Ahm it is a good choice, since the name implies its survivability. They say
"cockroaches can survive a nuclear explosion".

~~~
coltonv
Why not just RoachDB? Same meaning but rolls off the tongue a lot easier.

~~~
veidr
Plus it has the happy alternative connotation of something you might actually
be _happy_ to find when tidying up the kitchen the morning after a party...

------
greggh
I have serious problems with the name of this, even opening this post is
giving me real trouble. I can't believe someone thought this was a good idea
for a name. No matter what the tech is, or how good it is, I will NEVER be
able to use this.

~~~
oldmanjay
This is a serious question even if it seems like it isn't; how do you handle
the real world with such a strong adverse reaction to a mere word?

------
marknadal
Full disclosure: I work for a non-competing (EC vs SC) database - which also
has an ugly name "gunDB".

Super excited to hear about CockroachDB's progress - and their decision to add
SQL support I think was smart.

However, I can't go without critiquing their marketing of "Scalable" and
"Strongly Consistent", especially when they reference horizontal scaling.
Strongly Consistent systems bottleneck around a centralized master, or around
complex consensus algorithms like PAXOS or Raft. CockroachDB has previously
written this off with "we're using Google's Spanner algorithm" but this makes
them vulnerable to a Split-Brain failure (which contradicts their other goal
of "Survivability"), for anybody interested in this line of research I
recommend googling Kyle Kingsbury "Aphyr" "Call Me Maybe" Jepsen Tests. With
that said, you can't just toss horizontally "Scalable" and "Strongly
Consistent" into the same blog because they are fundamentally at odds with
each other - the physics just don't work. To achieve this you only have two
options... either (A) vertically scale the Master, or (B) make incredibly
technical sharding decisions (where no two pieces of data that depend upon
each other can be in separate shards). The first (A) is not the definition of
being scalable, because a single machine can only scale so much (limits of the
physical system). And (B) cannot be generalized, which would require every
company building on top of CockroachDB to either have the expert knowledge of
predicting the exact sharding schema in advance (maybe why they are moving to
SQL?), or have to depend upon expensive consulting options (perhaps provided
by CockroachDB, after a startup has already built itself around using the free
open source product?). The third option is that they are over-generalizing
their database trade-offs, and that you cannot actually get horizontal
"Scalability" and "Strong Consistency".

That said, I know the CockroachDB guys are smart, so I'd love to hear the more
nuanced/detailed explanation. I'm just trying to keep your guys' marketing and
promises in check with the reality.

~~~
teraflop
CockroachDB's replication protocol is based on Raft, which is not susceptible
to split-brain. (Neither is Spanner, but Spanner requires specialized hardware
for clock synchronization.) And contrary to your assertion, it allows
consistent snapshots and transactions that cross shard boundaries.

The rest of your comment ignores a couple decades or so of _very_ extensive
research into scalable distributed transactions.

Design documentation is here if you care to read it:
[https://github.com/cockroachdb/cockroach/blob/master/docs/de...](https://github.com/cockroachdb/cockroach/blob/master/docs/design.md)

~~~
marknadal
Raft isn't, but that is extremely different from whether an implementation of
Raft is or isn't. Kyle Kingsbury's work shows this to be the case quite often
- see [http://jepsen.io/](http://jepsen.io/) .

I have read the design document, and know how Spanner works. This isn't the
first time I've asked about this, I had the 2nd highest upvoted comment on the
initial CockroachDB announcement (
[https://news.ycombinator.com/item?id=9660339](https://news.ycombinator.com/item?id=9660339)
) which means it is many many more people than just me that are concerned
about these claims.

Unless you can show me exactly what research contradicts my critique, your
response has no merit and doesn't address my concerns. You can't handwave
magic appeals to authorities without citing them. So my questions still stand.

~~~
phpnode
If an implementation of raft is susceptible to split brain then it is not
raft, it's something else. Please read the literature rather than dismissing
it, no one owes you any further explanations and no one is handwaving or
appealing to authority.

