
CockroachDB 1.1 Released: Production Made Easy - nate_stewart
https://www.cockroachlabs.com/blog/cockroachdb-1dot1/
======
mschaef
If you ever wondered what the creators of the Gimp were up to these days, this
is your answer:

[https://www.cockroachlabs.com/about/](https://www.cockroachlabs.com/about/)

~~~
jdoliner
Is the Cockroach Labs team actually made up of the creators of The GIMP? I
checked the team page for GIMP and there don't seem to be any overlaps.

Maybe this is just a joke about both having off-putting names that's going
over my head?

~~~
d4l3k
The GIMP was originally created by Spencer Kimball and Peter Mattis who are
respectively the CEO and VP of Engineering of Cockroach Labs.

[https://en.wikipedia.org/wiki/GIMP](https://en.wikipedia.org/wiki/GIMP)

[https://www.cockroachlabs.com/about/](https://www.cockroachlabs.com/about/)

~~~
orangechairs
It's true. These two guys have been working together in OSS for a long time -
[https://www.gimp.org/news/2015/11/22/20-years-of-gimp-
releas...](https://www.gimp.org/news/2015/11/22/20-years-of-gimp-release-of-
gimp-2816/)

~~~
hnkimb3558
Pete chose the name GIMP (we were mulling calling it "XIMP", for X11 Image
Manipulation Program, but Pulp Fiction had recently been released, so...).

I (spencer) chose the name CockroachDB. I guess we have questionable tastes in
OSS project naming.

~~~
jdoliner
Sorry if my previous comment came off as yet another job about Cockroach's
name, I'm sure those get old. Thanks for your work on both projects. I've been
a long time GIMP user, haven't had a use for Cockroach yet but someday.

Also, does this mean that Pete's on deck for naming the next OSS project you
work on?

~~~
hnkimb3558
I think the algorithm is whoever comes up with the most egregious and sticky
name wins.

------
manigandham
Certificate management is still a major pain. I would rather have standard
auth options while using encrypted traffic between nodes instead of generating
certs for every node.

What's the use of checking node name and ip with the certificates? I already
have full control of all the nodes so a key/secret match inside a secure
connection would work just as well, since that's how SQL clients connect
anyway. It seems like this issue is why the kubernetes deployment also uses
insecure mode?

~~~
bdarnell
What "standard auth options" do you have in mind for node-to-node auth? The
ones that come to my mind are even more of a pain to set up than TLS certs.

It's true that setting up a secure cluster is kind of annoying right now. But
the kubernetes templates
([https://github.com/cockroachdb/cockroach/tree/master/cloud/k...](https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes))
do support secure mode now, and the plan is to provide more like this so it's
not something that everyone has to solve by hand.

If you know or can predict the addresses or hostnames you'll be using, then
it's possible to generate one cert and reuse it for multiple nodes. This isn't
ideal from a security perspective since you lose the ability to revoke
individual certs, but since we don't (yet) support CRLs/OCSP it's not much of
a loss.

Adding an option to skip hostname checks for node certs might make this less
of a pain (it would then be trivial to share one cert for all nodes if that's
what you want to do). We'll consider that and see if it compromises any
important security properties.

~~~
manigandham
Ah, didn't see that on the K8S docs page which only lists insecure mode... but
this secure mode requires manual intervention to approve the certificates,
which is the antithesis of easy automated scaling and availability.

Hostname checks make sense for websites since useragents cant trust anything
but I don't see the advantage for managing a db cluster. When would you need
to revoke an individual cert and why wouldn't that be better handled by just
shutting down the VM or container instead?

I'd prefer nodes using self-signed certs to securely connect, then user/pass
or other secret to authenticate to cluster - but yes, if you remove the
hostname check then the shared certs can do double duty as encryption/auth to
the cluster, although this now brings up maintenance issues with rolling
certs. Either way passing secrets (password or cert file) is easy when it's
the same across the cluster.

It seems like CRDB could easily run in a simple replicaset with no maintenance
but that requires running insecure, or have a rather convoluted manual
process. Something in the middle would be much better.

~~~
bdarnell
> When would you need to revoke an individual cert and why wouldn't that be
> better handled by just shutting down the VM or container instead?

You revoke a cert when it's somehow been compromised and something other than
the VM/container that's supposed to has it gets a copy of it.

~~~
manigandham
I cant see how that cert would be lost where the node/container hasn't already
been compromised. At that point there is a critical security issue where
revoking a single certificate doesn't seem to cover much since the entire
cluster should be reviewed and secured again with new keys.

Either way, I'd much prefer to make that decision as the admin rather than be
forced into either extreme. Removing hostnames and having an easy way to roll
certificates would go far towards operational simplicity and security.

------
Alir3z4
When would PostgreSQL features be compatible with Django ORM ?

Are these still the case:

* [https://github.com/cockroachdb/cockroachdb-python/pull/14](https://github.com/cockroachdb/cockroachdb-python/pull/14)

* [https://gist.github.com/pirate/f2931acd97d52242756d85d52b42e...](https://gist.github.com/pirate/f2931acd97d52242756d85d52b42e8bd#blocking-issues)

~~~
bdarnell
Some of those have been done, but not all, so it's still not possible to use
CockroachDB with the Django ORM. The biggest blocker (IIRC) is that the
default set of migrations that Django performs on a new database uses ALTER
COLUMN TYPE.

Some of the features from those lists we _have_ implemented include ALTER
COLUMN SET DEFAULT, pg_table_is_visible(), UUID, extract(), and (some) schema
changes in transactions. 1.2 will add (at least) INET types and sequences.

~~~
Alir3z4
Thanks for updating.

Supporting an ORM would boost the usage in my opinion. We're using Django with
PostgreSQL for a social blogging platform that needs to hold posts, comments,
likes/votes, recommendations, analytics, metrics, billings, payment
transactions and many more small and big informations. We're good with
PostgreSQL at this moment and we're thinking to keep posts into a Cassandra
database later in order to scale the database, we could use Django ORM multi-
database (router) to keep these data into another database and have all the
beauty of Django ORM.

If CockroachDB could start supporting Django ORM properly we'd start using it
alongside our PostgreSQL database.

Can't wait to see CockroachDB getting Django ORM supported completely.

------
spooneybarger
Congrats on the release! I love watching cockroach grow and develop.

~~~
JonCox
Eww

------
muxator
1) Apart from interleaved tables [0], what are the possibile solutions for
multi tenancy?

2) Is there a published benchmark on performance on join-heavy workloads?

[https://www.cockroachlabs.com/docs/stable/interleave-in-
pare...](https://www.cockroachlabs.com/docs/stable/interleave-in-parent.html)

~~~
rwu1997
I’m currently a developer at Cockroach working on interleaved tables.

If by multi tenancy you mean sharding by a particular column (or set of
columns) to groups of nodes, interleaved tables are probably not what you’re
looking for. Table partitioning is currently on the roadmap and you can find
the RFC here
[https://github.com/cockroachdb/cockroach/blob/master/docs/RF...](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170208_sql_partitioning.md).

~~~
muxator
Thanks rwu1997, I am referring to this advice given by Ben Darnell:
[https://forum.cockroachlabs.com/t/multi-tenancy-best-
practic...](https://forum.cockroachlabs.com/t/multi-tenancy-best-practices-
sharding/666/2)

~~~
rwu1997
Interleaving does try to colocate the interleaved tables by taking advantage
of the fact that the underlying SSTable storage sorts the data by their keys.
It doesn't however guarantee that a sufficiently large interleave hierarchy of
tables will be colocated on the same node(s) due to range splits (each range
has a default maximum size of 64MB).

I'm looking forward to the partitioning work being done right now since one
could in theory have a top-level (or root) table (with some tenantID) and
tenant-specific tables interleaved, then easily partition on the tenantID to
have tenancy isolation on all tenant-specific data.

------
manigandham
One of the most exciting products in a long time. Array data type is cool to
see, if they can get JSON in the next release then it'll be everything we
need.

~~~
MycroftH
It's on our roadmap and is actively being worked on right now. Should be
available for version 1.2. See:
[https://github.com/cockroachdb/cockroach/blob/master/docs/RF...](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170925_jsonb_scope.md)

------
continuations
> Baidu... are using CockroachDB to automate operations for applications that
> process 50M inserts and 2 TB of data daily.

Does this mean CockroachDB is not the database that does 50M inserts daily.
Instead it is used to automate the deployment of those apps which actually run
on a different database that's not CockroachDB?

In that case what are the reasons behind using CockroachDB for deployment but
not for the apps themselves?

By the way what kind of performance can I get from CockroachDB? I know it's
going to be slower than a Postgresql or MySQL running on replication. But how
much slower?

------
stingraycharles
What are the people of the HN community using CockroachDB for? In what kind of
scenarios is this a good tool to choose?

~~~
forgot-my-pw
If you need horizontally scaled DBMS.

More in the FAQ: [https://www.cockroachlabs.com/docs/stable/frequently-
asked-q...](https://www.cockroachlabs.com/docs/stable/frequently-asked-
questions.html#what-is-cockroachdb)

~~~
stingraycharles
Yes I understand that, but I was looking more to better understand what kind
of projects require this kind of tech. I can think of adtech as one industry,
are there any others?

~~~
ansible
We've got some preliminary plans to use it for a monitoring application.

We want something with high uptime and resiliency that's going to be
relatively easy for us to learn and deploy. We've used PostgreSQL and
MySQL/MariaDB more in the past, but those were for smaller applications that
didn't have significant uptime requirements.

We could figure things out with PostgreSQL and decide on a fault-tolerant
setup, deal with sharding, etc. But it seems like CockroachDB will be an
easier path forward, and also make scaling much simpler.

------
seangrogg
I was considering running two horizontally-scaling non-vendor databases: one
for handling high volumes of read/write operations that can handle eventual
consistency and one for handling lower volumes of strong
consistency/transactions.

Would CockroachDB be a good contender for the strong consistency case (and who
would be good competitors and why)? Also, I've so far considered Cassandra and
HBase for the eventual-consistency option; any recommendations there?

~~~
mjibson
Yes, CockroachDB excels at strong consistency at mid-low workload volumes.
There are already some production deployments doing this kind of work.

~~~
seangrogg
Awesome, thank you!

------
samstave
Can someone ELI5; deploying CRDB to AWS ec2 instances as beneficial over, say,
redshift? Or are they completely different solutions?

How easy is deployment/scaling/sharding with cockroach?

Aside from cost, where is the best use case for cockroach over anything else??

~~~
manigandham
There are 2 distinct use-cases: OLAP (data warehouse) vs OLTP (operational
database).

If you want a data warehouse to run SQL analytics over lots of data, stick
with RedShift, BigQuery, Snowflake, MemSQL, MapD that use column-stores,
vectorized operations, compiled queries and other features for fast
performance. Also more options like Apache Drill, Dremio, SnappyData, MapR
DB,... if you want SQL but over unstructured/random files stored somewhere.

For an operational database to hold your core active data and to do small/mid-
data analysis, CRDB is an option along with PostgreSQL, MySQL, MariaDB, SQL
Server, and others. Advantage is cloud-native and distributed architecture so
you can get high-availability, multi-master, and easy scaling out of the box,
even across multiple data-centers.

All the other traditional RDBMS either don't support this or require 3rd-party
software to come close, although some will never match the same level of
distributed functionality. In exchange, CRDB doesn't support the same
performance, data types, and rich feature set of a single node PostgreSQL, for
example. If you can tolerate downtime, need complex SQL statements, or already
have solid tooling with the existing systems, then stick with them.

If you're running a globally distributed app, or a new app that can work with
simpler SQL statements, or need availability and scaling with less work, then
CRDB is a good fit. If you're looking for something in the middle, then
CitusDB (postgres extension) allows for automatic sharding across multiple
instances. Keep most of the PostgreSQL functionality with single-region
horizontal scaling, but more work and complexity without the distributed
features of CRDB.

~~~
samstave
Thank you - what a great freaking explanation!

~~~
StavrosK
Yeah, that was pretty damn comprehensive, thank you.

------
NuSkooler
Great news, and thanks for including the customer use cases!

------
jzelinskie
Congrats on the release! What's the status of online DDL? I remember it not
being available for 1.0.

~~~
manigandham
They've always had it: [https://www.cockroachlabs.com/docs/stable/alter-
table.html](https://www.cockroachlabs.com/docs/stable/alter-table.html)

[https://www.cockroachlabs.com/blog/how-online-schema-
changes...](https://www.cockroachlabs.com/blog/how-online-schema-changes-are-
possible-in-cockroachdb/)

~~~
jzelinskie
Oh, awesome! I swear that Spencer said this didn't work at the NYC 1.0 meetup.
I considered this a hard required for production readiness, glad to see it's
available.

------
mayank
Great job! The last Jepsen analysis for CockroachDB seems to have been done in
February 2017, and suggests transactions can have stale reads in some
situations — is this still the case?

~~~
irfansharif
> Kyle’s testing found two new bugs: one in CockroachDB’s timestamp cache
> which could allow inconsistencies whenever two transactions are assigned the
> same timestamp, and one in which certain transactions could be applied twice
> due to internal retries. Both of these issues were fixed.

Source: [https://www.cockroachlabs.com/blog/cockroachdb-beta-
passes-j...](https://www.cockroachlabs.com/blog/cockroachdb-beta-passes-
jepsen-testing/)

If you're speaking to "Linearizability violations such as stale reads can
occur when the clock offset exceeds CockroachDB’s threshold." from section 2.1
in Aphyr's analysis[0], this too is touched upon in the post above:

> loose clock synchronization is necessary [...] to guarantee serializability.
> CockroachDB servers monitor the clock offset between them [...] although
> this monitoring is imperfect and may not be able to react quickly enough to
> large clock jumps. All CockroachDB deployments should use NTP to synchronize
> their system clocks [...] the default of 500ms (increased since Kyle’s
> testing, when it was 250ms) to be reasonable in most environments, including
> virtualized cloud platforms.

[0]: [https://jepsen.io/analyses/cockroachdb-
beta-20160829](https://jepsen.io/analyses/cockroachdb-beta-20160829)

------
plainOldText
I admit I’m a bit ignorant of how CockroachDB works, but how is performance
when doing table joins spanning multiple data centers?

~~~
sandstrom
I don't know the CockroachDB too well, but I'd guess in most scenarios you'll
replicate the whole database between DCs, so you won't do table joins across
DCs.

------
jinqueeny
Super cool work from CockroachLab! Wish to see more use cases and benchmark

------
BlackjackCF
CockroachDB and TiDB are two projects I've been keeping my eye on.

~~~
xmichael99
TiDB is really cool, however my experience in reporting bugs to them didn't go
well... I was trying to report that a generic sql file produced from mysqldump
failed to import into tidb. I simply provided a sample table and a comment of
the issue, and they told me to go F __* myself. There seems to be no shortage
of hostile comments from the TiDB developers if you google around a bit.

~~~
andreimatei1
I can all but promise you that no CRDB dev will tell you to go F* yourself :P

~~~
StavrosK
Not even if I ask nicely?

