
Migrating to CockroachDB - latch
https://www.openmymind.net/Migrating-To-CockroachDB/
======
brianpgordon
> if you want password-authentication, you must configure your cluster to
> communicate using certificates. You can start each node in "insecure" mode,
> but then can't use password authentication. There's a issue about this. Like
> the people posting there, our network is already secured, so this
> requirement is just an unnecessary nuisance.

Eh, I'm not sure how I feel about this. Obviously you should secure your
network as best you can, but how confident are you really that a bad actor
will _never_ find a foothold anywhere in your network? I think I would
advocate for your services to all communicate securely (TLS et al), even
internally, and if your database supports mutual client/server auth like it
sounds like Cockroach does then you should use that as well. Particularly if
you're depending on at-rest encryption handled transparently by the DBMS to
protect your users' data - that won't do you much good if someone can just
sniff/MITM network traffic and wait until a bunch of your data has been
queried.

~~~
bionsystem
See, I do infrastructure work in companies that often don't even expose their
services. They just want a database and a frontend for a very small internal
app that will never get out. And yet, a lot of the tools are designed "for the
web" and, rightfully so, enforce https, ssl, certs, you name it.

Now the issue is that they are a real PITA to implement in some of those
networks, where you have to use tons of private registries and repositories,
and to get everything working with certificates. Because those companies are
not used to issue certs, often times it take them months to do so, as they
have a very rigorous process, so everyone wonders why this tiny app never goes
to "production" and why we are blocked by such restrictions.

Now I'm not even accounting for the fact that you sometimes actually need to
go online to install some of those stuff. Recent example was node-sass,
dependancy from a composer package, which you would expect to just install
from the composer registry ; but no, it has a setup script that goes to
nodejs.org or something, which of course you cannot fake because it will have
to check certs as well. So again, something designed for the web that'll never
work seemlessly elsewhere. (There are workaround of course but I'd love to
spend my time elsewhere).

To give you an idea, the cloud I'm working on right now is pretty much
entirely offline, for both egress and ingress. There is a nexus mirror-proxy
to dockerhub and that's it, even this is in the LAN. So really, having to
configure cockroachdb certs (or anything else for that matter, like etcd...)
is useless, time consuming, frustrating, and counterproductive.

~~~
dkersten
> very small internal app

Is cockroachdb the right DB for this use case, though? A more traditional
Postgres/MySQL or evening sqlite (depending on needs) is often plenty good for
a small internal app without requiring this extra security

> the cloud I’m working on right now is pretty much entirely offline

I fee like “cloud” has lost all meaning it may have once had

~~~
bionsystem
> Is cockroachdb the right DB for this use case, though? A more traditional
> Postgres/MySQL or evening sqlite (depending on needs) is often plenty good
> for a small internal app without requiring this extra security

It's not, but the app I described is just an example. Some use cases involve
bigger dataset and resilience.

It's not cockroachdb specifically but any of those "web" tools that, used
internally in a closed environment, become a PITA to manage. Language tooling,
repositories, dbs... Etcd for example is used as an internal storage for
tenants (for example to store DNS information) and the team that manages it
cannot upgrade right now because they have to figure out first how to
implement ssl.

> I fee like “cloud” has lost all meaning it may have once had

Well, how would you call an on-prem install of OpenStack ? It's their internal
cloud. Cloud has not much to do with the web or with the fact that it is
publicly exposed, it's a tool to manage infrastructure. And to be perfectly
honest, "cloud" isn't a proper term to begin with. Years ago when the word
came out one of my professor said they basically renamed "grid" to "cluster"
then "cloud". They used to administer hundreds+ of machines and multiples of
that of jobs using pssh. They didn't wait for kube, and they didn't need ssl
between each and every API and services.

Now don't get me wrong, the public cloud provides a great service and tooling
coming out now should be web ready and web safe. Just, please, pretty please,
give us a --insecure and --offline even if it means having it all over my
ansible code, so we can get the job done without having to spend my days
working around those tools.

~~~
irfansharif
> Just, please, pretty please, give us a --insecure

Have you taken a look at [0]? (All caveats around running an insecure cluster
apply)

[0]: [https://www.cockroachlabs.com/docs/stable/deploy-
cockroachdb...](https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-
on-premises-insecure.html)

------
sandstrom
The lack of useable backups in the open-source version is confounding to me.
They are effectively limiting adoption of both their free and paid version.

If they would have proper backups in free, more people would start using it,
and then discover that they’ll need some of the premium features (data
locality, Role-Based Access Control or follower reads).

Also, why is there no public pricing page? How will I know that pricing won’t
change every year? Or be arbitrary based on how much they can squeeze us?
(less likely with public, listed prices)

Why would a company with low marginal cost (software) limit their growth like
this?

~~~
StavrosK
This is a problem with Postgres as well (for different reasons). I currently
have a server sitting without proper DB backups (only full backups and dumps)
because figuring out how Wall-E (or whatever the latest tool is) works is more
effort than "here's my bucket and encryption key, please start backing
everything up".

~~~
candiddevmike
WAL archiving is really helpful for (any) point in time restores and backing
up large databases. If neither of those apply, pgdump is great!

Also WAL-E has been replaced by WAL-G (written in Go):

[https://github.com/wal-g/wal-g](https://github.com/wal-g/wal-g)

------
jturpin
I'm glad for the honest writeup of Cockroach. I like CockroachDB, and hope
they continue it in the future, but the performance and backup capabilities
are something to consider. Still, I'd rather deploy it to Kubernetes whenever
I can over Postgres to ensure that I can do rolling upgrades and whatever
else. At least the Helm chart they provide lets you use Cert Manager with it,
so the TLS isn't as much of a pain.

------
isoos
> But going from our single PostgreSQL instance to a 3 node cluster,
> performance is worse. This is true even for simple queries involving 1 node
> (say, getting a record by id or running locally with a single node). Still a
> few extra milliseconds of latency is a fair price for better availability.
> But the performance characteristics aren't the same as a relational
> database.

I think this should be true to any distributed database: you can't really beat
an optimized single-machine software with a networked app, no matter how hard
you try.

My experience with CockroachDB showed that increasing the cluster size from 3
to 6 and more provided more benefit than I've expected. 1 -> 3 seems to be a
meaningless metric for such setup with a replication factor of 3.

------
Rapzid
There is no free lunch with databases. I often argue the position that master-
master is overrated, because you have to design your application around the
constraints to see the benefits and avoid the pitfalls.

Same thing with CockroachDB Postgresql "wire protocol support". The fact that
it only supports "serializable" isolation level when Postgresql utilizes Read
Committed OTB should be a screaming red flag to just about anyone that you are
in for some challenges on the performance front. Some well-worn modeling
techniques are simply infeasible under typical workloads at that isolation
level, for one.

Starting with bog-standard RDBMS you have to be deliberate to get great
performance. Adding in extra, fancy scaling properties means even more
considerations to take into account to maintain consistency while achieving
gains.

------
Nican
High-availability is always an issue that I am concerned with self-hosting
MySQL or PostgreSQL. I am always worry about the fail-over failing, in case
any step in the setup was not properly configured. Let me know if someone has
some magical formula for easy replication.

I still find CRDB's replication interesting, since all data is replicated
3-way on every commit, and data also auto-rebalances as needed. With some
exceptions, as long as you are connected to any node, you should have access
to all data.

~~~
SteveNuts
We used Percona's XtraDB Cluster for HA and have had very few issues.

Just fronting it with haproxy to read/write to a single node at a time
(setting others to "backup") has worked well for us, since we're not doing a
huge number of queries.

~~~
muffelsong
I concur, but consider ProxySQL instead of HAProxy. Percona with XtraDB/Galera
failover fronted by ProxySQL is very resilient and performant.

------
adevx
I've tried CockroachDB as well as YugabyteDB. A distributed ACID transaction
Database sounds like the holy grail, but it comes with a price. For
CockroachDB the performance penalty was high, even though I'm sure I could
have squeezed out more performance if I was to design my DDL specifically for
CRDB.

There was an inconsistency issue when deleting a ton of (self referencing
foreign key) rows in rapid succession. The results of a "select count(*) from
table" returned different results than a "select count(id) from table". After
some delay the results came back in sync again.

As I was trying to shoehorn an existing PostgreSQL database into CRDB, I had
to find solutions for SELECT .. FOR UPDATE, finding alternatives for triggers,
issues with the query builder I use (knex.js). In the end it was too much, not
even considering lack of backups and the risk of a VC funded company going
belly up. I do hope they gain a solid foothold and these NewSQL databases gain
more traction.

~~~
irfansharif
We take our claims if correctness very seriously. The behavior you're
describing, depending on exactly what you're doing, is likely an allowed
serializable history, which is the isolation level we claim to support. See
[0] for a deeper dive on the subject. Either way, you should show us what you
found (if you haven't already).

As for your performance analysis, do share your results and methodology. We
published an in-depth, reproducible comparison with YugaByte here [1] in
addition to publishing our TPCC-100k numbers[2]. If you're seeing performance
that doesn't line up with the above, let us know.

[0]: [https://www.cockroachlabs.com/blog/consistency-
model/](https://www.cockroachlabs.com/blog/consistency-model/)

[1]: [https://www.cockroachlabs.com/blog/unpacking-competitive-
ben...](https://www.cockroachlabs.com/blog/unpacking-competitive-benchmarks/)

[2]:
[https://www.cockroachlabs.com/blog/tpcc-100k/](https://www.cockroachlabs.com/blog/tpcc-100k/)

~~~
adevx
Thanks, "serializable history" could well have been the root cause. I did
discuss this in the forum but was unable to resolve it at that time.

------
sergiotapia
We use Elixir and Postgres at work, I didn't realize cockroachdb didn't have
geospatial features, unfortunately that makes it a non-starter for us. We need
it to track pathing from mobile phones (ios and android).

Thank you for writing this up and putting your experiences out there, clear as
day. It's very valuable.

------
johnnyAghands
What a world of difference good tests and coverage can make. Great write up.

------
pjmlp
What new class of RDMS?

Cluster scalability has always been a thing with Oracle, SQL Server, Sybase,
Informix, DB2.

~~~
klodolph
Scalability is relative. To some people a 10TB database is large, but some
teams I work with regularly see databases in the 10PB range. Going to that
kind of scale you tend to see different design tradeoffs.

~~~
pjmlp
Those kind of dataloads were already typical 15 years ago in the
telecommunication domain, for example Nokia NetAct report processing for the
whole operator network, based on HP-UX with Oracle OLAP support.

Or the CERN infrastructure to deal with non-stop handling from accelerator
data and storage for posterior data analysis.

Sometimes this industry feels like a continuous reboot.

~~~
ed_balls
I'm not sure how OLAP relates to CockroachDB. The use case for CockroachDB is
quite simple. You have millions of applications that runs Postgres. A large
chuck of them have to process more and more data which a single master cannot
handle and there is no easy way of scaling. Any option you pick does require a
lot of changes to the codebase. This is were CockroachDB should come in handy.
The dream is to have a drop-in replacement for Postgres that would scale, so
all SELECT .. JOIN still works.

~~~
pjmlp
Because that specific example made use of Oracle distributed cluster
scalability, already 15 years ago.

Postgres still has a couple of tricks to learn from big boys databases.

------
arjunbajaj
Does CockroachDB work with Ecto in Elixir? Can it be used as a drop-in
replacement for PostgreSQL?

~~~
latch
The Postgrex driver works with cockroach as-is, and they're open to making
tweaks as necessary (I made a small patch to Postgrex to support the way
cockroachdb encoded empty arrays and it was quickly accepted).

I don't think Ecto works as-is, you'd have to check.

For simple apps, it might be a drop-in replacement, or require very few
tweaks. In addition to the issues mentioned in the OP, the cockroachdb
documentation has a detailed list of differences (1).

(1) [https://www.cockroachlabs.com/docs/stable/detailed-sql-
suppo...](https://www.cockroachlabs.com/docs/stable/detailed-sql-support.html)

~~~
andreimatei1
Would you mind providing a link to the empty arrays patch? Is this something
CRDB should fix?

~~~
latch
It was fixed in CRDB also:
[https://github.com/cockroachdb/cockroach/issues/42942](https://github.com/cockroachdb/cockroach/issues/42942)

------
jsilence
Anyone have experience with rqlite and how it compares to cockroachDB?

------
ed_balls
Has anyone tried to run CockroachDB with GCP TrueTime?

~~~
irfansharif
TrueTime is not a publicly available GCP service.

------
daxfohl
Seems like overkill, and just a shiny new toy to play with for this use case.
Stackoverflow.com uses SQL. You aren't exceeding stackoverflow. In a couple of
years they'll migrate back, or to something else.

The whole point of cdb is to enable things like Google scale. But all I ever
see on it are random devs spinning up three nodes and being so proud.

Which is too bad because I'd love to see someone pushing its limits, and
betting mission critical services on it that nothing else could handle.

~~~
latch
Sorry if I wasn't clear, but the goal of the migration wasn't for scale, it
was for HA.

I find it difficult to setup PostgreSQL without having a single point of
failure and introducing downtime for things such as upgrades.

We aren't doing life changing stuff, true, but a downtime in our system can
result in employees (often at the bottom of the economic ladder) not being
able to get to work. That gnaws at me. I guess the better alternative is to go
for a hosted approach, but that isn't without its own complications and
challenges.

~~~
aprdm
Out of curiosity what do you feel so hard to set up with PostgreSQL?

Having a master with a read replica and then making the read replica the
master in case the master goes down seems to be a very well known methodology.
You can use a virtual ip with both servers behind it (keepalived) or a 0 ttl
DNS solution (consul). Is you case more complex than that? With master to
master replication I feel there are couple of gotchas and you need to design
your data in a way to avoid conflict as much as you can. A solution like bdr
seems to be an off-the-shelf one that is proven.

I haven't really upgraded postgres in a rapid fashion, usually just keep with
whatever version comes with the OS package manager forever.. or whatever
docker image I start it with. I guess the same approach for HA can be used to
upgrade one at a time, am assuming postgres doesn't break backwards
compatibility very often.

You can either automate the failover or have a system to SMS you when it
happens. I like to do it manually personally since I have seen the automated
failover fail more times than the manual procedure.

I have probably managed over 30 postgres instances in my services that have
grown up to 1/2 PB of data and I've never had a single issue with it...
postgres, redis, rabbitmq, consul are technologies that you RTFM, set it up,
and it just works!

I wouldn't trust a technology that isn't established and widely adopted which
is how I see cockroachdb a the moment.

~~~
AgentME
>Having a master with a read replica and then making the read replica the
master in case the master goes down seems to be a very well known methodology.
You can use a virtual ip with both servers behind it (keepalived) or a 0 ttl
DNS solution (consul).

>You can either automate the failover or have a system to SMS you when it
happens.

I don't want to be writing code to make things do automatic failover. I'm not
a database engineer. I want a database where its standard setup does this
stuff for me.

(I would find it really interesting if there was a tightly-integrated
"distribution" of Postgresql plus related services that had a setup process as
straight-forward as CockroachDB and had automatic read-replicas and failover.)

~~~
aprdm
You usually don’t have to write stuff it is all pretty much yum install and
config a file

------
seibelj
I’m sorry but the name of the database needs to change. It’s like if I named
my company “Disgusting Bug Company” then expected everyone to ignore the fact
I named my company after a gross creature. Call it “RockSolidDB” and you get
the same point across without invoking gross bugs.

~~~
seisvelas
I used to agree, but the name actually seems to have an ironic marketing
benefit: Everyone ascribes Cockroach's success to its technical merit. I've
never heard Cockroach criticized as another dumb web dev trend caused by
marketing and fads.

~~~
m0zg
I propose that they further strengthen this effect by dropping the word
"roach" from their name, and rebrand as just "Cock DB". /s

