CRDB is compatible with the PostgreSQL wire protocol, so you can use existing da...

thanatos_dem · on June 4, 2019

> But usually you want to a snapshot of the entire database at some (self-consistent!) point in time

This is actually super simple to do in Cockroach, as it supports time travel queries - https://www.cockroachlabs.com/blog/time-travel-queries-selec...

So you can run a script at, say, 12:05 AM every morning, saving the table state “AS OF SYSTEM TIME 12:00” (simplifying syntax). And thereby get a fully consistent snapshot of all tables, as long as your backup script takes less time to execute than the configured table TTLs

napsterbr · on June 4, 2019

This makes more sense, thanks. I just realized I mistook cockroachdb for timescaledb.

olavgg · on June 4, 2019

Would ZFS snapshots work? It is what I use to backup PostgreSQL, much easier and faster than ordinary backup

benesch · on June 4, 2019

Not unless you like pain. CockroachDB is distributed and uses Raft consensus under the hood, so you can't take a snapshot of just one node; you need to take a snapshot of all the nodes in your cluster, and you need to take those snapshots at the same logical instant. If the snapshots are from slightly different moments, when you boot the cluster up from the backup, the nodes will panic because you'll have corrupted the Raft state or otherwise caused the multiple replicas of the data to become inconsistent with one another.

There is one safe way to make this work: turn off your cluster, back it up with ZFS snapshots, then turn it back on. (Some of Cockroach's production tests do exactly this, in fact, because restoring from ZFS snapshots is so unbelievably fast.) But if you have the flexibility to power cycle your production database off and on in order to back it up... you probably don't need a distributed database, because fault tolerance likely isn't among your requirements.

(I'm a former CockroachDB engineer.)