
Nested Transactions in CockroachDB 20.1 - orangechairs
https://www.cockroachlabs.com/blog/nested-transactions-in-cockroachdb-20-1/
======
andy_ppp
We started using Cockroach DB at an employment that I was at and people could
not get date ordered queries working quickly. In the end they switched to
Postgres, but it did give me pause, that while I'm sure Cockroach is
fantastic, maybe for most people's use cases they should stick to Postgres
until really they need to switch.

~~~
vvern
I'll pile on to what mjibson said that we have dramatically improved our PG
compatibility with time types: [https://www.cockroachlabs.com/blog/time-data-
types-postgresq...](https://www.cockroachlabs.com/blog/time-data-types-
postgresql/).

If there concern was actually query performance due to writing time ordered
data, we also have recently added hash-sharded indexes which can help with
scalability there: [https://www.cockroachlabs.com/blog/hash-sharded-indexes-
unlo...](https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-
linear-scaling-for-sequential-workloads/)

~~~
andy_ppp
Okay! That’s cool to know, where are the current bodies buried that I might
just randomly get slow performance? It’s more likely I’d choose Cockroach if
there was some info on this floating around...

~~~
knz42
[https://www.cockroachlabs.com/docs/stable/make-queries-
fast....](https://www.cockroachlabs.com/docs/stable/make-queries-fast.html)

~~~
andy_ppp
This not what I asked for; to be even clearer, I want a list of queries that
are fast in Postgres and slow (currently) in Cockroach. You sent me a guide on
query optimisation. They are different things aren’t they.

~~~
vvern
I'm not sure that there's a simple answer here. One place where people often
run in to trouble is that they deploy cockroach with a global topology and
then are surprised when things are much, much slower than in postgres. I'm
going to assume that you're only talking about a single-region deployment.

One place cockroach radically differs from postgres is that it stores
historical versions of rows and it stores them sequentially. This means that
if you have an update-heavy workload and you do not lower your GC TTL for that
data, you can see dramatic performance degradation that you would not see with
postgres (see [1]).

I suspect that there are some cases where our optimizer will not make the same
or as good of choices as postgres, especially in cases of joins across large
numbers of tables. Our optimization of CTEs is relatively primitive compared
to PG IIRC.

Another place where we do quite poorly that doesn't come up in production but
commonly comes up in testing are schema changes. Cockroach schema changes are
online and do not block concurrent writes. That being said, they require
coordination and phases. Running schema changes in tight loops will yield
relatively low throughput and performance deteriorates given some of the
coordination is a function of the total number of schema elements. This almost
always comes up in the context of testing things like ORMs where schema
elements are created and destroyed in every test. I doubt this is the sort of
thing you're looking for but it's definitely a thing that comes up.

As noted in the hash-sharded index post, cockroach is not well suited to scale
write traffic incident on a single key. In these cases the in-memory, local
synchronization of postgres is likely going to be more efficient. Sequential
workloads suffer from a similar problem that doesn't exist in postgres. In
fact, sequential workloads are quite well suited to their B+-Tree. That being
said, this comes up relatively infrequently and when it does we have tools to
mitigate it.

Our temp table implementation likely violates people's intuition. It primarily
exists for compatibility to enable a broader suite of testing. Temp tables are
as expensive as regular tables.

In general some of our data-intensive queries are going to be less efficient
due partially the efficiency of our execution engine, partially to the
efficiency of our KV, and partially due to the fact that the KV is implemented
in a distributed way requiring a certain amount of data copying and
synchronization whereas postgres can get by just passing pointers around in
memory. I suspect its ability to achieve access locality for some workloads is
immense and leads to wins. We have done a lot of work recently to "vectorize"
our execution engine and make it more efficient (see [2]) and are in the
process of replacing our storage engine, in some small part to eventually
improve our scan speed (see [3]) and more generally improve our efficiency.

Maybe other can chime in but in my experience there isn't so much a general
class of queries where we're orders of magnitude slower than postgres. Often
when people see orders of magnitude differences it has to do with index
selection in a plan for a specific query but this tends not to be a
generalized problem and we're pretty active about rectifying these reported
problems. We've also been focusing on adding tools to better understand query
plans and performance.

More often it's the case that on a per-core basis, cockroach is less efficient
in terms of throughput than a single instance of postgres but in return for
that you get a number of benefits, one of which being that you can scale the
thing with your load. If you're looking to optimize $/query in a high-
throughput setting and don't care about the HA or other properties, cockroach
probably isn't the answer. Often when you're trying to run a cost-optimized
$/query system, it's because the scale is very large so postgres might not be
the answer either. It may make a lot of sense if it's a very update-heavy
workload where the total data set size is small.

Hope that was helpful. Maybe other who have had experience with cockroach and
found it to be pathologically slower in some cases can chime in. We'd also
love to hear about those cases in
[https://github.com/cockroachdb/cockroach/issues](https://github.com/cockroachdb/cockroach/issues).

[1]
[https://github.com/cockroachdb/cockroach/issues/17229](https://github.com/cockroachdb/cockroach/issues/17229)
[2] [https://www.cockroachlabs.com/blog/how-we-built-a-
vectorized...](https://www.cockroachlabs.com/blog/how-we-built-a-vectorized-
sql-engine/#)

~~~
anarazel
[2] seems to be dead.

> One place cockroach radically differs from postgres is that it stores
> historical versions of rows and it stores them sequentially. This means that
> if you have an update-heavy workload and you do not lower your GC TTL for
> that data, you can see dramatic performance degradation that you would not
> see with postgres (see [1]).

Postgres isn't that different on that aspect. PG after all is an in-heap MVCC
store. It's not quite sequential because we reuse free space in the middle of
a table. But if there's none it'll be sequential.

The biggest difference is probably that we have an "on access" cleanup
procedure. Dead tuple versions inside the table can be removed without an
external processes (well shrunk - we need a small toombstone like thing until
VACUUM comes around, otherwise indexes with older pointers would yield wrong
results). That turns out to be extremely crucial for performance in update
heavy workloads.

~~~
vvern
[2] should be [https://www.cockroachlabs.com/blog/how-we-built-a-
vectorized...](https://www.cockroachlabs.com/blog/how-we-built-a-vectorized-
execution-engine/)

Good note on the use of MVCC in Postgres. That on-access cleanup dramatically
helps with the explosion of the data size and seems generally nice. Cockroach
promises to keep data in a time window rather than just what's needed for
concurrent operations. The user controls the time window but often people
don't change it and even if they do, the build-up can be dramatic.

The sequential nature of the data storage and its implications for performance
are pretty different in postgres and cockroach as I understand it. In
cockroach we have fewer opportunities to exploit parallelism of write
operations acting on the same "range" (a cockroach level concept for a raft
group). The sequential nature of the workload is a problem not because of how
the data ultimately gets laid out on disk but rather on how it gets processed
and sequenced for replication. In particular, all of the writes will go to the
same "range" which owns the tail of the log. Postgres, if anything, is happy
with sequential workloads as they touch the fewest interior blocks of the
B+-Tree.

Cockroach effectively can't offload the work of GC (as we call it) to
foreground or already existing tasks mainly because it mains maintaining
consistency stats between replicas hard. See an attempt here:
[https://github.com/cockroachdb/cockroach/pull/42514](https://github.com/cockroachdb/cockroach/pull/42514).

------
maxpert
I have read so much about CockroachDB but I am still waiting for that one hero
app that puts it on the map. The hard part to convince your company or
somebody to put his serious stuff like on something CDB is the fear folks have
that what if this company/technology is gone (hint hint RethinkDB). I believe
even with all the cool stuff and the hype there are gonna be some paper cut
scenarios. Now these cuts might be fatal for someone but for others it won’t
be a big deal.

~~~
knz42
The technology is open source though. (Its BSL license automatically switches
to APL after a year or two.)

------
nightowl_games
I'm technically using CockroachDB through Nakama[1], and it's been good to me.
I integrated it into metabase under the guise of postgres.

[1]: [https://heroiclabs.com/nakama-
opensource/](https://heroiclabs.com/nakama-opensource/)

------
memexy
> labeling every data update in a transaction with a marker that identifies
> the nested transaction.

> upon ROLLBACK, mark the corresponding nested transaction marker as "to be
> ignored"

> upon any read of the data, skip over any data updates with a (nested)
> transaction marker known to be ignored.

Seems like a good trick. They go on to explain the main performance issue,
which seems to be about traversing the ignore list. For long queries a longer
ignore list adds constant time overhead to each operation in the transaction.

> The expense in this approach is the extra work performed by data reads when
> there has been at least one ROLLBACK prior. ROLLBACKs add markers to an
> “ignore list” and cause it to grow, and the performance of each subsequent
> read operation is slightly slowed down by the increasing ignore list.

It would be interesting to get a list of various databases and how they handle
transaction logic for rollback operations. If this list also had examples of
how the logic plays out in a concrete example then it would be a really useful
resource. Engineers could use this list to figure out if their use cases and
performance requirements were compatible with their chosen level of
transaction isolation properties.

------
kardianos
> Even though nested transactions are part of the SQL standard, PostgreSQL
> supports them and many of our users have demanded their inclusion in
> CockroachDB, we do not recommend their use in new applications.

I agree with this. People want nested transactions in Go's database/sql, but
the abstraction is fundamentally leaky. New applications should not be using
nested transactions. Find a different way.

------
sandstrom
Interesting read! I like the depth of CDBs technical blog posts!

I do wonder when they’ll add the BACKUP feature to the community edition. Dump
isn’t very useful for larger datasets and frequent backup snapshots.

~~~
namibj
Archive changefeeds, and replay later. Treat them like a WAL, and you should
reach a loss-window down to <15 seconds with ease.

~~~
latch
Not sure that I'd want my job to depend on me doing this correctly.

1\. The feature is still flagged as experimental 2\. If you fail to sync
within the ttl window (25 hours by default), you'll lose data 3\. There are
cases (which might not be an issue for you) where it simply won't work
(multiple column families, data imports, ...)

Much _much_ simpler and safer to use PG+barman.

~~~
namibj
1\. Yes. 2\. No, you just loose the ability to rollback to an arbitrary
position in that window. It might also be more expensive to re-start, but it's
been a few months since I looked at the code. 3\. I am aware of multi-column-
families not working (but that should get tackled?):
[https://github.com/cockroachdb/cockroach/issues/28667](https://github.com/cockroachdb/cockroach/issues/28667)

I know PG is easier in that respect, but the performance ceiling is much
lower.

------
EGreg
Nested transactions can easily be done at the app layer, with a counter. As
long as any rollback rolls back all outer layers.

~~~
knz42
Author here - yes there are schemes to do it client-side. However there is
enough legacy code and frameworks out there that use nested transactions via
pg's savepoint protocol, that some compatibility with those is warranted.

------
jlokier
I love CockroachDB's blog; it's one of my favourites. But the part of this
article that argues why nested transactions are a bad idea left me with more
questions than answers.

> Finally, nested transactions can run afoul of correctness in distributed
> apps. In fact, the idea of multi-component transactions in client code
> really evokes the idea of a bull in a china shop. As long as all is well and
> the transaction is due to commit, the idea somewhat makes sense. However,
> what happens when the database (and not the client) decides the transaction
> is un-committable and must be aborted, for example because of a
> serializability conflict or a node restarting for maintenance? It is not
> just the database state that must be rolled back; all the possible side
> effects performed by the components holding the transaction must also be
> rolled back. Think about payment transactions. Think about email
> notifications. For one, the programmer is unlikely to think about this
> properly before the app is deployed. Also, as the number of components
> involved grows, the chance they are collectively performing unabortable
> external side effects (logging, payments, notification emails or SMS, REST
> queries to external HTTP APIs, etc) increases dramatically.

Is this really an argument against _nested_ transactions? It seems like an
argument against _all_ kinds of transactions that are linked to irreversible
side effects, regardless of whether any are nested.

> Nowadays, such tight coupling has a bad rep. This is because two additional
> decades of software engineering have taught us that implicit global state
> really, really does not play well with distributed services where some
> components may fail even as part of normal load. It makes it hard to fail
> over a component gracefully by transferring its responsibility to a hot
> spare. It makes it hard to reason about the current state of a component and
> troubleshoot it by just looking at the log of its inputs and outputs.

This is convincing, but I would love to know what the author (or others) think
the better alternative is, which solves the hot spare and "normal load
failure" problems. Is it highly-available message queues? (And assume they
never fail or all fail at once, but never fail piecemeal because that would
cause half-updated state?) Is it (distributed/microservices) application code
stuck in (distributed) retry loops when there are failures, to get its job
done without half-updated state?

> Additionally, nested transactions can amplify performance anomalies. The row
> locks established by a large transaction whose latency extends across
> multiple components/services are more likely to incur transaction conflicts
> which, at best, increase the average latency of queries and, at worse,
> increase the error rate quadratically with the current query load.

Is this not a CockroachDB-specific performance limitation, rather than an
argument against nested transactions per se?

~~~
knz42
> It seems like an argument against all kinds of transactions that are linked
> to irreversible side effects, regardless of whether any are nested.

Correct. The best design is small transactions without side effects. The
problem with the _existence_ of nested transactions is that they encourage
large transactions across multiple components, where it's too easy to sneak
some side effects in.

> I would love to know what the author (or others) think the better
> alternative is.

Small transactions with a retry loop using exponential back-off and, depending
on application requirements, a max latency deadline with fallback/error.

> > Additionally, nested transactions can amplify performance anomalies.

> Is this not a CockroachDB-specific performance limitation

Nested transactions encourage cross-component, and thus long-lived
transactions. Long-lived txns holding locks cause performance problems in
every SQL engine, not just crdb.

~~~
jlokier
Nice answers, thanks!

------
blackrock
Ahh.. what’s in a name?

Not much. Except when you name yourself something stupid.

