
CockroachDB 2.0 Performance Makes Significant Strides - awoods187
https://www.cockroachlabs.com/blog/2-dot-0-perf-strides/
======
atombender
Looks very promising! We've looked at Cockroach for a particular project, and
we've been concerned that performance wasn't good enough.

Cockroach performance seems to scale linearly, but single-connection
performance, especially for small transactions, seems rather dismal. Some
casual stress testing against a 3-node cluster on Kubernetes showed that small
transactions modifying a single row could take as much as 7-8 seconds, where
Postgres would take just a few milliseconds.

The documentation recommends that you batch as many updates as possible, but
obviously that doesn't work for low-latency applications like web frontends
that need to be able to do small, fine-grained modifications.

~~~
jnordwick
> low-latency applications like web frontends

...

~~~
atombender
We have a collaborative, Google Docs-like application that currently issues a
write every time someone types into a text field. Now, clearly it's suboptimal
and something that should be optimized to batch the updates, but on the other
hand, with Postgres we've had zero incentive to make such an optimization,
because it's able to handle thousands of writes per node in real time with no
queuing happening on the client. I don't expect this from Cockroach, but I
would definitely want low latency.

~~~
thanatos_dem
Lordy, relational databases are not the way to go for that problem... With a
single shared resource (document), you're going to be encountering write
conflicts left and right.

Have you explored implementing a CRDT based solution like WOOT instead?

~~~
Groxx
Why write conflicts? Contention, sure, but contention isn't an issue until you
have literal tons of it.

~~~
thanatos_dem
From OP's description

    
    
      > issues a write every time someone types into a text field
    

With more than a handful of people, this is getting into conflict territory
pretty rapidly, especially if the document is structured as a single row
(hopefully it's more granular than that). Time for some back of the envelope
maths:

Assuming that an average person types at around 200 words per minute (number
pulled from [https://www.livechatinc.com/typing-speed-
test/#/](https://www.livechatinc.com/typing-speed-test/#/)), that's a
character every 300ms on average. With 10 people editing the document, that's
a character every 30ms on average, which can easily lead to conflicts if
they're all trying to update the same resource.

~~~
dalore
Perhaps it's event sourcing based. Every time someone types into a field it
writes a row that something was typed which is a record of what was typed.
Play it back and you have the full document with no conflicts.

~~~
gmueckl
With the caveat that the stored events are not conflicting with each other. So
a central instance needs to check each event for validity before allowing it
into the log. The check cannot be parallelized easily without introducing
races between event insertions.

~~~
dalore
No event sourcing architecture I know of advocates that at all.

One of the strengths of event sourcing is that you can fix things after the
fact. Say you got a wrong event like you suggested that conflicts. You don't
check it at the time before allowing it in log. You notice the wrong event,
delete it, and replay the log.

You can always replay the log to get you current point of time.

------
segmondy
I like what Cockroach is doing, I'm rooting for them to grow and survive.
Unfortunately the only time I hear about it is when they post blogs. I never
hear about it from other people.

~~~
AlimJaffer
_raises hand_ we're using them extensively. They're our database of choice
that we've paired with Nakama[1] which is an open-source, distributed server.
Have nothing but great things to say about the database itself in terms of
growing performance and the team behind it :). They've been great to us since
day-1.

[1] github.com/heroiclabs/nakama

~~~
netghost
What kind of workload are you using it for? What's been your biggest win while
using it?

~~~
AlimJaffer
A couple of our use-cases include: good KV access (stored user data etc.) and
listing blocks of data that has been pre-sorted on disk at insert time
(leaderboard records, chat message history etc.). As well, the clustering
technology is particularly useful at scale. We work in the games space with
some very large games in production, which allows us to spread the load across
multiple database nodes and offers us peace of mind regarding redundancy.

~~~
tomerbd
What do you mean by KV access isn't it a relational data store? do you store
the value as a blog? json? if so how do you then do queries on that value's
values?

~~~
staz
IIRC it's a KV store which offer a relational data store interface for
usability/compability

------
wilbeibi
The thing I really don't get is why CockroachDB is avoid benchmarking with
it's rival tidb
([https://github.com/cockroachdb/docs/issues/1412](https://github.com/cockroachdb/docs/issues/1412)).
tidb already pretty mature, used in many big companies (Let's say, Didi, which
on the similar scale data with Uber, and banks).

Even if I like CockroachDB's pg sql more, it would be helpful to have the
comparison/benchmark to show something more.

~~~
atombender
TiDB looks promising, but it doesn't have serializable transactions at all,
which makes it something of an apples-to-oranges comparison at the moment when
it comes to OLTP.

TiDB has a weird kind of variation on "read committed" where you get phantom
reads (though they're not called that in the documentation, which is actually
ambiguous on this point). This is a problem for apps that expect consistency.

~~~
siddontang
Siddon Tang from TiDB here. Thanks for your interest in TiDB! TiDB uses
Snapshot Isolation
([https://en.wikipedia.org/wiki/Snapshot_isolation](https://en.wikipedia.org/wiki/Snapshot_isolation)),
which is similar to Repeated Read in MySQL. It doesn’t allow Read Phantom but
can’t avoid Write Skew. IMO, this can work well for OLTP in most of the cases.
If you really care about serializability, you can use `select for update` to
promote the Read as Write explicitly like other databases do.

TiDB supports READ COMMITTED isolation which is not the same as MySQL, but it
is just designed for some special cases for TiDB itself and it is not
recommended for external users.

~~~
atombender
Thanks, that's helpful!

------
etaioinshrdlu
Project idea: globally hosted / managed CockroachDB that lets developers
quickly start building small apps cheaply or free using this database.

This database has the potential to dethrone Spanner in a major way.

~~~
joris
That’s on their roadmap:
[https://www.cockroachlabs.com/docs/stable/frequently-
asked-q...](https://www.cockroachlabs.com/docs/stable/frequently-asked-
questions.html#does-cockroach-labs-offer-a-cloud-database-as-a-service)

~~~
SoulMan
My Org/team is too conservative to use this is they have to hire ops and too
froogle to use spanner.

~~~
kevincox
Why do you think that hosted cochroachdb would be cheaper then hosted spanner?
Google has been optimizing spanner performance for years so I would expect
that it will be cheaper to run for quite a while. Of course the markup can be
different but I wouldn't expect it to make a huge difference.

~~~
tyingq
Maybe training? Spanner's lack of UPDATE/INSERT/DELETE requires a team that's
trained pretty specifically on how it works.

------
qaq
How is this meaningful without detailed setup description?
[http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false...](http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false&orderby=tpm&sortby=desc)
Looking at this list of results one wonders what those results actually mean?

~~~
pinars
I think you can still drive some insights. I clicked on the TPC-C results you
shared and read their executive summaries.

The Oracle on SPARC cluster (at the top, 2010) performs 30.2M qualified tx/min
vs the 16K tx/min in this blog post. The Oracle cluster also costs $30M, which
is clearly higher than the Cockroach cluster's cost.

That said, the TPC-C benchmark is new to me. Happy to update this comment if
I'm misreading the numbers.

(Edited to incorporate the reply below.)

~~~
arjunnarayan
A short note that the total cost of that SPARC cluster was $30 million. You're
not misreading those numbers, but it requires a little context.

We're focusing today on our improvements over CockroachDB 1.1, using a small-
ish cluster. We'll be showing some more scalability with larger clusters in
the coming weeks. If you've found CockroachDB performance slow in the past,
you will be pleasantly surprised with this release!

~~~
pinars
Sure thing. I was primarily answering the question above - in terms of how the
numbers in the TPC-C benchmark fit in. I updated my comment to reflect the
cost.

I think what's interesting with TPC-C is that you can sort the results based
on performance or price/performance. On the price/performance metric, SPARC
looks expensive. Dell has a $20K SQL Anywhere cluster that can do 113K tx/min.

I wonder if anyone tried to run these benchmarks on the cloud and how one
would calculate total cost of ownership there now.

~~~
qaq
you do realize it's ancient hardware thats $300-400 USD on ebay now.

~~~
tyingq
[http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=11...](http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=110120201)

Yeah, but 1700 cores worth. That's still a lot of $300 boxes. Like qty 53
Sparc T3-2's for example. Which seem to be $1200 to $2k on eBay. And
unsupported, end of life, etc.

I'd compare CockroachDB's number to some more recent result with a similar
number of cores. (If you can find one)

~~~
guipsp
Minor correction: They used 54 SPARC T3-2. You can see exactly which
components they used in
[http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_SPARC_Sup...](http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_SPARC_SuperCluster_with_T3-4s_TPC-
C_FDR_120210.pdf)

~~~
tyingq
Not feeling too bad about my back of the napkin guess being off by one server
at 53 vs 54. :)

------
Rafuino
How much and what kind of memory and storage (SATA SSD, NVMe SSD, HDD?) is
included in the 3 nodes used for testing? This benchmarking is really
interesting but the next level is to understand the cost per tmpC measured.
Memory especially and storage is a big component of cost these days.

~~~
arjunnarayan
Short answer: 3 n1-highcpu-16 GCE VMs with Local SSDs attached. We're working
on a complete disclosure document, with comprehensive reproduction steps to
replicate all our numbers. This document should be out in a couple of weeks.
We want to walk you through, command by command, on how to reproduce these
numbers, and verify the results for yourself.

~~~
Rafuino
Thanks for the short answer. Would be good to know how many local SSDs are
attached though for the 850 warehouse scenario. The TPC-C documentation says
each warehouse maintains 100,000 items in their stock, but I can't surmise
from that how much storage is required to hold 850 warehouses' worth of data.
I'm impatient though so let me try to work through the #s myself. I'm using
GCP's monthly reserved pricing in the US-Iowa region as a reference as of
today's pricing.

A n1-highcpu-16 GCE VM costs $289.84/month. Local SSDs are added at 375GB per
drive, and they cost $30/month at $0.08 per GB. I highly doubt you could fit
the ~1250 warehouses (what got you the peak TPM-C) on 375GB local SSD, but I
have to make assumptions here! So, now you're paying $319.84 per instance per
month, or $949.52 for 3 of these instances.

At 16,150 TPC-C, you're paying roughly $0.06 per TPC-C, or, looking at it the
other way, you're getting 16.83 TPC-C per dollar spent each month. Is that
good? I don't know!

Now, the really interesting question is, is that TPC-C/$ on CRDB 2.0 actually
better than TPC-C/$ on CRDB 1.1? The answer lies in how many local SSDs you
have to provision to reach that peak throughput. Peak is at ~1300 warehouses
on CRDB 2.0, and ~800 warehouses on CRDB 1.1.

Does anyone with more knowledge here know how much storage you need per
warehouse in the TPC-C test?

~~~
jordanlewis
Each warehouse requires about 80 megabytes of storage, unreplicated. 1250
warehouses * 80 MB * 3-way replication = 300 GB, which comfortably fits in a
3-node cluster with 1 local SSD each.

~~~
Rafuino
Thank you! My Google-fu couldn't find me the answer

------
baconomatic
I'd love to hear from someone who has implemented this in production. Seems
like really cool tech, but haven't had a chance to use it on a project yet.

~~~
welder
Using it in production currently with dual-write and dual-read to compare
perf. I'll do a write-up showing how Cockroach performs to Citus and Cassandra
for my use case.

~~~
qeternity
We use Citus and Memsql (big data analytics use cases). How does Cockroach
handle joins and other OLAP style queries?

~~~
manigandham
You're not going to get better performance for OLAP than MemSQL's columnstore
and in-memory rowstore for reference tables to join.

Citus is great if you want the Postgres interface but is still using standard
rowstore tables. CockroachDB is similar with rowstore performance but with
added distributed consensus overhead. They are both much better for OLTP and
sharding. CockroachDB also provides easy high-availability and replication.

~~~
ams6110
MemSQL is one of those "ask for a quote" products. What are some rule of thumb
estimates for what it costs?

~~~
manigandham
Licensed by total RAM of all nodes. $25k/year minimum license now, but you
should still talk to them if you're a small company. Regardless of price, I
highly recommend the product as one of the most polished data warehouses
available for on-prem/self-managed operations.

------
evrydayhustling
Great stuff. I appreciated being educated about TPC-C, and the whole spirit of
not focusing on vanity benchmarks!

~~~
itsdrewmiller
Same here, but in educating myself more I found that TPC-C seems to be a
somewhat obsolete metric compared to TPC-E (see
[https://stackoverflow.com/questions/9246939/what-is-the-
diff...](https://stackoverflow.com/questions/9246939/what-is-the-difference-
between-tpc-c-tpc-e-and-tpc-h-benchmark)). Why use the old one here?

edit: Looking into it even further, I agree with the co-author's response here
that TPC-C is still an appropriate metric. TPC-E is different and newer but
still not as widely used.

~~~
arjunnarayan
I don't think it's true to claim that TPC-C is obsolete and subsumed by TPC-E.
They are both different OLTP benchmarks, with different characteristics. TPC-C
is more write heavy, TPC-E is far more read heavy. It's true that TPC-E is
newer, but doesn't deprecate TPC-C (the way TPC-A, for instance, is now
deprecated).

We chose TPC-C because it's far more understood than TPC-E in 2018. We wanted
to provide understandable benchmarks that can be put into context with other
databases. Other databases report TPC-C numbers, so we choose to do so as
well.

~~~
tyingq
It seems not used much anymore. Follow that link
([http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false...](http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false&orderby=tpm&sortby=desc))
and sort by either score, or price performance. The vast majority of top
results are a decade old or more. I couldnt find anything less than 5 years
old without going to second/third pages.

And the top results are usually crazy high number of cores clusters. The Sun
example was over 1700 cores.

~~~
makmanalp
The problem is that I think it costs money and red tape to submit results and
vendors run their own, and you kinda have to take their word on it or
reproduce them yourself.

~~~
tyingq
That makes sense. Probably TPC-C died after Oracle basically killed off Sybase
and Informix. No more well funded competition to keep up the pace. And no
multitude of RISC vendors trying to fend off Linux/X86.

The open source databases didn't play that game, so TPC-C became irrelevant.

Too bad there isn't a good way to directly compare the healthy survivors.

------
sheeeep86
I dont like when companies are not transparent about the pricing of their
product. If you have a price page, show the price, so that Í can decide if
this is relevant for me or not ...

~~~
ahmedalsudani
The only thing the Enterprise offering gives you is priority access.

~~~
mjibson
Enterprise allows access to various features like distributed backup and
restore.

~~~
ahmedalsudani
Ah, my mistake. I stand corrected.

------
skybrian
I wonder how far apart those three nodes are and how much the latency between
them matters?

------
d0ugie
Hadn't heard of Cockroach but based on the article, this thread and the rest
of their site it sounds at least worth installing on a few hobby nodes if only
to get familiar with the behavior and configuration should a need arise - like
Cassandra was years ago when I had already on my own learned the gist of it,
sort of a road not taken relative to my then-firm's usual prescriptions (MySQL
and Mongo), it turned out to be perfect for my team's needs (paperwork to get
permission to use it notwithstanding). Thanks for posting and good luck!

------
hellofunk
Nice pun there. Cockroaches do indeed have a habit of making significant
strides.

------
Asdfbla
Does someone have more information about how they implemented serializable in
such a way that, as they claim, performance isn't negatively impacted? Seems
pretty hard to achieve that.

------
elvinyung
Since you only have 3 nodes, doesn't that mean every range is replicated to
every node? Doesn't that make joins trivial (i.e. no different from non-
distributed joins)?

~~~
d4l3k
Yeah, though from what I understand this benchmark is measuring both
transactional read and write performance rather than just join performance.

Transactional writes are likely the slowest thing since they need to talk to
all replicas.

~~~
elvinyung
Actually hmm, do reads need to talk to all replicas in this case (serializable
isolation)?

~~~
ComputerGuru
As I understand it, reads only need to talk to the lease owner, which in turn
bypasses raft since write consensus guarantees atomicity of after completion
of write intents. Cockroach tries best-effort to have the lease owner and the
raft leader be the same.

------
some_account
Congratulations to the cockroach team for putting out an awesome product :)

Would be great to see how it compares against postgres in similar scenarios.

------
strict9
Before clicking the comments link, I always know what to expect in HN comment
section for a CDB post announcing their latest milestone or feature:

A lot of congrats and excitement, questions about who uses it in a production
environment, very specific use-case questions, and of course the name.

Weird how predictable the response to one company/tech always is.

~~~
latenightcoding
People complaining about the name and how they are never going to be able to
use it in production because of how gross cockroaches are is definitely the
most recurring point. I think it worked well for them, since everyone
remembers the name, specially with all the distributed stores coming out
lately.

~~~
ngsayjoe
Sometimes i wonder the evolution of cockcroaches' grossness has something to
do with its high survivability?

------
pieterhg
Great stuff but this name really doesn’t work. Make it a name with positive
connotations.

~~~
expliced
I think it's a great name once you get the.. uh.. pun behind it.

~~~
arbitrage
What is the pun?

------
api
What drug was the person who drew that graphic on?

~~~
evrydayhustling
Heavy doses of Hieronymous Bosch?
[https://en.wikipedia.org/wiki/Hieronymus_Bosch](https://en.wikipedia.org/wiki/Hieronymus_Bosch)

------
johnmarcus
I will not use this product based on it's name alone, it give me jeepers.
Petty? Damn straight it's petty. Doesn't make it less real though.

