Wow, that's...not fast.
Jepsen is essentially a worst-case stress test for a consistent database: all the transactions conflict with each other. CockroachDB's optimistic concurrency control performs worse with this kind of workload than the pessimistic (lock-based) concurrency control seen in most non-distributed databases. But this high-contention scenario is not the norm for most databases. Most transactions don't conflict with each other and can proceed without waiting. Without contention, CockroachDB can serve thousands of reads or writes per second per node (and depending on access patterns, can scale linearly with the number of nodes).
And of course, we're continually working on performance (it's our main focus as we work towards 1.0), and things will get better on both the high- and low-contention scenarios. Just today we're about to land a major improvement for many high-contention tests, although it didn't speed up the jepsen tests as much as we had hoped.
I've been testing with node.js's pg client (not even using the 'native' option) and on 1 node I get > 1,000 single inserts /sec. I'm seeing this scale linearly too. NB I'm testing on a pretty overloaded macbook with only 8gb RAM. On a decent-sized cluster with production-grade hardware and network, performance is really unlikely to be a bottleneck. Also, it'll significantly improve during this year, one imagines.
All databases tend to struggle to under contention. If you are an application author building a high-scale app, you need to avoid contention by design. That single counter row you were planning to increment on every visit? Bad idea; maybe insert a new visit record instead.
The contention in this example is an aspect of the test design, not database design. You wouldn't try to build an app this way.
a.) this is a beta product where the team has been focusing on completeness and correctness before performance
b.) I was testing Cockroach from August through October--Cockroach Labs has put a good deal of work into performance in the last three months, but that work hasn't been merged into mainline yet. I expect things should be a good deal faster by 1.0.
c.) Jepsen does some intentionally pathological things to databases. For instance, we choose extremely small Cockroach shard sizes to force constant rebalancing of the keyspace under inserts. Don't treat this as representative of a production workload.
When there are basic expectations of behavior that aren't being met, many of us would reject the idea that this code is 'correct'.
IIRC, in the case that aphyr refers to for these specific numbers, the reads are scans that span multiple shards, while the writes are writes to single shards.
 even though aphyr says it's just a hundred rows, the tables are split into multiple shards because aphyr in this case was specifically testing our correctness in multi-shard contention scenarios. In production you wouldn't have multi-shard reads crop up until you were doing scans for tables that were hundreds of thousands of rows in size. It's easier to picture if you think of this performance speed difference in the scenario where you are doing full table scans spanning multiple shards on multiple computers while the underlying rows are being rapidly mutated by contending write transactions. The transactions get constantly aborted to avoid giving a non-serializable result, and performance is suffering. We agree that the numbers in this contention scenario are too low, and we are actively working on high-contention performance (and performance in general) leading up to our 1.0 release.
 Specifically, we break up into multiple shards when a single shard exceeds 64mb in size.
 You can follow along on one of the PRs that address this specific performance issue here: https://github.com/cockroachdb/cockroach/pull/13501
The benchmark isn't quite apples to apples. I'm used to people doing put/get benchmarks, not put/search workflows. Merging results from multiple machines is nothing to sneeze at, especially if you have replication between nodes.
I worked for a search engine dotbomb, and I had a constant worry that the core engineering team had not solved a very similar problem. But since the funding round failed I'll never know for sure what they had planned.
This was IMO the big innovation of true time in spanner - contention free reads at either now, or at some time in the past.
Reads per second is ultimately a measure of how many operations can be retired per second, not how long the operation takes. Anyone who has spent fifteen minutes learning about capacity planning should know that. 5 reads per second doesn't mean each read takes 200ms. It might mean 1 second per read spread across 5 threads of execution, or 3 seconds per read across 15 workers.
Optimistic locking systems inherently perform poorly under contention. But they also perform better than pessimistic concurrency systems overall because in the real world we design applications to avoid contention.
As an example, the Google App Engine datastore runs zillions of QPS across petabytes of data in a massive distributed cluster. But if you build an app that does nothing but mutate a single piece of state over and over, you'll top out at a couple transactions per second. This is painful if you're trying to build a simple counter, but with minimal care you can build a system that scales to any sized dataset and traffic volume.
For instance you wouldn't expect a single user to make 5 comments or upvotes per second so storing data about recent activity with the user isn't a bottleneck that you need to design for. Storing data about responses with an item might also be okay as long as you don't plan to be HN or Reddit (Github, for example, would be just fine). But if you want to track activity globally (eg, managing the watch list notifications in github), you will need to design around that number.
Yes, the general advice for users of the GAE datastore is to build Entity Groups around the data for a single user. That isn't absolute though, and it doesn't cause problems for watch lists; the Watch can be part of the User's EG rather than the Issue's EG. Or it can be its own EG. In practice this doesn't require as much consideration as you probably imagine.
FWIW, you're also describing Cassandra and it seems to do fine in the marketplace.
The test is testing the worst-case scenario of everything needing locking. CockroachDB uses an optimistic "locking" approach which makes this bad. But if you're use-case is strictly linearizable high-throughput reads good luck finding anything that is not a single-machine database.
All this assumes you have only so much data that can fit in one machine. If that's not the case you're in need of a database that spans more than one machine.
One had some perf issues.
First he just moved the DB to avoid having it on the same machine as the web server. Saved him a year.
Then as the user base grew, he just bought a bigger server for the db.
Right now it works really fast for 400k u/d, and there are still 2 offers of servers more powerful that the current one he has. By the time he needs to buy them, 2 new more incredible offers will probably exist. Which is a joke because more traffic means he has more money anyway.
Having a cluster of nodes to scale is a lot of work. So unless you have 100 millions users doing crazy things, what's the point ?
There are many very small startups doing work in IoT, Analytics, Social, Finance, Health etc who have ridiculously challenging storage needs. Trying to vertically scale is simply not an option when you have a 50 node Spark cluster or 1M IoT devices all streaming data into your database. Vertically scaling your DB doesn't work as it is largely an I/O challenge not a CPU/Storage one.
But hey it's cute and lucky that you work on small projects. But many of us aren't and have to deal with these more exotic database solutions.
Plus if you manage to do 50 write sec on this DB, I don't see how you plan to scale anything with it. You are back to use good old sharding with load balancers.
So by all means, scale up your single DB server. But if you ever get the happy issue of even more growth, and needing to scale beyond a single server, then you "should" be able to migrate your data to a cockroachDB cluster, and not have to change much, if anything, in your application.
These databases are designed for use-cases which require the consistency of a relational database, but cannot fit on a single instance.
The problem is not just the performances, it's that distributing has a huge cost in term of servers and maintenance.
If you can write only 50 times a second, your data set won't get big enough to justify distributing it.
Put your millions of row in one server and be done with it. Cheaper, faster, easier.
There is a tendancy nowaway to make things distributed for the sake of it.
Distribution is a constraint, not a feature.
Can you really not see why distributed databases are needed? High availability, (geo) replication, active/active, oversized data, concurrent users, and parallel queries are just a few of the reasons.
Distribution will always come with a cost, but the tradeoffs are for every application to make. We use a distributed SQL database and while it's faster than any single-node standard relational DB would be, speed isn't the reason we use it.
If the DB can only handle 50 ops/sec then the point you are making here is valid.
But see https://news.ycombinator.com/item?id=13661349, that's a pathological worst-case number. You should find more accurate performance numbers for real-world workloads before making sweeping conclusions.
Your original comment was:
> Even if they manage to multiply this by 100 on the final release, it's still way weaker than a regular sql db
This is what my comment, and the sibling comments, are objecting to, and I don't think you've substantiated this claim. 100x perf is probably well in excess of 10k writes/sec/node, which is solid single-node performance (though you'd not run a one-node CockroachDB deployment). Even a 10x improvement would get the system to above 1k writes/sec/node, which would allow large clusters (O(100) nodes) to serve more data than a SQL instance could handle.
Obviously I'd prefer to be able to outperform a SQL instance on dataset size with 10 nodes, but for a large company, throwing 100 (or 1000) nodes at a business-critical dataset is not the end of the world.
Performance is very loosely correlated with dataset size and less so in most distributed NoSQL databases like CockroachDB.
When the project can no longer fit onto a developer's box it changes a bunch of dynamics and often not for the better. Lots of regressions slip in, because developers start to believe that the glitches they see are caused by other people touching things they shouldn't.
With Mongo we had to start using clustering before we even got out of alpha, and since this was an on-premises application now we had to explain to customers why they had to buy two more machines. Awkward.
- it's implicit, and you have to inspect the db and the code to understand it.
- there is no single source of truth, so any changes better be backed up by unit tests. And training is awkward. If some fields/values are rarely use, you can easily end up not knowing about them while "peeking at the data".
- the constraints are delegated to dev, so code making checks is scattered all around the place. Which makes consistency suffers. I almost ALWAYS find duplicate or invalid entries in any mongo storage I have to work on. And of course again, changes and training are a pain. "Is this field mandatory ?" is a question I asked way to often in those mongo gigs.
Things like redis suffer less from this because they are so simple it's hard to get wrong.
But mongo is a great software. It's powerful. You can use it for virtually anything, whatever the size, quantity, or shape. And so people do it.
The funny thing is, the best MongoDb users I met are coming from an SQL background. They no the pros and the cons, and use the flexibility of Mongo without making a mess.
But people often choose mongo as a way to avoid learning how to deal with DB.
But I didn't want to turn the Cockroach scalability discussion into another roast of Mongo so I very purposefully only talked about the horizontal scaling vis a vis 'immediately have to scale' versus 'very capable of scaling' horizontally.
If you gave me a DB that always had to be clustered but I could run it as two or three processes on a single machine without exhausting memory, I would be fine with that as long as the tooling was pretty good. As a dev I need to simulate production environments, not model them exactly.
But if the whole thing only performs when it's effectively an in-memory DHT, there are other tools that already do that for me and don't bullshit me about their purpose.
The issue is that they are annoying to use. You have to create a schema, you have to manage the evolution of that schema (great another tool), you have to build out these complicated relational models and finally you have to duplicate that schema in code.
And so IMHO that's why people move towards MongoDB. It's amazing for prototyping and then people just well stick with it as there isn't a compelling enough reason to switch.
It's been so long since I touched SQL that I'm forgetting how. Which means that my coworkers haven't touched it either, and some of them that were junior are probably calling themselves 'senior' now, having hardly ever touched SQL. It's no wonder noSQL can waltz into that landscape.
But what's the old rule? If your code doesn't internally reflect the business model it's designed for, then you'll constantly wrestle with impedence mismatches, inefficiencies, and eventually the project will grind to a halt.
So what I'd rather see is a database that is designed to work the way modern, 'good' persistence frameworks advertise them to work. That would represent progress to me.
Or run ToroDB Stampede  and look at the generated tables manually or via other tools like Schema Spy .
(warning: I'm a ToroDB dev)
Lots of data, parallel queries, high-availability, replication, multi-master, etc are all instances that require true distributed scalability.