FFS, even RoachDB would've been a better name.
It's only a couple jumps away from their original intent.
May be a better question would be "Who would'a thunk you could name a record label 'Virgin' and get it off the ground?"
Anyway, my manager also liked to write his letters in Lotus123. Which is a round about way of pointing out that if management isn't going to evaluate technical decisions using technical criteria then there's no escaping the fact that pointy hair is as pointy hair does. If you're pitching the name, not the solution the name isn't the problem.
It sounds a little more palatable, whilst still keeping the spirit of the original.
I've been looking for a database which does sharding and replication automatically and without throwing away any focus on consistency and transactions, so I figure I'm likely to use this in the future. I've struggled to try to find any others meeting these criteria.
They've been lying. Never trust them.
Some datastores can actually do this, but performance per beefy server is less than you'd expect. You can use Riak but you have to write proper CRDTs. You can use zookeeper or etcd but those are for small amounts of configuration data, not for large amounts of customer data.
For all the datastores that claim to do everything automatically and have great performance, we can thank Aphyr for providing the proof that they don't live up to their promises, while we just suspect they don't.
I'd suggest trying to use a simpler model, and understand and accept its failure modes. Maybe your app has to go into read-only mode for a few hours if there's a server failure, etc.
I'm fine with failure modes like that. I just want it to be automated. I don't want to come home from a trip and find that my database master has fallen over and the database slave has been patiently waiting for me to manually promote it for the last few days. I could probably rig up some cron jobs and shell scripts to automate this, but this is what I'm looking for something to do for me that's hopefully written by people smarter than me.
There's a reason Apple bought them.
Netflix is the largest user, and they are well-known for their "Chaos Monkey" strategy of taking down servers randomly..
(full disclosure: I work at DataStax and work on improving cassandra every day)
Consistency: you want all you cluster to have the same data.
Availability: you want being able to lose one node or more in case of some issue.
Partition-tolerant: in case of net-split (think IRC), the splitted part of your cluster can work in isolation, then re-heal when the link is up again.
When someone sell you auto-healing and cluster reliability, they are selling you AP, which means that you lose the C, something we all takes for granted. Cassandra is one of those. Think of what you can't do when all your nodes can have different data of the same model.
Sorry for the useless explanation if you already knew that.
This is pretty hyperbolic. Netflix does perfectly fine with this model given that they run Cassandra at its lowest consistency level. If they can reliably store watch histories, run recommendations, settings, and playlists on this model well I'm wondering what you have in mind when you say "think of what you can't do". Besides, its not like large AP systems are a new thing, have you ever overdrawn your account?
There are a multitude of failure cases in which it cannot replicate those writes. Ultimately your database has to be a decision based on the availability and consistency needs due to your use case, period. "Trust" should never come into the discussion at all, you should be well aware of what your tradeoffs mean in the worst case.
"Today, we’re launching CockroachDB for everyone. Use it. Build on it. Contribute to it!"
Can someone (preferably from the team) clarify the current situation?
PS.: CockroachDB is the only distributed DB that I would bet on going forward and being a solid base for a big distributed DB.
So since you guys started working on the structured data layer, does that mean CockroachDB is going into Beta?
I can't wait to get my hands on a test setup even though it's probably going to take a long time before I can deploy it into production.
It feels like going back in time.
Well that's not remotely true, is it? Not even close. Is it really a good idea to lead with something so obviously untrue? If you're trying to convince me of something (i.e. that this product is good), putting such a jarring, obvious falsehood right at the start is a bad idea. I'm wondering if they're deliberately spoofing their own seriousness, but I see nothing else in there to support that.
Data is the beating heart of every business in the world.
I guess this is interesting, but distributed hard consistency pure K-V stores have been done before, Zookeeper, etcd, etc. It seems like the vast majority of the hard work is left to do. I don't want to get into naming arguments, but I wouldn't really call this a 'database' yet. It doesn't sound like you can do anything but a key lookup or range query currently, which is incredibly limiting for most real world applications.
I somewhat question the approach. e.g. why not figure out the hard part first? i.e. build the `SQL and data layers` on top of zookeeper or etcd then replace the backend to scale better? I would think this would get a lot more early adopters. As is, it's a very niche usage case that the alpha fills.
If you look at the documentation (eg., ), the design has been rather carefully thought out; it's just that they're implementing it from the bottom up.
According to their roadmap , they're aiming for KV functionality in 1.0 and aren't aiming for SQL until past version 2.0 (it's currently alpha).
Given the backgrounds of the technical people involved (including Google, as this project is inspired by Spanner), they should have a lot of experience with what they're trying to accomplish.
As for "done before", a core feature of Cockroach is true ACID transaction support, including snapshot isolation, something no distributed NoSQL database I know about supports. (ArangoDB does support transaction, but is mostly NoSQL in the sense of implementing a different query language than SQL.)
Zookeeper has ACID transactions which I believe are linearizable (which trumps SI). The downside is the memory only working set, but given how cheap memory is, I'd still rather have a memory only Zookeeper with a rich query interface than a large storage data KV store with minimal query interface.
> ArangoDB does support transaction, but is mostly NoSQL in the sense of implementing a different query language than SQL
What is your definition of NoSQL?
> What is your definition of NoSQL?
I don't have one, and I think the term isn't terribly useful. But the whole idea of NoSQL started as an attempt to break free of the relational aspect of SQL, because things like joins, strict schemas, foreign keys, and normalization were perceived as getting in the way of distribution. ArangoDB supports joins (but not foreign keys, because it's schemaless) and an SQL-like query language, which makes it a lot closer to an SQL database than something like Redis or Cassandra.
"The highest level of abstraction is the SQL layer (currently not implemented)."
I can always scan-and-switch when it's ready.
With that said, they seem to be assuming that their clock skew (ε) has a fixed maximum boundary which is incredibly disconcerting to me as it implies that in certain (rare and anomalous) network partitions that they'll get data corruption and fail.
I can see how they, coming from a Spanner background with atomic clocks, might assume this. But this assumption requires that their database cluster is always connected, within some heartbeat interval (which they mention) such that they can trust there exists a maximum bounded ε skew.
So while it seems like a dumb question, I honestly must ask a very trivial question: how does CockroachDB handle basic network partitions? I assume they have a good answer to this, but it needs to be clarified in order to answer the more important issue of anomalous partitions, like split brain. This might rip the crockroach in half, quite literally, meaning that all other "guarantees" they give get thrown out the window like linearizability and global consistency.
For a more in-depth explanation of the above, see https://gist.github.com/tschottdorf/57bcccc379b151456044.
Without a global clock you basically have to give up uncontended snapshot reads and linearizability for cross-shard transactions. That would be a completely different system from spanner and cockroachdb.
As far as network partitions go, a consensus must exist for reads or writes. If you don't have 3 out of 5 working correctly and talking to each other, then you are down.
Now why they would think to make that image a clickable link is beyond me...
Why it's a Wordpress default is beyond me...