Cassandra is a scalable key/columnfamily database designed for supporting low-la...

damienkatz · on Sept 9, 2009

> CouchDB is a document database that supports two-way replication so you can re-sync after taking part of the data offline, but it's still designed around the concept of a single master that holds all the data

To clarify, CouchDB's replication features are peer-based and ad-hoc, there is no "master" replica. And you can take part or all of the data offline, or spread it geographically.

I think by single master, you mean CouchDB doesn't have support for partitioning a single logical database across machines, which is true (though there are projects for large scale partitioning with CouchDB, but it's not in the core yet).

ddiljoy · on Sept 9, 2009

Does a Cassandra cluster stay write-available in the event of a network partition?

If so, how does it reconcile writes when the partition heals? Last I looked, Cassandra doesn't use vector/logical clocks - doesn't that potentially cause data loss when the partition heals if you're using a simple last-write-wins based on physical timestamps for a reconciliation policy? Does Cassandra use merkle trees for anti-entropy?

From what I can tell, although Cassandra claims to be write-fault-tolerant, the dependence on physical timestamps and the lack of the self-healing properties that merkle trees provide make me nervous about data loss and inconsistency when deploying it at scale.

jbellis · on Sept 9, 2009

> Does a Cassandra cluster stay write-available in the event of a network partition?

The client can specify whether it wants consistency (refuse writes if not enough write targets are there) or availability.

If it chooses availability, then Cassandra sends extra copies to nodes it _can_ reach, with a tag that specifies who the "real" destination is. When that node is reachable again it will be forwarded. ("Hinted handoff.")

> how does it reconcile writes when the partition heals?

As you said, last-write-wins. The experience with Dynamo showed that most apps don't want to deal with explicit conflict resolution, and don't need it. (But, I suspect we will end up adding it as an option for those apps that do. In the meantime, if Cassandra isn't a good fit, we're not trying to hard-sell anyone. :)

> Does Cassandra use merkle trees for anti-entropy?

Not yet, but my co-worker Stu Hood is working on this. Should be part of the 0.5 release.

> the dependence on physical timestamps and the lack of the self-healing properties

Whether the first is an issue is app-specific. As to the latter, I'm excited to get the merkle tree code in, too.

In the meantime, Cassandra _does_ do read repair and hinted handoff, so in practice it's what I would call "barely adequate." :)

_csoo · on Sept 9, 2009

Facebook has 40TB on a 150-machine Cassandra cluster

So why did people stop using mainframes again?

btilly · on Sept 10, 2009

People haven't stopped using mainframes. In fact sales of mainframes have been growing at a very healthy clip.

That said, commodity hardware is a lot cheaper than niche hardware due to increased competition and economies of scale. So any problem that can reasonably be tackled on commodity hardware is generally cheaper to do so.

However not all problems are a good fit for commodity hardware. Mainframes win if you need reliability guarantees, and win again for sustained high volume IO. Commodity hardware doesn't keep up. Similarly there are computational problems that require a lot of IO and communication between the processing nodes. Supercomputers beat clusters of commodity hardware for those problems.

But if you've got a computational problem and it doesn't fall into one of those narrow categories, commodity hardware will be cheaper.