

HBase to Cassandra: why we switched - jbellis
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved

======
cx01
"[..] and Cassandra being more suitable for real time transaction processing
and the serving of interactive data."

Does Cassandra actually support transactions?

"For example, adding a new node to the system becomes as simple as
bootstrapping its Cassandra process and pointing it at a seed node (an
existing node within your cluster)."

You could easily have this in a distributed system that has a single master
(implemented as a distributed state machine), without all the disadvantages of
a gossip-protocol.

"Secondly I have come to the conclusion that Cassandra’s P2P architecture
provides it with performance and availability advantages. Load can be very
evenly balanced across system nodes thus maximizing the potential for
parallelism, the ability to continue seamlessly in the face of network
partitions or node failures is greatly increased, and the symmetry between
nodes prevents the temporary instabilities in performance that have been
reported with HBase when nodes are added and removed"

None of these features require a P2P system. Actually, a P2P system will in
most cases be slower than a hierarchical one.

~~~
jbellis
> Does Cassandra actually support transactions?

He means in the sense that databases have typically been divided into
"transaction processing" (doing a small set of operations over and over with
large concurrency) and "analytics" (doing potentially monstrous ad-hoc queries
w/ very low concurrency).

> You could easily have this in a distributed system that has a single master
> (implemented as a distributed state machine), without all the disadvantages
> of a gossip-protocol.

Sure, but then you have all the disadvantages of a single master system. :)

For most systems the single master system and its potential for catastrophic
downtime if failover goes badly (which it _always_ does eventually; if you
claim otherwise you are a novice or selling snake oil) is the worse choice.

> a P2P system will in most cases be slower than a hierarchical one.

I call BS. An O(1) routing p2p system like Cassandra has no inherent speed
disadvantage over a heirarchical system.

Case in point: Cassandra is substantially faster than HBase, its closest
heirarchical competitor. There's also Hypertable, but to a first approximation
nobody uses it so I don't know of any benchmarks.

~~~
cx01
> For most systems the single master system and its potential for catastrophic
> downtime if failover goes badly (which it _always_ does eventually; if you
> claim otherwise you are a novice or selling snake oil) is the worse choice.

I don't know what you're talking about. If implemented correctly, a
hierarchical system is extremely unlikely to fail. I mean, if your network is
split into 3 partitions, the master will be unavailable, but in that situation
you're going to have worse problems than availability, because your web-
servers are unlikely to be even reachable from outsite the datacenter.

> I call BS. An O(1) routing p2p system like Cassandra has no inherent speed
> disadvantage over a heirarchical system.

Nope. If you have a master that stores the dictionary, then all lookups are
also O(1). Even better, you can randomly distribute keys across the nodes and
are not bound by the hashing algorithm.

I don't know about the performance between HBase and Cassandra; I'm strictly
talking about theoretical performance.

~~~
jbellis
> If implemented correctly, a hierarchical system is extremely unlikely to
> fail.

You should go show google how they're doing it wrong so they can keep app
engine up.

