
Cassandra Internals – Reading - r11t
http://www.mikeperham.com/2010/03/17/cassandra-internals-reading/
======
ableal
Those interested in the topic may also want to read this:

"Why we’re using HBase (at Adobe)": <http://hstack.org/why-were-using-hbase-
part-1/>

It is a fine "war-story" of picking new technology and making it work without
losing data.

(It was submitted yesterday by the author here:
<http://news.ycombinator.com/item?id=1196382>, but got killed with 5 points,
which baffles me. I found it when puzzling out why my submission today was
instantly killed, with a different item id ...)

[P.S. minor bug report: my 'dead' item has a working link to the article,
which it perhaps shouldn't. <http://news.ycombinator.com/item?id=1200833>]

------
jbellis
The reason uncached reads are slower in Cassandra is not because the sstable
is inherently io-intensive (it's actually better than b-tree based storage on
a 1:1 basis) but because in the average case you'll have to merge row
fragments from 2-4 sstables to complete the request, since sstables are not
update-in-place.

------
suhail
Little misinformative imo. While Cassandra has eventual consistency, reads are
not slow necessarily. With the right Cache settings tuned correctly
(KeysCached/RowsCached) and available memory, Cassandra actually performs
quite well. Cassandra is virtually worthless without those cache features kind
of like MySQL is without indexes. They are slower than writes but I think it
would've been more proper to talk about how the cache works and more
interesting.

Like any database, MySQL/Postgres/etc, it's a dark art in terms of
understanding how to make it work.

~~~
jbellis
Right. Digg dropped memcached entirely from their architecture when we added
RowsCached to Cassandra.

------
CWIZO
Cached (by Google) text only version:
[http://209.85.129.132/search?q=cache:http://www.mikeperham.c...](http://209.85.129.132/search?q=cache:http://www.mikeperham.com/2010/03/17/cassandra-
internals-reading/&hl=en&strip=1)

------
ra
Cassandra isn't easy to learn like, say, couchdb. But Couch uses JSON (An
awesome choice, BTW), and Cassandra uses Thrift.

Cassandra is kinda difficult to pick up because there is no SQL equivalent,
there are no relationships, joins or "where"s.

So, basically, it's an engine without user friendly controls. But - it's
probably the most awesomely powerful storage engine yet available in the
public domain.

Imagine if Google released a server image of one of their storage nodes...
ostensibly, that's what Facebook did with Cassandra.

