

How Do I Cassandra? - tjake
http://www.slideshare.net/rbranson/how-do-i-cassandra

======
dpritchett
Attended Rick's presentation last night. It was _so_ choice. If you have the
means, I highly recommend taking one in.

The md5 hashing to shards around a keyspace ring and the read/write quora were
particularly interesting. Definitely going to be browsing the Reddit source
(<https://github.com/reddit/reddit>) for ideas on how to use Cassandra.

~~~
pgr0ss
You should read the Dynamo paper:
[http://www.allthingsdistributed.com/2007/10/amazons_dynamo.h...](http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html)

The ring is first explained in Section 4.2 Partitioning Algorithm.

------
rbranson
FYI -- the presentation says "temporal data" is a bad use case for Cassandra,
but that's a misuse of the word temporal. "Ephemeral data" is a better way to
say that.

------
mikeryan
<< META COMMENT >>

I'm always surprised when someone's raw slide decks, for an interesting
presentation make it to the front page of HN. I get so little from just an 85
slide deck, especially decks with a slide like

DATA MODEL

I gotcha column family ratttt heeeeyyyaahhh!

~~~
rbranson
Yup. Slides are best used as an illustrative tool.

~~~
checker
I got a lot out of your slides. Sometimes it can be difficult to discern the
point of the slide without audio, but I felt that your slides communicated the
point pretty well. Thanks for sharing!

------
koobe
Does Consistency in Cassandra require client clocks to be in perfect sync?

Y u no vector clocks?

~~~
rbranson
Timestamps were chosen over vector clocks intentionally.

"...the primary use case for classic vector clocks of merging non-conflicting
updates to different fields w/in a value is already handled by cassandra
breaking a row into columns."

See <https://issues.apache.org/jira/browse/CASSANDRA-580>

------
ajtaylor
Finally, I get it! This go around, the ideas of ColumnFamilies stuck. Thanks
for the slides. Any chance the video is available to go along with it?

------
deweller
Is there an audio or video archive of this presentation available on the web
perchance?

~~~
thomaslangston
No. Unfortunately audio/video recording equipment seems to be sparse in the
Memphis user group communities. As far as I know we only have one group that
does it regularly. <http://www.justin.tv/launchmemphis>

If anyone has a good, cheap solution for recording/streaming user group
presentations, please share.

------
nowarninglabel
I'm used to attending and giving presentations that are a bit outlandish, but
this is gimmicky enough that I started losing track of actually learning
anything out of this.

I'm sure it's better in-person or with a recording though.

~~~
bkmontgomery
I was at the talk, and it was very good in person... I really wish we'd
recorded it :(

------
rvenugopal
How does the ring range work with nodes going down and new nodes being
introduced into the cluster. As per slide 38, it appears to be a static range.

~~~
rbranson
The ranges are assigned by a per-node token. With 4 nodes like the cluster
shown in the slides, you'd space the tokens apart appropriately for 25% of the
range on each node.

When you add a node, you need to decide if you want to rebalance the cluster
by moving every node's token a bit so that every node's range consumes 20% of
the keyspace, or if you just want to quickly alleviate load by splitting a
single node's traffic in half by assigning the new node's token half way into
an existing node's range.

Decommissioning a node works the same way, but in reverse.

More detailed info:
[http://www.datastax.com/docs/1.0/operations/cluster_manageme...](http://www.datastax.com/docs/1.0/operations/cluster_management#adding-
capacity-to-an-existing-cluster)

------
idan
Y U NO SPEAKERDECK?

Seriously, it's a much more pleasant slide-consuming experience for your great
slides.

------
grout
You don't, if you're smart.

[http://chip.typepad.com/weblog/2011/08/why-cassandra-is-
unfi...](http://chip.typepad.com/weblog/2011/08/why-cassandra-is-unfit-for-
production.html)

~~~
rbranson
That's a 24-hour-news-network-pundit grade straw man he's created there. The
replication strategy used by Cassandra is sourced from the Amazon Dynamo
design (shared with Riak and Voldemort) and is intentional. It's not a bug;
it's the way it's supposed to operate. There are many large scale distributed
systems built on this replication strategy that have excellent durability
track records -- Amazon S3 is one of them.

The back pressure argument is laughable. In Cassandra, a write consistency of
greater than ONE will ensure the data is replicated to more than one node
before acknowledging to the client.

~~~
grout
You use argument from authority ("It's based on Dynamo, which is made by
Amazon, so you know it's good!") while accusing me of a strawman for reporting
my own personal experiences.

Good thing there is no god, or you should have to watch out for lightning.

~~~
rbranson
Actually, this is empirical evidence of the validity of this approach gained
through observation of successful, real-world implementations. I'm not saying
it's good purely because it's from Amazon or because any one person said it
was good, but that there are many successful large-scale systems built using
these principles that do not suffer the durability issues from your
hypothesis.

While not proof in the scientific sense, this type of evidence is used as a
basis for scientific theories that explain much of what we know about the
world.

However, I still look forward to your detailed, rigorous rebuttal of my
arguments in the previous comment.

