Comments are denormalized and stored two ways: One in a plain ColumnFamily (for ...

Comments are denormalized and stored two ways: One in a plain ColumnFamily (for random access) and one in a SuperCF (for sequential access).

The plain CF ("Comments") uses the comment ID (which is a timestamp+salt) as the row key. The fields of the comment are columns in the row (username, text, date_created etc).

The SuperCF ("StoryComments") uses the story ID as the row key. Each row contains one SuperColumn per comment. The SC name is the comment_id, and the columns are the fields of the comment.

So, say you want to get the first 50 comments for a given story, you'd do a get_slice on StoryComments, passing the story ID in as the row key, "" as the start column, and a count of 50. You get back 50 SuperColumns, each of which contains one comment.

Cassandra is extremely good for sequential reads, not just random lookups. Just about any list of things can be efficiently stored and retrieved with its batch insertion and slicing operations.