Hacker News new | comments | show | ask | jobs | submit login

Yeah, I've considered other options. Riak seems really nice, actually - I'm going to be trying that out next. I do like MongoDB's support for Windows, though. I'm also hesitant to use Cassandra, considering all the trouble that Reddit's had with it [1].

[1] http://blog.reddit.com/2010/05/reddits-may-2010-state-of-ser...

Reddit's initial problems were from being woefully underprovisioned, which they acknowledged.

Here's what David King [ketralnis] said six months later: "Running any large website is a constant race between scaling your user base and scaling your infrastructure to support it," said David King, Lead Developer at Reddit. "Our traffic more than tripled this year, and the transparent scalability afforded to us by Apache Cassandra is in large part what allowed us to do it on our limited resources. Cassandra v0.7 represents the real-life operations lessons learned from installations like ours and provides further features like column expiration that allow us to scale even more of our infrastructure."


Reddit also just uses it as a dumb K/V store and don't take advantage of the column-store model. In theory, all of the comments for a story could be stored as columns in a column family. While this would reduce the write throughput to something like ~1000/second -- for a single story -- because partition granularity is at the row level, within a row, all the columns in a CF are stored sequentially on disk, and it also means only a single node would need to be interrogated to get the data, rather than having to broadcast a massive multi-get across the entire cluster.

EDIT: the column keys are also returned in sorted order, so this could be used to pre-sort the comment list by it's optimal access pattern.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact