Hacker News new | comments | show | ask | jobs | submit login

TBH, we didn't seriously pursue Cassandra when we considered distributed database systems b/c the vast majority of "back-reference checks" we did on the YC network and other area startups was "stay away."

We got some very frank advice from some people whose opinions on databases I take very seriously to stay away, including reports from within FB.

Having said that, I cannot claim to have firsthand proven or disproven anything about Cassandra.

Facebook for a long time didn't use the vastly (and I mean vastly) improved open source version of Cassandra, instead opting for their internal fork. Instead of choosing to do so, I believe they have now switched to HBase, mainly for its easier consistency model. So I would take their advice with a grain of salt, because it's probably based on their experiences with an old fork.

There are a few people (YC companies even, alas) who are very vocally negative about Cassandra, but I also saw some of those same people ignoring direct advice given to them in #cassandra on IRC, and then turning around and bashing it when it didn't work as planned. Simply following the advice could have made for a completely different story.

I suppose the lesson to learn is that you need to develop software in a way that simply won't allow developers to shoot themselves in the foot, because people never want to blame themselves for doing it, they blame the gun.

Eric, Jonathan Gray said it very clear at his talk at BerlinBuzz: Facebook is now using HBase instead of Cassandra. http://berlinbuzzwords.de/content/realtime-big-data-facebook... You can find a lot of info about the FB process to choose HBase in favor of Cassandra. This one for example: http://facility9.com/2010/11/18/facebook-messaging-hbase-com...

That's exactly what I'm talking about: that facility9 blog post explaining why they chose HBase had many factual errors about Cassandra when it was posted, and had to be revised after several respected people in the space contacted the author.

Quora's decisions to not use Cassandra and Adam's answer regarding it lead me to the same conclusion (http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use...). Evidently few from Facebook are advocating it.

Quora on MySQL failed outright when AWS EBS failed, companies on AWS using Cassandra like SimpleGeo and NetFlix did not. To their credit, Facebook were clear enough on their reasons for using HBase over MySQL and Cassandra, such as wanting to double down on their current Hadoop system/knowledge and having easily obtainable ordering guarantees on messages. It's also clear they've invested in making HBase good enough.

At large loads and footprints, imvho, Riak, Cassandra and HBase present viable options. But there are some factors to consider that don't seem to get mentioned in the pop tech press

- What are you able to operate in production?

- What are you able/willing to debug and patch?

- What hardware options do you have?

- What are your workloads?

- Which variable of C.A.P, when you lose it, most damages your business?

- Will your company's choices be evaluated in the press?

- Does your board/investors have capital tied up in business's that are using something else?

- What architecture tradeoffs and styles sit well with you?

- What kind of data access and consumption patterns make you money?

- Can you pay for help?

The right choice is context sensitive, and I'm fairly sure for this class of systems at this point in time, there's no free lunch. That means you have to do the legwork for yourself and make your own choices and commitments; doing what you heard worked for someone else is a cargo cult.

Or maybe it's wisdom. "A fool learns from his mistakes, but the truly wise learn from the mistakes of others." -- Otto von Bismarck

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact