

Unhappy Programmer: Whining about Thrift and Cassandra - cassie

Earlier this year I embarked on a project where we had the opportunity
to experiment with a lot of new things.  We had been given a lot of
freedoms that most projects do not enjoy.<p>The first technology we looked at was Google Protocol Buffers for
representing data.  Protobuffers are done really well and they are
easy to use in a multitude of languages we care about.  The problem is
that there is one critical piece missing:  Google never bothered
releasing their RPC mechanism, and word around the campfire is that it
is never going to happen due to the spaghetti of deep
interdependencies this RPC layer has with lots of other stuff in the
google codebase.  Also, no clear open source project has filled the
spot in any convincing manner.<p>So we dropped protobuffers and had a look at Thrift. On the surface of
things Thrift fits the bill. Thrift isn't quite as nice as
protobuffers but it is close. More importantly, it comes with an RPC
mechanism. Or shall we say, it comes warious RPC mechanism options.<p>Okay, we have an RPC layer.  On to the storage system.<p>We had a look at CouchDB, Voldemort, Cassandra and Hbase and decided
to give Cassandra a chance.<p>Now Cassandra has been getting a lot of press lately and it is indeed
a relatively sweet system. There are a few problems though.<p>First off, would it have killed the designers to use nomenclature that
makes sense to people? Call a table a table. Call a row a row. And who
the hell figured it would be a good idea to refer to tuples and maps
as various types of columns? Most people have certain expectations of
what a "column" is and it ain't what the Cassandra designers think.<p>The reason why you have types such as Set, Map and List in Java is
because these have defined meanings in mathematics.  The designers
didn't just assign new meanings to words that already had meaning.<p>Second, neither Thrift nor Cassandra are available through any official
Maven repositories yet even though it has been <i>forever</i> since they
were released as open source. That right there is a big warning sign.<p>It means that people need to fiddle around embedding dependencies in
their projects. Even Cassandra is using a older version of Thrift,
which it has to embed in the build -- so if you were thinking of using
a relatively new version of Thrift in your Cassandra-backed Thrift
service you have to think again (or go through the pain of making two
versions of the same library play nice within the same JVM).<p>Third, some of the attitudes really stink.<p>While reading through the mail archives I came across an issue that I
myself experienced. I had a thrift service running and all of a sudden
it crashed with an OutOfMemoryError. I tracked down the bug to the
framed transport implementation and to my horror discovered that it is
a fairly naive implementation: it'll just read a sequence of bytes off
the wire, interpret them as a number and then try to allocate a buffer
of that size. There was not even a comment in the code that this is
relatively poor design and that it might be an idea to implement a
chunked framed transport (so you can move large objects while still
discovering frame errors without committing lots of resources). But I
digress. One of the responses I found in the mailing archives amounted
to "don't do that then. thrift is expected to run in a trusted
environment". Huh!?  Have these people even worked for an Internet
company before?  There's a <i>lot</i> of stuff going on in a datacenter and
you CANNOT have a critical system go down just because some program
erroneously connects to the wrong port.<p>Fourth, while trying to develop for Cassandra I needed to implement
proper unit tests. This proved to be amazingly fiddly since embedding
Cassandra is a sheer nightmare. I looked at how one of the client
library designers had done it and ... I got a bit sad. Not an elegant
solution.  The short version is:  if there is any chance you'll be
running more than one instance of Cassandra at the same time in the
same JVM you are fucked.  I guess someone didn't get the memo on how
to design singletons properly.<p>Fifth, I am really amazed by the fact that Facebook just threw Thrift
and Cassandra over the fence and then never bothered to make sure that
things progressed to a usable state in a timely manner. Right now,
wide adoption of Cassandra and Thrift is gridlocked. I see people play
with it for a while and then ditch it.  Why did Facebook open source
it in the first place if they never intended to drive the project
forward? (Same goes for Google, why on earth did they just dump
Protobuffers into open source and then largely abandon it?)<p>As mentioned earlier, it has been <i>forever</i> since Thrift was open
sourced, and the thing is still not available through any official
maven repositories. Which means that EVERY downstream project is
affected. Including Cassandra.<p>I didn't want to write this piece. What I wanted was to sit down,
learn the codebase thoroughly and see where I could contribute to
pushing these hings to where they should already be (even though I've
been told "well, good luck with getting any patches accepted"). 
I went to my managers and said that I would like to take a quarter out
of our current project to make Cassandra and Thrift usable for
everyone. Unfortunately we do not have the budget or the time for
that.<p>I think Cassandra has a huge potential, but that it is slowly being
wasted.  Not enough people care about things that <i>really</i> matter to
developers and unless this is recognized and addressed I think
Cassandra is going to have a really bleak future.<p>Yes, this is a whiny piece, but I am at my wits end.
======
isnoteasy
Well, you say you don't have the time or the budget to contribute to Cassandra
and Thrift now. Don't despair, perhaps in the future you could contribute to
those projects.

