
Thrift, Scribe, Hive, and Cassandra: Open Source Data Management Software - prakash
http://www.cloudera.com/blog/2008/10/24/thrift-scribe-hive-and-cassandra-open-source-data-management-software/
======
KirinDave
This is interesting stuff (especially Cassandra), but I have to confess I've
had nothing but troublesome and negative experiences with Thrift in my
experiences with it, and right now I'd say that's the weakest link in this
structure.

Thrift is fantastic when you're working in a C++-ish mode where CORBA-style
IDL was as good as it got. But when it comes time to try and build something
more flexible in terms of the protocol, or interface with scripting languages
like Python which depends on a less typed kind of architecture, you're going
to hit a lot of growing pains.

For example, I work on a project (which you can see public code for, check my
github for Fuzed) that we'd like to look into providing a generic thrift
interface for. But so far the Thrift infrastructure, despite being entirely
capable of it, doesn't seem to show much interest in embracing flexible or
generic protocols. Everything needs to be big-banged out up front, and this
just doesn't jive with larger meta-projects building out fault-tolerant and
flexible infrastructure that Hadoop doesn't meet the needs of.

This post's intent is not to say, "Thrift is terrible." Indeed, it is awesome
at what it currently does. But if you want to go beyond that you're going to
have to invest significant time and resources to get the protocol that binds
all these amazing services up to snuff.

------
shailesh
Interesting stuff. I also noticed another article that explains how to
configure and use Scribe for Hadoop Log collection:

[http://www.cloudera.com/blog/2008/11/02/configuring-and-
usin...](http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-
for-hadoop-log-collection/)

The e-mail address at the end of the article goes to support id. That is a
good way to attract customers in startup mode.

------
DenisM
Amazing stuff.

I am particularly interested in Scribe:

 _A Thrift service for distributed logfile collection. Scribe was designed to
run as a daemon process on every node in your data center and to forward log
files from any process running on that machine back to a central pool of
aggregators. Because of its ubiquity, a major design point was to make Scribe
consume as little CPU as possible._

