

4 Months with Cassandra at Cloudkick (YC W09) - pquerna
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/

======
petercooper
I know Twitter's just done the dirty with Cassandra, but it seems Cassandra is
getting a massive PR boost this week, despite being one of the older (where
old is 2008!) NoSQL systems.

Can anyone explain/posit an idea as to why MongoDB and CouchDB have been
stealing the thunder for the past year rather than Cassandra? It just seems
odd.

~~~
fizx
Cassandra is by most accounts a pain to setup and administor. Mongo has good
docs.

~~~
ddispaltro
Its actually very easy to setup. Install and run ./bin/cassandra -f nothing
like the troubles with HBase.

Administration is pretty hands off for most tasks. For instance you want three
copies of each piece of data? Set the replication factor to 3. Other parts are
difficult to "get" if you don't have an understanding of the code, my best
advice is to make sure you approach Cassandra from a developer perspective.
You can't treat it like a black box quite yet.

------
mahmud
This stood out, negatively:

 _Since Cassandra uses Apache Thrift as the default RPC mechanism, exposing
the Thrift layer to any non-controlled data can be dangerous. We use firewalls
on our nodes to make sure our Thrift ports are only exposed to a very small
set of machines, because even just telneting into the port and typing "hello"
can cause the JVM to OOM._

\--

I use Redis and heavily guard its telnetable port, but it doesn't OOM. This
issue should have been fixed before public release, imo. You wouldn't want
something as simple and common as a port scan to shutdown your data layer.

~~~
pquerna
This is an issue specifically with Thrift -- not with Cassandra.

You can see the opinion of the Cassandra developers, which is pretty negative
towards thrift, in the linked ticket:
<https://issues.apache.org/jira/browse/THRIFT-601>

This is one of the many reasons the Cassandra developers are looking at
replacing Thrift with Avro: <http://hadoop.apache.org/avro/>

~~~
rmanocha
Can someone explain why Protocol Buffers
(<http://code.google.com/apis/protocolbuffers/>) is not being used more
widely?? It it 'cause of the limited language support (Java/C++/Python only)
or some other reason??

~~~
jhammerb
Google also kept the RPC side of Protocol Buffers, which is called Stubby, as
an internal project.

------
viraptor
I thought Cassandra used only keys for selects - but in this post I see you
can also use slices of from..to values. Are there any other predicates that
one can use? Like ones implementing 'LIKE string%' or 'LIKE %string%'?

It doesn't look like that from the API wiki, but maybe someone knows if that's
possible, or planned.

~~~
jbellis
all the from..to predicates can take a prefix in from. (i.e., LIKE foo%, but
not LIKE %foo%).

('to' is optional, and technically, so is 'from'.)

------
js2
The post doesn't mention the environment in which you're running Cassandra.
Any chance you're running it in the cloud (EC2?), or are you running it on
real h/w?

~~~
ddispaltro
We are running in a virtualized environment, we've done a pretty good job at
adding capacity for performance and space reasons ahead of the demand.
Loadbalancing is pretty hands off, its just something that takes some time and
fundamental understanding, as there are still ways to shoot yourself in the
foot with this system.

------
aliasaria
Anyone know what tool is used to draw the graph in the post? Is that an open
javascript library?

~~~
ddispaltro
Its actually amCharts a flash graphing tool.

<http://amcharts.com/>

