

Cassandra at Twitter Today - abraham
http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html

======
seldo
I don't know about you, but to me this seems like a big blow to the reputation
of Cassandra. I know a bunch of people (including myself) whose primary
motivation for investigating Cassandra as a large-scale data store was
"Twitter is using it, therefore it's got to be fine for what we want".

In my experience this was not the case -- after quite a lot of prototyping, we
moved away from Cassandra. It scales brilliantly, but its query time has a
pretty solid floor that I was unable to shift to speeds approaching what I
needed: it never went up, but nor did it go down.

Of course, I am a total noob at distributed systems -- perhaps it's simple to
get it to go faster, but Twitter's move away suggests not. (Facebook's use-
case is quite different, and in any case they still use an awful lot of MySQL
too) The other systems they cite as currently using Cassandra in production
are nowhere near the scale of the primary tweet store.

~~~
alex1
I don't think they're avoiding Cassandra as their tweet store because it's not
capable of performing well. It's probably more cause it would be a really long
process migrating every tweet into Cassandra. I'm sure if they could, they
would go back to the first day Twitter launched, and use Cassandra as the
tweets store.

As for query time, maybe your problem was I/O throughput? I've read Cassandra
can be really slow if you have poor I/O throughput. This happens a lot in the
cloud, where I/O is usually shared.

~~~
jacobbijani
Don't they not show / hide & archive old tweets anyway? Seems like they could
really simple transition Cassandra in if they wanted to.

~~~
mrduncan
On a small scale, yes. On Twitter scale (in February they were averaging 50
million tweets/day [1]), it's probably not so simple.

[1] <http://blog.twitter.com/2010/02/measuring-tweets.html>

Edit: As of mid June, it was about 65 million per day.

[http://blog.twitter.com/2010/06/big-goals-big-game-big-
recor...](http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html)

~~~
jacobbijani
My point was that they aren't importing legacy data at all if they aren't
showing old stuff. Just start writing the new ones to the new DB, right?

------
antirez
I think that apart from the speculations about why twitter does not want to
move to a new store for tweets, the important "take off" here is that what
they (and other companies) do is experimenting with different platforms, and
then using what they feel it's right for them.

Don't use what others are using or suggesting. Download a few stores, take
your time to understand how they work, and then do your pick. Time consuming?
Not as time consuming as starting a big project with a system that is not a
good fit.

~~~
jacobbijani
Or you could just use MySQL.

------
schubertzhang
1\. Cassandra is very young! Especially, the design and implementation of
local storage and local indexing are junior and not good.

2\. Pool read-performance is also due to the poor local storage
implementation.

3\. The local storage, indexing and persistence structures are not stable.
They need to be re-designed /re-implemented. If Twitter move data to current
Cassandra, they should do another move later for a new local storage, indexing
and persistence structure.

4\. There are many good techniques in Cassandra and other open-sourced
projects (such as Hadoop, HBase) etc. But, they are not ready for production.
Understand the detail of these techniques and implement them in your
projects/products.

Schubert Zhang

------
code_duck
Every time I use Twitter, it is malfunctioning in a very noticeable and
bizarre new way. I don't care about scale, how difficult it is to be twitter,
etc - I would not take their word about any technical matter without a tsp of
salt.

~~~
code_duck
Ha ha. Everything I say on this site is down voted recently. As if I fucking
care.

Twitter :

\- can't find my @ replies. Shows zero. Not correct.

\- has been showing 2-5 copies of each message in my own history for the past
week

\- gave me 30 error screens in a row, then an hour later on their status blog
"we've noticed elevated rates of errors". Like a constant stream of errors is
normal for them, but elevated rates - time to do something!

\- can't get their widgets/external code straight, redirects people to
twitter.com/undefined and twitter.com/null

Etc, etc. Give me a break. Twitter is hopeless, and has always been hopeless
in the tech department.

------
tszming
At the middle of Hype cycle, they are now getting real.

------
niekmaas
Interesting to see that they use two different db systems. I'm a bit worried
about this. I believe it is just postponing an important decision. In the
future they will have to choose for one db system. With the growth of Twitter
wouldn't it be better to make a decision now and focus all resources on
optimizing one system?

~~~
c3o
They're also using their own FlockDB (<http://github.com/twitter/flockdb>) to
store their social graph.

~~~
alex1
FlockDB isn't a data store itself. They use FlockDB on top of MySQL AFAIK.

