

At Twitter we're working on using Cassandra to store tweets - whalesalad
http://n2.nabble.com/Cassandra-users-survey-td4040068.html#a4040439

======
antirez
I read: "We evaluated a lot of things: a custom mysql impl, voldemort, hbase,
mongodb, memcachdb, hypertable, and others"

I wonder if among the others there is Redis too, if so (and somebody at
twitter is listening) I could love to know what do you think are the biggest
weakness in Redis and if you think in the short term it's more important a
fault-tolerant cluster implementation (the redis-cluster project) or virtual
memory to support datasets bigger than RAM into a single instance. Thanks in
advance for any reply.

~~~
jcapote
Redis requires you have enough RAM to hold your entire dataset in
memory...which seems impossible for something as large as twitter

~~~
briansmith
Does Redis really require the entire dataset to be in RAM, or does it just
require enough virtual memory to hold the dataset? In other words, couldn't
you just let it swap?

~~~
jcapote
From their FAQ:

Do you plan to implement Virtual Memory in Redis? Why don't just let the
Operating System handle it for you?

Yes, in order to support datasets bigger than RAM there is the plan to
implement transparent Virtual Memory in Redis, that is, the ability to
transfer large values associated to keys rarely used on Disk, and reload them
transparently in memory when this values are requested in some way.

So you may ask why don't let the operating system VM do the work for us. There
are two main reasons: in Redis even a large value stored at a given key, for
instance a 1 million elements list, is not allocated in a contiguous piece of
memory. It's actually very fragmented since Redis uses quite aggressive object
sharing and allocated Redis Objects structures reuse.

So you can imagine the memory layout composed of 4096 bytes pages that
actually contain different parts of different large values. Not only, but a
lot of values that are large enough for us to swap out to disk, like a 1024k
value, is just one quarter the size of a memory page, and likely in the same
page there are other values that are not rarely used. So this value wil never
be swapped out by the operating system. This is the first reason for
implementing application-level virtual memory in Redis.

There is another one, as important as the first. A complex object in memory
like a list or a set is something 10 times bigger than the same object
serialized on disk. Probably you already noticed how Redis snapshots on disk
are damn smaller compared to the memory usage of Redis for the same objects.
This happens because when data is in memory is full of pointers, reference
counters and other metadata. Add to this malloc fragmentation and need to
return word-aligned chunks of memory and you have a clear picture of what
happens. So this means to have 10 times the I/O between memory and disk than
otherwise needed.

------
chanux
Digg was considering Cassandra too <http://blog.digg.com/?p=966>

~~~
uggedal
Digg is in process to convert all data to Cassandra (mentioned in their NoSQL
East presentation).

------
technoweenie
Evan Weaver blogged about Cassandra
([http://blog.evanweaver.com/articles/2009/07/06/up-and-
runnin...](http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-
cassandra/)) awhile ago and used Twitter as a sample schema to show how it
worked.

~~~
alexro
From that post: "Don't use EC2 or friends except for testing; the virtualized
I/O is too slow" kind of restricts Cassandra usage for an average Web 2.0
startup

Added: just to clarify that: when starting small the last thing you want to do
is either buy expensive infrastructure or hack something cheap instead of
doing it properly. If Cassandra won't perform well in the virtualized
environment, it will be necessary to first use some other means of storing
data and "go cassandra" later if needed.

~~~
antirez
But everything resembling a DB is slow on EC2 since I/O is slow and memory
accesses are slow, so I think the problem is EC2 and not Cassandra, MySQL, ...

~~~
alexro
Hence the hopes it won't be much more slower

------
rbranson
Glad to hear they decided against writing their own in Scala.

------
anApple
I love how twitter is constantly trying to reinvent the wheel for a simple
message passing service.

~~~
diego
A highly-scalable, multicast, real-time, distributed message passing system
with full-text search, archiving, deletions, dynamic subscription changes,
access restrictions, open APIs with quotas, abuse detection mechanisms, spam
fighting algorithms, different authentication protocols, etc.

Still think it's so simple?

~~~
evgen
Okay, I'll take the bait...

> highly-scalable

Everything I take from our set of OTS components will have live, verifiable
examples of running at scale.

> multicast, real-time, distributed message passing system

For this the choice really comes down to RabbitMQ or ejabberd. An XMPP
solution is appealing for the obvious benefits of having a "presence" concept,
but an AMQP solution keeps us closer to the current reality of Twitter.

So we start with a large distributed rabbit setup. A few clusters scattered
across the world, connected via shovel pipes and using some nested
queue/exchange plumbing to wire it all together.

> full-text search

Some queues dump to a FTS setup that keeps recent messages in RAM and migrates
them to disk as they get older. SOLR is probably a better solution here, but I
know the Sphinx delta/full-index model better and would reach for that first.

> archiving

Dump a firehose queue to disk and send copies/updates out to various services
that want it. Pushing it to HDFS would probably be the first choice I would
look at, just because it is easy to go from there to various Hadoop analytics.

> deletions

Nothing is ever deleted, it just becomes invisible. This is just a flag to add
to the FTS indexes and archives.

> dynamic subscription changes, access restriction

Handled completely by the message queues.

> open APIs with quotas, abuse detection mechanisms, spam fighting algorithms,
> different authentication protocols, etc.

None of these is particularly difficult once you have the general framework
setup, and the structure of the system actually makes it pretty easy to wire
in things like realtime abuse and spam prevention once you get rolling. There
is only one thing hard about building a better Twitter, getting the userbase
to make it worthwhile. If Twitter was in a different market where the network
effect was not so strongly self-reinforcing then it would have been cloned and
re-implemented better back when the fail whale was our constant companion, but
this is not the case so making a "better" twitter is of little value.

