
Twitter At Scale: Will It Work? - davidw
http://www.techcrunch.com/2008/05/22/twitter-at-scale-will-it-work/
======
tonystubblebine
All this interest in how twitter could scale really highlights the opportunity
for someone who knows what they're talking about to write an intelligent
analysis. After ten or more tries, clearly TechCrunch doesn't employ that
person.

Almost every analysis I read, including this one, points to Rails as the major
bottleneck but then can't say anything concrete to justify it. This article
says that Rails is no good for processing intensive tasks and that you should
use C. Rails is a framework for serving web pages, it isn't doing any of the
processing or message queueing. I'm sure that Rails won't work out of the box
with whatever backend you would need for a stable twitter, but I haven't heard
anyone address the actual technical issues. Is it just a matter of having to
tweak Rails or is there a reason that Rails is the wrong architecture?

Blaine released Starling, the queue system that Twitter uses. What do people
think of that? Was it the right architecture? Is there some out of the box
proprietary solution that would have worked better? What were the tradeoffs of
writing that in Ruby?

People seem to be catching on that one of the problems is a small operations
team. Some people say there's just one person but I'm pretty sure there's two.
Either way, I can't imagine that with just two people they've had enough time
put in fully redundant systems let alone a real staging environment. What
should operations look like for something Twitter's size?

On the Gilmore Gang, Blaine mentioned that the reason Twitter's track feature
only works on the phone and not the web is because it's much easier for them
to do broadcast than to do the lookups necessary for historical display. Yet,
I use Summize and Tweetscan to give me a web based track feature. What are the
real issues there?

I'm sure there's lots to get into here if someone with the chops to get into
it would speak up. But given the lack of in depth commentary, maybe there
really aren't very many qualified people.

~~~
diego
I'm sure by now the Twitter team must be very aware of what their bottlenecks
are. I don't know what the typical user Twitter user does. I personally send
1-5 messages per day but refresh the web timeline all the time. If most people
are like me, then I would want to keep the last 24-48 hours worth of messages
in memory. This shouldn't be a problem, as we are talking about 1M messages
times 200 bytes or so, uncompressed.

In order to generate a page, you'd need a "follow" matrix. Given a user, you
want to know who this person is following and pick up enough recent messages
to generate a page view from the memory table above. This "follow" matrix
would be relatively sparse (I imagine). It must be persisted often but given
that it's queried all the time it would have to be in memory as well, at least
partially. Assuming that Twitter has 2M users and the average user follow 40
people (wild guess), the entire matrix would take up 80M ids (a few hundred
megabytes in a hashmap or something).

Also, I'd cache personal timeline requests for a minute or so, to avoid being
killed by people hitting Ctrl-F5.

Of course the devil is in the details. All that information needs to be
persisted and crash-recoverable. For the numbers above everything might work
on a single beefy server but probably not for long, so the system has to
distributable.

It's not an easy problem, but a competent team should be able to solve it
(perhaps not in the most elegant and cleanly documented way) in a matter of
weeks given the right motivation. Getting the system to be stable enough for
production would be a different matter.

~~~
jonknee
> Assuming that Twitter has 2M users

Crazy as it sounds, Twitter is far from that popular. It has mindshare in the
tech sector, but is for from widely adopted.

[http://www.readwriteweb.com/archives/summize_twitter_trends....](http://www.readwriteweb.com/archives/summize_twitter_trends.php)

There are a couple hundred thousand active users per week. It's probably close
to that per day, but still only a couple hundred thousand.

------
wavesplash
Classic. The guy behind Omnidrive's inability to stay functional is claiming
to have a clue about twitter's scaling issues without interviewing anyone on
the team? Isn't that like asking George W. Bush to write the handbook on
Mideast Diplomacy?

------
yan
I always wondered if it'd make sense to run a contest by the community and
judged by the community on how to make twitter scale. There have been a lot of
suggestions in informal blog posts and the like, but nothing official.

In one way it would be admitting defeat, in another a way to crowdsource
difficult problems to be created and judged by the community. Judging from the
interest it harnessed already, I imagine a few intelligent and interested
people would submit ideas.

~~~
xirium
> run a contest by the community and judged by the community on how to make
> twitter scale.

There's already been more effort discussing the problem than fixing the
problem. Most of the discussion has been based on very little factual data.
For example, armchair architects can safely suggest Blub where Blub != Ruby
because it is a theory which is unlikely to be disproved and extremely
unlikely to be made in isolation.

------
randy
Limited Understanding and Stupid Advice: Will It Work?

------
redorb
They have $15mm cash and a $80mm valuation; Lets see if throwing money at the
problem can solve it. I still have trouble seeing something without a business
model have a valuation of $80mm ;/ just me?

~~~
transburgh
It is the way of the Valley.

------
Excedrin
Is using Twitter really very different from the Gale messaging system? It
seems like the user experience is fairly similar, but Gale's architecture is
much more interesting to me (and it scales! wow!).

~~~
vegai
Gale lacks the hype.

------
nir
If you read TechCrunch for technical insight (or any kind of insight, for that
matter) you're part of the problem! :)

------
smhinsey
In some sense, isn't Twitter email with message length limitations and
multiple interfaces? Is there really a need to store this in a highly
relational single storage way? I'm not enough of a power user to have seen the
nuances around the more advanced features to know for sure. Can anyone
enlighten me?

------
samwise
the need to have two teams. One to maintain the current POS and another
building a new scalable one.

There is no reason why that service can not be built this day and age.

------
bluelu
Imagine a Twitter with groups. They would need at least another 5 engineers to
programm this ! ;)

------
axod
This is just getting ridiculously boring now. Twitter is broken. They have no
clue. We get it.

