
Why Can't Twitter Scale? Blaine Cook Tries To Explain - paulsb
http://www.alleyinsider.com/2008/5/why_can_t_twitter_scale_blaine_cook_tries_to_explain
======
aschobel
The meat in the comments of Blaine Cook's blog entry:

[http://romeda.org/blog/2008/05/scalability.html#140141155247...](http://romeda.org/blog/2008/05/scalability.html#1401411552478169860)

"Scaling Twitter as a messaging platform is pretty easy. See Mickaël Rémond's
post on the subject. Scaling the archival, and massive infrastructure concerns
(think billions of authenticated polling requests per month) are not, no
matter what platform you're on. Particularly when you need to take complex
privacy concerns into account."

Sounds like they are having the kinds of problems that Friendster had years
back.

How come sites like MySpace don't have these issues? They also seem to have a
pretty complex social graph.

~~~
gcv
MySpace does have massive scaling problems, most directly related to the
difficulty of partitioning a relational database.

[http://www.baselinemag.com/c/a/Projects-Networks-and-
Storage...](http://www.baselinemag.com/c/a/Projects-Networks-and-
Storage/Inside-MySpacecom/)

------
patrickg-zill
I have to admit that I am lacking in clue about Twitter.

Are they handling more than 64GB of user-generated data per hour? If not, why
not just store everything into RAM on a big 128GB RAM server and query that?

~~~
Retric
My guess is they have a really stupid architecture if they are having
significant issues. This is probably due to reusing legacy code when they
should have started from scratch. I am honestly temped to code something up
this weekend to see if I can find out what there issue is.

~~~
mtts
Using Ruby on Rails is an issue, I would think. You do _not_ need a framework
for storing and sending out 140 byte messages but if you do use one, its
overhead will actively hurt performance.

The fact that they chose to use RoR anyway hints that they may not be one
hundred percent technically competent.

~~~
raganwald
Not sure I understand how to integrate what you are saying with what I read in
the post. If RoR is dog-slow--let's pick 100x slower than something you would
code in the weekend--but scales, then what they can serve with 100 servers =
what you serve with one server. To double capacity, you add one server and
they go from 100 to 200. And so it goes.

Scalability is a second-order issue: if you go from 1 server to 2, but they
need to go from 100 servers to 400 servers to handle the same load, then not
only are they slower than you are, but they can't scale as well as you can.

I get why using a framework for the UI and business logic could make them 100x
slower than something custom-coded. And from the original post, I get why a
conventional database+read cache may not be appropriate for a messaging
application.

But what I don't get is the connection between RoR and scalability. Unless you
are speaking of its default configuration, namely RoR+ActiveRecord+MySQL.
Which speaks more to the architecture choice (tables, rows) than to the
framework choice (views, models, controllers).

or am I missing something?????

~~~
mtts
There is no connection. The fact that they're using RoR to build what is
really a very, very simple web site suggests their problem is not knowing
which tools to use, which is a problem you can not solve by throwing more
hardware at it. Talk about scalability is a red herring in this case. The real
issues are elsewhere.

~~~
raganwald
Ah, so it's a case of "I believe their choice of Tool A is wrong for solving
Problem B, thus although I cannot see what they have done with Architecture C
to solve Problem D, I don't have a lot of confidence they made the right
decisions."

Thanks for explaining your reasoning.

------
tdavis
I am having an impossible time coming up with a unbiased and substantiated
opinion on this issue. On the one hand, we know nothing about Twitter's
architecture other than they're (to some extent) using a language that has a
notoriously under-optimized interpreter and a framework with reported scaling
issues (unless you do a lot of hacking to it).

On the other hand, there's gotta be something fishy going on in Twitter-ville.
Although I agree "the idea that building a large scale web application is
trivial or a solved problem is simply ridiculous," by now Twitter should have
enough performance data to know exactly which part of the process is causing
the high-load issues they're having. If we are to assume that after each
outage they at least "throw more hardware at it," then, theoretically, it's a
problem that horizontal scaling cannot solve and the issue is deep-rooted in
the system -- somewhere in the basic architecture of it.

Is it Ruby? RoR? Poorly optimized queries? Improper caching? Lack of domain
knowledge? Leprechauns? I don't know, neither does anyone else outside of
Twitter... but I guess speculation can be entertaining.

------
jrockway
I don't get why Twitter doesn't scale. It's just webmail, but with smaller
messages and a simpler UI. Here's how twitter should work: every user should
have a list of users following them. When they tweet, each follower gets a
copy of that message in their personal inbox. A copy is also attached to the
tweeter's account, so new followers can suck that copy in when they start
following them.

That's it. Now, sending a message takes O(n) (n=followers) time, which is
really cheap. On my machine, it takes about a second to create and sync 40,000
files (there's not much data, so replicating this via NFS wouldn't be that
expensive either). With that out of the way, all you have to do is ls your
"twitter directory" to see all of your friend's messages. This is another
incredibly cheap operation. It's easy to distribute, and there's no locking.

Anyway, just look at the mail handling systems at huge universities and
corporations. They scale fine, and they're much more complicated than twitter.
Twitter is just a subset of e-mail, so it should be implement that way, not as
a "SELECT * FROM tweets WHERE user IN (list, of, followers) ORDER BY date".
That is the wrong approach because it makes reads (very common) expensive and
writes (very uncommon) cheap. That's why twitter doesn't scale.

------
sutro
Of course all of you Hacker News Monday morning quarterbacks have taken a
website from launch to Twitter's current level of traffic, right? If so,
please let us know which site that was so we can compare your experience and
your decisions against those of the Twitter team. If not, perhaps you should
go and do that first before sounding off on Twitter. No, I'm not a friend of
Blane Cook's, nor am I a Twitter apologist -- I don't even use the service.
But I do respect startup founders and builders far more than their critics,
and I know that website architecture -- like most things -- is always easier
in hindsight. For perspective, please read the Hot or Not story in "Founders
at Work," then read Teddy Roosevelt's "Man in the Arena" quotation that
Arrington is so fond of. (And yes I realize that reference is ironic given
that Arrington started the pile-on against Blaine Cook. Arrington should try
to remember that "it's not the critic that counts.")

------
dmose
It's doubt it's the language. Sure, you might get more efficient throughput
with another framework or language..but I think the core problem lies in how
many polling API connections they have. If their API is based on their core DB
and not polling off a read only copy..they have a serious design flaw.

API conections, given their nature of having to do a security check each time
should be based on a slave copy read-only db which is near-realtime, or
potentially dirty read but who cares, its for their API. The security lookup
should be cached since you rarely change API security accounts more than
once..if you do, then you update the caches.. etc.

Granted I'm no expert, but it just sounds like they're overloading their DB
with polling.

Their web pages for each user should be cached since they're not really hit as
much as the api or rss

I dont' care what anyone says, having THAT many API connections constantly
polling their DB along with what blane said about having to do authentication
requests on every singel hit is taxing on any setup you can put together.

Oh and if they have a single table holding all their tweets...big issues
there. I'd have a 26 "tweet" tables, one for each letter of the alphabet and
my switch in the business layer. Then simply ship tweets older than a month,
or patition based on that.

------
gleb
Common usage of word "scale" is that things continue to work well as you add
load. Whether that's done by having the software fast enough to handle
everything one computer, adding servers or adding pet monkeys is not important
to the user.

Responding to reasonable complaints by using a different definition of word
"scale," makes for a weak argument.

~~~
ks
The author also says that it's perfectly acceptable for something to be 10x
slower, because "a well-capitalized company" can afford 10x more servers... I
am thinking of how to respond to that, but I don't know where to start

~~~
raganwald
Here's a suggestion: compare it to using 10x as many programmers to write
something in J2EE because it's easier to hire 10 J2EE drones than one RoR whiz
(where "whiz" is understood to mean "person who has experience solving RoR
problems").

It's all a question of what you believe to be the scarce resource. H=e
obviously believes that for a well-founded company, server CPU cycles are not
a scarce resource. This implies that he believes that RoR addresses some issue
raised by something else being the scarce resource.

~~~
xirium
There's also the issue of the timescale. Start with one programmer and server.
You have the finances to purchase 10 additional servers or hire one additional
programmer.

You can get 10 servers bought, delivered, installed and configured in one
week. These servers can be deployed to do anything from load balancing, web
caching, database caching, being database read slaves or application servers.
Regardless of the quality of your architecture, 10 servers will probably add
some scalability to your system. Furthermore, they can be re-purposed as the
situation demands.

Hiring a better programmer (or just another warm body) takes longer.
Furthermore, the improved code they produce isn't as flexible as surplus
hardware.

Of course, the situation changes when you've got thousands of servers.

------
ivankirigin
The web frontend could just be a static list of your contacts, with ajaxy
grabbing the tweets from each. This way, you could cache each users tweets and
avoid the issue of each user's page being different.

That's just one idea. I'm not really an expert.

------
systems
I have several questions, because I feel there should be a better system
approach to this twitter problem

is there any publicly available documentation for twitter's architecture?

did they use consultant help, did they contact SUN, IBM, Oracle or any other
respected consultant when they started facing those problems?

i recommend you watch this video: [http://www.infoq.com/presentations/qcon-
voca-architecture-sp...](http://www.infoq.com/presentations/qcon-voca-
architecture-spring)

really ... we need to have a look at twitter architecture before we discuss
this further

~~~
raganwald
> SUN, IBM, Oracle or any other respected consultant

Are you implying that Sun, IBM, and Oracle are respected for the prowess of
their consulting organizations? I'm sure they have some Fine People working
there, however the view that "writing a seven-figure cheque to these companies
guarantees a positive outcome" is not universally held.

------
goodkarma
Unless you know what you need to scale to, you can't even begin to talk about
scalability. How many users do you want your system to handle? A thousand?
Hundred thousand? Ten million? Here's a hint: the system you design to handle
a quarter million users is going to be different from the system you design to
handle ten million users.

[http://teddziuba.com/2008/04/im-going-to-scale-my-foot-
up-y....](http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html)

------
bluelu
If he can't scale a simple message passing system, what can he scale then?
It's not that they are doing rocket science at twitter. He probably was the
wrong person for the job.

In the good old days, every message over ICQ was sent over their servers, and
they probably had more messages to handle than twitter does nowadays. And
there wasn't a single problem then.

~~~
quellhorst
ICQ didn't archive a copy of every message sent.

~~~
bluelu
This doesn't increase much the complexity...

Not thinking long about it (5 minutes), but this is how I would do it: Not
going muhc into detail (I don't use twitter, just had a quick look at it).

1) 1 replicated system where you can fetch messages by id (complete body with
who sent it, to whom, time, etc...) 2) User page: list of ids.... 3) Private
messages: list of ids... 4) When a new message comes in, write it to a queue.
Process those ids and append the ids to the different pages. Multiple
processes doing this (each process has a subset of users with the complete
list of people following those users). One could add another layer to do bulk
inserts.

Could be easily done in Memcachedb. One page view takes x + 1 memcachedb
requests (x number of items on page). One can still optimze this by caching
(static html pages which are deleted when a page is updated for a user). When
inserting, replace existing data by adding the ids.

Everything is nicely seperated. (Eg pages for user 1-10000 are on server 1,
etc... Messages can be nicely sperated as well).

Any thoughts on this? To twitter: hire me not him ;)

------
jdavid
i think the problem with twitter is that they did not use a real time
architecture, like email, irc, or some other messaging platform. instead i
think they write each page dynamically off of a db call based on who
subscribed where. ie, you would have to start from scratch to scale it. i
think the problem is that now that twitter, let the cat out of the bag, they
are now chasing problems on an old architecture, and trying to scale that,
while also trying to build a new version. it suppose its like running two tech
companies.

