

Ask YC: How hard is it to keep a service like twitter online - mixmax

There have been so many posts about twitter downtime lately that it is must be pretty embarrassing for them, and do some serious damage to their reputation. The latest one is on the frontpage now: http://news.ycombinator.com/item?id=476910<p>Not having worked with high-traffic websites I've been wondering why they're having such problems. I would imagine that if you have money (which they do) you could just throw more servers and a good SQL consultant after the problem. Is there a problem specific to their architecture that is hard to scale? If so what is the constraint?<p>I'm just puzzled that they haven't solved it, and hope they do.
======
forkqueue
Not all high-traffic websites are equal. Youtube has way more visitors than
Twitter, but is a far simpler site to scale.

Twitter is a particularly difficult problem, because unlike most high-traffic
sites the number of writes is pretty close to the number of reads. The latency
expected by users is also very low - people are carrying on conversations over
it after all, and these two factors combined mean that there's very little
that can be done in the way of caching. Add this the complex relationships
between all the different users and you've got a difficult site to scale to
the sort of volumes they're experiencing.

If the guys at Twitter want to offer me a job, I've got a few ideas though ;)

~~~
moe
_Twitter is a particularly difficult problem_

Excuse me? Twitter, at its core, is a large-scale pub/sub system with a bunch
of frontends. Such systems are well understood and, frankly, quite trivial as
far as the non-existant constraints of the twitter-app are concerned. After
all no ordering guarantees or fancy routing are needed and obviously they
don't give a damn about fault tolerance or latency either.

Their constant failure to keep that thing online just screams incompetency
very loudly...

There are plenty of mature building blocks available in the OSS world to
implement such a system. RabbitMQ and the spread toolkit come to mind.

Or, if you have the dough, you could buy a shrinkwrapped solution from TIBCO,
Solace or the like. The latter will even come with fancy functionality and
availability guarantees in writing because those are normally deployed in
"mission critical" industrial systems, such as logistics, banks, stock
exchanges.

Now, twitter is not SWIFT and doesn't need five nines of uptime. But saying
that this is a "particularly difficult problem" is, sorry, ridiculous.

~~~
gaius
There is an old saying, _those who were asleep in CS 101 are doomed to be
mocked on the Internet forever_ :-)

------
azharcs
Here is more about Twitter's architecture, I guess it has changed a lot right
now. Also check out the presentation by Blaine Cook, Architect of twitter.

[http://highscalability.com/scaling-twitter-making-
twitter-10...](http://highscalability.com/scaling-twitter-making-
twitter-10000-percent-faster)

<http://www.slideshare.net/Blaine/scaling-twitter>

------
yan
They essentially deal with millions of real-time messages, each of which have
a fairly complicated relationship tree. These issues aren't so much a database
problem as a message passing and queuing problem.

~~~
mixmax
Why can't you just pass this to X number of servers?

~~~
gaius
The simple answer is: In the beginning they thought Twitter would more like
Livejournal and architected accordingly (not that LJ is the pinnacle of good
architecture or anything but even so). Instead it turned out to be more like
IM and rather than starting again they decided to hack it into working.

~~~
gruseom
Although the downtime must be frustrating for users and the temptation to
build the shiny-new-system-that-will-solve-everything nearly irresistible for
developers, I think they probably made the right decision not to do this. They
have a runaway hit on their hands. That makes the risks associated with
rewrite way higher (and they're already way high). One thing that people often
do is start a "next-gen" project with an "architecture" team to figure out the
new thing while a maintenance team (often lower-status within the
organization) works on the existing product. This is so consistently bad an
idea, it's pretty much a red flag all by itself. If I were ever going to
rewrite a successful system I'd insist that it be the original team doing it.

I also wonder whether Twitter's downtime doesn't psychologically reinforce the
sense that everybody's using it so it must be valuable. Scarcity and all that.

One thing about Twitter does make me pause. Haven't they hired tons of
programmers? What are they all doing?

------
presty
Hm.. isn't friendfeed similar to twitter? They don't seem to get (any)
downtime. Also, there's some Indian twitter-like service which apparently has
higher traffic volumes and no downtime either.

If all this is correct, then it's probably not "that hard" to keep something
like twitter with an "acceptable" uptime.

------
rokhayakebe
I once launched a Facebook IM application using AjaxIM. Everything was
beautiful for about 10 days. I went from 90 users to 18K users in 21 days and
the whole application came to a HALT. IMs were taking 5 seconds to get to the
other end. I, for sure, did not have the technical ability to make it work and
I decided to not invest money into it. We were dumping the DB of IMs every 3
hours, but with IM you are dealing with hundreds of thousands of real-time
messages. This is extremely hard. Add to this the fact that each Twitter
account has a different graph and your messages go to several people.

