
A Detailed Five Step Twitter Scaling Plan - pchristensen
http://whydoeseverythingsuck.com/2008/05/detailed-five-step-twitter-scaling-plan.html
======
menloparkbum
Paying attention to half-baked advice like this is half the reason Twitter is
fucked.

------
xirium
The suggested plan seems reasonable. Yes, if you denormalise users inbound and
outbound messages then you store the data twice. Yes, shards give almost
limitless scalability. However, could it be that during peak load, the number
of messages to be written plus indexes to be maintained exceeds the total
available disk seek operations?

~~~
wmf
So shard more, or get more disks, or get flash or whatever. (Yes, I don't know
what I'm talking about.)

~~~
xirium
> So shard more, or get more disks, or get flash

Your intuition is excellent. These are all valid techniques to alleviate disk
seek limitations. They are likely to be successful. Unfortunately, they're
expensive options.

You can partially overcome disk seek limitations by performing database batch
inserts. So, rather than inserting messages indiviually and updating indexes
for each message, you insert 100 messages and perform less disk seeks updating
the indexes for each batch. Unfortunately, it can be hard to reliably merge
data into suitable batches.

~~~
wmf
Hank's proposed solution is expensive; we should just admit that up front. But
is it more expensive than being down and fighting fires all the time?

~~~
xirium
The economics of downtime is a very important consideration. I read that three
minutes of downtime reduces the value of a website by 1%. However, that may be
a figure bandied by vendors of high-availability equipment. It is also
generally acknowledged that high-availability is achieved through strict
processes and sheer redundancy of hardware. Therefore, it gets exponentially
more expensive to minimise downtime.

So, you've got two non-linear curves, cost and benefit. If you've got an
exceptional team, a great architecture and good finances then getting 99.999%
uptime should be easy. Of course, this assumes that you don't have unforeseen
circumstances, glaring omissions in redundancy - or exponential growth and
utilisation without a revenue stream.

Even with generous financing, exponential growth can be deadly. A small but
sustained increment in load can be enough to cause a backlog in requests that
never recovers. If you're growing fast then it can occur anywhere in your
system at any time. I've been a DBA for a renderfarm with thousands of cores
and this situation is stressing.

We've also had this situation while working on search. Thankfully, it occurred
on a smaller scale and most thankfully it occurred before launch. We had two
database servers, two app servers, no redundancy, on unreliable hardware. We
didn't make much headway with efficiency improvements because long tests were
likely to encounter hardware failure. It was demoralising.

This slowed the development and testing of a database failover feature in the
database wrapper. Now this has been implemented, progress is much better. A
test that previously took three hours now takes five minutes and we've had
much more time to migrate to reliable hardware and introduce redundancy. We've
also got a setup which can be used to test resilience to real modes of
failure.

~~~
wmf
My guess is that Twitter's downtime is due to overload, not hardware failures.
Thus a scalable system (even one with a high slope) reduces the
overload/downtime problem to "buy more servers -- now!" instead of the
"analyze bottleneck, rewrite code" loop that it sounds like Twitter is in now.

------
davidu
So wrong it's hard to know where to start...

~~~
hank777
Its fascinating funny how you are so smart you can't even deign to share your
wisdom on this. If I am wrong, tell me where. Given that I have spent the last
several years studying/developing distributed databases if I just don't know
what I am doing I'd certainly like someone as smart as you to tell me.

~~~
KirinDave
Hank, your design is so high level as to be almost useless to anyone who'd
actually benefit from a proposed design. It's not clear what issues you feel
Twitter has, or how exactly this architecture would solve them. Usually your
blog posts are more cooked than this one. I'm a regular reader, and at first I
honestly thought your post was satirical.

As I read your design, one of the biggest problems is that to build up the
infrastructure I think (and infer based of my experience with designing these
kinds of systems) you are describing is that it's a massive engineering effort
in and of itself. You basically end up rewriting a realtime-optimized BigTable
or HBase, and that's much to much to expect of Twitter's small engineering
team.

People seem to forget that scalability is a tradeoff, like any other aspect of
a service. You devote an amount of resources to the problem that makes sense
to your business. Despite twitter's downtimes, people don't seem to be leaving
the service, despite several competitors. The service is down, but it isn't
down so much that it's unusable.

For a site that's grown at such a rapid pace and has such a small engineering
team, I think Twitter's scaling plan is probably very reasonable. They're
growing their capacity at a rate they can afford, and a rate that they know
will retain most of their userbase.

~~~
hank777
Well perhaps its a matter of staring at something so long it becomes simple.
But I really don't think what I am suggesting is that hard. The main thing
that requires some thought is shard splitting. That is not a big piece of
code, but it does admittedly take some thought to get right. Aside from that
everything else is provided out of the box by someone else like amazon web
services or terracotta. I am not sure how I could have been more precise
without source code.

It is certainly reasonable to suggest that Twitter is good enough as it is
because people still use it, though the grumbling in the market suggests that
that may not be something they can bank on forever.

~~~
KirinDave
One thing you don't address: where do messages go while a resource is self-
splitting? How much redundancy do you build in to prevent this from being a
problem? How much backlog do you expect to handle?

And have you checked latency serving out of EC2? Our experience is that the
connections are fairly fast but the latency establishing the connection is
significant, especially from the west coast where twitter is based.

~~~
hank777
I addressed this in another answer here but happy to do it again. The way
shard splitting works is that the initial shard keeps working while the new
shard is being built. During this time, all messages to the old shard are
written to the old shard _and_ the new shard, so, unless I don't understand
your question I dont see there being a backlog.

Regarding EC2, latency isnt really an issue because Twitter is far from real-
time. That said, haven't really noticed any problems with EC2 connection
latency, but we are not live yet so I would defer to you on that.

------
bluelu
His approach isn't so good, as it creates a copy for the same message sent to
multiple users.

Better safe the messages in a central store by id and only write the ids to
the different pages.

<http://news.ycombinator.com/item?id=188080>

~~~
hank777
I considered this, but given that the messages are small and the primary issue
is scaling not storage this, it does not seem to me is a good solution. The
entirety of everything ever written on twitter is probably not big enough to
worry about storage size. The issue is performance. By the way, this
architecture is essentially how email works. All your inbound messages are
stored in one place separate from outbound messages from the sender. This
allows for infinite horizontal scaling.

------
edw519
These design techniques could very well make a nice impact on performance.
Then again, they might not.

Why? They are proposals to solve a problem that _has not yet been defined_.

Fact is, you _don't know_ what the problem is.

So proposing a database solution to what could just as easily be a memory
management or throughput problem is premature.

~~~
hank777
This is true. Though I can say that if one were building twitter and you didnt
either have a design like the one I suggested, or a design with some kind of
distributed caching then you are going to have exactly the kinds of scaling
problems they have had. In other words if you use tradition techniques for
something like twitter you are screwed. However you are absolutely right that
(as I said in my piece) they may already, for example have thought of all of
this and either implemented or dismissed it because they have a better design,
and I have no way of knowing.

~~~
menloparkbum
This isn't true. You don't need a distributed sharded setup in order to scale.
(You probably do if you are limiting yourself to MySQL) I built financial
applications way back in the day that had a lot more reads and writes than
twitter, and we just used ONE gigantic, super expensive database server.

------
sutro
Hank, good article, and good defense of your article here on HN. Regarding the
knee-jerk negative responses you've received here, I can only say that you are
a more patient man than I. Best of luck with your current project.

------
axod
This has been solved many times, many years ago. It's not a hard problem. A
one step scaling plan would be "write it properly this time".

~~~
chaostheory
many things are easier said than done.

~~~
axod
Of course. But there are solid examples from web1.0 and before that did more
than twitter, and scaled to more users. Twitter is pretty simple in terms of
functionality.

~~~
chaostheory
climbing Mt Everest/K2 has been done many times already, but it's still hard;
and not something taken lightly

I just don't like gloating on someone else's mistakes and failures
(essentially kicking someone who's already on the ground), especially when
I'll probably have my own set of spectacular f-ups to deal with related to my
own startup

------
andrewparker
This is post is little more than link bait (and I guess I'm falling for it by
commenting).

~~~
hank777
With all due respect andrew why dont you share your technical analysis of what
exactly I am saying that is wrong here. Btw, have you ever actually _worked_
on any database scaling problems? Just checking.

~~~
davidu
I have. I have managed systems that handle millions of clicks a day in
databases and billions of dynamic (PHP) pageviews on a daily basis.

I am currently managing a system that serves over 100,000 DNS queries per
second, each of which is dynamically evaluated as to how we respond to it. And
each of which is logged, amounting to over 7,000,000,000 log lines a day, all
which get processed, parsed and in many cases handed over to a database store.
Yep, we're dealing with BILLIONS of rows of MySQL per day.

~~~
Xichekolas
Do you have anything online about your architecture? I think it'd be
interesting to see how those 100k queries per second get served.

Hey thanks for OpenDNS by the way, I'm a long time user, and I love it.

~~~
davidu
I'll be speaking about it for the first time at the Velocity Conference in
June. I'll have slides available.

------
mtw
imho, it's rails. rails (and any other ruby framework) is not suited for a
real-time communication platform like twitter.

also you forgot about messaging queues.

~~~
KirinDave
Why would you say that? It's not like the Ruby code is handling that many
messages. Even Rails sites don't use ruby for their concurrency and messaging.
They push down to a database or to another concurrency level like ActiveMQ.
Above them lives another piece of code or hardware to do load balancing.

This isn't even about Ruby, it's about the Rails architecture.

