
The Architecture Twitter Uses to Deal with 150M Active Users - aespinoza
http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html
======
apaprocki
Since tweets are very much like financial ticks (fixed size, tiny), we can put
the numbers in perspective. Let's compare Twitter vs say, OPRA (North American
options), a single "exchange" feed.

Twitter: 300K QPS, firehose 22 MB/sec, 400 million tweets/day

OPRA: 12.73M MPS, firehose 3.36 GB/sec, 26.4 billion messages/day

[http://opradata.com/specs/2012_2013_Traffic_Projections.pdf](http://opradata.com/specs/2012_2013_Traffic_Projections.pdf)

edit: Also worth noting, the _diff_ between OPRA 1/1/13 & 7/1/13 is ~256
MB/sec, 2.0 billion messages/day. So in just 6 months the existing firehose
_increased_ roughly 10x Twitter's entire output and roughly 5x'd the number of
messages/day.

~~~
thezilch
Tweets are fanned out to more than a single feed and, in the "most important"
cases, millions of feeds -- 31 million feeds for Lady Gaga. Your comparing
300K reads, which might not even include the sub-queries, to 12.7M writes. A
single tweet from a single user would trump the writes; it's unclear whether
it could be done in the same one-second window.

Twitter already does 30B Timeline deliveries a day, compared to the 26.4B of
OPRA. Again, there is no telling what can be implied by a "Timeline delivery;"
does it include pulling user and other secondary and tertiary objects? An HTML
renderer? The doesn't say much for the capabilities and focuses on what they
do.

There's also little comparison to be made on what powers OPRA. If Twitter can
simply add nodes to their architecture, are they doing it wrong?

~~~
apaprocki
Securities work a similar same way -- When you're looking at your portfolio
you're only directly monitoring the individual fields for the securities in
your portfolio, not the entire firehose. The OPRA feed is an equivalent to the
raw Twitter firehose, only much fatter. Once you integrate it into the last-
mile display to the users, you're doing all the same things. This is typically
done with multicast topic-style subscriptions by ticker / type of data. It
makes sense that if Lady Gaga writes a tweet you only want it to be "1
timeline delivery" (multicast) as opposed to "31 million timeline deliveries",
which is making your infrastructure do a lot more work. Granted, you're kind
of limited by what the browser can do in this regard, so you're kind of stuck
with the socket model.

~~~
jacques_chester
Securities feeds are, I suspect, more write-dominant and have a much lower
fanout rate. They also don't have the outlier-retweeting-outlier or outlier-
tweeting-@-outlier problems.

~~~
apaprocki
Yes, much more write-dominant. Apps which are built on top of the feeds can
create issues, though. E.g., An obscure ticker pasted into a chat room with
500 people instantly starts monitoring in all their windows -- not just the
static price (equiv to a RT), but the live feed. That would be as if A RT'd B
and suddenly all of A's followers added B to their timeline. The degree to
which that happens depends on how many features like that are integrated into
the app.

~~~
granitepail
I have no idea why you're getting so much pushback. Products like those
offered by Bloomberg ingest and distribute massive amounts of data in real
time, as well. The comparison is completely suitable.

While their products for investment managers are mostly run off of persistent
databases, the trader terminals rely on a high-volume, nebulous fan out. For
many traders, a five second latency is unacceptable.

Incredibly interesting talk and a good write up. Twitter continues to impress!
I was surprised to see Redis playing such a critical role, too.

~~~
jacques_chester
Sometimes the pushback is right, sometimes it's wrong. He's been very good at
explaining the different and heavy requirements of financial data feeds.

------
aroman
> _Twitter no longer wants to be a web app. Twitter wants to be a set of APIs
> that power mobile clients worldwide, acting as one of the largest real-time
> event busses on the planet._

Wait, then why are they actively destroying their third-party app
ecosystem...?

~~~
mey
The two are not the same. It sounds like they are becoming API driven to
support multiple interfaces and to put clear boundaries between systems.

~~~
zanny
Sounds like html. Or qt. Not 140 characters.

------
NelsonMinar
This article cites its source as a talk by Twitter VP Raffi Krikorian. 38
minutes of video, audio, and slides are at
[http://www.infoq.com/presentations/Twitter-Timeline-
Scalabil...](http://www.infoq.com/presentations/Twitter-Timeline-Scalability)

~~~
aespinoza
Thank you for sharing it.... I was wondering about the video.

~~~
jmstout
I also enjoyed this presentation:
[http://www.infoq.com/presentations/Timelines-
Twitter](http://www.infoq.com/presentations/Timelines-Twitter)

It goes into more depth about how they handle timelines.

------
davidw
> Your home timeline sits in a Redis cluster and has a maximum of 800 entries.

Wow, that's pretty cool. Congrats to antirez - it must be a nice feeling
knowing that your software powers such a big system!

~~~
nasalgoat
Redis is probably the most useful tool powering the internet after nginx. It
really is an amazing piece of engineering.

~~~
wdewind
I have not used nginx, and have never heard of it spoken about in such glowing
terms. Can you maybe compare/contrast it with apache?

~~~
jeffasinger
The biggest difference is that nginx uses an evented I/O, and handles many
connections per thread/process, Apache can only handle one at a time. This
allows it to to use far less memory per connection, and lets it perform
reasonably under very high loads. It also has very low latency, even for small
loads, and is relatively easy to install and configure on most platforms.

nginx does an excellent job as a reverse proxy for applications, there are
many configurations where nginx acts as a load balancer, and serves static
content, and everything else is passed off to a real application. It's also
useful if you're running on a tiny VPS with very little RAM.

However, Apache has a few features that nginx lacks, like embedding languages
into it like mod_php does, and per directory configuration, in that directory.

~~~
logic
FYI:
[http://httpd.apache.org/docs/current/mod/event.html](http://httpd.apache.org/docs/current/mod/event.html)

------
jacques_chester
Figuratively, Twitter have switched from doing a design rooted in Databases
101 (OLTP) to a design more rooted in Databases 102 (OLAP).

That is, they moved processing from query time to write time. And that's a
perfectly legitimate strategy; it's the basis of data warehousing.

OLTP is about write-time speed. It's great for stuff like credit card
transactions, where you only really care about the tally at the end of the
banking day. The dominant mode of operation is writing.

OLAP is about read-time speed. You do a bunch of upfront processing to turn
your data into something that can be queried quickly and flexibly.

One thing that's problematic about the teaching of database technology is that
the read/write balance isn't explicitly taught. Your head is filled with
chapter and verse of relational algebra, which is essential for good OLTP
design. But the problems of querying large datasets is usually left for a
different course, if it's treated at all.

~~~
joshuaellinger
Maybe at a high level, but OLAP involves precomputing aggregations and bit
indexes. It's a pretty different beast.

OLAP is very rarely under real-time constraints and, when it is, it tends to
push the heavy lifting out to OLTP.

~~~
jacques_chester
Oh, I'm handwaving, not talking about the underlying details of star schemata,
clever column representations and whatnot.

But I think the analogy is still correct.

Every such system has two functional requirements:

1\. Store data.

2\. Query data.

And every system has the same non-functional requirement:

1\. Storage (write) should be as fast as possible.

2\. Queries (read) should be as fast as possible.

However, per an observation I made a while back, complexity in the problem
domain is a conserved value.

Insofar as your data requires processing to be useful, that complexity cannot
be made to go away. You can only decide _where_ the complexity cost is paid.

You can pay it at write time and amortise that across reads. You can pay it at
read time and excuse writes. Or you can pay it in the middle with some sort of
ETL pipeline or processing queue.

But you must always pay it. The experiences of data warehousing made that
bitterly clear.

So really, the job of a software architect is to take the business
requirements as a non-functional requirement (an -ility) and then pick the
architecture that fits that NFR. That includes dropping other nice-to-have
non-functionals.

Twitter's non-functional requirement is that they want end-to-end latency to
be 5 seconds or less, under conditions of very low write:read ratio. This
suggests paying the complexity cost up front and amortising it over reads. And
that's what they've done.

------
cwt137
This near hour long video is a deep look into Twitter's backend. Specially
into the Firehose feature, Flock, etc. They go into detail on how they use
Redis and even show one of the actual data structures they store in Redis. A
must see video for anyone into high scalability.

[http://www.infoq.com/presentations/Real-Time-Delivery-
Twitte...](http://www.infoq.com/presentations/Real-Time-Delivery-Twitter)

------
mikemoka
I am playing the armchair architect and my question will be probably wrong in
infinite ways,but I might learn something, what is the reason why the service
has to write a tweet on two million timelines, wouldn't it be cheaper if they
let the client build the page on its own via restful apis?

~~~
harryh
That is actually a very interesting question and it turns out that whether
it's better to fanout on write or on read depends on a few different things.
There's a very widely read paper on the subject you might enjoy:

[http://research.yahoo.net/node/3203](http://research.yahoo.net/node/3203)

~~~
sumzup
The link doesn't seem to work (error: "Unable to connect to database server");
do you have an alternate link?

~~~
sheldoan
Not a link to the full paper, but a summary is presented here:
[http://highscalability.com/blog/2012/1/17/paper-feeding-
fren...](http://highscalability.com/blog/2012/1/17/paper-feeding-frenzy-
selectively-materializing-users-event-f.html)

------
brown9-2
Here is a similar talk that Jeremy Cloud gave at QCon NY a few weeks ago:
[http://www.infoq.com/presentations/twitter-
soa](http://www.infoq.com/presentations/twitter-soa)

 _Jeremy Cloud discusses SOA at Twitter, approaches taken for maintaining high
levels of concurrency, and briefly touches on some functional design patterns
used to manage code complexity._

~~~
mistertrotsky
I would absolutely trust someone named Jeremy Cloud on this subject.

------
joshuaellinger
The surprise for me is that the core component is Redis.

My first guess would have been custom C code. Yeah, you have to do everything
yourself. Yeah, it would be hard to write. But you'd control every little bit
of it.

Obviously, I must not fully understand the problem and what Redis buys.

Sam Puralla (if you are reading) -- do you know why didn't Twitter go with a
full custom system at its heart?

Josh

~~~
bitbckt
I'm probably a better person to answer this than Sam - I'm a former lead on
this project - so I'll take a swing:

We chose Redis because it gave us the specific, incremental improvements over
our existing memcached-based system that we required, without requiring us to
write (yet another) component. There was enough to do, and this choice has
turned out to be good enough, I think.

As the project progressed though, we treated Redis much like we did own it. We
altered the wire protocol, changed the eviction strategy, and reduced protocol
parsing overhead, for example. Much of that work has long since made it
upstream.

[edit] grammar

~~~
joshuaellinger
Sounds like good engineering choices. I think of Twitter as having unlimited
resources but, of course, that can't be true.

Two follow-on questions:

1\. Did your changes make it back into open source or were they only relevant
to Twitter? When you say upstream to you mean on Redis or earlier in the
Twitter pipeline?

2\. How much is Redis on the critical path? Is it 90% of the processing work
in the large fanout cases?

~~~
bitbckt
1\. Yes, most everything we changed is in the open source Redis code base.
That's what I'm referring to as upstream, above.

2\. Redis is in the critical path for a majority of API requests. I can't
provide a specific percentage.

------
smandou
Well... It seems that hisghscalability doesn't scale... Error from squarespace
on that page.

------
ampersandy
No one else seems to have noticed that these details are out of date. Twitter
has publicly stated that they currently have "well over 200M active users".
The stats are also misleading in that -- I'm pretty sure -- 300k reads and 6k
writes per second are only referencing tweets. Flock, Twitter's graph
database, handles more than 20k writes and 100k reads per second on its own
(peak numbers available from the two year old Readme on Github).

[https://blog.twitter.com/2013/celebrating-
twitter7](https://blog.twitter.com/2013/celebrating-twitter7)

------
webwanderings
> Twitter knows a lot about you from who you follow and what links you click
> on.

No kidding. But we don't care as we live in the glass house of a celebrity
culture.

PS: downvote the quoted text if you must. My point is not obvious.

~~~
StavrosK
You realize it's public data, right? When I tweet something, I don't expect it
to be private. HN knows a lot about me too.

~~~
webwanderings
Yup, I mentioned glass houses.

------
jebblue
>> it can take up to 5 minutes for a tweet to flow from Lady Gaga’s fingers to
her 31 million followers

Why not break the load up among a farm of servers? 5 minutes to deliver a
single message? It's too bad multicast can't be made to work for this use
case.

At least analyze to see if there's a pattern of geographic concentration of
her followers and optimize for where their datacenters are.

Use peer to peer, let the clients help distribute the messages.

------
tschellenbach
Their setup is very similar to what we use at Fashiolista. (though of course
we only have millions and not hundreds of millions of users). We've open
sourced our approach and you can see an early example here:
[https://github.com/tschellenbach/Feedly/](https://github.com/tschellenbach/Feedly/)

------
pesenti
Does anybody know how that compares to Facebook? I believe they do not use
write fanout but instead rely on the search federation model (the model
claimed not to have worked for Twitter).

~~~
sheldoan
Facebook uses a fan-out-on-read approach

[http://www.quora.com/Facebook-Engineering/Did-Facebook-
devel...](http://www.quora.com/Facebook-Engineering/Did-Facebook-develop-a-
custom-in-memory-database-to-manage-its-News-Feeds)

------
kushti
Scala is awesome language to implement scalable things, like Twitter services

~~~
swah
Why?

------
bpicolo
They have a lot of RAM. Dang.

~~~
buro9
Not as much as I would have thought.

2TB can be one server:
[http://www.supermicro.co.uk/products/system/5U/5086/SYS-5086...](http://www.supermicro.co.uk/products/system/5U/5086/SYS-5086B-TRF.cfm)

An expensive server for sure... but it's just one server.

~~~
vidarh
In case people have a hard time finding prices for this, my vendor gives ~57k
GBP ($85k) as the lower end price for configurations with that SuperMicro
cabinet, 64 cores (8x 8-core CPU's) and 2TB of RAM. Dropping to 1TB lets you
pick from a lot of cheaper cabinets and so the price drops by quite a bit more
than half for the basic configurations.

------
papsosouid
I really question the current trend of creating big, complex, fragile
architectures to "be able to scale". These numbers are a great example of why,
the entire thing could run on a single server, in a very straight forward
setup. When you are creating a cluster for scalability, and it has less CPU,
RAM and IO than a single server, what are you gaining? They are only doing 6k
writes a second for crying out loud.

~~~
davidgerard
I do consider it a pathology when tiny services, or tiny apps in a corporate
structure, act like they have the problems of Google.

You are not Google.

You do not have Google's problems.

You do not have scaling issues.

For you, N is small and _will stay small_.

Stop giving me this delusional resume-padding garbage to implement. For you,
here, it is delusion and lies.

~~~
jacques_chester
The point here is that Twitter really is one of the cases where vertical
scaling is not, on the balance of non-functional requirements to engineering
overhead, the right decision. They really do need to pay the complexity piper.

~~~
davidgerard
Oh, yeah. I'm speaking to everyone else :-)

