
Lessons Learned While Building Reddit to 270 Million Page Views a Month - rjim86
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
======
arkitaip
I love reading about how companies scale their BigHuge data but it bothers me
that we still haven't reached the point where scalability is a commodity
instead of a patchwork of technology that everyone actor solves in their own
way.

~~~
nostrademons
It's because "how to make a website scale" depends heavily upon _which_
website, what it does, and how big it needs to be. Making a messaging queue
like Twitter or G+ scale is very different from making Google Search scale.
Hell, making the _indexing system_ of Google search scale is a very different
problem from making the _serving system_ scale.

You can't really avoid having a patchwork of technology, because it's a
patchwork of problems. Instead, there're a bunch of tools at your disposal, a
few "best practices" which are highly contextual, and you have to use your
judgment and knowledge of the problem domain to put them together.

~~~
batista
> _It's because "how to make a website scale" depends heavily upon which
> website, what it does, and how big it needs to be. Making a messaging queue
> like Twitter or G+ scale is very different from making Google Search scale.
> Hell, making the indexing system of Google search scale is a very different
> problem from making the serving system scale._

For most websites is not THAT different.

Actually, most have pretty similar needs, and you can sum those up in 3-5
different website architectural styles anyway.

There is far more duplication of work and ad-hoc solutions to the SAME
problems than are "heavily different" needs.

~~~
nostrademons
What would be those 3-5 different website architectural styles?

~~~
batista
News/Magazine/Portal like (read heavy), Game site (evented, concurrent users,
game engine computations), Social Platform (read-write heavy), etc.

Most needs are bog standard. If you really look at most successful sites they
use might same-ish architectures, only with different
components/languages/libs each.

Basically all high volume sites use something like the notions behind the
Google App Engine, and the services offered. The various AWS tools are also
similar (S3, the table they offer, etc).

~~~
nostrademons
I think you're missing a lot of complexity of the considerations that actually
go into implementing any of the above. I can think of 3 subsystems within
Reddit alone (reading, voting, and messages) that all have different usage
patterns and (if they're doing it right) require different approaches to
scaling.

Where's e-commerce on your list? The approaches for scaling eBay are
completely different than for scaling Reddit or YouTube, because eBay can't
afford eventual consistency. You can't rely on caching to show a buyer a page
whose price is an hour out-of-date.

Here's something else to think about: why do (the now-defunct) Google real-
time search and GMail Chat have completely different architectures, despite
both of them having the same basic structure of "a message comes in, and gets
displayed in a scrolling window on the screen"? The answer is latency. With
real-time search, a latency of 30 seconds is acceptable, since you aren't
going to know when the tweet was posted in the first place. With GChat, it
_has_ to be immediate, because it's frequently used in the context of someone
verbally saying "I'll ping you this link" and it's kinda embarrassing if the
link doesn't arrive for 30 seconds. Real-time search also has to run much more
computationally-intensive algorithms to determine relevance & ranking than
GChat does.

I've personally worked on Google Search, Google+, and Google Fiber. I can tell
you that they do _not_ all use something like the notions behind Google
AppEngine. There's no way you could build Google Search on AppEngine, and G+
would be a stretch.

~~~
batista
> _I can think of 3 subsystems within Reddit alone (reading, voting, and
> messages) that all have different usage patterns and (if they're doing it
> right) require different approaches to scaling._

Yes, and those would be the same in all social bookmarking type sites, and
similar (voting, etc) components of a social site a la Facebook, G+ etc.

> _The answer is latency. With real-time search, a latency of 30 seconds is
> acceptable, since you aren't going to know when the tweet was posted in the
> first place. With GChat, it has to be immediate, because it's frequently
> used in the context of someone verbally saying "I'll ping you this link" and
> it's kinda embarrassing if the link doesn't arrive for 30 seconds._

Yes, so that's one use case for one architecture (low latency message queue),
and the other is another. I gave the example of a Game site that also has
similar latency concerns.

> _Where's e-commerce on your list? The approaches for scaling eBay are
> completely different than for scaling Reddit or YouTube, because eBay can't
> afford eventual consistency._

My list wasn't supposed to be exhaustive -- I spoke of 4-5 common styles and
only mention 2. That said, the approaches behind eBay might not resemble
Reddit or YouTube, but they will resemble others like Amazon, Etsy, etc.

> _I've personally worked on Google Search, Google+, and Google Fiber. I can
> tell you that they do not all use something like the notions behind Google
> AppEngine._

I don't think we mean the same things. For one, nobody makes search engines,
or a google competitor. So what it takes to do Google Search is a moot point,
when discussing common architectural patterns behind big sites.

I meant high level stuff on on hand like demoralised data, map reduce,
workers, share nothing, sharding and such, and common infrastructure on the
other hand, like the relational db, memcached, big-table like datastore,
abstract filesystem (S3, BlobStore, GridFS, etc), ElasticSearch, Hadoop, Node,
Redis, message queues, etc.

Google Search or Facebook might have needs way beyond those, but the above are
shared by 99% of big sites out there.

A common system to build on top of them that is higher level than Heroku (and
more expansive and accommodating than GAE) should exist, and it would cater
more than 80% of big website needs. Of course each will need some custom
stuff, but not 80% custom stuff.

~~~
batista
Several typos, sorry, here's a particularly bad case:

"I meant high level stuff on _one_ hand like _denormalized_ data."

------
ndemoor
"Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a
Thing: users, links, comments, subreddits, awards, etc. Things keep common
attribute like up/down votes, a type, and creation date. The Data table has
three columns: thing id, key, value."

I hope they introduced some NoSQL sweetness by now.

~~~
encoderer
Exactly.

They turned their RDBMS into a NoSQL database that's still slowed-down by all
the relational machinery.

Though I'm sure this was an informed choice at the time, I seriously hope
nobody finds this advice actionable anymore. Use Cassandra. (Or your NoSQL DB
of choice)

~~~
jebblue
The only problem with NoSQL is what happens when one day
_you_need_to_relate_data?

~~~
encoderer
It's an all-of-the-above strategy.

I remember when the Bigtable paper was released. It was very early in my
career and I remember it sounding so alien to me. Sure, i had Memcached in my
stack, but no SQL? It seemed like something they had to trade off to be able
to build the kind of services they offer. I felt the same way after reading
Dynamo.

Sure, I thought a lot about data design. I thought about usage patterns to
inform how we denormalize. And I grew into using, eg, Gearman, to pre-compute
dozens of tables every night. I evolved, a bit.

But a few years ago, a little before this OP was written, I had a great
experience with some Facebook engineers and had an a-ha moment that has made
me a much better software engineer. Basically, I realized that I needed to let
my data be itself. If I have inherintly relational data, then it should be in
a relational database. But I've built EAVs, queues, heaps, lists, all of these
on top of MySQL and Postgres. Let that data be itself.

We have more options now than ever before. K/V stores, Column stores, etc. I
use a lot of Cassandra. A lot of Redis. Some Mongo. And a put a lot more in
flat files than I ever thought I would.

I know a lot of people that are smarter than me left the womb knowing these
things. But for me it was transformational and has made me much happier. I
realized how much energy I wasted fighting my own tools.

~~~
notimetorelax
Does it mean that in a single project you may use two different storage
solutions? SQL DB for relational data and NoSQL for queues, heaps, etc. ?

I'm asking because for me it is a kind of a paradigm shift, I have used plain
files and SQL DB in the same project before, but never two different
databases.

~~~
encoderer
The other two commenters here beat me to it, but yes, absolutely. But be sure
to right-size. If you don't have scalability issues, then your life is easier.
Postgres has a k/v store option. Use that. Get more complex if you need to.

It's also perfectly acceptable to duplicate data -- store everything in
Postgres or MySQL as your "master", cache all query results in Redis to avoid
hitting the database whenever possible, and then build out
lists/indexes/copies in a NoSQL database. Because yeah, corruption happens and
it's great to be able to wipe and rebuild.

A common pattern would be.. i dunno.. think of a newsfeed.

* Each item is one entry in an "items" table, maybe with a "likes" int column.

* For each item, you have a sorted-set in redis of Likers.

* For every user you have a sorted-set for their newsfeed, with ID's of each item.

* When they hit the page, you load that sorted set, then query for the first 20 items from your database by ID (so it's a primary key lookup, and it probably won't even need to hit the Database because your redis cache layer will see it has that ID in cache already)

* You'd use redis MultiGet feature to reduce round-trip calls.

* When somebody "likes" something you write a job to a queue, which will eventually add the liker to the bottom of the sorted-set, then probably stores the details of the Like itself (timestamp, user, etc) to a slower datastore, which could even be a transactional log that you'd never read again unless you had some significant issue.

I've never actually built a newsfeed so there's certainly issues here, but
maybe it will be helpful as a general idea. Remember, somebody, can't remember
whom, said "The only truly hard things in computer science are cache
invalidation and naming things" Cache invalidation (knowing when to go back to
source data because your cache is stale) is still an "analog" problem. By that
i mean, there's a thousand shades of grey there and you have to pick a
strategy that works for you.

~~~
notimetorelax
Thank you for the example, it's interesting to see k/v storage used for
caching. I'll keep it in mind!

------
chaz
Their most recent infrastructure blog post was from January 2012, and shows
that they're using Postgres 9, Cassandra 0.8, and local disk only (no more
EBS). I'm curious if the recently-announced provisioned IOPS would enable them
to go back to EBS.

[http://blog.reddit.com/2012/01/january-2012-state-of-
servers...](http://blog.reddit.com/2012/01/january-2012-state-of-servers.html)

~~~
mdellabitta
Probably the better move would be to go to SSD-backed High I/O instances.
Netflix did: [http://techblog.netflix.com/2012/07/benchmarking-high-
perfor...](http://techblog.netflix.com/2012/07/benchmarking-high-performance-
io-with.html)

------
ecaron
Only disagreement (although I feel like I'm arguing w/ Linus about git) is
don't memcache session data (lesson 5.) Memcache's 1mb max-block (exceeding
that removes too many performance perks to be considered viable) introduces a
"I need to constantly worry about my sessions getting too big" mental overhead
that isn't worth it.

Go with Redis for storing session data.

~~~
encoderer
Even better: Don't use sessions. If there's data you're caching in a session,
pre-compute it and store it in Redis (or other suitable NoSQL database).

Going statless wherever possible almost always wins. (And I only say "almost"
because I'm not smart enough to know if it really _always_ wins. But for me,
it has)

~~~
irahul
> Even better: Don't use sessions. If there's data you're caching in a
> session, pre-compute it and store it in Redis (or other suitable NoSQL
> database).

What do you mean by _data you are caching in a session?_ Session is for
maintaining state viz. logged in or not, language preference etc.

How is storing in Redis(or any other data store) any different from storing in
Memcache?

> Going statless wherever possible almost always wins.

Using sessions with a data store which can be accessed from multiple
machines(no files or in-memory stores) is stateless. The simplest
implementation will have a signed cookie with a session id. Your application
will have a hook to load the user session from the backing data store before
processing the request.

~~~
encoderer
Once you move from storing session data in-memory on the webserver, and add a
network call, why not just store the data alongside the users other data in a
fast datastore? That could be Redis, Cassandra, whatever.

This isn't babble -- it's honestly a pretty common technique. If your site
sees millions of users, storing a session for visitors that aren't logged in
is prohibitive and unnecessary. You can store UI customizations and basic
memoization in a client-side cookie if you need to.

If you're building something small or basic, then you probably won't have
multiple webservers and you can use fast in-memory sessions without concern.
This only applies once you need to worry about scale.

~~~
irahul
> Once you move from storing session data in-memory on the webserver, and add
> a network call,

I don't move from storing session data in-memory; I never store session data
in-memory. Well, I do indirectly but that's always some distributed memory
system. And there isn't necessarily a network call. A single webserver
accessing memcache running on the same machine is as fast as accessing in-
memory cache. When there is more than one memcache server, there has got to be
a network call, but I am yet to see an application where that is an overhead.
Compared to disk access, that is a floating point error.

> why not just store the data alongside the users other data in a fast
> datastore? That could be Redis, Cassandra, whatever.

What advantage would I have from storing the session alongside the user data?
My datastore isn't Redis, Cassandra. It's postgresql. Redis is my cache/data
structure server. The only nosql solution which I would consider for user data
is mongo.

> This isn't babble -- it's honestly a pretty common technique.

Other than your insistence on using Redis, I don't see how your technique is
any different from mine.

> If your site sees millions of users, storing a session for visitors that
> aren't logged in is prohibitive and unnecessary.

If the user aren't logged in and don't have any user data, there is no session
to be loaded. When the user logs in, set the secure cookie with the session
id. When a new request comes in, if the secure cookie has the session id, load
it. For a not-logged in user, there is not session id and nothing is loaded.

> You can store UI customizations and basic memoization in a client-side
> cookie if you need to.

The only thing I will store in the cookie is session id. Cookie length
constraints and network payload for every request is more overhead than
loading the session on the server side.

> If you're building something small or basic, then you probably won't have
> multiple webservers and you can use fast in-memory sessions without concern.
> This only applies once you need to worry about scale.

For sessions, session id in cookie and memcache/redis as session store works
for all scale.

~~~
encoderer
If you've got a single webserver then we're speaking a different language.
That's not an insult or anything, it's just that I'm talking about techniques
we use to support millions of pageviews a day.

There is no one way to build a system of course. I'd love to hear more about
your lessons-learned, but this really isn't a good forum for that. Though even
at your scale you should consider redis as a drop-in replacement for memcache.
It benchmarks faster in many cases, and has support for data structures
(lists, sorted sets, etc) that make your life easier.

And see my comment below clarifying what you said about cookies.

> For sessions, session id in cookie and memcache/redis as session store works
> for all scale.

Kind of a bold statement? GL with that.

~~~
irahul
> If you've got a single webserver then we're speaking a different language.
> That's not an insult or anything, it's just that I'm talking about
> techniques we use to support millions of pageviews a day.

And where did I talk about single webserver? You said "Once you move from
storing session data in-memory on the webserver, and add a network call,", to
which I said I never do in-memory session, even for a single webserver. Get
over yourself - you aren't a special snowflake who deals with more than one
more webserver.

> Though even at your scale you should consider redis as a drop-in replacement
> for memcache. It benchmarks faster in many cases, and has support for data
> structures (lists, sorted sets, etc) that make your life easier.

"Though even at your scale "

As I said, get over yourself. All you have offered is somehow storing session
in Redis makes you scale.

How on earth would you know what scale I am talking about? I don't remember
mentioning it.

And I know what redis does. Cargo cult mentality viz. "redis is really better
than memcached" are the main reason behind fucked up systems.

> And see my comment below clarifying what you said about cookies.

Yes, I saw your comment. Local storage is not the solution for coming year or
so. I am paid to design systems that work, not systems that might work.

> For sessions, session id in cookie and memcache/redis as session store works
> for all scale.

>> Kind of a bold statement? GL with that.

I don't know where you are getting your numbers from, but million
pageviews/day as you mention again and again is a very nominal number for a
generic webapp(unless you are very cache unfriendly viz. reddit). That isn't
something you even have to think about. A standard rails app sitting behind
nginx with 4-5 webservers will do it just fine.

And for the last time, "session id in cookies and then load the session before
request" fucking works for every one including facebook and google. The only
thing that differs is choice of session store, and no, redis isn't the
catchall solution. For most of the cases, memcache is faster.

~~~
encoderer
Wow, ego issues?

You have taken this all very personally. I apologize for offering a different
take on system design than what you apparently believe in very strongly? I've
said a few times there's "no one way."

The systems we've built have served over-billion-page-view months. That's not
common or easy. HN is a site that values a back-and-forth about that kind of
experience and I'd have loved to hear your tips thrown-out there too. But this
has become some strange ego thing for you so it's time for me to bow out.

~~~
irahul
> Wow, ego issues?

You are way in over your head.

> You have taken this all very personally. I apologize for offering a
> different take on system design than what you apparently believe in very
> strongly?

No, you haven't offered anything other than "you should use Redis". You
started with "if you are caching in sessions". I don't know where you get the
idea the sessions are caches.

> The systems we've built have served over-billion-page-view months. That's
> not common or easy.

Good for you. But have you actually used local storage for storing user
specific data? And have you compared using redis and memcached for session
storage? I don't have a problem with difference in opinion - it's just that
your opinions aren't valid.

------
citricsquid
Article is from 2010, if I remember correctly their architecture has changed
substantially since this article.

------
dotborg
How are PHP, RoR, node.js, Perl/CGI etc. engineers supposed to build offline
processing? Crontab?

~~~
CWIZO
We use gearman (and PHP, but it works with many other languages). It's great.

~~~
dotborg
yet another piece of software to maintain?

~~~
CWIZO
If you need background processing then you really can't afford to not
introduce some new stuff in your stack.

------
surferbayarea
that's just ~12 queries/sec..not that huge a deal..

~~~
jonknee
There are 2,592,000 seconds in a 30 month day so actually that's more like 104
page views per second. But that also does not count peaks (peak is likely
250+) and that each page view requires multiple queries.

Either way, no need to hate on the traffic figures of what everyone knows is a
very large website.

~~~
masklinn
Also, it was back in 2009[0]. In December 2011, they were up to 2.07bn page
views (~750 pages/sec on average)

[0] according to [http://blog.reddit.com/2012/01/january-2012-state-of-
servers...](http://blog.reddit.com/2012/01/january-2012-state-of-
servers.html), December 2010 saw 829 million page views

~~~
ralfd
Interesting is also traffic peaks. The IAMA from President Obama a few days
ago really stress tested Reddit:

<http://blog.reddit.com/2012/08/potus-iama-stats.html>

> At the peak of the IAMA reddit was receiving over 100,000 pageviews per
> minute.

> In preparation for the IAMA, we initially added 30 dedicated servers (20%~
> increase) just for the comment thread. This turned out not to be enough, so
> we added another 30 dedicated servers to the mix. At peak, we were
> transferring 48 MB per second of reddit to the internet. This much traffic
> overwhelmed our load balancers which caused a lot of the slowness you
> probably experienced on reddit.

------
petercooper
It's pretty cool how nowadays Redis can cover several of those concerns
quickly and simply (open schema, caching, replication..)

