

Reddit: 2012 State of the Servers - blutonium
http://blog.reddit.com/2012/01/january-2012-state-of-servers.html

======
gameshot911
Having no experience with database/website administration myself, I'm struck
by just how _little_ I'm able to translate the works and concepts in this post
into actual, manual labor.

For each and every thing that Jason talked about...upgrading Cassandra, moving
off EBS, embarking on self-heal and auto-scale projects...what took the reader
a few seconds to read and cognise undoubtedly represented hours and hours of
work on the part of the Reddit admins.

I guess it's just the nature of the human mind. I don't think I could ever
fully appreciate the amount of work that goes into _any_ project unless I've
been through it myself (and even then, the brain is awesome at minimizing the
memory of pain). So Reddit admins, if you're reading this, while I certainly
can't fully appreciate the amount of labor and life-force you've dedicated to
the site, I honestly do appreciate it, and I wish you guys nothing but success
in the future!

------
markerdmann
It's interesting to see that they're sticking with Cassandra, and that they're
having a much better experience with 0.8. I've been hearing so many fellow
coders in SF hate on Cassandra that I had stopped considering it for projects.
Has anybody worked with 0.8 or 1.0? Would you recommend Cassandra?

I got to work with Riak a lot while I was at DotCloud, but the speed issue was
pretty frustrating (it can be painfully slow).

~~~
rbranson
This is because people came to the table with unrealistic expectations. They
were used to dealing with mature software based on decades old proven ideas
and coming into very experimental territory expecting to get a smooth
experience.

Cassandra has enabled Reddit to manage a highly scalable distributed data
store with a tiny staff. This is not to say it has been trouble free, but it
has enabled them to do something that would have been infeasible without
pioneers in this space (Cassandra, Riak, Voldemort, etc) making these tools
available.

~~~
onemoreact
I respect the Reddit team, but I don't think they need to use Cassandra at
their scale. I mean they only have 2TB of data in total. They should easily be
able to use a simple caching system to keep the last 2 weeks of data in RAM
and basically never read from the database.

That said, they may be freaked out based on their growth curve and simply
thinking ahead.

~~~
techscruggs
They said that they had 2TB in postgres not 2TB of total data. I imagine all
of their data is probably about an order of magnitude larger. Additionally,
the challenges are not as much around how much data you have, but how you want
to access that data (indexes).

~~~
rbranson
Indeed. It boils down to their need for a durable cache. It's simply too
expensive to try to cache every comment tree in RAM, and Cassandra's data
model and disk storage layout is a really good fit for the structure of their
data.

~~~
onemoreact
You don't need every comment tree in RAM just the last few days worth plus a
few older threads that get linked back to. They are currently using 200
machines so let's say 10 of them are used to cache 1 weeks comments. 30 GB of
ram * 10 machines = 300GB of cache. I would be vary surprised if they generate
200GB/week or 10TB of comment data a year.

Edit: For comparison Slashdot spent a long time on just 6 less powerful
machines vs the 200+ Reddit is using. Reddit may have more traffic, but not
40x as much. And, last I heard HN just uses one machine.

PS: The average comment is small and they can compress most comments after a
day or so. They can probably get away with storing a second copy of most old
threads as a blob of data in case people actually open it which cost a little
space, but cuts down on processing time.

~~~
rbranson
Please. Reddit does 2 billion page views per month.

~~~
onemoreact
Yea, and Casandra was built for a company serving 500 times that.

------
thought_alarm
It reminds me of Slashdot circa 1998/99, back when we watched those guys grow
their then-new-found popularity out of a dorm-room Linux box; at a time when
the web was a mere fraction of the size it is today.

Godspeed, reddit. You're on the right track.

------
joevandyk
They say they moved off ebs and onto local storage for postgres and saw a big
increase in reliability and performance.

I did the same for my site last year and it was great.

This is one of the reasons why I haven't moved my Postgres databases to
enterprisedb or heroku: they use ebs.

~~~
x3c
But how do you achieve data persistence in case of server crash? Snapshots are
not reliable for that, slave db servers aren't foolproof either.

~~~
fleitz
You just copy the WAL log to another server and replay it. It takes a day to
setup and test. Once that is setup you have two options async replication
(which means you'll lose about 100ms of data in event of a crash) or you can
use sync replication which means the transaction doesn't commit until the WAL
log is replicated on the other server. (that adds latency but doesn't really
affect throughput)

I'm not exactly sure how the failover system works in Postgres, the last time
I setup replication on postgres it would only copy the WAL log after it was
fully written, but I know they have a much more fine grained system now.

If you use SQL Server you can add a 3rd monitoring server and your connections
failover to the new master pretty much automatically as long as you add the
2nd server to your connection string. Using the setup with a 3rd server can
create some very strange failure modes though.

~~~
notaddicted
Another possibility on the AWS cloud is to put the logs on S3, Amazon S3 has
High Durability. Heroku has published a tool for doing it:
<https://github.com/heroku/WAL-E> , which they use to manage their database
product.

------
cluda01
I'm unfamiliar with hosting costs or really any costs running a site as
popular as reddit. Anyone with experience in this area have a ballpark figure
for how much it would cost per month to run this sort of setup?

~~~
rdouble
$300K

~~~
someone13
Where do you get this estimate from? (Not disbelieving you, just curious)

~~~
bru
One year and half ago, it was calculated and then confirmed by an admin[1]
that the monthly cost was around 22K/month, or 270K/year. jedberg added that
they were projecting to be around 350K/year by the end of 2010.

Supposing that the cost increased linearly with the number of users (which
sounds like a bad hypothesis, but is a start), the cost at the end of 2011
could be around 1M/year... That's impressive, but nowhere near the 300K/month
proposed by rdouble.

So I would say that the monthy cost of reddit's infrastructure is around 90K.
Which is really impressive.

1:
[http://www.reddit.com/r/blog/comments/ctz7c/your_gold_dollar...](http://www.reddit.com/r/blog/comments/ctz7c/your_gold_dollars_at_work/c0v8yby&context=2)

~~~
rdouble
You're probably right as I calculated with expensive instances. Also, when I
made my estimate I was guessing at image storage costs, forgetting that the
images are coming from image sharing sites.

------
ypcx
Wondering how much of that 2TB dataset is necessary for the common daily
functionality of reddit, probably less than 1%, and the rest is historical
data, accessed by almost no one, except perhaps by the submission-dupe-
checking algorithms, and similar?

~~~
rplnt
Are you suggesting moving that down the ladder and not having that data
everywhere? I.e. if someone want to see an old post, there would be one extra
step required to load the data (so cdn, cassandra, now subset of postgres with
not so old data, "full" postgres). I think facebook does something similar,
but they really have to, considering their size.

~~~
nbm
In terms of status updates (ie, stories which may mention check-ins or photos
or similar, but not Facebook Messages), before Facebook's Timeline launch,
there were multiple stores of data depending on age. With Timeline, all the
different versions of data over all ages were put back together into a single
(logical) store. More about that process at:

    
    
        https://www.facebook.com/notes/facebook-engineering/building-timeline-scaling-up-to-hold-your-life-story/10150468255628920

------
brador
Could we get a public backup of the database already? Make it a torrent if
bandwidth is an issue, but lets back that amazing resource up.

~~~
obtu
Terabytes is starting to be expensive to mirror, in terms of bandwidth and
storage.

------
zerostar07
Those are staggering numbers, glad i invested my time in reddit last year. We
must be cautious of overheating though, signs of a bubble or a possible
subreddit crisis.

------
ctekin
Does anyone know what kind of hardware those 240 servers have? I wonder how
much they cost.

~~~
davej
They're EC2 large and x-large instance.

------
Ecio78
What about IndexTank? They dont talk about it in this blog post. Have they
stopped using it?

~~~
eco
They still use it. Whenever you search you get the "Powered by IndexTank" logo
in the corner.

------
fleitz
Running a DB on a single spindle, and they have performance problems?

I couldn't imagine why.

2 TB OMG, thats almost a decent sized SQL Server instance. Yeah, it should
take about an hour or two to replicate. I'm assuming they have a 10Gb enet on
their DB server.

