Hacker News new | past | comments | ask | show | jobs | submit login
Reddit costs $33k/month to operate? (reddit.com)
125 points by there on July 26, 2010 | hide | past | web | favorite | 158 comments

That's actually not terribly surprising. Reddit's always been extremely conservative with ad space, and it seems like they could offset a lot of that if they started running more "traditional" ad slots.

By their own numbers, they do about 14.3m pageviews/day. Let's assume half of that is AdBlocked. A single 300x250 at an exceptionally low $0.05 CPM would yield $3,575/day in ad revenue, or $107,250/month. I'll guarantee that reddit could pull far, far higher CPMs. Like, an order magnitude or two higher.

Edit: I'm off by an order of magnitude, see below. Point stands, though, that there is a lot of ad money they're leaving on the table.

I agree with your main point but your math is off by an order of magnitude:

(14,300,000 x $0.05 x 0.5)/1000 = $357.50/day

I still think Reddit could easily pay their costs by running ads, not sure why they are not doing so.

Their user base is outright hostile to the idea.

Users are actively hostile to ads in general. Reddit's userbase, in particular, has been spoiled by the staff's similar attitude towards ads. The fact remains that computing power, bandwidth, and storage takes money, which is easy for your average hostile-to-ads user to forget.

If they dropped in AdSense units, there would be an uproar for about a week, then people would adapt and life would go on with a significantly bolstered revenue stream. Worst that happens is that more users would still be adblocking, but given their page view volume compared to their expenses, I can't imagine that even that would be a net loss.

> The fact remains that computing power, bandwidth, and storage takes money, which is easy for your average hostile-to-ads user to forget.

Uh... therefore reddit gold. That's the point.

Reddit gold will probably bring in 2% of what adsense would bring in (If the userbase was 'better')

I think you are completely wrong about that. Even at a very low conversion rate subscriptions pull in an awful lot of money. Typically my subscription income outweighs my ad income by about 5 to 1 in a given month. And my site is far smaller than reddit.

The key to this is user retention, and if reddit users are loyal to reddit (which they seem to be, especially those willing to pull their credit cards) I wouldn't be surprised to see retentions on the order of 6 months or more.

What do you base the 2% on ?

If you look at the internet as a whole, the number of $ revenue from 'subscriptions' vs number of $ from advertising, then the % is probably far lower. <1%. Also my personal experience has been that it's far far easier to make money from advertising. Especially for something like Reddit where there is little incentive for people to pay.

For Reddit, yes you'll get a number of die-hard fans who want to subscribe. But I don't believe that there is a big enough value proposition for the userbase to do that en-masse. They may as well just set up their own reddit since the code is open source (And wouldn't be that hard to recreate if it wasn't).

It might be different if:

  A). Reddit wasn't spending crazy money on servers
  B). Reddit wasn't owned by a multinational corporation
  C). Reddit hadn't pandered to, and cultivated a staunchly
      anti advertising userbase.
Right now though, it's looking like they're going to try a subscription based news service. Which fails time and time again. People don't want to pay for online news.

Also from their latest blog post, it looks like they're just spending the 'reddit gold' money on more hardware! Instead of fixing the underlying issues.

I agree that their hardware strategies are bordering on the insane, I can host much (and really, like 10 times or so) cheaper by simply getting dedicated servers with fat pipes than I could ever do using EC2, that part makes no sense at all to me. Scaling issues aside, if they ran an efficient shop serverwise I think they could easily operate the whole thing from their subscription potential. Typically you can count on between 0.5 and 2% of your users signing up for a 'gold' service, provided you give them some extra goodies on top of the free product.

The 'news' angle is a silly one, but they could definitely think up features that people would pay for that are not available right now in the free product.

The real issue with reddit making money from advertising (aside from the ad blocking) is that the CPMs that are quoted here (between 2 and 9$) are not realistic for their number of pageviews. By the time all the unsold inventory is taken out you probably end with $0.05 ECPM or maybe 10 cts per click (and that would be pretty good).

I'm really interested in how much they were paying before for their bandwidth and hosting if going to EC2 actually lowered their costs, they must have had the worst deal on the net for that to be true.

Right now, $30K / month buys you 20 (very) fat servers and 20 Gbps flat rate, managed hosting.

I'd really like to see someone make the case they can get that kind of performance out of EC2 for a similar cost.

>Also from their latest blog post, it looks like they're just spending the 'reddit gold' money on more hardware!

And hiring a designer! Because, you know, that'll fix their load issues.

joking aside, a good designer can save you bandwidth money if they know what they are doing. (please note I used the word "good")

From elsewhere, only 10% of the cost is bandwidth. 90% is CPU?!!! Because Reddit is so computationally expensive apparently...

Considering reddit's current bare-bones design, a designer would have a hard time coming up with something eating less bandwidth, lest he removed all colors.

You can waste a lot of bytes on a bar-bones design if done poorly. A couple of years ago there was a blog post (can't find it now) about a designer redoing slashdot.org using css and that redesign had a pretty large % change in size.

Slashdot was tables-based back in the day, which can't help but be verbose. :)

(The point stands, though, that "basic" doesn't mean "light").

They have conditioned their user base to be hostile to the idea.

Maybe, but I'd guess a significant portion arrived that way. Advertising on the web is often extraordinarily obnoxious, and makes few friends.

It'll be interesting to see whether they're more hostile to advertisements or not having a site.

I'm not entirely sure that's true. Most active redditors I know, including myself, have reddit whitelisted on AdBlock.

Whitelisting solves nothing. It's still a 'bad' userbase. It's been cultivated to be anti consumerist anti advertising etc.

Whitelisting on adblock will let through a tiny bit of CPM based revenue, but people won't click in the numbers that 'normal' people do.

How do you cultivate a user base in a particular direction, when all the content is submitted by the users?

The only way I can think of to cultivate an anti-ad userbase is to not have ads, but Reddit has had advertising for a long time. For a while it was even a bit obnoxious.

Right. You need to set the tone for a community while it is growing.

I was thinking that my numbers seemed high, but mentally chalked that up to the fact that they do $overflow pageviews/month. Thanks.

Still though, I agree. At something like a $2 CPM (which is NOT that hard to get for someone their size) they'd still be pulling in insane money for just one ad unit.

Using _pi's figure of $8 CPM then we get 160 times this, $57,200 ?

No on seems to be mentioning other revenue streams, user stats, customised reddits, merchandise sales, ...?

There's not a chance in the world that reddit could get $8 CPMs. They might be able to sell ads on one or two subreddits for that, but I doubt it.

They could get more than $8 because there would only be a single ad on the page. Roadblock rates (ie. when a single advertiser buys out every ad slot on the site) are $65+ CPM on sites like Gawker and Techcrunch. If you assume a 50% fill rate on reddit, it would be very realistic for the site to achieve a $20+ECPM.

The users now have an option to not see ads (ie. buy a gold subscription), so they can't really complain about having a single ad slot on the page for non-gold users.

To go further, they could not show the ad slot at all if there isn't a campaign running on the site (ie. don't show 'backfill') which would further increase the value of the slot.

$100k+ a month from that ad slot would be very very easy to achieve, and with minimal sales work (they could outsource it to Doubleclick or another network). Sites like Techcrunch and Gawker make millions from ad revenue with a fraction of the traffic - reddit could hit 6 figures a month with a single unit that isn't even displayed all the time.

It would be a perfect balance of retaining the style of the site along with bringing in revenue

TBH, when CN bought reddit, this is what I thought they would do

You can't compare content sites to aggregators. Even if you could, the Gawker properties have really good demographics (as far as advertisers are concerned).

Do you really think that they could pull $8 CPM?

What's digg getting? A bit of googling shows that most people who get front-paged see ECPM of <$1...

Why do you think that reddit would be so much higher?

(I agree, they should be showing more ads)

It's really their rate, check the link under my comment thread.

$130k/year with your numbers.

That'll cover the servers, easy, but what about salaries?

That's with a 5-cent CPM and a 50% block rate. Those are laughably conservative numbers. The actual numbers they could pull would be massively larger.

Just wondering, thanks for letting me know.

I have no idea what constitutes normal ad rates for a site because all the projects I've worked on have been pathological cases in either direction.

What would some realistic numbers for something like Reddit be?

Maybe a $0.50 CPM * 4 ad units/page * (1 - 0.25 adblock rate) * 14.3M/1000 = $21,450/day or ~7.83M/yr

At 33k/mo in servers costs it doesn't even remotely cover the servers.

Reddit's actual CPM is 8$. One of the admins said that Conde wants to keep it a premium brand and they don't actually set the CPM. I'll try to dig up a link. It's also why there's tons of cross reddit advertising, and adverts that are paid are only seen for a tiny bit of time.

I'm really interested in hearing more about this. Were you able to find a link?

Pretty depressing seeing comments like "Also there is no way a website has a $8CPM" and wondering why people who don't know what they're talking about keep on talking. I sell at double that rate for smaller sites on a daily basis ;/

Coincidentally, I'm running a site and I'm having lots of trouble finding people to pay more than peanuts. I just got my first deal for $2 CPM for a pretty damn targeted site.

http://www.thathigh.com - got any tips for me? how do you usually pursue direct advertisers?

I believe a Reddit blog post quoted AdBlock ratio at something closer to 30%. 50% is extremely conservative.

That's before the hypothetical backlash, though.

Might want to consider how many of their users use adblock of some sort.

Re-read the post - I did, and at a very conservative number. :)

A comment or two down, jedberg explains they use reserved instances from Amazon, so their hosting fees are more like $22,390.37 a month.

Actually they're even lower, since the 3 year instances are so much cheaper than the 1 year over time.

  (1820 * 57) + (910 * 23) = $124,670/yr in reservation fees
Changes to

  (2800 * 57) + (1400 * 23) = $191,800/3yr in reservation fees
Which, when spread over the 3 years, brings the cost down by $5,061.39/mo, making the final cost there actually $17,328.98.

Of course this assumes they're doing 3 year reservations, but I don't see any reason not to on their XL instances (I could be wrong though, not sure how open Conde Nast is to logic like this).

We use one year reserved instances because our budgets are done annually.

Why not just get some dedicated servers? :/ I still don't understand why anyone with any amount of load or traffic would use the most expensive hosting available.

We had dedicated servers. It saved us 30% a month to move to EC2.

I can't understand how that would be the case. Did you run virtualization on the dedi servers so you had multiple 'machines' on them to play with? (If you need that sort of thing for your architecture)

What was the limiting factor on dedicated? CPU? ram? bandwidth?

Maybe you just had a really bad dedicated server deal? Were they reasonably priced?

Even VPSs are cheaper than Amazon.

I started out on VPSs and saved a lot moving to dedicated servers once it made sense to. For the dedi servers I use I don't have to pay for RAM monthly, and I get bandwidth at a good price.

The kind of money you're paying on servers is just obscene. It wouldn't matter if Reddit had the revenue of course...

There is no point proposing them to use dedicated servers as already many did. They are 100% sure dedicated servers cost more, so let them keep their ec2's and let them fail!

I wonder why google, yahoo and facebook don't run their site on ec2... if it's cheaper.

I think failure is looking more and more likely the more I read of their blog and details of how they run things.

It's a very interesting case to analyze though - perhaps acquired too soon, never had enough pressure on monetization until it was too late... Questions over how well the site was architected. Sounds like they're using a lot of 'new' unproven 'hip' things. Casandra? :/

Seems like the founders and YC have been extremely quiet about the problems... It'd be interesting to hear their take on things.

Also can't imagine how Conde Naste could be happy with things.

Google et al. do essentially run their servers on EC2. However it's their own version of EC2, in their datacentres. They have a whole pile of virtualised servers they can turn on an off by the minutes. They have internal accounting systems so that each team is 'charged' based on what they use. Google are so big, so it's cheaper for them to make their own datacentre than use someone else's (i.e. Amazon's).

EC2 only came about because Amazon run their servers on it, and they had so much spare capacity, so they sell it.

> EC2 only came about because Amazon run their servers on it, and they had so much spare capacity, so they sell it.



>Google et al. do essentially run their servers on EC2.

If by EC2 you mean "bunches of servers", sure.

On GoGrid you can buy cloud servers, or you can rent dedicated servers (and you can intermix the two). The latter are quite a bit less expensive for a given quanta of resources, while the former obviously offer greater dynamic flexibility (with a significant premium).

Actually considering the terrible I/O rate of services like EC2, dedicated often offers a dramatic advantage.

>They have a whole pile of virtualised servers they can turn on an off by the minutes.

But they don't. So they use none of the upside, and have all of the downside. Yay!

> But they don't. So they use none of the upside, and have all of the downside. Yay!

That's not true at all. We shut down machines when we are over capacity (rarely) and we often have to bring up a bunch of new machines where there is a traffic spike.

> I wonder why google, yahoo and facebook don't run their site on ec2... if it's cheaper.

Well, for one, they are A LOT bigger than us. But you'd be surprised who DOES run on EC2. The biggest one I'm allowed to tell you about is Netflix. Their entire streaming service is run off EC2. I guess they're idiots too, huh?

They have a business model and revenue. It doesn't matter quite so much what they use. $1000 hosting costs vs $10k hosting costs for them is 'meh'.

For startups, and Reddit, the difference does(should) matter.

Well that's hardly a fair point. It's going to depend on the scale of your traffic, the type of site, the networking/sysadmin manpower available, tolerance for failure, etc...

EC2 certainly isn't always cheaper, but it also certainly can be cheaper.

I don't think Reddit is doing it now (since they said their boot process is not automated) but there's a lot to be said for spinning servers up and down according to demand. I'd be curious what Reddit's utilization graph looks like. There may be times when, give smart load monitoring algorithms, they could run at 2/3 the number of servers, or even fewer.

Even without that, keep in mind dedicated servers have investments in hardware management. That's a huge cost. Plus, when you're a fast-paced company, the ability to move quickly is invaluable, which EC2 definitely does allow, but dedicated does not.

Even without that, managed dedicated servers are still often more expensive. Rackspace costs $420/mo[1] for their cheapest dedicated, which is roughly equivalent to 2 small EC2 instances (~$140/mo). The Planet has a similar(ish) machine for $184/mo[2].

[1] - http://www.rackspace.com/managed_hosting/configurations.php [2] - http://www.theplanet.com/dedicated-hosting.aspx

Rackspace's dedicated hosting is not a great comparison, they're definitely at the high end of pricing.

That said, don't forget that both the Rackspace and The Planet boxes come with 2 terabytes of transfer which would be $300 from Amazon. When you factor that in, suddenly Rackspace becomes competitive and The Planet vastly cheaper.

Softlayer will sell you a quadcore box with 8GB of ram and 4 terabytes of transfer a month for $200.

I meant that we used to have our own datacenter.

Yes, the dedicated servers might be less. But when one of them breaks, I have to wait for the provider to fix it. On EC2, I can replace it in 5 minutes.

> I started out on VPSs and saved a lot moving to dedicated servers once it made sense to. For the dedi servers I use I don't have to pay for RAM monthly, and I get bandwidth at a good price.

EC2 doesn't charge for RAM either and the bandwidth is at a great price.

> The kind of money you're paying on servers is just obscene.

It's really not that much more than other hosting providers, and they offer features that the other ones don't. The two biggest being the speed with which I can add new machines and the speed I can add more disk.

>> "and the bandwidth is at a great price."


  * Layered Tech: $169 (2 TB/month)
  * The Planet: $149 (1.5 TB/month)
  * Superb Hosting: $119 (2 TB/month)
  * Hostway: $99 (2 TB/month)
  * Server Beach: $75 (1.2 TB/month)
  * Cari.net: $60 (1.3 TB/month)

  * Amazon EC2: $244.40
So I disagree the bandwidth is at a great price. I think it's ridiculously expensive. Maybe you have money to burn though.

If a dedicated server dies, just spin up some VPSes while you order a new dedicated server or get it fixed :/ It's not a great problem for the rare hardware failure.

The dedicated servers I use get me 5TB transfer for around $100/month. That would cost me around $1,000 on Amazon.

I'm too lazy to do the research, but what happens when you scale that up to the 23TB a month that we are doing (which doesn't include the 49TB of cross datacenter traffic that is super cheap)? Amazon cuts us a price break at the higher tiers. Do the other guys?

Come on. The other guys are like 10% the cost of Amazon. Data center internal traffic is usually free. Are you talking about datacenter to other datacenter traffic?

In any event. You're wasting money. Reddit could easily be hosted for $3k/mo or so.

You probably had a bad dedicated server deal.

I think you could get 75 of Dell R410s with 8 cores each and 16G of memory for under $300k. A top of the line colo will run you about $2k a month for a rack and a half with power and room to run all 75 boxes. Where you run into trouble is if the company you buy from charges you for the box cost every year. Getting charged as if you are re-buying every year would work out to something like $27k a month. Those boxes will easily last 3 years and if you span it out over 3 years you are looking at $10k a month. That is half what you are paying for EC2.

I meant that we used to have our own datacenter.

Yes, the dedicated servers might be less. But when one of them breaks, I have to wait for the provider to fix it. On EC2, I can replace it in 5 minutes.

It probably depends on the datacenter environment and the quality of machine you are buying. I'm not talking about logging on to godaddy and asking for 75 random dedicated servers. I'm talking about going to Dell and buying 75 machines and putting them in a rack. You will know exactly what you are getting. We use a datacenter that is also a Dell service center so they have parts on hand for any failures and they can service any Dell machine that comes with the base warranty.

With 75 good quality machines I bet you wouldn't see more than three hardware failure a year if that. All 3 of those would probably be drive failures. I imagine the redundancy of the software would handle that without a problem.

I agree that the continence of EC2 is very nice. We use EC2 a lot but just in a lot more "elastic" way. We have found it is cheaper to colo boxes if the demand is constant.

I understand the budgeting logic but you're telling me you can't stagger the 3-year buys so that they work within your budget? It's a 15% savings long-term, seems definitely worth some number futzing.

I don't want to commit for that long to a single type of instance.

Conde Nast isn't involved in Reddit's operations.


> Actually they're even lower

Yeah I like that you declare knowing the costs of Reddit's servers.

1) Conde Nast owns them. To say they don't have some involvement in all of their decisions (even if not directly) is a bit naive.

2) You're right, I mis-spoke and should have said "they may be" instead of "they're". That being said, I would hope any frequent internet dweller wouldn't argue semantics and would give me the benefit of the doubt with respect to tone and inflection.

I think all the math is based on the known costs of EC2 services at various rate plans. All the people posting dollar figures here and on reddit are drawing from these know costs.

> All the people posting dollar figures here and on reddit are drawing from these know costs.

The others are quite clearly marked as estimates, this one not really, and the second estimate from the top in the reddit thread (yielding an estimate of $22,390.37/mo) got the following comment by one jedberg:

> Yes, once again, you are totally accurate. That is almost exactly what it costs to run reddit, as of today. However, with our projected growth, we're looking to be closer to 350K by the end of the year.

note that the comment got that estimate based off 1-year reserved servers, which jedberg's comment suggest is broadly correct rather than 3-year reservations.

Wow...this is hard to believe. Numbers like that make me wonder if the cost of going with ruby or python (in reddit's case) or insert-your-slower-dynamic-language-here outweigh the benefits in the long run. How can anyone bootstrap a company and scale to the size of Reddit without quickly running into a "must fund or sale" situation?

If it's truly a python issue, it should not be taken lightly by entrepreneurs. I hope it's not...I'm an avid user of both ruby and python, and want to believe they can both be used to create successful and maintainable (in terms of effort and cost) sites.

14.3 million pageviews per day comes to 163 per second, on average. Reddit's traffic, like most sites of its kind, probably peaks during lunch hour during the workday and probably peaks at 5x that number. So we're looking at 815 pageviews per second. Given 80 servers total, we're looking at 10 pageviews per second per server.

Let's compare this with Facebook. In October 2009, Facebook announced they had 30k servers[0]. In September 2009, there was a rough estimate that Facebook served 200 billion pageviews per month[1]. That implies 73k pageviews per second, or 2 pageviews per second per server. Clearly the pageviews are a rough estimate, but even if facebook served 1 trillion pageviews per month they still wouldn't be beating reddit for efficiency.

I have a feeling if you run the numbers for any other highly dynamic site at scale, you'll find that amortized over every server in use, you won't get a lot better than 10 pageviews per second.

[0] http://www.datacenterknowledge.com/archives/2009/10/13/faceb... [1] http://www.businessinsider.com/googles-estimates-for-faceboo...

Comparing with facebook isn't a good idea IMHO. Facebook also have way too many servers. But they have a business model - they can afford it.

10 pageviews per second is pretty lame IMHO. Crazy crazy server costs. Amazon is crazy expensive! Why use them? Either Reddit need to drastically change the way they do things, or they deserve to die.

That's terribly misleading, though. Facebook has huge clusters for performing offline business analytics. Therefore comparing them to Reddit, which presumably has a much smaller offline analytics backend, is not an apples to apples comparison in terms of efficiency.

It's an educated guess, but I suspect comparing the entire infrastructure of a site like reddit with the entire infrastructure of a site like Facebook is pretty fair. Neither of them are using their entire server capacity to serve traffic. reddit's at a smaller scale than Facebook, obviously, but it's a pretty apt comparison.

You think the functionality of Facebook compares equally to that of Reddit computational-wise? I mean, I know neither is the most complicated thing ever, but does Reddit even allow photos anywhere besides thumbnails? Does Reddit find things only specific to your account and show you a list of friends and their content you are allowed to see (or even something vaguely similarly complicated)? I also highly doubt that Facebook counts all of the ajax-goodness it delivers as pageviews. I could really go on for a long time (widgets, Facebook connect, Facebook Apps API...) about the huge differences that you glazed over since it's using PHP for the frontend.

No, I think looking at Facebook's numbers, figuring roughly 2 pageviews/server, and then looking at reddits numbers (at roughly 10 pageviews/server) and saying "reddit's setup seems reasonable" makes it a good comparison. It's all just hand waving, but I don't mind using that comparison to put reddit's numbers in perspective.

All of the math is just wacky in the post though - he's comparing peak pageviews/sec/server to Facebook's pageviews/sec/server for a month. Reddit's pageviews/sec/server is 2 using 163 pageviews/sec and 80 servers.

Facebook also doesn't count all webserver HITS as pageviews, whereas most of Reddit's do count.

Beyond that, pageviews for Facebook may eat more resources (CPU/RAM) than Reddit on average due to photo uploading and other misc. things on Facebook. This means a server is working harder at 2pageviews/sec/server for Facebook.

> but does Reddit even allow photos anywhere besides thumbnails?

How is that even relevant? Images cost in bandwidth, not in computing power.

> Does Reddit find things only specific to your account

Uh yes, on every single submission and comments. The admins clearly stated that what used the most resources was the voting system. As well as every single user (whether you marked them as friends) and the list of links itself, which is extracted from the user's own list of reddits.

There is barely anything on a logged in user's page which isn't at least in part specific to that user's account.

> I could really go on for a long time

No, I don't think so, you've been 0/everything so far, it's time to stop.

> How is that even relevant? Images cost in bandwidth, not in computing power.

Sounds like you need to do some STFU yourself, having clearly never come anywhere near this problem domain. Using standard tools to process and resize user-uploaded images can easily soak up all the CPU time and memory you could throw at it.

I recently spent a bunch of time rewriting a client's image processing pipeline fromt the usual O(n)-space ImageMagick crap that loads the full uncompressed image into memory a few times over to be O(1)-space doing streaming downsampling by exploiting the compression implementations of JPEG and PNG. It was more than worth it — even with the PNG implementation being a bit more CPU intensive now, it's a lot nicer having it operate in constant space without fear of swap and the OOM-killer.


Facebook's complexity is much bigger.

Your personalized news feed and what you can see depends on the settings of all your friends and those friends.

Reddit is just a simple forum site...

That's ridiculous.

Facebook have revenue, and profit. They can afford to buy more servers and have them sitting idle.

Reddit, should be trying to run on an optimal number of servers. Firstly, cheap hosting (Which is NOT amazon), and secondly less servers. 80 is just crazy.

If they truly have to handle a massive amount of traffic with only four people, a service like Amazon is a pretty good deal. It makes it trivial to create, clone, archive, or upgrade servers, and you never have to worry about drive/CPU/motherboard/RAM failures bringing down your site at 3 AM.

As someone who's worked in a data centre before, I can say hardware problems only increase as you scale up. We had around 350 servers with an average of three drives each, and we probably went to the data centre to swap dead hard drives on average three times a week - SAS, SATA, or SCSI. Nevermind the time I spent testing motherboards, swapping CPUs out, etc.

Owning your own servers means dealing with maintenance and downtime. Virtualizing 10 servers on one physical server is great until that one physical server's RAM starts acting flaky, and then you have to take those 10 servers offline, or shift the VMs onto your other hardware.

Renting your servers typically means paying a monthly cost for the hardware, including RAM and disk, long after the fees have paid the DC back their costs.

For Reddit, being able to bring N extra nodes for X purpose (mysql, cassandra, web serving, etc.) with a few clicks in a few minutes is likely a huge draw for them. It means they can grow gradually (instead of having to shell out another $10-20k for a new server), and it means they can dynamically adjust to meet their needs. For example, if Monday is a busier day than Saturday by a significant stretch (and if their architecture allows it), they can bring more nodes online early Monday morning to handle the load. Take them offline until Thursday evening to handle the 'It's Friday, to hell with doing work' rush I'm sure they get, and so on.

Bad architecture or poor design crosses languages. It's entirely possible and done every single day, to emit a broken system using even the most optimal code.

That said; this looks, sounds and smells like database scaling issue more than anything. It is interesting to ponder the long-term costs of running a site like this on EC2 versus your own maintained server farm.

At least a few months ago, here's Reddit's architecture: http://blog.reddit.com/2010/03/and-fun-weekend-was-had-by-al...

I am not familiar with the cost of running a high profile website, is the price tag for Reddit considered a lot, or little?

A very lot.

Actually, I bet you it's a "we hit the database on every pageload" problem.

I know the reddit guys, and I can tell you it's definitely not that. Hitting the DB is extremely rare - almost everything comes straight out of cache (either memcache, or memcachedb for caches which are more expensive to rebuild). Execution cost is dominated by the python app servers.

The overwhelming majority of reddit's pageviews are users that are not logged in, which is very very cacheable.

About 20% of the users are logged in, and they represent about 50% of the page views.

I am pretty sure this is false. While the overwhelming number of unique visitors to reddit are not logged in, those users are far less engaged than average. Logged in users make up a huge percentage of the pageviews because of their higher engagement level.

The database is rarely accessed in request. Almost all DB access comes from batch processes.

Yep. That's why you shouldn't have pylons noobs running a site like reddit.

Here's an anecdote for you: our corporate intranet site gets a peak spike of a few thousand users every day, around lunch. (People browsing the web over lunch and the computers in-house all have the site as their fixed homepage.)

That web server is running separate from databases and other resources. It never sees load. We're using a static language, but even if ruby was 10x slower, it wouldn't matter.

On the other hand, change the contents of a stored procedure so that it doesn't line up with indexing properly, and page load goes from >0.1s to <4s.

How did you get from "It costs this much" to "Dynamic languages are bad"?

Cause a server with these languages has high overhead in memory and CPU to serve requests thus having lower throughput per server and thus needing more servers to sustain certain site-wide SLA and thus causing more money overall.

Do you have evidence that reddit (or any site using a dynamic language) would reduce overall operating costs by switching, or is this an assumption on your part? What about development costs? What about time to deploy new features? What about the cost to make platform revisions? Do you have evidence that a non-dynamic language would really be so much faster that it would require less hardware? Perhaps the problem is not the language, but the chosen architecture?

What I'm really trying to say here is... prove it.

I don't have evidence Reddit would reduce operating costs since they haven't switched yet, and I don't have access to their code/architecture/etc to decide one way or the other. However, based on my personal experience, static type languages are much faster than dynamic languages and use much less resources. They used substantial less servers to maintain the same SLA throughput. You don't have to take this advice and keep paying for the high cost of hardware. There's a belief that developer cost is much higher than hardware cost and thus it's justified. However, when scaling out, hardware cost is much higher than developer cost. Developer cost is a fixed sunken cost at initial development. Afterward it's just maintenance and can be scaled down, but the hardware operation cost is ongoing, increasing years after years.

> and I don't have access to their code/architecture/etc

Actually, you do! :) http://code.reddit.com

But to help you out, I'll tell you that a good chunk of the expensive loops are written in C.

Proof that static type languages are much faster than dynamic languages. Reddit has resorted to use C for speedup.

Proof that in this single instance, Reddit developers decided this was a suitable optimization for their codebase.

This also calls into question the original assumption, that posters believed Reddit had far too many wasteful servers to handle their service, and that it was because they use a dynamic language.

Was there ever a question about that? I thought that was pretty much a fact. Of course static typed languages are faster.

Difficult to prove, but an observation with precedent. 37 Signals is famous for justifying Ruby by saying that hardware is cheap, developer time is not, so use RoR and just throw servers at your app till it meets your performance needs. Hardware is cheap, but maybe not cheap enough to run a site like Reddit with intentionally scant ad revenue.

Hardware is cheap. Amazon hosting is not.

Do you have evidence that dynamic language have less overall development cost?

Because when all you have is a hammer, every problem looks like a nail...

I ran a web app with identical pageview numbers using Ruby on Rails and EC2. My server costs were 10x cheaper than what reddit is paying.

Scaling is caching and architecture, not writing your app in Java.

It's a bold boast, but honestly, it doesn't mean much if the problem domain isn't similar. Reddit in particular runs into issues because so little of what they do is widely cachable. The site is heavily customized per user, and experiences an awful lot of cache churn. I don't disagree with your sentiment (design matters more than choice of language), but I could serve 16 million static pageviews/day off of a $20/month Linode. That doesn't mean that reddit could do the same.

Reddit has 8 million unique visitors, but only 300k subscribers to /r/pics, the top auto-subscribed subreddit. Generously assuming half of people manually unsubscribe, that's at most 7% of their user base that are logged in. I know full well about cache churn with logged in users, but that's a lot of lurkers.

I'm also not sure what makes a Facebook game's domain more "widely cachable". 100% of users are logged in. The vast majority of actions taken are changing state. Any app touchpoint is writing to the database. Page caching is nearly impossible.

It's a lot less different than you think.

I can't claim to know the specifics of your application, but consider that a single upvote invalidates however many thousand copies of a given page. An action in a Facebook game might invalidate a page for 10 or 20 or 50 users. An upvote, (or a comment, or an edit, or an upvote on a comment) invalidates that page for upwards of 300k+ users. Combine that with the rate at which reddit's pages change, and you're talking an obscene amount of CPU time spent building fragments/pages to stick into your cache. Scope of visibility of a change means a lot.

You obviously know how to scale an app if you were pulling 16 million pageviews a day, and I don't intend to discount that at all. I just mean to point out that while fundamentally the same problem, reddit has to deal with a version of that particular problem that most applications don't begin to approach.

>a single upvote invalidates however many thousand copies of a given page

I never understood this. Does reddit really need to spend the capital making sure I see a stranger's upvote the moment it occurs? A 60 second delay to refresh pages in batches seems perfectly reasonable. Perhaps with a client side script to mark my own upvotes so the system doesn't look like is losing my selections.

It would be nice if you elaborate a little.

I'm not sure what you want to hear, so I'll elaborate a lot.

Warbook was a Facebook application written in Ruby on Rails I ran by myself in late 2007 - 2008. It grew to over 16 million pageviews a day. At the time it was more pageviews than Twitter.

I scaled it using the following stack: Perlbal for load balancing, LightHTTPD for static assets, Mongrel for dynamic requests, Memcached for caching, and MySQL for relational data storage.

I used two medium instances for load balancing, one medium instance for asset hosting, 15 small instances for mongrel, one XL instance for memecached, and one XL instance for MySQL.

I used memcached as a "write-through" cache. Everything in cache was considered fresh. Every write of a cachable object would write to both MySQL and memcached. Every read of a cachable object would start with memcached first and failover to MySQL. This reduced reads on the database by 95%.

Total hosting costs were ~$2,000 a month.

If you don't mind me asking what happened to it? Facebook now reports about 5,500 monthly users. I'm guessing you were acquired by SGN but I'm don't see why that would cause traffic to drop so much.

SGN switched their core focus from Facebook games to iPhone games in mid 2008. At that time, they droped support for all of their Facebook properties except (fluff)Friends. Games on the Facebook platform are doomed to rapid traffic loss without constant adaptation and viral tuning. Even with it, user retention is hard.

Facebook giveth and Facebook taketh away.

16M/24/60/60 = 185 pages/sec. 185/15 = 12 pages/sec/server. That's not much different than the Reddit number.

could you set up something similar not on EC2?

I don't know what the "writing your app in Java" bit is about... Care to elaborate?

Mibbit is written in Java (Custom written framework and server), and handles traffic just fine. Java is insanely efficient for network IO.

It's an apples to frogs comparison, but if you measure 'page views' then Mibbit does bajillions, on a handful of servers.

> I ran a web app with identical pageview numbers

That's cute, but was your web app as hard to cache as Reddit?

Scaling a fully static website is trivial, and the more dynamism you include, the harder it becomes.

There is barely anything static on a logged in Reddit user's page.

Isn't every single subreddit cachable? Only the starting page isn't 100%, but I'm sure that many users have the same combination of subreddits (e.g. the default ones). Would be interesting to see if they have certain "user classes". Something like:


> Isn't every single subreddit cachable?

For logged in users, you're still going to need the voting status for the current user on every single submission as well as that submission's hidden status for the current user to decide whether or not a submission should be displayed in the listing.

I believe they almost never hit the DB directly, so these are probably recached immediately (or submitted to both the cache and the DB at the same time), but that still means quite a lot of traffic.

The cache is invalidated after every vote.

I run 2 Reddits on my calculator watch using C++ and CouchDB so clearly the technology is at fault here.

That's just the server instance cost. What about bandwidth and S3 cost? Bandwidth would cost a bundle.

The bandwidth is less than 5% of the total cost. It's about $2500 a month.

I haven't studied their arch, but I could imagine scenarios where spot instances would reduce the bill some, but yeah. Bandwidth. Ouch.

So.. how much cost HN? :)

And what/who pay for that?

I'm pretty sure YC pays for it. It's one big server, so it's probably like $1k/month or something. But HN isn't anywhere near 8M active users, and it's pretty slow much of the time and has brief downtime often (multiple times per day, I think) so they can reboot the server to speed it back up.

Very little compared to reddit. PG said recently that HN has 60K unique/day. Reddit has around 8Million.

In addition to that, the software behind HN doesn't do nearly as much as the reddit software: there is no automatic checking of new messages, there are no sub reddits you need to work on, there are only so many votes to be processed, etc.

I have no clue about who pays for what, but I guess PG pays.

I would assume though it would be even easier here than on reddit to get people to donate towards server cost should the need ever arise.

The users of reddit are downright hostile toward anyone trying to make money. There was a thread last week about paid accounts. Out of 2000 comments, I think I only found one or two people supporting the idea. Most of the comments were somewhere along the lines of "paid accounts creates two classes of users and everyone should be equal" and "it's not my job to figure out how reddit makes money" (advertisements are actively blocked by 30% of users).

When your userbase is filled with college kids and anti-capitalist, you are going to have a tough time making money,

There had been a donation drive the week prior to that, which was very well supported. I think people were saying that they thought the idea of 'paid accounts' just wasn't going to work as well as the donation drive had. The idea being that once people are paying for a actual service they expect more for their money.

After all, they're aiming for 2% of the userbase subscribing. That's a large number when they're offering basically nothing of value in return. I know other 'freemium' business get 1-2% subscribers, but surely that's when they're offering a real improvement in the service, such as Dropbox's 2GB -> 20GB.

Being anti-advertising is no the same as being anti-capitalist.

There is a good argument that advertising actively moves you away from the EMH which is one of the strongest arguments in favor of capitalism.

"There is a good argument that advertising actively moves you away from the EMH which is one of the strongest arguments in favor of capitalism."

I said anti-capitalist because many people were against advertisements and the gold accounts. While there may be many people lurking on reddit that are not anti-capitalist, it's a sentiment I see in almost every topic related to making money on that site.

Yep. I think it's a valuable business lesson: unless you know what your plan to make money is, don't announce that your attitude is "ads are bad". Although I think they'll figure out how to make money, this situation could've been avoided if they ran ads from early on and weren't loudly against them.

Also keep in mind that out of the 2000 comments, I imagine a large number of gold members didn't respond. I know I didn't.

The users of reddit are wide and varied. There is a significant number of redditors who have demonstrated their willingness to pay reddit, me among them, and don't regret it.

In my experience, the first rule to learn about reddit is "when not to comment/reply". So what you say rings true to me: People know better than to comment on threads like that because the vocal "i hate this" types may decide to harass you, and who needs that?

Many of us are also gainfully employed/work in the "real world" ;)

You shouldn't. It probably isn't the case.

Amusingly, I can't read it because I get an error: "the service you request is temporarily unavailable. please try again later.".

What is toadjaw? The homepage didn't seem to have any info.

It's my after hours/weekend project. It'll eventually be a bookmarking site. I'm about 75% of the way there before I open it up to the public. Some of the features are nearly completed, like the screenshots and article text viewer.

Along the way, I realized it would be trivial to make a mobile HN reader, so I spent some time doing that http://toadjaw.com/hn

Hey I use your website everyday on my Iphone. Love it, and thanks for the new 'theming' and new buttons, that's better than last week !

Keep up your website. Maybe you should put an ad on it, I would click it once a day to help :)

Thank you, ronnier. Very kind of you.

I wonder what the best money-saving optimisation would be?

Whilst it sounds like a lot of money remember that the whole sites annual EC2 bill could easily be paid for several years by one of the Newhouse brothers (who own Advance Publications, who own Conde Nast) selling one of their Reubens or probably just giving up their bank interest for a month (est. about ¼-million USD per day for the poor one : 1day of 2% annual interest on $4.5 billion).

Of course the Newhouse brothers could do that, but they could do it precisely for the reason they'd be unlikely to: if it's a bad investment, they shouldn't keep throwing money at it. Not saying that it is a bad investment, just that ignoring the problem because it's owned by a rich person who could easily bail it out ignores the way rich people think.

>Not saying that it is a bad investment, just that ignoring the problem because it's owned by a rich person who could easily bail it out ignores the way rich people think.

I don't want to argue that at all. I'd argue that bailing them out as if they're a charity is wrong given that supporting it would be pocket change for the owners - less than they'd pay for a painting that they probably don't even bother to look at (I know that these paintings are not to look at, humour me).

There is a great talk by an (ex-) developer from Reddit about the technical background, scalability problems, caching, storing redundant data, queued offline computing etc. This might shed light on some questions in the comments here.


People are usually the highest cost for most companies. Reddit for sure needs admins, developers, sales, marketing, etc.

Presumably, they could halve the costs buy racking a bunch of SuperMicro boxes.

Renting an empty rack is about 200 EUR/month, and you can fill it with back to back mounted Atoms with SSDs for less than 40 kUSD. 160 cores and 320 GByte RAM with some 16 TByte/s peak throughput for some ~2 kW is not that bad for the price.

Would moving reddit to a cloud will make it cheaper?

It's already there.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact