Hacker News new | past | comments | ask | show | jobs | submit login

14.3 million pageviews per day comes to 163 per second, on average. Reddit's traffic, like most sites of its kind, probably peaks during lunch hour during the workday and probably peaks at 5x that number. So we're looking at 815 pageviews per second. Given 80 servers total, we're looking at 10 pageviews per second per server.

Let's compare this with Facebook. In October 2009, Facebook announced they had 30k servers[0]. In September 2009, there was a rough estimate that Facebook served 200 billion pageviews per month[1]. That implies 73k pageviews per second, or 2 pageviews per second per server. Clearly the pageviews are a rough estimate, but even if facebook served 1 trillion pageviews per month they still wouldn't be beating reddit for efficiency.

I have a feeling if you run the numbers for any other highly dynamic site at scale, you'll find that amortized over every server in use, you won't get a lot better than 10 pageviews per second.

[0] http://www.datacenterknowledge.com/archives/2009/10/13/faceb... [1] http://www.businessinsider.com/googles-estimates-for-faceboo...

Comparing with facebook isn't a good idea IMHO. Facebook also have way too many servers. But they have a business model - they can afford it.

10 pageviews per second is pretty lame IMHO. Crazy crazy server costs. Amazon is crazy expensive! Why use them? Either Reddit need to drastically change the way they do things, or they deserve to die.

That's terribly misleading, though. Facebook has huge clusters for performing offline business analytics. Therefore comparing them to Reddit, which presumably has a much smaller offline analytics backend, is not an apples to apples comparison in terms of efficiency.

It's an educated guess, but I suspect comparing the entire infrastructure of a site like reddit with the entire infrastructure of a site like Facebook is pretty fair. Neither of them are using their entire server capacity to serve traffic. reddit's at a smaller scale than Facebook, obviously, but it's a pretty apt comparison.

You think the functionality of Facebook compares equally to that of Reddit computational-wise? I mean, I know neither is the most complicated thing ever, but does Reddit even allow photos anywhere besides thumbnails? Does Reddit find things only specific to your account and show you a list of friends and their content you are allowed to see (or even something vaguely similarly complicated)? I also highly doubt that Facebook counts all of the ajax-goodness it delivers as pageviews. I could really go on for a long time (widgets, Facebook connect, Facebook Apps API...) about the huge differences that you glazed over since it's using PHP for the frontend.

No, I think looking at Facebook's numbers, figuring roughly 2 pageviews/server, and then looking at reddits numbers (at roughly 10 pageviews/server) and saying "reddit's setup seems reasonable" makes it a good comparison. It's all just hand waving, but I don't mind using that comparison to put reddit's numbers in perspective.

All of the math is just wacky in the post though - he's comparing peak pageviews/sec/server to Facebook's pageviews/sec/server for a month. Reddit's pageviews/sec/server is 2 using 163 pageviews/sec and 80 servers.

Facebook also doesn't count all webserver HITS as pageviews, whereas most of Reddit's do count.

Beyond that, pageviews for Facebook may eat more resources (CPU/RAM) than Reddit on average due to photo uploading and other misc. things on Facebook. This means a server is working harder at 2pageviews/sec/server for Facebook.

> but does Reddit even allow photos anywhere besides thumbnails?

How is that even relevant? Images cost in bandwidth, not in computing power.

> Does Reddit find things only specific to your account

Uh yes, on every single submission and comments. The admins clearly stated that what used the most resources was the voting system. As well as every single user (whether you marked them as friends) and the list of links itself, which is extracted from the user's own list of reddits.

There is barely anything on a logged in user's page which isn't at least in part specific to that user's account.

> I could really go on for a long time

No, I don't think so, you've been 0/everything so far, it's time to stop.

> How is that even relevant? Images cost in bandwidth, not in computing power.

Sounds like you need to do some STFU yourself, having clearly never come anywhere near this problem domain. Using standard tools to process and resize user-uploaded images can easily soak up all the CPU time and memory you could throw at it.

I recently spent a bunch of time rewriting a client's image processing pipeline fromt the usual O(n)-space ImageMagick crap that loads the full uncompressed image into memory a few times over to be O(1)-space doing streaming downsampling by exploiting the compression implementations of JPEG and PNG. It was more than worth it — even with the PNG implementation being a bit more CPU intensive now, it's a lot nicer having it operate in constant space without fear of swap and the OOM-killer.


Facebook's complexity is much bigger.

Your personalized news feed and what you can see depends on the settings of all your friends and those friends.

Reddit is just a simple forum site...

That's ridiculous.

Facebook have revenue, and profit. They can afford to buy more servers and have them sitting idle.

Reddit, should be trying to run on an optimal number of servers. Firstly, cheap hosting (Which is NOT amazon), and secondly less servers. 80 is just crazy.

If they truly have to handle a massive amount of traffic with only four people, a service like Amazon is a pretty good deal. It makes it trivial to create, clone, archive, or upgrade servers, and you never have to worry about drive/CPU/motherboard/RAM failures bringing down your site at 3 AM.

As someone who's worked in a data centre before, I can say hardware problems only increase as you scale up. We had around 350 servers with an average of three drives each, and we probably went to the data centre to swap dead hard drives on average three times a week - SAS, SATA, or SCSI. Nevermind the time I spent testing motherboards, swapping CPUs out, etc.

Owning your own servers means dealing with maintenance and downtime. Virtualizing 10 servers on one physical server is great until that one physical server's RAM starts acting flaky, and then you have to take those 10 servers offline, or shift the VMs onto your other hardware.

Renting your servers typically means paying a monthly cost for the hardware, including RAM and disk, long after the fees have paid the DC back their costs.

For Reddit, being able to bring N extra nodes for X purpose (mysql, cassandra, web serving, etc.) with a few clicks in a few minutes is likely a huge draw for them. It means they can grow gradually (instead of having to shell out another $10-20k for a new server), and it means they can dynamically adjust to meet their needs. For example, if Monday is a busier day than Saturday by a significant stretch (and if their architecture allows it), they can bring more nodes online early Monday morning to handle the load. Take them offline until Thursday evening to handle the 'It's Friday, to hell with doing work' rush I'm sure they get, and so on.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact