Hacker News new | past | comments | ask | show | jobs | submit login
Minecraftwiki serving more traffic than Stackoverflow with 4 servers (and PHP) (reddit.com)
153 points by Keyframe on Oct 15, 2010 | hide | past | web | favorite | 76 comments

I'm part of the Minecraft forum/wiki team (I created them both originally) so if you guys have any questions, feel free to shoot them at me, or you can join #rsw on irc.esper.net.

(in advance I'm not one of the sys admins, although I guess I'm a php developer, but I suck too much to work on the sites at these traffic levels, I just do the whole community stuff)

Also to clarify, the forum + wiki combined when ignoring our downtime are pushing more traffic, not just the wiki.

Edit: Also to hijack my own comment, if any of you guys with your fancy startups want to advertise to a community of indie gamers feel free to email sam@redstonewire.com :-)

Is this the shape of things to come?

Comparisons of apples to oranges... It's not about traffic, it's about what sort of website you have, how dynamic/static it is, etc

A wiki likely has relatively few writes compared to reads, so caching should work very well.

That said, always nice to see people optimizing properly and using a sane number of servers.

That's kind of the point. Yesterday Spolsky asked why Digg had 500 servers and less traffic than StackOverflow, yet StackOverflow had 5 servers.

False dichotomies huzzah!

Yes, title is kind of a homage to that false dichotomy posted yesterday by Spolsky.

Yes but StackOverflow is very similar to Digg:

you've got articles that have rating, you've comments to those articles that have rating, articles and comments are sorted according to rating, people earn karma from posting good comments / articles, everything has tags.

So personally I don't see a false dichotomy; even if Digg is more dynamic / complex ... WTF are they doing with those 500 servers?

Digg is more like Facebook and Twitter than StackOverflow. Each of Digg's logged in users gets their own "News Feed" based on the users they follow/friended/are most similar to.

StackOverflow on the other hand is much simpler - questions and responses, plus users and voting. AFAIK there's no collaborative filtering going on at SO, be it user or item based.

I don't know why Digg needs so many boxen, but I did find Spolsky's comparison disingenuous.

Well TBH 500 is ridiculous overkill. Even 100 would be pushing it.

Whether it's overkill or not depends entirely on what the servers are being used for. Without actually working there it's hard for one to say for sure.

We know what the service does. We have that much data - we can see what users are allowed to do on the website, and what they cannot do.

And for the level of content, interactivity, media, etc etc, I can say that 500 is silly.

You don't know what each server is doing. They might have 10 servers being used by an internal marketing analytics team, with 40 support servers for development, QA, testing, and disaster recovery of those specific services.

What sort of computing resources do you think it took for Google to develop the autopilot car? Would you have been able to determine that by looking at their homepage and the services they provide? No. That's the point. I'm not suggesting that Digg is doing anything so interesting, but most of those 500 servers are almost certainly NOT being used to support their website directly, they're being used by the business for other things.

The title was supposed to be a serious representation of our traffic numbers (our amount of servers wasn't supposed to be part of the comparison, the poster here is a slippery snake) but we can pretend it was if you like, nobody likes joel so it fits in well.

I liked this submission simply because it demonstrates the ridiculousness of such claims.

Spolsky compared two rather more similar sites though. This one is a wiki where most of the content is static, served out of cache. Still, I agree that such comparisons are on a very weak footing. A single feature could easily kill any valid comparison.

Thankfully it wasn't actually less traffic, it was only 4x (maybe 5x) the traffic, with 10x the servers.

Also, it's MediaWiki so all the optimisation work was done by someone other people, to make Wikipedia work.

Mediawiki is fantastic software, we'd be able to operate on a lot less hardware if it was just mediawiki, but because of the forum we had to boost everything up. For anyone who ever considers using phpbb for anything serious: please don't.

Suggestions for good alternatives with good scaling?

We're looking at a couple of alternatives and I can report back with our findings when we know enough, right now we're looking at vbulletin which powers some of the larger forums, it is apparently very good if you're willing to strip out the poor parts (apparently search is terrible).

It looks as if the best approach is to roll your own, phpbb seems to be designed with the smaller user in mind, so while routing every single file through file.php for easy processing might work well for Johnny and his friends forum, once you hit a large scale it becomes rather a pain.

So yeah, no idea, we'll find out soon though, I'll report back when I know :-)

Any chance you could post your configuration files somewhere ? (you know PHP, varnish, etc.) It'd be great to grok those.

Where can I stay posted with your findings?

I think there is an important lesson in this even if it's apples to oranges. Often in my career I've been in a debate with a non engineer (product person, ceo, etc) about why a certain features sounds good to them but a variation of it which provides most of what they want is so much better because I can keep the page mostly static. Sometimes I've won that debate and sometimes I lost. Being able to hold up examples, see such and such wiki serves x million pages with 3 servers because it's mostly static vs ...

I think these types of stories are misleading for startups.

Many startups would do better to add server capacity in the short term, rather than spend lots of time optimizing to cut costs, when this is typically hidden from the user.

For example, a 4GB linode VPS is $160/month, so you can have 34 of those ($5440/month) for the cost of one developer (Salary of $67k based on: http://www.simplyhired.com/a/salary/search/q-php+developer/l...). Also, many startups struggle to recruit good developers, so would it make sense for them to spend all their time optimising code to perform on cheap hardware? rather than improving the product in a visible way to the user?

Possibly true, but your accounting ignores recurring vs. one-time costs. If you can pay some external person $10k to tune your setup and save $4000/month, and aren't already swimming in money, obviously that's a smart thing to do.

It also ignores all the very real other side effects of inefficient design (the most common cause of poor performance). For example, bad user experience (if you need to spread out your traffic onto many servers that usually implies a sizeable latency on each page view), engineering drag due to technical debt, and engineering drag due to cumbersome infrastructure and deployment overhead. All of these things matter.

In general a company with a more efficient solution will have an easier time with just about every aspect of development and deployment, which pays huge dividends. However, if you find that engineering such efficiency is too difficult due to fundamental design choices or legacy systems then sometimes it's not worth killing yourself to fix.

Definitely, nobody should take us as an example for their startup/business. We're literally a small forum that inherited huge success and had to rapidly deal with scaling up, we're not a business and money is not our goal, so if someone were to base their business off of what we've done it might not turn out too well.

The current servers we operate were paid for with donations from our users because our ad revenue has yet to arrive heh.

Distributing your app across 34 servers will require a non-trivial development effort itself.

Not to mention, there was a story here just this week about how communication between EC2 nodes may be a lot of reddit's performance trouble. Scaling horizontally is inevitable at a certain scale, but it's no panacea.

For $200 per month you can get a quad core X3220 with 8 GB RAM and 2x 500GB disk with a large amount of bandwidth included: http://www.100tb.com/ .

I don't fully understand the love for large VPSes (that aren't even all that large) compared to dedicated hardware that have a better chance of having higher memory bandwidth, more RAM, and faster disk access; though I do understand that many are very happy with Linode as a business.

You can get a six-core 980x with 24GB of RAM and 4x1.5TB of HDD for $200 per month at http://www.hetzner.de/en/hosting/produkte_rootserver/eq10/

Not 100TB bandwidth though.

The ability to grow with a vps is much easier than with dedicated hardware.

Also on a semi related note, I (like you) suggested 100tb.com but we tried them out (just to test speeds) and they're pretty poor...

May I ask what part of the speeds were poor for you? I am curious.

I'll talk to the guy who actually tested them when he wakes up, but from what I understand network speeds were terrible. I'll get back to you when I know :-)

You really think for $200/month they're going to let you saturate the equivalent of a 300 megabit connection 24/7? You'll get shut off if you approach anything close to that, I'd bet, just like all the "unlimited bandwidth" hosts. That, or the transfer speeds your server gets will be nothing near the 300 megabits that'd be required to use 100 terabytes in a month.

more to the point, for $5K a month you get 2x32GiB ram 8 core 4 disk servers, capacity for 16 of your 4GiB VPSs. so with two months up front cost, then another $200/month you can get the same capacity as those 32 linodes. Even if you have to pay $100/hr for your hardware guy (which is above market) you are saving a ridiculous amount of money.

And then add in the bandwidth and the savings explode several times over...

It depends on the type of optimization. For page caching I definitely agree, not because it's a waste of effort, but two other distinct reasons:

First, because you might be papering over more fundamental issues performance issues that will still hurt you in the long term, and will indeed be harder to spot once caching is in place.

Second, because cache invalidation is quite often non-trivial, and doing it right may require a somewhat thick layer of code, and a certain discipline moving forward. This will slow down development if you are in rapid pivot mode.

However if your performance fundamentals are already resolved, and the business model is in place, and you expect the code to be around for a while, then putting the effort into caching will be amortized over many years and pay many dividends not only in server costs, but also in user experience due to fast page loads, and also in correctness because you will have time to get the caching right rather than scrambling to add it at scale and potentially serving stale data to millions of people instead of thousands.

A decent php dev can figure out how to hook up a memcached server and install apc. Most of the work is already done, I think it would be worth the couple hours to learn about caching, IMO. A little caching can go very far.

Sorry, I don't get the fuss about this.

I do 1M pages/day in average since more than 1 year on one unique server that is a bi-opteron 250 at 2,4Ghz with a load average of 0.3...

We just serve mostly static content, and most php content is cached. I just think that this comparison to SO and /. is flawed.

Which wasn't our point, the OP here misrepresented us. We wanted to explain how popular we were and we realised we had as much traffic as stackoverflow, so we used that as a traffic comparison, our point was not to compare servers. As you can see if you read through the linked topic, at no point did we compare ourselves to SO beyond traffic wise, the OP here is at fault :)

Why was your posted title on reddit then: "Minecraftwiki.net and minecraftforum.net now serve more traffic than Slashdot and Stackoverflow!" ? I just added a server count and php part which were both in the thread mentioned to signal previous false dichotomy. You guys shouldn't have posted a title like that then, you have compared yourself to SO and Slashdot, not me.

Yes, we compared our traffic levels, not our server count. The title of this post implies that we're saying "we have as much traffic as SO and they use more servers!" which wasn't the point. The point was Stackoverflow is a tangible comparison of traffic, not hardware. Anyway I replied to you above explaining, I misunderstood your intentions.

Your title made our servers and setup the focus, that wasn't the point at all.

I agree. Over here we do 70m (high write ratio) pages per month on 1 server handling all apache/php/mysql. Hardware is really fast these days if you tune it to any degree.

Heck, if we're showing off, here is how we do it...pretty graphs and all. http://www.pinkbike.com/news/pinkbike-speed-efficiency.html

That was a very interesting article, thanks. One question, if I may: When using a reverse proxy, it makes no sense to have keepalives on for Apache, correct? The proxy takes care of the keepalive and leaves Apache free for other requests?

Correct. The reverse proxy pulls from the fast, local network apache, and then passed the data to the slow clients. Apace is connected for a shorter time. Basically you're trying minimize the time a "memory expensive" process like apache is open per client.

Yep, makes perfect sense, thank you. I've disabled keepalive and increased my mancrush on varnish.

Could you expand on the tuning of tcp slow start? Are you referring to net.ipv4.tcp_slow_start_after_idle?

No, I'm not referring to this parameter in the write up. ipv4.tcp_slow_start_after_idle, which is on by default on most distros, applies to keepalive connections. This causes your keepalive connection to return to slow start after TCP_TIMEOUT_INIT which is 3 seconds. Not probably want you want or expect. For example, if you have keepalives of say 10s, you'd expect that a request after say 5s would have it's congestion window fully open from previous requests, but the default behaviour is to go back to slow start, and close your congestion window back down. So you want to tune this to off on your image servers and other keepalive systems.

The tuning which I talk about is to actually increase the default initial congestion window size. The result being an advantage for non-keepalive connections and keepalives. There is no sysctl parameter that will allow for this control. This behaviour is hardcoded into the tcp stack, and hence requires direct modification and a recompiled kernel.

This is a really good intro on tuning a high bandwidth site; thx!

If they enabled gzip compression on their CSS/Javascript files could cut down their page weight by several hundred kb.

Even just running their pngs through a lossless compression tool like Smush.it would probably be worth it: http://news.ycombinator.com/item?id=1796101

PageSpeed and YSlow yielded many other fruit-bearing ideas.

Update: Added link to Autosmush HN item.

Also a request if any of you are experience with it, we're interested in new ad strategies (that retain our "minimalist" approach, allowing expansion without upsetting users) so if anyone either works for an advert company or has experience at our volume (or similar) we'd love to hear from you. sam@redstonewire.com :-)

(Hopefully this is okay, I'm not a regular HN user, mainly lurk, it isn't mentioned in the guidelines but it might be one of those secret rules that are learned as you go along... be gentle!)

Isn't minecraft making like, a house a day in revenue? Is it really worth tossing ads in forums?

Anyway I'd suggest you guys chat with CPMStar - http://www.cpmstar.com, they're the king of gaming-ad-stuff.

Drop me an email (check my profile) and I'll pass you the addresses I know there.

We're not connected with the people behind Minecraft, we're totally separate entities. The forums and wiki are community ran and while we've had brief discussions with the Minecraft company (Mojang) nothing has come of it. The general consensus is that we're best operating as separate entities, it means Mojang can focus on game development and we can focus on growing the community, it means that we can remain impartial (although whether or not that is an issue, I have no idea). We've never had a penny of Minecraft proceeds :-)

I'll send an email now, it'd be great to talk to some people over at cpmstar!

Oh sorry mate my mistake heh.

I ran a wiki once where I changed the search box to a google search box. Made some good money even with very little traffic. Of course you're taking people away from your site but my website was pretty rubbish so wasn't too worried about that personally.

Just an idea anyway :)

Have you considered affiliate links through Amazon to popular games akin to reddit[1]/ qghy2?

[1] I know you're very active on reddit, so I'm sure you've already thought of this.

I'm running a website with 15% more quantcast visits than stackoverflow and ~200MiB/s traffic from 3 servers, one of which is nearly at 100% idle.

The trick is not to have any of those dynamic page things.

It has been quite a while since we served one million page views a day.... even on public holiday weekends we serve more.

I'm surprised this got upvoted so much, I could easily serve 2 million static pages a day off one server, if I needed to and the pages were static.

The assumption that we are using the same hardware and have similar workloads and so on, is clearly wrong.

We could spend months and months tweaking everything so we need 2/3/4 less servers .... but ... servers are cheap, developer time is expensive.

Also, we happen to have backup servers, we are not running at 100% utilization an we also happen to run chat off the same servers.

I think it is awesome that minecraft are serving lots of traffic, I love nginx, we use haproxy. But the headline is misleading.

Just to clarify, the OP here titled it in a manner that misrepresents what we were saying. Also, we're far from serving static pages. Granted the wiki (which is 50% of our traffic) is pretty static and we could easily run that from a single server, the reason we have such high number of servers is because of the forum, which is the other 30m page views and it's phpbb, it's... well, let's not go there.

This submission is poorly titled, our intention was never to claim we're better than SO (we're very different... just like SO is very different to Digg) it was just a good comparison to make, as in "Joel said they're serving 60m page views a month and SO is huge, well we're doing the same, now you can see how big we are!" not "We serve the same as SO, therefore they suck!".

You really should be using community tracker, my other baby :) http://community.mediabrowser.tv/ I'm so happy I moved off phpbb it was causing nothing but grief.

No hard feeling here, I think you are building an awesome business

We're in the process of moving, it's just a lot of work at our size, we have to make sure everything works :) We actually started out with fluxbb (my choice) but users got tetchy and as we moved from being "just a forum" to being a "community" we had to go forward with new features, but this was back before we had adverts and the $250+ for "proper" forums wasn't something we wanted to do. Here's an idea of how much we've grown: http://i.imgur.com/eenut.jpg

That software looks interesting although as I'm not a sys admin, all that matters to me is how pretty it is and that doesn't have enough rounded corners ;)

Also, important to note that while 2 million pages a day sounds like a lot, it's around 23 pages/s if distributed evenly. If traffic peaks around double or triple that, it's still very feasible for a single server running a dynamic app to serve that.

I don't get how Minecraftwiki is serving "more traffic than StackOverflow." I think we have at least twice their daily traffic. All our numbers are on Quantast--feel free to check.

As I've said elsewhere, the person who titled this is a silly person, they misrepresented what we said.

Wiki + Forum = 60m page views a month, so combined we serve the same amount of traffic as you, as per this tweet: http://twitter.com/#!/spolsky/status/27244766467

We weren't challenging you or anything, I just noticed that tweet (it was mentioned here) and I thought "hey we're doing the same, we can use them as an example of how big we are!". I'm just a dumb kid who has never had anything he created this successful before and being able to say "We're as big as stackoverflow" is crazy.

As I've said elsewhere, the person who titled this is a silly person, they misrepresented what we said.

Yes, you've said that several times. Intention was two-part. One, to show how previous post by spolsky was false dichotomy, and second to point out to your success - which almost everyone here understood as such.

However, your original post on reddit (well your sysadmin) was titled: "Minecraftwiki.net and minecraftforum.net now serve more traffic than Slashdot and Stackoverflow" - so you can't blame that part on me, just the server count and php part.

I'm not good at the whole English thing (even though it's my only language). I didn't mean to imply you were at fault, just that your title didn't represent what was actually happening. Also I didn't realise that you'd posted a comment here pointing out it was supposed to be a joke, I'm used to reddit where it points out that a comment is by the submitter.

Poor wording and a mistake on my part, sorry! :-)

iirc 4chan is doing radically more traffic on even less hardware. different sites have different performance... so what?

Does anyone have any actual numbers on 4chan's traffic or hardware setup?

I do, but am not at liberty to share.

I like how this thread is next to my post.

This is more a testament to HTTP caching and varnish than PHP, 4 servers or Mediawiki. If you can cache the entire page and serve it out of cache for most of your requests, you're in a very position.

  very position
      ^- insert word here

"good" :)

Is minecraft that big, or is the tech world that small?

No, Minecraft is seriously that big. If you trust Alexa much you'll find that we (forum/wiki) are in the top ~5k for both sites, Minecraft is top 3k last I checked. It's been insane recently... what really hammers it home is that this is a product people have purchased, so it's going to be around for a long while. While we probably won't maintain the current traffic once the game settles down into a normal routine, we sure won't be dying for many years, which is what I love about this.

Minecraft is like garrymods, the game is what you the player want to make it, this will lead to a lot of future success along side this current success.

Also if you want to see the sales figures, I've been tracking them for the past few months: http://m00d.net/minecraft/sales/ :)

*If you're interested, here's a (not very accurate) list of where Minecraft has been featured: http://www.minecraftforum.net/viewtopic.php?f=3&t=2162 which includes Australian TV, Physical magazines, huge tech blogs, gaming blogs, forums... everywhere! I don't think I'll ever see anything happen like this again in my life (and I'm only young) -- Minecraft is incredibly unique.

> the game is what you the player want to make it

Interesting -- I know nothing about it, but it sounds like it might appeal to people who want to learn video game programming, at least perhaps the ones that don't want to go into hardcore engine programming.

It's more like playing with LEGO than anything else.

I'm just wondering... Why do they use Varnish and HAProxy and nginx? This is quite redundant setup. It would be a lot more efficient to put nginx on lb01 and leave only PHP on fe* nodes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact