(in advance I'm not one of the sys admins, although I guess I'm a php developer, but I suck too much to work on the sites at these traffic levels, I just do the whole community stuff)
Also to clarify, the forum + wiki combined when ignoring our downtime are pushing more traffic, not just the wiki.
Edit: Also to hijack my own comment, if any of you guys with your fancy startups want to advertise to a community of indie gamers feel free to email email@example.com :-)
Comparisons of apples to oranges... It's not about traffic, it's about what sort of website you have, how dynamic/static it is, etc
A wiki likely has relatively few writes compared to reads, so caching should work very well.
That said, always nice to see people optimizing properly and using a sane number of servers.
False dichotomies huzzah!
you've got articles that have rating, you've comments to those articles that have rating, articles and comments are sorted according to rating, people earn karma from posting good comments / articles, everything has tags.
So personally I don't see a false dichotomy; even if Digg is more dynamic / complex ... WTF are they doing with those 500 servers?
StackOverflow on the other hand is much simpler - questions and responses, plus users and voting. AFAIK there's no collaborative filtering going on at SO, be it user or item based.
I don't know why Digg needs so many boxen, but I did find Spolsky's comparison disingenuous.
And for the level of content, interactivity, media, etc etc, I can say that 500 is silly.
What sort of computing resources do you think it took for Google to develop the autopilot car? Would you have been able to determine that by looking at their homepage and the services they provide? No. That's the point. I'm not suggesting that Digg is doing anything so interesting, but most of those 500 servers are almost certainly NOT being used to support their website directly, they're being used by the business for other things.
It looks as if the best approach is to roll your own, phpbb seems to be designed with the smaller user in mind, so while routing every single file through file.php for easy processing might work well for Johnny and his friends forum, once you hit a large scale it becomes rather a pain.
So yeah, no idea, we'll find out soon though, I'll report back when I know :-)
Many startups would do better to add server capacity in the short term, rather than spend lots of time optimizing to cut costs, when this is typically hidden from the user.
For example, a 4GB linode VPS is $160/month, so you can have 34 of those ($5440/month) for the cost of one developer (Salary of $67k based on: http://www.simplyhired.com/a/salary/search/q-php+developer/l...). Also, many startups struggle to recruit good developers, so would it make sense for them to spend all their time optimising code to perform on cheap hardware? rather than improving the product in a visible way to the user?
In general a company with a more efficient solution will have an easier time with just about every aspect of development and deployment, which pays huge dividends. However, if you find that engineering such efficiency is too difficult due to fundamental design choices or legacy systems then sometimes it's not worth killing yourself to fix.
The current servers we operate were paid for with donations from our users because our ad revenue has yet to arrive heh.
Not to mention, there was a story here just this week about how communication between EC2 nodes may be a lot of reddit's performance trouble. Scaling horizontally is inevitable at a certain scale, but it's no panacea.
I don't fully understand the love for large VPSes (that aren't even all that large) compared to dedicated hardware that have a better chance of having higher memory bandwidth, more RAM, and faster disk access; though I do understand that many are very happy with Linode as a business.
Not 100TB bandwidth though.
Also on a semi related note, I (like you) suggested 100tb.com but we tried them out (just to test speeds) and they're pretty poor...
First, because you might be papering over more fundamental issues performance issues that will still hurt you in the long term, and will indeed be harder to spot once caching is in place.
Second, because cache invalidation is quite often non-trivial, and doing it right may require a somewhat thick layer of code, and a certain discipline moving forward. This will slow down development if you are in rapid pivot mode.
However if your performance fundamentals are already resolved, and the business model is in place, and you expect the code to be around for a while, then putting the effort into caching will be amortized over many years and pay many dividends not only in server costs, but also in user experience due to fast page loads, and also in correctness because you will have time to get the caching right rather than scrambling to add it at scale and potentially serving stale data to millions of people instead of thousands.
I do 1M pages/day in average since more than 1 year on one unique server that is a bi-opteron 250 at 2,4Ghz with a load average of 0.3...
We just serve mostly static content, and most php content is cached.
I just think that this comparison to SO and /. is flawed.
Your title made our servers and setup the focus, that wasn't the point at all.
Heck, if we're showing off, here is how we do it...pretty graphs and all.
The tuning which I talk about is to actually increase the default initial congestion window size. The result being an advantage for non-keepalive connections and keepalives. There is no sysctl parameter that will allow for this control. This behaviour is hardcoded into the tcp stack, and hence requires direct modification and a recompiled kernel.
Even just running their pngs through a lossless compression tool like Smush.it would probably be worth it: http://news.ycombinator.com/item?id=1796101
PageSpeed and YSlow yielded many other fruit-bearing ideas.
Update: Added link to Autosmush HN item.
(Hopefully this is okay, I'm not a regular HN user, mainly lurk, it isn't mentioned in the guidelines but it might be one of those secret rules that are learned as you go along... be gentle!)
Anyway I'd suggest you guys chat with CPMStar - http://www.cpmstar.com, they're the king of gaming-ad-stuff.
Drop me an email (check my profile) and I'll pass you the addresses I know there.
I'll send an email now, it'd be great to talk to some people over at cpmstar!
Just an idea anyway :)
 I know you're very active on reddit, so I'm sure you've already thought of this.
The trick is not to have any of those dynamic page things.
I'm surprised this got upvoted so much, I could easily serve 2 million static pages a day off one server, if I needed to and the pages were static.
The assumption that we are using the same hardware and have similar workloads and so on, is clearly wrong.
We could spend months and months tweaking everything so we need 2/3/4 less servers .... but ... servers are cheap, developer time is expensive.
Also, we happen to have backup servers, we are not running at 100% utilization an we also happen to run chat off the same servers.
I think it is awesome that minecraft are serving lots of traffic, I love nginx, we use haproxy. But the headline is misleading.
This submission is poorly titled, our intention was never to claim we're better than SO (we're very different... just like SO is very different to Digg) it was just a good comparison to make, as in "Joel said they're serving 60m page views a month and SO is huge, well we're doing the same, now you can see how big we are!" not "We serve the same as SO, therefore they suck!".
No hard feeling here, I think you are building an awesome business
That software looks interesting although as I'm not a sys admin, all that matters to me is how pretty it is and that doesn't have enough rounded corners ;)
Wiki + Forum = 60m page views a month, so combined we serve the same amount of traffic as you, as per this tweet: http://twitter.com/#!/spolsky/status/27244766467
We weren't challenging you or anything, I just noticed that tweet (it was mentioned here) and I thought "hey we're doing the same, we can use them as an example of how big we are!". I'm just a dumb kid who has never had anything he created this successful before and being able to say "We're as big as stackoverflow" is crazy.
Yes, you've said that several times. Intention was two-part. One, to show how previous post by spolsky was false dichotomy, and second to point out to your success - which almost everyone here understood as such.
However, your original post on reddit (well your sysadmin) was titled: "Minecraftwiki.net and minecraftforum.net now serve more traffic than Slashdot and Stackoverflow" - so you can't blame that part on me, just the server count and php part.
Poor wording and a mistake on my part, sorry! :-)
^- insert word here
Minecraft is like garrymods, the game is what you the player want to make it, this will lead to a lot of future success along side this current success.
Also if you want to see the sales figures, I've been tracking them for the past few months: http://m00d.net/minecraft/sales/ :)
*If you're interested, here's a (not very accurate) list of where Minecraft has been featured: http://www.minecraftforum.net/viewtopic.php?f=3&t=2162 which includes Australian TV, Physical magazines, huge tech blogs, gaming blogs, forums... everywhere! I don't think I'll ever see anything happen like this again in my life (and I'm only young) -- Minecraft is incredibly unique.
Interesting -- I know nothing about it, but it sounds like it might appeal to people who want to learn video game programming, at least perhaps the ones that don't want to go into hardcore engine programming.