If there is one thing I learned from Disqus it is the power of keeping a lightweight stack. Disqus keep it simple, and prove that all the myths that "Django/SQL/whatever doesn't scale" are obscene.
Even for an app with requests per second in the 5 digit range - they do pretty damn good with the basic Django stack with no more than some small tweaks.
No NoSQL. And they use transactions for write queries.
They use apache (not even nginx!).
Only 25% of their servers are pure (no snapshot) caching servers (not 50% or 75%).
They prefer vertical partitioning over sharding (but they still use sharding).
I understand they are looking at redis for some of their features, but really, their main stack is traditional and proven. Very enlightening.
So, as you can see, for the normal case the recommended setup is fine, but for extreme cases you should use the combination with caution.
YMMV, of course, but if you're pushing your setup to the limit and don't have any options for extra servers, I would heartily recommend uwsgi+nginx or fcgi+lighttpd instead.
The talk is useful - as an overview of what they use - but I know nothing of how to implement a single step.
Which perhaps sounds rude, but it's not meant to be.
This stuff isn't taught per se, you learn it bit by bit as you solve each problem that you face.
I learned about HAProxy when my site load exceeded that which a single web server could manage.
I learned about heartbeat when I had to update my HAProxy and it knocked the site offline.
I learned about master/slave replication of databases when a site I worked on had considerably more reads than writes and scaling vertically (buying a bigger box) cost more than scaling horizontally (adding cheap read slaves).
I learned of sharding when I worked on a graph stored in an Oracle database and performing calculations on the whole graph exceeded that one physical box.
I learned of one-hop replication to solve the problem of a sharded graph.
I learned of partitioning to solve the problem of having one big database and the computability not being maxed but the storage being maxed.
I learned of memcached when I wanted to reduce page generation times and realised going to the database was more expensive than keeping it in cheap RAM elsewhere on the network: http://www.buro9.com/blog/2010/11/18/numbers-every-developer...
I learned of reverse proxy caches when I wanted to make sure requests for things already served never reached the web layer again.
I learned about Varnish when I considered that most reverse proxies use disk storage for their cache.
We can go on and on here, but the message is that you learn these things one at a time solving real problems that you come up against. There is always the next hurdle to jump through, and when you get there you too will learn how to get past it.
I'd emphasise that you cannot attempt to do this prematurely, that premature optimisation quote really applies well to architecture too. Keep things as simple as they can be and just know that when you get to a hurdle that someone else has already solved it and you've just got to find out where it's written down (if anywhere), what they used, how they approached it, the upsides, downsides, what they'd do differently, etc.
You could try sites like http://highscalability.com/ but I would urge you not to implement things without knowing why you're implementing them. Don't cargo cult ( http://en.wikipedia.org/wiki/Cargo_cult ) this stuff, it's really key to do only what you need to do, when you need to do it.
Cause i can't find it anywhere...
Just with Bentley's Programming Pearls book, he underlines again and again that knowing your problem is more important than knowing the best algorithms, the one that will work for you depends on your problem and only you know that.
Highscalability.com and shared slidedecks act as a community generated set of architectural patterns, but no-one should implement them without knowing what their problems will be.
What is it about scaling an application that can't be reduced to an algorithm - at some level of abstraction at least, so newbies can at least get an idea of how they should start thinking about it.
To put the question in another way - could a framework like django ever come to provide scaling tools out of the box? Or is it just something that fundamentally can't be reduced. Might it be that there just haven't been enough people who have faced scaling problems that repeatable patterns haven't yet become obvious?
Is the problem their algorithm? Does it spend a long amount of CPU time working away? Could it be written a different way?
Is it data access, is the lag caused by queries taking to long?
Is that down to badly formatted queries, inefficient schema, server problems or something else?
Is it even a single problem or a combination of a multitude of minor little niggles that combine into a big headache?
Do you do thousands of little queries, or smaller numbers of big ones? Are your tables narrow or wide? What size and types of data do you store in the fields?
A number of the things that Disqus have done to scale out aren't appropriate for other environments, by nature of the fundamentals of the app. All they can advise on is how to scale your python/Django/MySQL based commenting system, but even then your approach to writing one might be different to theirs.
Quite simply, no one can tell you how to scale your application and its infrastructure, because every application and infrastructure is unique by the very nature of every problem being unique, and every solution more so.
That's not to say there is no value in the information that Disqus has provided. Quite the contrary, there is every bit of value there, and I greatly appreciate them posting it. There is a good chance that whilst some of what they've done won't be of use to you, some of it may be. It may even be of use for other reasons that are entirely different from those that benefit Disqus.
Quick example. You want to load balance web traffic, what do you chose for it? Is software of hardware best? Do you do lots of SSL (would an SSL Accelerator be of use?) Do you want the servers to directly respond to the client, or respond through the load balancers?
You want to add in caching:
Apache Traffic Server
and so on! Each software package has its own particular strengths and weaknesses, and its as more a matter of gut instinct and intimate knowledge of the way your code and site works, than anything else that can help you find the right way to scale.
Building Scalable Web Sites
The Art of Capacity Planning
The Art of Scalability http://www.amazon.com/Art-Scalability-Architecture-Organizat...
Scalable Internet Architectures http://www.amazon.com/Scalable-Internet-Architectures-Theo-S...
Enterprise Cloud Computing http://www.amazon.com/Enterprise-Cloud-Computing-Architectur...
And if you're in doubt, John is now VP of Ops at Etsy and came from Flickr before that:
His blog is interesting too:
So without having read the books, I would shoot for the latter 2 if I wanted to have hard copies around to introduce me to this kind of stuff.
Some of it is taught. I learned about HAproxy second-hand, before any problem had to be solved.
I do agree that much of this has to be learned initially through experience, but I also believe much of it can be taught. Teaching it in a classroom wouldn't work, though, as Ops only exists when there's something to operate.
It's my perpetual dilemma: few startups have enough  scale to warrant having a senior enough sysadmin to be a mentor, let alone another one or two more junior ones, but large companies are awkward environments for experimentation.
I try and dump some bits of wisdom/experience to my blog. Shameless plug: http://blog.maxkalashnikov.com
 Even Disqus is borderline in terms of traffic. Moore's law and even its more linear corollary for I/O can take one a long way.
You can also learn from observation when working with people who are better than you. This is one of the best reasons to have some industry experience prior to raring off into startup land. I am a much, much, much better engineer in 2011 than I was in 2007 partly because I am older and wiser but mostly because I sat next to the second best engineer I've ever met and took notes when he told me that everything I knew was wrong.
Then there is the last option: your lack of expertise with X bites you in the keister, you fix X, you now have expertise with X and hopefully write up your experience somewhere to decrease the net amount of keister-biting in the world.
That is to say the host-count in isolation is not the most interesting figure. I've seen large sites run on 20 machines or on 500, depending on the skills of the management- and developer-team, and how much they care about the infrastructure cost in the big picture.
The host-count becomes more interesting when you relate it to the request rate. 17k/sec is absolutely a worthwhile workload, even when (as likely in the disqus case) reads dominate writes by far.
That said the relation of 100 hosts / 17k rps seems about reasonable.
However (not meaning to narrow their achievement) the engineer in me can't help but wonder if perhaps even a little more could be squeezed out on the caching front? I was a bit surprised to not see varnish on the slides; fragment caching on the perimeter can achieve mind-boggling results.
In the general caching front though, I want to note though that Disqus is particularly hard to cache -- there's a very long tail leading to relatively a high miss:hit ratio per pound of caching.