

Disqus: Scaling the World’s Largest Django Application - ahmicro
http://ontwik.com/python/disqus-scaling-the-world%e2%80%99s-largest-django-application/

======
yuvadam
Disqus is a classic case study when it comes to scalability.

If there is one thing I learned from Disqus it is the power of keeping a
lightweight stack. Disqus keep it simple, and prove that all the myths that
"Django/SQL/whatever doesn't scale" are obscene.

Even for an app with requests per second in the 5 digit range - they do pretty
damn good with the basic Django stack with no more than some small tweaks.

~~~
St-Clock
That was exactly my thought.

No NoSQL. And they use transactions for write queries.

They use apache (not even nginx!).

Only 25% of their servers are pure (no snapshot) caching servers (not 50% or
75%).

They prefer vertical partitioning over sharding (but they still use sharding).

I understand they are looking at redis for some of their features, but really,
their main stack is traditional and proven. Very enlightening.

~~~
ghc
I actually find it astonishing that they're using Apache for this. I have had
a lot of problems with Apache behaving weirdly in the past, especially with
mod_wsgi.

~~~
yuvadam
You are aware the Apache + mod_wsgi is the recommended way for deploying
Django apps, right?

~~~
ghc
Yes, I'm aware of that. I've been using mod_wsgi since modpython was the
recommended way for deploying with Django. The problem is that when you're
running a high number of Django instances (via a large number of daemons) you
can get all sorts of problems with Apache itself. Some of these issues have
been due to mod_wsgi, and Graham has, in my past experiences, been very
responsive about fixes. Other issues are simply due to Apache, and when you're
running multiple instances in a memory-restricted environment you're left with
a tough configuration job. I've even had a case where thread contention in a
low-memory setup caused a complete Apache lock-up.

So, as you can see, for the normal case the recommended setup is fine, but for
extreme cases you should use the combination with caution.

YMMV, of course, but if you're pushing your setup to the limit and don't have
any options for extra servers, I would heartily recommend uwsgi+nginx or
fcgi+lighttpd instead.

------
grovulent
Not like this is a problem I have to worry about. But where on earth does one
learn this stuff?

The talk is useful - as an overview of what they use - but I know nothing of
how to implement a single step.

~~~
buro9
It's called experience.

Which perhaps sounds rude, but it's not meant to be.

This stuff isn't taught per se, you learn it bit by bit as you solve each
problem that you face.

I learned about HAProxy when my site load exceeded that which a single web
server could manage.

I learned about heartbeat when I had to update my HAProxy and it knocked the
site offline.

I learned about master/slave replication of databases when a site I worked on
had considerably more reads than writes and scaling vertically (buying a
bigger box) cost more than scaling horizontally (adding cheap read slaves).

I learned of sharding when I worked on a graph stored in an Oracle database
and performing calculations on the whole graph exceeded that one physical box.

I learned of one-hop replication to solve the problem of a sharded graph.

I learned of partitioning to solve the problem of having one big database and
the computability not being maxed but the storage being maxed.

I learned of memcached when I wanted to reduce page generation times and
realised going to the database was more expensive than keeping it in cheap RAM
elsewhere on the network: [http://www.buro9.com/blog/2010/11/18/numbers-every-
developer...](http://www.buro9.com/blog/2010/11/18/numbers-every-developer-
should-know/)

I learned of reverse proxy caches when I wanted to make sure requests for
things already served never reached the web layer again.

I learned about Varnish when I considered that most reverse proxies use disk
storage for their cache.

We can go on and on here, but the message is that you learn these things one
at a time solving real problems that you come up against. There is always the
next hurdle to jump through, and when you get there you too will learn how to
get past it.

I'd emphasise that you cannot attempt to do this prematurely, that premature
optimisation quote really applies well to architecture too. Keep things as
simple as they can be and just know that when you get to a hurdle that someone
else has already solved it and you've just got to find out where it's written
down (if anywhere), what they used, how they approached it, the upsides,
downsides, what they'd do differently, etc.

You could try sites like <http://highscalability.com/> but I would urge you
not to implement things without knowing why you're implementing them. Don't
cargo cult ( <http://en.wikipedia.org/wiki/Cargo_cult> ) this stuff, it's
really key to do only what you need to do, when you need to do it.

~~~
grovulent
seriously dude... write that book...

Cause i can't find it anywhere...

~~~
sciurus
Can anyone speak to how close these books come to this? Or recommend other
books?

Building Scalable Web Sites <http://oreilly.com/catalog/9780596102357>

The Art of Capacity Planning <http://oreilly.com/catalog/9780596518585>

Web Operations <http://oreilly.com/catalog/0636920000136>

~~~
buro9
Both of John Allspaw's books (the latter two on your list) look good from
their table of contents.

And if you're in doubt, John is now VP of Ops at Etsy and came from Flickr
before that: <http://www.kitchensoap.com/about-me/>

His blog is interesting too: <http://www.kitchensoap.com/>

So without having read the books, I would shoot for the latter 2 if I wanted
to have hard copies around to introduce me to this kind of stuff.

~~~
ruckusing
I've found "Scalable Internet Architetures" by Theo Schlossnagle to also be
quite valuable. It contains general advice on how to approach problem solving
when it comes to building uh, scalable architectures.

[http://www.amazon.com/Scalable-Internet-Architectures-
Theo-S...](http://www.amazon.com/Scalable-Internet-Architectures-Theo-
Schlossnagle/dp/067232699X)

------
andrewcamel
Thank you so much for submitting that. I'm creating a Django application that
has the potential to store and work with even more data than Disqus, so I've
always been worrying about how to scale this to such a huge scale. Thanks to
your submission, I'm no longer as crazed about it.

------
nas
As an aside, AFAIK douban.com is still using Python and Quixote[1]. Back in
2007 they were doing 2 million pageviews per day[2,3]. According to Alexa they
are busier yet now. They use the SCGI protocol as well.

1\. <http://quixote.ca/>

2\. <http://mail.mems-exchange.org/durusmail/quixote-users/5441/>

3\. <http://mail.mems-exchange.org/durusmail/quixote-users/5657/>

~~~
zeemonkee
Good to hear Quixote is still going. It was my favourite framework in the pre-
WSGI Django/Pylons days.

------
adrianwaj
How does Disqus make money? Has IntenseDebate made money since being acquired?
Does revealing this information make Disqus a more likely acquisition?

~~~
ro_gupta
Disqus has premium add-ons: <http://disqus.com/addons>

------
dansingerman
I know disqus does loads of traffic, but sheesh ~100 servers - thats
exceptionally non-trivial.

~~~
moe
On the upside, there's long phases of "just add more of the same" in scaling
these things. The fun comes in waves, every time when you hit one of the
various physical barriers (namely latency and bandwidth).

That is to say the host-count in isolation is not the most interesting figure.
I've seen large sites run on 20 machines or on 500, depending on the skills of
the management- and developer-team, and how much they care about the
infrastructure cost in the big picture.

The host-count becomes more interesting when you relate it to the request
rate. 17k/sec is absolutely a worthwhile workload, even when (as likely in the
disqus case) reads dominate writes by far.

That said the relation of 100 hosts / 17k rps seems about reasonable.

However (not meaning to narrow their achievement) the engineer in me can't
help but wonder if perhaps even a little more could be squeezed out on the
caching front? I was a bit surprised to not see varnish on the slides;
fragment caching on the perimeter can achieve mind-boggling results.

~~~
thedz
Varnish wasn't in production at that point. We're testing/using Varnish now
for some things. It definitely is helping.

In the general caching front though, I want to note though that Disqus is
particularly hard to cache -- there's a very long tail leading to relatively a
high miss:hit ratio per pound of caching.

