
Scaling Django to 8 Billion Page Views - mattrobenolt
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
======
gojomo
Front-side HTTP caches can be essential at giant scale. First they help with
naturally-static content, then you can design to maximize the amount that can
be HTTP-cached.

I would like to see this better recognized as a usual and desirable production
setup in the Django community, so as to eventually change the traditional
doctrine around static resources.

Currently, there's the assumption that every serious production deployment
moves static file-serving out of python/Django, to a dedicated side server.
So, the interaction of the 'staticfiles' component changes awkwardly around
the DEBUG setting, and the docs contain hand-wavey warnings about how using
Django to serve static resources is "grossly inefficient and probably
insecure".

Well, once you've committed to having a front-side HTTP cache, it's pretty
damn efficient - one request per resource for an arbitrarily long cache-
lifetime period. It requires fewer deployment steps and standalone processes
than the assumed ('collectstatic'-and-upload-elsewhere) model. And if the
staticfiles app is truly "insecure", that needs fixing: many people run their
dev/prototype code in an internet-accessible way, so any known security risks
here should get the same attention they get elsewhere. (Disabling the code
entirely when DEBUG is true is a dodge.)

I'd love a future version of Django to embrace the idea: "staticfiles is a
wonderful way to serve static resources, _if_ you run a front-side HTTP cache,
which most large projects will".

~~~
eli
I'm a _huge_ fan of Varnish as a caching proxy.

But for just serving static assets, I'd suggest Amazon CloudFront. It's easy
to set up, you only pay for what you use and you get all the benefits of a
global CDN.

~~~
mattrobenolt
CloudFront is slow. :) Check out Fastly.

~~~
eli
I demoed it a while ago, maybe I'll take another look. My real point was that
you don't need to install any unfamiliar software. I love Varnish, but if the
default VCL file doesn't work for you, there's a learning curve.

~~~
gojomo
CloudFront... Fastly... CloudFlare... I'd agree that many projects should just
adopt a cache-as-a-service, rather than add another layer/process that must be
learned and maintained to their own systems.

------
hardwaresofton
"Slowness is likely a result of the fact that your request is communicating
with other services across your network. In our case, these other services are
PostgreSQL, Redis, Cassandra, and Memcached, just to name a few. Slow database
queries and network latency generally outweigh the performance overhead of a
robust framework such as Django."

This seems to fly in the face of everything I've experienced with
frameworks... Is this true? The bottleneck for me is usually never the
database backend, unless you've written horrible queries... Maybe it's just
because I'm not doing queries complex enough on the scale that disqus is?

~~~
crucialfelix
yes, database and specifically disk i/o is usually our bottleneck.

I use newrelic and I can see the time taken of the database vs python. slow
page events are almost always caused by database bottlenecks.

edit: actually the fact that a django response needs to block for the database
means that overall concurrency is a serious bottleneck. again, due to
database. but it doesn't need to be a bottleneck if the requests didn't have
to block.

a more modern architecture like scala play! or node.js can serve other trivial
requests while the database is still working. with django I have had to have
two servers (8 CPU, 12 processes) so that nobody is blocked (eg. for just a
trivial serve from cache) just because a few too many requests are needing
more database objects for the page they are serving. so this is a django
architecture issue.

~~~
mattrobenolt
We run many more than 12 processes per server. That's how you can get around
that. :) But that's frankly more of a Python architecture problem than Django.
Django is compatible with coroutine libraries like gevent, just most of the
ecosystem is not compatible, so that makes things difficult at the size we're
at.

Realistically though, this isn't a problem for us and if things are done
right, shouldn't be a problem for you either. Just use more processes and
you'll be fine unless you're running out of RAM or something.

~~~
crucialfelix
how many CPUS and how many processes per machine ?

I guess I used to run 8 on a 4 CPU machine. I tried 9-12 processes but it
always tripped over itself.

> unless you're running out of RAM or something.

yep, on the new machines I run 6 just to keep the total memory down. and that
also sucks: having to use lots of memory just to get concurrency.

~~~
mattrobenolt
We run 40 processes on each with 16 cores and 24G of RAM.

------
jjoe
Deploying Varnish is like cheating but in a good way. Varnish isn't a set-and-
forget cache proxy. It requires a well thought out VCL and a good deal of
attention and maintenance. So the saying _not all VCLs are created equal_
applies in this case.

I challenged myself to build the most generic VCL in the sense that I want it
to work with the majority of "scripts" in a semi "shared" server deployment. I
also wanted to make it as easy to deploy as possible while exposing some of
its advanced features.

The end result is an array of software "plugin" products that one can download
and deploy on cPanel and DirectAdmin (two leading control panels).

Shameful plug (wait for it...):

[http://www.unixy.net/advanced-hosting/varnish-nginx-
cpanel/](http://www.unixy.net/advanced-hosting/varnish-nginx-cpanel/)

[http://www.unixy.net/varnish/](http://www.unixy.net/varnish/)

There's a free 14-day trial (no payment or CC required) for those who want to
give it a spin.

------
gojomo
If you want highly-tuned geo-distributed Varnish-as-a-service, check out
Fastly.com:

[http://fastly.com](http://fastly.com)

(Disqus is actually listed as a Fastly client.)

~~~
mattrobenolt
Fastly is used for some of our traffic yes, but not all. Our main app is not
behind Fastly FWIW.

------
bhauer
I applaud Disqus for scaling Django to this tier of sustained load. I applaud
them for sharing a clearly-written and approachable explanation of how that
was achieved. I also applaud them for their product in general. I think Disqus
is a quite excellent embeddable comment tool.

I do have some reservations with a few points made by this article. (Below I
am speaking generally, and not about Disqus in particular. I don't mean
anything below to imply they are doing it wrong. On the contrary, I think
they're doing it very right given their circumstances.)

Repeated is the conventional wisdom that the performance of your application
logic is negligible versus external systems such as your database server or
your back-end cache. For low-performance frameworks and platforms that is
indeed commonly the case, hence the conventional wisdom. However, there are
important caveats: first, do not confuse time spent in your database driver
and ORM as waiting for your database server. Your database server vendor will
find that hurtful and offensive. Most database servers will be able to
retrieve rows from well-indexed tables at far greater rates than low-
performance application platforms' ORMs can translate those rows into usable
objects. Modern database servers fetching rows from well-indexed tables can
keep up with the query demands of the very highest-performance frameworks
without saturating a database server's CPUs (with throughput measured in the
tens to hundreds of thousands of queries per second per server). Yes, at scale
your database server may need attention. But it's not necessarily the pain
point you might think it is. Bottom line: profile your application and watch
your database server's performance metrics. You may not be waiting on your
database despite conventional wisdom. The same is true for other third-party
systems such as a back-end cache.

Coupling the above with application logic and in-application composition of
content into client-digestable markup ("server side templates") will compound
the impact of a low-performance platform. While high-performance platforms can
execute application logic and compose a server-side template tens of thousands
of times per second on modest hardware, low-performance platforms may suffer a
ten-times or greater performance penalty by comparison.

It is not necessarily true that high-performance frameworks and platforms are
lower-productivity if you are starting with a green-field scenario where your
development team is free of incumbent preferences. That last bit is crucial,
of course. Most teams do have preferences, past experience that can be
leveraged, and "know-how" with legacy frameworks. Do not confuse this
institutional knowledge with an objective measure of developer efficiency.
Developers who are unfamiliar with both Django and a modern high-performance
framework may see roughly equal productivity. Measuring your Django-
experienced teams' productivity versus their productivity with (for the sake
of argument) a Go framework or a modern JVM framework is a biased assessment
because of the alternative's learning curve. If we continue to judge net
productivity as a combination of learning curve _and_ the resulting and
ongoing effort level past the learning curve, little with a learning curve
will be honestly evaluated.

Yes, reverse proxy caching such as that provided by Varnish is an excellent
idea when your application is a public-facing system without a great deal of
personalization. But not all systems are public-facing embeddable comments or
blogs or news sites (I don't mean this to be critical!). In many systems, a
majority of responses are tailored to the specific user and other entities
making them unavailable for caching (as the article mentions these requests
will typically use a cookie to identify the session and are therefore not
cached by Varnish). In these cases, if it weren't already clear from the
above, I recommend seriously considering a higher-performance platform and
framework that gives you the headroom to deliver responses under high load
without necessarily resorting to crutches like a reverse proxy. Yes, leverage
caching where-ever and when-ever possible. But when you cannot cache, respond
as quickly as possible.

Performance is actually an important concern. It's not _the_ concern, but
don't keep throwing it under the bus.

Further, performance is not only a scale and concurrency concern. It's also a
user-experience matter. In addition to reducing the system complexity for
high-load and high-concurrency, a high-performance platform means that even
without load and concurrency, you are able to respond to user requests more
_quickly_ (reduced latency). This leads to user happiness, and in some
circumstances better search engine positioning and similar fringe benefits.

Again, I want to be clear that I think Disqus is great and this article is a
valuable contribution, especially for those who are invested in a similar
technology stack with similar usage characteristics.

~~~
m0th87
I don't get this recent narrative that the productivity you'd get out of
languages like go is comparable to higher-level ones like python. As someone
who uses both go and python in production, this sounds like the blub paradox
in play here. Yes, you will get significantly better performance for CPU-bound
applications in go vs python. But you'll also get significantly better
productivity out of python, assuming equivalent knowledge.

~~~
mattrobenolt
I also write Go probably just as much as Python anymore, and the biggest set
backs I have in Go is the immaturity of the ecosystem. Almost any project I
work on, I have to shave 10 yaks to get things working right.

My most recent yak was fixing lib/pq so I could query against our production
dbs. :)

~~~
m0th87
This library?
[https://github.com/bmizerany/pq](https://github.com/bmizerany/pq)

If so, we're about to use it. Anything we should be forewarned about with
respect to its use in production settings?

~~~
mattrobenolt
Yeah, but that's an older unmaintained version. Official is at
[https://github.com/lib/pq](https://github.com/lib/pq), and I already shaved
my yak. :)
[https://github.com/lib/pq/pull/135](https://github.com/lib/pq/pull/135)

------
ddorian43
Scaling varnish to 30K/sec. Scaling Django to 10K-15K/sec.

I remember long time ago:"How i scaled drupal(large number of queries/page) to
3K/pages/sec". It was really Varnish that scaled.

~~~
mattrobenolt
Sure, but the idea is knowing how and when to use your tools properly and what
they're good for. It allows us to continue using Django.

------
bliti
15K requests with Django is a nice number to hit. What are you using as a
server? Gunicorn?

Not a bad setup.

    
    
        load balancers --> Vanquish:
            if !cache:
                --> Django
    

Still makes me wonder how faster this would be in Go or Java. I've never been
able to get Python to be very efficient over 5K requests.

~~~
mattrobenolt
uwsgi

~~~
bliti
Gevent is a bit funny sometimes. How are you dealing with it?

~~~
mattrobenolt
We don't run gevent for disqus.com. We do some some background workers and
whatnot, but our main app is single threaded with a large number of processes
on each machine.

~~~
bliti
Interesting. I'd love to know about the specs of your hardware. Maybe a future
blog post about that?

------
byroot
About:

> The common pattern for application level caching
    
    
      data = cache.get('stuff')
      if data is None:
          data = list(Stuff.objects.all())
          cache.set('stuff', data)
      return data
    

I'm wondering if you simplified the example or if you are just not preventing
cache regen race conditions ?

Some other frameworks just put back the expired data for a few seconds while
the new one is being regenerated to avoid having multiple workers building the
same thing.

e.g. rails:
[https://github.com/rails/rails/blob/3182295ce2fa01b02cb9af0b...](https://github.com/rails/rails/blob/3182295ce2fa01b02cb9af0b977a9cf83cc5d9aa/activesupport/lib/active_support/cache.rb#L539)

~~~
mattrobenolt
This is a very simplified example. We do something similar to avoid a
stampede.

------
iknight
so going to this link with ghostery enabled is a bad idea. the constant
attempts to load resources that are blocked will crash chrome

------
tomlu
Front-side HTTP caching is all well and good, but what do you guys do in the
case when the returned content is (at least partially) user-contextual?
Caching isn't really going to help you in these cases.

~~~
adamauckland
I'm guessing they cache as much as they can here too

[https://docs.djangoproject.com/en/1.6/topics/cache/#template...](https://docs.djangoproject.com/en/1.6/topics/cache/#template-
fragment-caching)

~~~
mattrobenolt
Template fragment caching is not a good thing. We don't do this anymore. If
you have to cache your template fragments, I generally think you're doing
something wrong. Gonna have a bad time.

------
Trezoid
One thing I'd also recommend for speeding up django is swapping in djinja.
Straight drop in replacement for almost the entire template engine, but much
faster.

~~~
rufugee
What do you lose by using djinja instead of the builtin template engine?

~~~
Trezoid
You primarily lose a few of the django filters, though they can all be
implemented into the jinja filters if you need to

~~~
antihero
Wouldn't this also mean you have to re-implement any filters that come from
third-party apps, too?

------
ksec
Ok, Varnish 4.0 were mentioned, by I dont find anything concrete of specific
with Google Search. When will that be coming?

And I would love if High Scalability do an Interview with Disqus. 8 Billion
PV, would love to see their Stack, Backend, and Machines that handles it.

~~~
amarsahinovic
Scaling Realtime at DISQUS: [https://speakerdeck.com/northisup/scaling-
realtime-at-disqus](https://speakerdeck.com/northisup/scaling-realtime-at-
disqus)

~~~
mattrobenolt
[http://blog.disqus.com/post/51155103801/trying-out-this-
go-t...](http://blog.disqus.com/post/51155103801/trying-out-this-go-thing)
This has replaced that. ;)

------
callesgg
I dont get why varnish should be any faster than any other raw static http
server. Ok a bit faster i can understand. But like 300 times faster than
without it that i don't get.

~~~
mattrobenolt
Varnish caches dynamic requests.

------
mburst
Awesome post Matt. I've been meaning to check out Varnish. I'm sad I missed
your talk at Djangocon but hopefully it'll make it's way to the web soon.

------
csense
What does Varnish do that nginx doesn't?

------
stefantalpalaru
Scaling Django by using it less... It's Varnish that deserves the spotlight,
not Django.

~~~
acjohnson55
Well, sort of. To me, it's more like, use Django for what it's best at, which
is making your app logic really organized, readable, and maintainable. I think
it's a great story because it makes the point that just because Django isn't
super high performance doesn't mean you need to go with something super
stripped down or in an unreadable language to get good results.

~~~
iends
But couldn't they save a ton of money in server costs by going with something
with higher performance?

~~~
mattrobenolt
Servers are much cheaper than humans.

First of all, the amount of time that'd be required to rewrite things in said
"higher performance" language with the hopes of getting some improvements
would be very costly. It's not something that'd be done in even a month by one
person. It'd be a whole company effort. Learning curve of new language, etc.

With that aside, I guarantee, for our use case at least, I'd rather invest my
time into tuning a Varnish config and save the same # of servers without all
the man power wasted.

Our server costs are much much cheaper than our human costs.

~~~
iends
This is certainly the case _now_. You have too much technical debt.

For future projects should you continue to use Django/Python, or something
with better performance?

Should this be a lesson to me as a Django developer, that I'm better off
starting a company on a different stack if I want to keep server costs low?

~~~
mattrobenolt
It obviously depends on the project, but for something like Disqus or anything
else large in terms of size, I'd still go with Django.

For some personal projects, I like to use werkzeug, but they usually are very
small and don't do much. :)

I also write things personally in Go if it makes sense. But the Go work is
typically not for web services.

