Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Django to 8 Billion Page Views (disqus.com)
316 points by mattrobenolt on Sept 24, 2013 | hide | past | web | favorite | 132 comments

Front-side HTTP caches can be essential at giant scale. First they help with naturally-static content, then you can design to maximize the amount that can be HTTP-cached.

I would like to see this better recognized as a usual and desirable production setup in the Django community, so as to eventually change the traditional doctrine around static resources.

Currently, there's the assumption that every serious production deployment moves static file-serving out of python/Django, to a dedicated side server. So, the interaction of the 'staticfiles' component changes awkwardly around the DEBUG setting, and the docs contain hand-wavey warnings about how using Django to serve static resources is "grossly inefficient and probably insecure".

Well, once you've committed to having a front-side HTTP cache, it's pretty damn efficient - one request per resource for an arbitrarily long cache-lifetime period. It requires fewer deployment steps and standalone processes than the assumed ('collectstatic'-and-upload-elsewhere) model. And if the staticfiles app is truly "insecure", that needs fixing: many people run their dev/prototype code in an internet-accessible way, so any known security risks here should get the same attention they get elsewhere. (Disabling the code entirely when DEBUG is true is a dodge.)

I'd love a future version of Django to embrace the idea: "staticfiles is a wonderful way to serve static resources, if you run a front-side HTTP cache, which most large projects will".

I think what I'd like to get across more is that a lot of our dynamic content can be "static" as well, without going the static files route. Being static for shorter periods of time is something that we can achieve with Varnish, and it's definitely an overlooked concept in the community.

I'm a huge fan of Varnish as a caching proxy.

But for just serving static assets, I'd suggest Amazon CloudFront. It's easy to set up, you only pay for what you use and you get all the benefits of a global CDN.

CloudFront is slow. :) Check out Fastly.

I demoed it a while ago, maybe I'll take another look. My real point was that you don't need to install any unfamiliar software. I love Varnish, but if the default VCL file doesn't work for you, there's a learning curve.

CloudFront... Fastly... CloudFlare... I'd agree that many projects should just adopt a cache-as-a-service, rather than add another layer/process that must be learned and maintained to their own systems.

$50 monthly minimum at fastly means it's not a great plug-and-play replacement for smallish sites.

Sure. Any CDN is better than no CDN. :)

"Slowness is likely a result of the fact that your request is communicating with other services across your network. In our case, these other services are PostgreSQL, Redis, Cassandra, and Memcached, just to name a few. Slow database queries and network latency generally outweigh the performance overhead of a robust framework such as Django."

This seems to fly in the face of everything I've experienced with frameworks... Is this true? The bottleneck for me is usually never the database backend, unless you've written horrible queries... Maybe it's just because I'm not doing queries complex enough on the scale that disqus is?

> Slow database queries and network latency

If someone could add prepared statements to Django ORM, we could see a significant performance increase. And mysql protocol support compressions, if your very buzy page requires move big text/blob out of mysql it could be very useful.

Wow, is it still true that django doesn't do any connection pooling?

I have long since switched to Flask, personally (no hard feelings, just like flask's minimal nature) -- and I haven't been keeping up with developments


> Django now supports reusing the same database connection for several requests.

Pooling is onething, prepared statements is another.

Very true, that's what I was thinking, and saw this post


and read "django has no concept of connection pooling, and every page view uses a brand new database connection. MySQL prepared statements only exist for the life of the connection/session in which they are defined" on that page... So they're somewhat connected right? My understanding of a prepared statement was a statement that a *SQL server caches for quick use -- which is only kind of right (looked it up again @ https://en.wikipedia.org/wiki/Prepared_statement and it's not too far off).

Then question marks went off as to why this still wasn't a thing yet.

You are correct. Prepared statements without connection pooling is moot

Keep in mind that 1.6 isn't released yet — it's the upcoming version. This might prevent you from using it where you need it most (production): are you willing to use unreleased software in prod? If not, how far behind Django's development cycle are you? Also keep in mind that Django breaks compatibility between versions: upgrading might have a small cost in code changes. (Not that I consider this a bad thing: I think nearly all (if not all) of the changes I've had to make between Django versions have been beneficial. I love projects that kill off old APIs.)

That said: finally.

Connection pooling is finally coming in 1.6.

You might tell Django to find one thing in the database, but it might do more than one query to get it. Which is a bottle neck. Most of the projects that I have worked with that use Django at scale, tend to move away from the ORM once it starts to hit 3K requests per second. Including a Redis DB for cached content usually allows for Django to work better.

We still use the ORM for most things. If you know how to use it right, you can make your queries pretty efficient. I'm not claiming this is the best solution, but it works. It takes patience and understanding to use efficiently and make sure you're not using excessive amounts of queries. Specifically avoiding O(n) queries. Which are easy to achieve in Django.

So this is kind of exactly what I witnessed in real life -- a friend of mine who uses Django for his stack had essentially a nested O(n^2) query, because of the ease with which the ORM allows you to query things as if they were objects in memory... Needless to say, this was very bad, and we both knew it, but it took some digging to find the right django code to implement it (we knew what we WANTED to do, and could write it out in SQL)

This particular friend was using the django testing middleware, and even with the massive amount of queries he was doing to the database, the database was not the biggest slowdown (each query went in quick, came out quick), the marshalling he was doing on the django side was...

Yeah, it really depends a lot on the models, too. If you go FK crazy then you push the ORM into making weird joins all the time. Which kills performance. Do you run the actual ORM commands or pass raw SQL to it?

We do a mix of both, but mostly the actual ORM methods.

yes, database and specifically disk i/o is usually our bottleneck.

I use newrelic and I can see the time taken of the database vs python. slow page events are almost always caused by database bottlenecks.

edit: actually the fact that a django response needs to block for the database means that overall concurrency is a serious bottleneck. again, due to database. but it doesn't need to be a bottleneck if the requests didn't have to block.

a more modern architecture like scala play! or node.js can serve other trivial requests while the database is still working. with django I have had to have two servers (8 CPU, 12 processes) so that nobody is blocked (eg. for just a trivial serve from cache) just because a few too many requests are needing more database objects for the page they are serving. so this is a django architecture issue.

We run many more than 12 processes per server. That's how you can get around that. :) But that's frankly more of a Python architecture problem than Django. Django is compatible with coroutine libraries like gevent, just most of the ecosystem is not compatible, so that makes things difficult at the size we're at.

Realistically though, this isn't a problem for us and if things are done right, shouldn't be a problem for you either. Just use more processes and you'll be fine unless you're running out of RAM or something.

how many CPUS and how many processes per machine ?

I guess I used to run 8 on a 4 CPU machine. I tried 9-12 processes but it always tripped over itself.

> unless you're running out of RAM or something.

yep, on the new machines I run 6 just to keep the total memory down. and that also sucks: having to use lots of memory just to get concurrency.

We run 40 processes on each with 16 cores and 24G of RAM.

Disk I/O I can definitely believe, but not every request from the DB goes to disk right? especially for in-memory databases

Also, I can NEVER explain evented servers adequately when people ask me about them. Especially how an evented server like twisted or tornado or gunicorn work when run from inside python (struggling w/ the GIL). But yes, I didn't even consider that -- all the stuff is synchronous! Amazing that they can handle so many requests with little to no hiccups

The issue is that while many requests are very fast to serve from cache, they can get blocked if your limited number of workers get caught handling long ones.

If an admin is using a rarely used and inefficient page, or a few too many people are uploading lots of images at once. I have 400 real estate agents, and if they happen to all want to edit their listings at the same time then the customers get frozen out. Varnish would help here.

It's definitely not a blanket statement. For us, it is. Considering most requests aren't just 1 simple query either. If you even do 5 simple queries, each at 20ms each, that's 100ms spend just waiting for the db. Plus other systems in place.

And honestly for us, we'd be happy if all queries were ~20ms. :) Lots are unfortunately slower than that. Especially if all 45k/s were doing the queries.

With asynchronous (go-like) channels becoming so popular, I'd love to see some non-blocking Django ORM database calling features. That way, you could fire off a few non-blocking db requests and keep executing on the app server (eg. rendering?) until you hit a point where you needed the database data.

I know you could roll your own implementation... but building this into the ORM would still be really slick.

There are frameworks that work like this, iirc. I think some in Scala, but I'm not sure.

IMO, the added complexity wouldn't be worth it with things like Varnish. I think there are better solutions and this is attacking the wrong problem.

But, YRMV. It depends on the situation at hand always.

Basically anything living in the akka+spray world is written this way. But akka+spray is not an framework in the same sense that django is.

Are there async drivers for e.g. mysql yet? Last time I did that I had a very nice async spray frontend, but it was all backed by a pool of regular blocking threads because that was the only way to use the database driver.

There is this, but it's not particularly mature: https://github.com/mauricio/postgresql-async

I also use https://github.com/etaty/rediscala for accessing redis.

But even without async drivers for MySQL, you can still do:

    val sqlResult: Future[StuffFromSql] = future {
      val stmt = connection.createStatement("...")
This will tie up a thread, but you can still run other actions in parallel with it.

If you do this, execute the future on a different dispatcher so you aren't blocking the default dispatcher.

This can cause problems if you tie up the default dispatcher with blocking SQL queries while using clustering: the cluster gossip won't be able to fire and if your failure detection settings are low, that node can be marked unreachable by the rest of the cluster.

AppEngine NDB does this for their datastore. Even works with Django, though not with the ORM (you have to use AppEngine's datastore API).

How many database servers do you have? How do you have them handle the queue?

1 primary master, with a few read slaves and a few shards.

Thanks for answering my questions. (:

Another thing to consider is contention and competition for the database's attention. If a query is fast in isolation, it's probably not going to be fast at tens of thousands per second.

Deploying Varnish is like cheating but in a good way. Varnish isn't a set-and-forget cache proxy. It requires a well thought out VCL and a good deal of attention and maintenance. So the saying not all VCLs are created equal applies in this case.

I challenged myself to build the most generic VCL in the sense that I want it to work with the majority of "scripts" in a semi "shared" server deployment. I also wanted to make it as easy to deploy as possible while exposing some of its advanced features.

The end result is an array of software "plugin" products that one can download and deploy on cPanel and DirectAdmin (two leading control panels).

Shameful plug (wait for it...):



There's a free 14-day trial (no payment or CC required) for those who want to give it a spin.

If you want highly-tuned geo-distributed Varnish-as-a-service, check out Fastly.com:


(Disqus is actually listed as a Fastly client.)

Fastly is used for some of our traffic yes, but not all. Our main app is not behind Fastly FWIW.

I applaud Disqus for scaling Django to this tier of sustained load. I applaud them for sharing a clearly-written and approachable explanation of how that was achieved. I also applaud them for their product in general. I think Disqus is a quite excellent embeddable comment tool.

I do have some reservations with a few points made by this article. (Below I am speaking generally, and not about Disqus in particular. I don't mean anything below to imply they are doing it wrong. On the contrary, I think they're doing it very right given their circumstances.)

Repeated is the conventional wisdom that the performance of your application logic is negligible versus external systems such as your database server or your back-end cache. For low-performance frameworks and platforms that is indeed commonly the case, hence the conventional wisdom. However, there are important caveats: first, do not confuse time spent in your database driver and ORM as waiting for your database server. Your database server vendor will find that hurtful and offensive. Most database servers will be able to retrieve rows from well-indexed tables at far greater rates than low-performance application platforms' ORMs can translate those rows into usable objects. Modern database servers fetching rows from well-indexed tables can keep up with the query demands of the very highest-performance frameworks without saturating a database server's CPUs (with throughput measured in the tens to hundreds of thousands of queries per second per server). Yes, at scale your database server may need attention. But it's not necessarily the pain point you might think it is. Bottom line: profile your application and watch your database server's performance metrics. You may not be waiting on your database despite conventional wisdom. The same is true for other third-party systems such as a back-end cache.

Coupling the above with application logic and in-application composition of content into client-digestable markup ("server side templates") will compound the impact of a low-performance platform. While high-performance platforms can execute application logic and compose a server-side template tens of thousands of times per second on modest hardware, low-performance platforms may suffer a ten-times or greater performance penalty by comparison.

It is not necessarily true that high-performance frameworks and platforms are lower-productivity if you are starting with a green-field scenario where your development team is free of incumbent preferences. That last bit is crucial, of course. Most teams do have preferences, past experience that can be leveraged, and "know-how" with legacy frameworks. Do not confuse this institutional knowledge with an objective measure of developer efficiency. Developers who are unfamiliar with both Django and a modern high-performance framework may see roughly equal productivity. Measuring your Django-experienced teams' productivity versus their productivity with (for the sake of argument) a Go framework or a modern JVM framework is a biased assessment because of the alternative's learning curve. If we continue to judge net productivity as a combination of learning curve and the resulting and ongoing effort level past the learning curve, little with a learning curve will be honestly evaluated.

Yes, reverse proxy caching such as that provided by Varnish is an excellent idea when your application is a public-facing system without a great deal of personalization. But not all systems are public-facing embeddable comments or blogs or news sites (I don't mean this to be critical!). In many systems, a majority of responses are tailored to the specific user and other entities making them unavailable for caching (as the article mentions these requests will typically use a cookie to identify the session and are therefore not cached by Varnish). In these cases, if it weren't already clear from the above, I recommend seriously considering a higher-performance platform and framework that gives you the headroom to deliver responses under high load without necessarily resorting to crutches like a reverse proxy. Yes, leverage caching where-ever and when-ever possible. But when you cannot cache, respond as quickly as possible.

Performance is actually an important concern. It's not the concern, but don't keep throwing it under the bus.

Further, performance is not only a scale and concurrency concern. It's also a user-experience matter. In addition to reducing the system complexity for high-load and high-concurrency, a high-performance platform means that even without load and concurrency, you are able to respond to user requests more quickly (reduced latency). This leads to user happiness, and in some circumstances better search engine positioning and similar fringe benefits.

Again, I want to be clear that I think Disqus is great and this article is a valuable contribution, especially for those who are invested in a similar technology stack with similar usage characteristics.

I appreciate the kind words. :) Of course profiling is important.

Even with starting on a greenfield project of this scale, I don't think I'd choose Go (which I spend a lot of time writing). The ecosystems aren't as mature in libraries and whatnot as Python is.

But aside from that, we do profile, and we know that we have some bottlenecks with actual query time. Not to say that this is the sole contributor, but it plays a role.

And you're right about us being in a position to be able to leverage Varnish to it's fullest. We have a large number of reads vs writes, and a large percentage of those are anonymous users.

Even if we had a faster backend, I don't see why we'd avoid something as simple and useful as Varnish. Even caching things for 1 second is invaluable when doing 45k/s. Even if our backend was some super optimized C code, it's pointless.

Varnish also provides us some other benefits, like being able to serve a "stale" version if we're down and smooth over hiccups.

> Even if we had a faster backend, I don't see why we'd avoid something as simple and useful as Varnish

Agreed! Since as you show in your post, it's quite effortless to set up, it's often a great idea to have a reverse proxy cache to deal with whatever cacheable responses your application generates (be it many or just a few request types).

I just want readers to be aware that for systems with a much more modest usage load (say ~1k RPS), a high-performance system can enjoy a slightly less complex system architecture. Ultimately, it's a decision for the application's designer to make. As you point out, there are other non-performance upsides to a reverse proxy. I love the ability to blip (technical term?) the application server and know the reverse proxy keeps a portion of users unaware. :)

I don't get this recent narrative that the productivity you'd get out of languages like go is comparable to higher-level ones like python. As someone who uses both go and python in production, this sounds like the blub paradox in play here. Yes, you will get significantly better performance for CPU-bound applications in go vs python. But you'll also get significantly better productivity out of python, assuming equivalent knowledge.

I also write Go probably just as much as Python anymore, and the biggest set backs I have in Go is the immaturity of the ecosystem. Almost any project I work on, I have to shave 10 yaks to get things working right.

My most recent yak was fixing lib/pq so I could query against our production dbs. :)

This library? https://github.com/bmizerany/pq

If so, we're about to use it. Anything we should be forewarned about with respect to its use in production settings?

Yeah, but that's an older unmaintained version. Official is at https://github.com/lib/pq, and I already shaved my yak. :) https://github.com/lib/pq/pull/135

That's largely why I ended up back on Python, after experimenting with Go a bit. I'll try Go out again in a year or two, when others have forged the path a little better.

This is a subjective matter, I suppose. I can see your point about Go. As a counterpoint, looking back on projects my team has worked on, I can't say we've reaped a great deal of net gains from leveraging the more mature Ruby or Python ecosystems, especially as compared to a modern JVM stack. Sure, sometimes we find a library that claims to address our need, but we often find the library doesn't do what we need precisely, is incompatible with something else we're using (e.g., JRuby), or whatever. The quality is all over the map. Net efficiency seems like a wash in my experience. But YMMV is always applicable with developer efficiency.

Yeah, that's fair. Our biggest cost has been the same - the libraries aren't there yet. This is something whose impact would likely vary a great deal from one project to the next. I do think there is quite a bit to be said of the value of higher-level language semantics as well though.

Anecdote: I definitely don't agree. Both Python and Go are in my wheelhouse, and I feel much more productive with Go than Python. Personally, I think it's the static types and simple language design. (i.e., Reading Python code, for me, requires a higher cognitive load than Go.)

I freely admit that this may vary from person to person, but I think it's close enough that you don't need to resort to rationalizations like the Blub Paradox.

Background: I've probably written ~100K LOC in each of Go and Python in the last few years.

I agree. I feel more productive in statically typed imperative languages, vs dynamically typed ones; it took me a while to get there though. The change in mindset was difficult to start with, coming from A dynamic scripting background.

I'm learning functional programming at the moment, and hope to have a similar epiphany :)

> I'm learning functional programming at the moment, and hope to have a similar epiphany :)

I've done quite a bit of FP, but I've still held out from acquiring a traditionally FP language as a go-to language in my toolbox. Instead, I've found the intense focus on abstraction and representation in the FP world leak over into my interaction with other more imperative languages.

Yes! Immutability and a lack of side effects has changed my mindset for the better. It really helps keep things understandable.

absolutely agree, static typing/FP convert here, once you get over the hump, it's a joy.

>But you'll also get significantly better productivity out of python, assuming equivalent knowledge.

Why? What is python doing to get you better productivity? As someone who is certainly not an expert at python, but used it as my perl replacement for a few years I feel like I am as productive in python as I am going to get. I was just as productive in go after two days of playing with it. The only reason I can see to use python is if it has a library/framework go doesn't.

Biggest semantic baggage of go over python that I've experienced:

* Lack of exceptions causes a lot of noise. Wrapping a bunch of code in try/catch is just more terse.

* Lack of parametric polymorphism means there's no useful higher-level functions like map, fold, etc.

"* Lack of parametric polymorphism means there's no useful higher-level functions like map, fold, etc."

This is one of my problems with Go, too. This will add line after line of repeated code in the long run. Time will tell.

I find the opposite on the error handling. The lack of exceptions is one of the few things I like about go. I just wish they had done error handling correctly instead of the stupid kludge they put in because "durr what is an ADT?". I agree that the lack of parametric polymorphism makes go essentially useless, I just don't understand how python is any better. It also lacks parametric polymorphism, and in fact polymorphism period. It is a unityped language.

Something I'd like to see: an API-compatible reimplementation of Django in C++, which would function as both a host for existing Django apps and a C++ web framework that you could gradually rewrite the hot spots of your webapp into.

This would basically be a high-performance HTTP server coupled with an embedded Python interpreter. Instead of running Django in Python, it would take your app's settings.py and url.py and compile the URLs you have into a DFA, using a high-performance regexp implementation like re2. When a pattern matches, you could either register your handlers in Python or in C++. Python handlers would work like Django normally does: they invoke a Python function, pass in request/response objects (constructed quickly via a C++ native HTTP parser), and write the response back to the socket. C++ handlers would invoke a C++ class to do the processing, which would use native C++ DB bindings or RPC stubs. C++ code should be able to call into the Python code via the normal interpreter mechanisms, so you can share utility functions between them. (Vice versa as well, though writing Python bindings for C++ utilities might be a bit more of a chore.)

The templating language could also be redone to parse the templates on server reload, generate bytecode, and then read directly from C++ data structures or Python wrappers around those. Then rendering the template becomes a very quick linear pass over the bytecode, either outputting to a buffer or directly doing zero-copy vectored IO to the socket.

The cool thing about this (other than the extra speed from using good algorithms and writing the framework in a fast language) is that you could incrementally transition from a productivity-focused startup webapp to a performance-focused high-growth webapp. You don't need to write everything in C++; only the most frequently trafficked dynamic pages. Oftentimes webapps have very sharp hotspots, eg. I'm guessing that the front page of HN gets many times more pageviews than /new, /leaders, or /reply. If they're static you can cache them, but oftentimes the heavily-trafficked pages are heavily trafficked because they change frequently.

The other cool thing is that it preserves a built-in framework for prototyping and experimentation that can re-use functionality you've built for the production site. Usually the experimentation process doesn't stop when a company grows up: it's really important to provide an escape hatch for folks to try new ideas without having to rewrite everything you've built over time.

As much as I like working in Python, and I tend to use it whenever I'm not forced to use something else these days, I would still rather move on to a nice, async, designed for the era of HTML5/JavaScript apps on client, Golang-based framework. I hope it won't take too long. Golang seems to a lot of us to be in the sweet spot of productivity close enough to Python and performance close enough to C++.

I've heard that claim a lot, but I don't really see it. I'm in the process of working on a Go AppEngine prototype at Google. Not huge, but large enough that I get a feel for the language. I like it, but it's not at all like Python for prototyping. Things that I really miss, in a prototyping context:

1. The ability to dump large amounts of untyped data on your app and pull out only what you need. I get back a JSON response from backends with hundreds of fields. In Python I can just parse it into a dict of arrays and pull out only what I need. In Go I have two options: parse into a map[string]interface{} and then use a whole lot of type assertions when I want to actually manipulate the values, or parse into a user-defined data type that pulls out the fields I need. I've opted for the latter, but since the JSON data structure is nested about 6 levels deep, that's a lot of types I have to define before I can do anything.

2. List comprehensions. So much of prototyping is basically "Munge this data type and compute something. No, actually, munge this whole list of data types and compute a list of somethings." That's a one-liner in Python, about 4-5 lines in Go. It adds up.

3. Awkward append syntax. For such a common operation, it's awfully verbose (and confusing, until you get used to it and understand why you need to reassign the result back to the slice you're appending to).

4. Error handling. In prototyping you usually just want to ignore errors on the first pass through and print a traceback. Go explicitly discourages this and makes you check the return value. And you can't even write a type-safe utility function that takes a function that returns (result, err) and panics if err is non-nil, because of the lack of generics. (The return type of such a function would depend upon the function that's passed into it, which you can't express statically.)

5. Pass-by-value defaults. In Go structs are value objects, which mean they're copied when passed to functions or assigned to variables. This is a reasonable design choice, but means that you need to think about whether you want a value or a pointer before writing the code. And if you screw up or forget to change it, your local modifications to the object are silently ignored. When I'm prototyping, my attention is usually on the problem domain and not on the language, and there've been several times I've messed up because a function suddenly needs to start modifying its argument but it was passed in by value and I forgot to change it.

To be clear, I like Go a lot as a production language, and I wish that the large C++/Java systems I maintain were written in it. But it doesn't really come close to Python for prototyping in my experience. I'm a little surprised after hearing many glowing testimonials about how Go is just as productive as Python and just as fast as C++; I would still say there's at least a factor of 2 difference in productivity. I see Go as the new Java (and also the anti-Java, in terms of design decisions they made). It's simple enough to learn quickly, it's "good enough" in performance to be used in production systems, and its design tries to encourage unskilled programmers to not screw things up too badly. On all these counts I think it's strictly better than Java. But I feel like for people that are willing to use a multi-language solution, Python + C(++) is strictly better for most of the various types of programming you'll face in building a large system. Python beats Go for scripting, Python beats Go for prototyping, Python beats Go as a config language, C++ beats Go for high-performance computing, and C++ beats Go if you need to be close to the machine. And with Cap'n Proto one of the slowest and most annoying parts of maintaining a large multi-language system (re-marshaling your data structures across the memory boundary) goes away.

What I'd really love is a way to use Python as a prototyping/scripting language and Go (instead of C or C++) as the underlying production substrate, but their runtimes are pretty incompatible.

+1 for lots of interesting points. Regarding the benefits of Python, I definitely agree. For writing my own scripts, small utilities, one-off data processing jobs (turn this CSV file into a big JSON object, etc.) and all the little, daily automation tasks I run into, it's hard to beat Python, and I'm usually surprised at how fast it is. I don't plan to give it up as my "everyday carry" Swiss Army Knife language.

For writing apps for the server, where it will keep running 24/7 serving thousands of people, I think I'm okay with giving up what you estimate to be about a factor of two in productivity for the performance gain, but I wouldn't give up the order of magnitude productivity difference needed for C++ to gain another factor of two in performance (vs Go) unless my code had to serve millions. I don't see that happening anytime soon, so Go as the new, improved Java (I see it that way, too) still seems like the sweet spot for server coding for me.

"And if you screw up or forget to change it, your local modifications to the object are silently ignored. "

Since I started writing Scala I got used to immutable data and the reverse happens to me when coding in imperative style in other langs; that I pass something and it gets "locally modified" feels awkward. Preposterous even.

Regarding Go and the "also the anti-Java" part: I really don't think so.

They have made some decisions that seem to me a reaction to c++ and that at least is similar to Java. Some of them seem simplistic not simple: for example "There is no inheritance!" (one one one) but I really don't know how implicit structural subtyping will work out in the long run.

What I really envy from Go is the "compile and link to one executable" from an ops point of view.

Time will tell.

[Edited to add envy part, and rewrite a phrase]

I actually really like immutable data - I was (and still am, when I don't have to get work done) a big fan of Haskell. The problematic part is when you have a language that encourages an imperative programming style, and yet passing an object by value is only one character off from passing it by reference.

[Disregard, I read your post again after a coffee and in a normal screen. Sorry]

I don't understand "yet passing an object by value is only one character off from passing it by reference".

Is that Go?

very neat idea. Very similar to developing your jvm based application in groovy and slowly moving over to java.

Good luck with that , Django code is massive , and when 1 line of python ~= 5 lines of C++ , without all the sugar, the task is not worth it at all. Let alone all the QA and porting all the legacy code ,which will never be full compatible with a pure C++ implementation unless you know pretty well Python internals.

If you need high performance servers,move to the JVM(Scala) or Go,or use a C++ framework , problem solved. Use the right tool for the job.Disqus did and it works pretty well for them , without having to reinvent the wheel.

You don't have to port all of it, though. Since you can run Python code from inside C++, you could just use the built-in Python interpreter to run Django, the existing codebase. The only part that really needs to be ported is URL dispatching and Request/Response objects, because that's the interface to the views.

The rest could all be done incrementally, replacing components as necessary to gain performance wins. Templates seem like a good candidate because they're used everywhere and would be on the fast path even for C++ code. You'd have to convert all the tags over to get any useful speedup. But you could leave filters in Python and call into them via Python runtime, then convert them over incrementally.

I'd do middleware next as that runs on almost all requests. Again, you can do it incrementally: all you need to do on the first pass is a C++ dispatch layer that reads the classes from settings.py and invokes them through the Python interpreter. Change that to let you register & define C++ classes implementing the interface, and you can transition the most common middleware classes one by one to C++.

A bunch of stuff I'd expect would never move over to C++. There's no real reason to port django-admin/manage.py, or the admin site. I personally wouldn't care about moving the ORM over; if I want performance I don't use an ORM, and there's nothing wrong with using raw MySQL or libpq calls. A lot of the niche stuff like File Uploads or PDF generation occurs infrequently enough that there's really not much benefit to porting it over. It's debatable whether even Forms is worth including, since most high-traffic pages would not be ones where you're submitting a rich form.

I find your sentiment about ORM overhead spot on. I did a simple unscientific benchmark once that showed a 2x overhead using jdbc versus a native driver and a minimum 4x overhead using hibernate vs jdbc. That means that using hibernate proved to be 8 times slower than using a native driver! And this is before you get into issues most ORMs have struggled with like efficient joins. And as bad as hibernate is, dynamic language ORMs are even slower.

> Most database servers will be able to retrieve rows from well-indexed tables at far greater rates than low-performance application platforms' ORMs can translate those rows into usable objects. Modern database servers fetching rows from well-indexed tables can keep up with the query demands of the very highest-performance frameworks without saturating a database server's CPUs (with throughput measured in the tens to hundreds of thousands of queries per second per server).

could you be a little more specific? is this from real world experience or from some benchmarketing material somewhere?

if i recall from early Zynga (circa 2008), we would get MySQL (InnoDB) on decent commodity hardware (Dell 2U with 8 cores, 32G mem, 15K RPM disks in raid 10, cost about $7K USD) to around 6K QPS before it would eat shit.

that was with correct indexes and only consisting of simple selects and updates (no joins or other complex queries. we probably only had ~10M users in the database, but the `user_item` table at least was likely well over 100M rows).

we even used to have a saying: "first, check the database. if it isn't the database, it's probably the database." when the app crashed, it was almost, almost always the database.

this wasn't at the 'monster' scale that the games later grew to, just a handful of php scripts with a few million monthly actives, a scale that i think a lot of people hope to achieve with modern consumer apps.

i'm just saying that i've lived the daily "it's the database", and i agree that it's a major concern for people building on open-source stacks and commodity hardware for a wide audience. the honest truth is that everything else in the stack is really, really easy to scale in comparison.

Admittedly, a queries per second measurement comes with a bunch of variables. Reads? Writes? Fetching by primary key, an indexed field, an unindexed field? Joining? Does the data set being accessed fit in the memory of the database server?

In our benchmarks [1], the hot data set does fit in memory, rows are small (by design, to avoid saturating the Gigabit connection) and they are fetched by primary key. The results top out at ~145,000 queries per second (cpoll-cppsp shows 7,252 requests per second and this test exercises 20 queries per request). The database server's resources are not fully utilized by this test. That is, in this test, the web framework--even this blisteringly fast C++ framework with its optimized MySQL driver--is the bottleneck.

In the updates test [2], the same framework scores 1,963 requests per second and this test exercises 20 reads and 20 writes per request for a total of 39,260 select queries and 39,260 update queries per second.

Outside of the database server using a Samsung 840 Pro SSD, our benchmarks run on early-2011 vintage i7-2600K Sandy Bridge workstations.

This benchmark is not a database benchmark, however. We capture numbers with MySQL, Postgres, and Mongo but it is not our intent to measure the performance of databases.

Another benchmark is Databench [3], which is measuring the impact of ORMs on bank-transfer-like transactions. Again, not a benchmark focused specifically on measuring the database itself, but interesting nevertheless. On an hi1.4xlarge Amazon instance using the Prevayler library, that hardware achieves 84,860 transactions per second. Databench uses Postgres.

[1] http://www.techempower.com/benchmarks/#section=data-r6&hw=i7...

[2] http://www.techempower.com/benchmarks/#section=data-r6&hw=i7...

[3] http://databen.ch/

Correction: Now that I'm at home, I took a closer look at Prevayler since I wasn't familiar. It appears to be a persistence platform in its own right rather than an interface to the database. So disregard.

I'm not precisely sure how to read the DataBench results, but presumably the rows that cite Postgres in the name are more relevant.

The obvious question is: did you try other RDBMSes?

no, that's sort of why i was asking for details. I'd be interested to know if postgres or some other open source solution could net us the 10-100x performance he's referencing. I've heard great things about postgres, but I've never used it under heavy load.

I'm under the impression that some proprietary databases on expensive hardware can achieve those levels, and some friends that do oracle installations claim that the total cost-per-query is competitive with open source / commodity hardware setups, but I've never really considered using them at my companies, and it seems like a lot of start-ups are in the same boat.

Yeah, the proprietary databases can be very expensive. I was told at an IBM internship day that DB2 was doing billions of transactions / day for UPS in the mid 1990s ... on clustered[1] mainframes. Not something most startups can easily get their hands on.

Basically the technological bifurcation seems to exist near the limits of a high-value corporate credit card. Which naturally changes over time.

The biggest shift in data economics in the last 5 years has been the emergence of SSDs. For startups the 2nd biggest shift has been the surged in performance of PostgreSQL; they've made huge strides in scaling it up over the last few years.

[1] http://en.wikipedia.org/wiki/IBM_Parallel_Sysplex

FWIW, PostgreSQL is our primary data store. It works well. ;)

like i said, i've heard good things.

what kind of QPS have you guys seen a box go up to? anywhere near the 'tens to hundreds of thousands' that [bhauer](https://news.ycombinator.com/user?id=bhauer) threw out there?

45K req/sec is big, btw. props, yo.

>if i recall from early Zynga (circa 2008), we would get MySQL (InnoDB) on decent commodity hardware (Dell 2U with 8 cores, 32G mem, 15K RPM disks in raid 10, cost about $7K USD) to around 6K QPS before it would eat shit.

Mysql is slow for concurrent access, but not that slow. I don't have mysql installed, but postgresql running on openbsd (a famously poorly performing OS) in a virtualbox virtual machine (hooray overhead) only being given one core and running off of a slow laptop hardrive is beating that for me.

how big is your data set? i remember MySQL degrading when the tables got too big, i think around 100M rows or so. the data set probably didn't fit in memory either.

i don't know how much of a difference it makes, but we were also moving all the data across the wire (vs. what i'm guessing is local access on your machine), and probably about 50% writes.

i might be off on the 6K number... this was all some years ago, so it's a little fuzzy... but i don't think it was 'tens or hundreds of thousands' by any means.

What are the "modern high-performance frameworks" that one should consider when developing a greenfield application?

A similar question was asked in another thread [1], and I'll repeat that answer: Just about anything on the JVM (Dropwizard, Play, Finagle, Scalatra, Rest-Express, Rest-Easy, Compojure, Unfiltered, Jersey, Vert.x, Spark), Go (Gorilla, Beego, Revel), Lua (Lapis), Haskell, Erlang...

[1] https://news.ycombinator.com/item?id=6402205

Being on the JVM does not automatically turn something into a high performance solution. The JVM itself is plenty fast, but I've seen some pretty slow code on that platform. Of the many things you listed, do you have a specific recommendation that is actually known for being high performance?

I'm not really familiar with all of the choices you listed, but of the ones that I do know, I don't really consider Dropwizarr or Finagle as alternatives to a web framework like Django (neither is Jersey, but that's already a component of Dropwizard).

Fair enough point about the breadth of the framework's services. Dropwizard and Finagle are small compared to Django. If you need specific features of Django, you may not find those in any JVM framework. I'm not certain.

But for many applications, the necessary framework pieces are there in many of the frameworks I listed: routing, database connection pooling, ORM, sessions, cookie support, form validation, JSON serialization, server-side templates, XSS/CSRF countermeasures, email, REST clients, authentication, etc. or some mix thereof. For example, Dropwizard provides a majority of those pieces.

Also agreed that you can write low-performance code on anything, and you're right there are several JVM libraries that perform poorly. But the JVM gives a very high "high water mark," so to speak. As do Go, Haskell, Lua, Erlang, and other platforms.

Dropwizard is a solid framework and would be my first choice for building web services. Recently it has acquired a few components that could work for putting together a user facing website (instead of just the web services it was designed for). Coupled with a JavaScript MVC framework this might even work for putting together a full featured user facing web application.

However, would you attempt to build a traditional multi-page web application (what Django is really built for) in Dropwizard? Is anyone doing that? I'd love to see some examples and patterns for doing that.

Also I know the JVM crowd hates to admit it, but also Node.js. I was able to do 15k rps to our node API (which hit Postgres) on the last application I worked on. And that wasn't even on SSD. Node can be very fast on your own hardware. That was on a 2 server Xeon setup.

From what I've toyed around with, Twisted and Tornado with PyPy have comparable speeds to Node.JS, too.

And that wasn't even on SSD. Node can be very fast on your own hardware

Surely the absence of a SSD and hardware vs virtual hosting is more likely to affect Postgres than Node? Node presumably runs all in memory and is more likely to be CPU bound (or waiting) than disk IO bound.

Indeed; you're right! It's an oversight to have left Node.js off my list above.

TechEmpower run some really interesting benchmarks www.techempower.com/benchmarks/ JVM frameworks are indeed performing really well

Also, to touch on your point about cookies, we do a few tricks to working around them and make conscious decisions about stripping them. My talks explain that a bit and how we design our app to work around it.

Wow, you really do have an agenda.

Scaling varnish to 30K/sec. Scaling Django to 10K-15K/sec.

I remember long time ago:"How i scaled drupal(large number of queries/page) to 3K/pages/sec". It was really Varnish that scaled.

Sure, but the idea is knowing how and when to use your tools properly and what they're good for. It allows us to continue using Django.

Out of the box Drupal didn't used to play well with Varnish, so that was not a trivial task even if Varnish did the heavy lifting. For serving a cached asset from memory it's going to be really hard to beat Varnish on speed.

Still, Django at 15K is good.

15K requests with Django is a nice number to hit. What are you using as a server? Gunicorn?

Not a bad setup.

    load balancers --> Vanquish:
        if !cache:
            --> Django
Still makes me wonder how faster this would be in Go or Java. I've never been able to get Python to be very efficient over 5K requests.


Gevent is a bit funny sometimes. How are you dealing with it?

We don't run gevent for disqus.com. We do some some background workers and whatnot, but our main app is single threaded with a large number of processes on each machine.

Interesting. I'd love to know about the specs of your hardware. Maybe a future blog post about that?

If you really need gevent that badly


you can spawn greenthread to do some background task just before return response to client.


> The common pattern for application level caching

  data = cache.get('stuff')
  if data is None:
      data = list(Stuff.objects.all())
      cache.set('stuff', data)
  return data
I'm wondering if you simplified the example or if you are just not preventing cache regen race conditions ?

Some other frameworks just put back the expired data for a few seconds while the new one is being regenerated to avoid having multiple workers building the same thing.

e.g. rails: https://github.com/rails/rails/blob/3182295ce2fa01b02cb9af0b...

This is a very simplified example. We do something similar to avoid a stampede.

so going to this link with ghostery enabled is a bad idea. the constant attempts to load resources that are blocked will crash chrome

Front-side HTTP caching is all well and good, but what do you guys do in the case when the returned content is (at least partially) user-contextual? Caching isn't really going to help you in these cases.

Obviously you can't cache everything. Hence why we still have 15k/s that are hitting out backends.

There are lots of more complicated things we can do with ESI if we really wanted to get into it.

I'm guessing they cache as much as they can here too


Template fragment caching is not a good thing. We don't do this anymore. If you have to cache your template fragments, I generally think you're doing something wrong. Gonna have a bad time.

One thing I'd also recommend for speeding up django is swapping in djinja. Straight drop in replacement for almost the entire template engine, but much faster.

This trick will speed-up django template significantly


What do you lose by using djinja instead of the builtin template engine?

You primarily lose a few of the django filters, though they can all be implemented into the jinja filters if you need to

Wouldn't this also mean you have to re-implement any filters that come from third-party apps, too?

Benchmarks, please. I've yet to find a case where that was true in a real-world project outside of … dubious … designs where someone was using a dozen nested if/for blocks to avoid writing a templatetag.

In one memorable case, someone switched an entire site over to Jinja before doing any profiling – I reverted it and added select_related() to avoid doing 8,000 queries while generating the page.

Ok, Varnish 4.0 were mentioned, by I dont find anything concrete of specific with Google Search. When will that be coming?

And I would love if High Scalability do an Interview with Disqus. 8 Billion PV, would love to see their Stack, Backend, and Machines that handles it.

I dont get why varnish should be any faster than any other raw static http server. Ok a bit faster i can understand. But like 300 times faster than without it that i don't get.

Varnish caches dynamic requests.

I guess they are using it for caching non-static content.

Awesome post Matt. I've been meaning to check out Varnish. I'm sad I missed your talk at Djangocon but hopefully it'll make it's way to the web soon.

What does Varnish do that nginx doesn't?

Scaling Django by using it less... It's Varnish that deserves the spotlight, not Django.

Well, sort of. To me, it's more like, use Django for what it's best at, which is making your app logic really organized, readable, and maintainable. I think it's a great story because it makes the point that just because Django isn't super high performance doesn't mean you need to go with something super stripped down or in an unreadable language to get good results.

But couldn't they save a ton of money in server costs by going with something with higher performance?

Servers are much cheaper than humans.

First of all, the amount of time that'd be required to rewrite things in said "higher performance" language with the hopes of getting some improvements would be very costly. It's not something that'd be done in even a month by one person. It'd be a whole company effort. Learning curve of new language, etc.

With that aside, I guarantee, for our use case at least, I'd rather invest my time into tuning a Varnish config and save the same # of servers without all the man power wasted.

Our server costs are much much cheaper than our human costs.

This is certainly the case now. You have too much technical debt.

For future projects should you continue to use Django/Python, or something with better performance?

Should this be a lesson to me as a Django developer, that I'm better off starting a company on a different stack if I want to keep server costs low?

It obviously depends on the project, but for something like Disqus or anything else large in terms of size, I'd still go with Django.

For some personal projects, I like to use werkzeug, but they usually are very small and don't do much. :)

I also write things personally in Go if it makes sense. But the Go work is typically not for web services.

Think about what you just said: snidely dismissing using tools where they're most appropriate is simply bad engineering. Professionals know how to balance trade-offs.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact