

Macro-Benchmark with Django, Flask and AsyncIO - GMLudo
http://blog.gmludo.eu/2015/02/macro-benchmark-with-django-flask-and-asyncio.html

======
falcolas
Disclaimer - I have not used AsyncIO. I have, however, tried to implement
things on top of Twisted, from which this framework derives many concepts.

Asynchronous programming in Python is a real pain in the butt. If I were
writing a web frontend where performance mattered enough that Django/Flask
would not cut it, I would use a language designed for performance.

I'm not the only one: the python development market has collectively chosen
Django won out over twisted. No matter how bad Django can be with all of its
corner cases, writing for it is still better than writing Twisted code. I
imagine this will carry over to AsyncIO as well.

As a side note, note terribly impressed with the source. Hiding the word
"dick" in unicode doesn't mean it's not there.

~~~
IgorPartola
It has been long established that using Twisted's web capabilities is a bad
idea. It has been broken from the usability point of view for a very long time
and I don't think it will ever be fixed. Without reciting the entire history,
Tornado got a lot of crap for not just fixing what Twisted had going on, but
creating a new framework from scratch. Having worked with both, I see why they
did it.

Basically, if you have a bad experience with Twisted + web, you are very much
not alone but the fault is with Twisted, not Python. Here's how I break it
down:

If you need async networking at the TCP or UDP level, use Twisted.

If you want a web application, use Django.

If you want a one page landing page type application, use Flask.

If you are on a restricted budget and need really good performance for your
web application that you tried and couldn't extract from Django, go with
Tornado.

~~~
klibertp
> that using Twisted's web capabilities is a bad idea

Isn't async just a bad fit for a typical Web page? Two things which take most
time per request (in my experience) are DB queries and rendering templates.
While you can have async DB driver (only for postgres right now), all the
template engines do their job synchronously, which - with async event loop -
blocks _everything_ until finished. There's deferToThread in Twisted, but if
we're going to use a pool of threads anyway, what's the point?

I also thought this problem was mostly solved with Nginx and uWSGI. This setup
works extremely well in my experience, eliminating problems with handling too
many sockets and such but allowing to write Django code as usual.

Async is good if you mostly do things which can be asynchronous, like fetching
things over the net, reading files from disk, communicating with Redis and DB.
You really need pre-emptive scheduler for tasks (not necessarily threads, see
Erlang) for anything that's going to be CPU-bound. And it's not true that
rendering web pages is 100% IO-bound - not when you're using Python, Django
and need consistently ~100ms response times.

~~~
IgorPartola
Well, you answered your own question ("what's the point?"): because not every
application is "app stack + RDBMS". There are many situations where your
backing store is not an RDBMS. There are many situations where you are not
rendering templates. There are many situations where queries take seconds,
while rendering the result takes microseconds.

While nginx is a very useful tool, it's not the application layer. What if you
need your application layer to be fast and complex? What if your application
layer works on streams, not rendered HTML pages? What if you want to support
server-client notifications via WebSockets? There are so many different
situations where a "block this request processor until the request is served"
does not work.

Having said that, I'll repeat again that async is not what you should reach
for unless it makes perfect sense for your application. If you are building an
RSS fetcher, sure go for it. If you are building a product for which you see
peak usage of, say, 1m users, go for it. However, for most people, time to
market is much more important than peak performance after you can't scale the
hardware cheaply. That's where Django (Flask, Rails, etc.) make more sense.

~~~
klibertp
"What's the point" referred only to the use of deferToThread in Twisted. And I
was specifically talking about "typical Web _page_ ", you know, login, logout,
comments and such.

Other than that we actually agree 100%. I wrote that "Async is good if you
mostly do things which can be asynchronous", you in turn listed a couple of
examples of such things (WebSockets, not RDBMS). We're violently in agreement
here.

One additional point I made was about Erlang. Really, if you're building " a
product for which you see peak usage of, say, 1m users" go for Erlang (or
rather about pre-emptive scheduling, but it boils down to Erlang anyway). In
my experience it's the only environment which provides both concurrency and
parallelism for both IO and CPU-bound tasks and is easily (transparently!)
distributable to many nodes.

~~~
IgorPartola
That's a good and complete summary.

I have been curious about Erlang for some time. The syntax keeps sending me
running in the other direction, but perhaps I just haven't found a suitable
project to work on where I could have an excuse to really dig in. any favorite
resources you can recommend on learning Erlang?

~~~
klibertp
Of course LYSE
([http://learnyousomeerlang.com/](http://learnyousomeerlang.com/)) and then
"OTP in Action" after you know the basics. But!

There are at least two languages that work on Erlang VM and offer alternative
syntax. There's Lisp Flavoured Erlang ([http://lfe.io/](http://lfe.io/)) and
Elixir ([http://elixir-lang.org/](http://elixir-lang.org/)). Elixir, in
addition to syntax, improves on some aspects of Erlang which make newcomers
uncomfortable, adds more powerful metaprogramming utilities and adds modern
features like browsing the docs from REPL. I found "Introducing Elixir"
([http://shop.oreilly.com/product/0636920030584.do](http://shop.oreilly.com/product/0636920030584.do))
rather good as a starting point and you can do quite a lot with it. But, in
the end, you have to at least know how to read Erlang docs, because Elixir
won't (and doesn't even try to) cover all of Erlang libraries with friendly
wrappers.

Personally I'm used to Erlang syntax, which is small and consistent and I like
the "explicit is the only way, no implicit things ever" language philosophy of
Erlang, but Elixir is a fine language to learn and use. There's a web
framework called Phoenix which despite being very young is already better
(subjectively) than pure Erlang frameworks.

I used Erlang twice professionally: for writing a kind of reverse HTTP proxy
with caching and for writing a backend for web app using WebSockets. It
performed very, very well and was quite pleasant to write. Both things are
running non-stop for over a year now and never crashed and were not restarted
even once, despite being changed significantly in the meantime (that's just an
anecdote, of course).

Erlang is a bit odd and has much smaller community than Python. Lack of
libraries may be a problem. I'd never use Erlang for something I'd use Django.
I would consider using it for things I'd otherwise use Flask, but probably
wouldn't chose it in the end. But it's my "go to" tool for situations where
I'd use Twisted/Tornado or gevent now.

------
DasIch
Just a quick look over the code of the Django and Flask benchmarks reveals,
that they are both run in debug mode introducing significant overhead.

In addition that that database access isn't performed asynchronously, which
leaves the question about what is actually supposed to happen asynchronously.

~~~
mrj
The json serialization is also very different. Flask is using json.dumps while
the django code is using a custom JsonResponse class that includes an
isinstance check.

------
acdha
A quick look at the repo shows that persistent connections are enabled for
every application but Django, which should have CONN_MAX_AGE set to something
greater than the default 0 to avoid being a benchmark for how quickly Postgres
can open connections:

[https://github.com/Eyepea/API-
Hour/blob/3e43b61cbaa3045ec3d0...](https://github.com/Eyepea/API-
Hour/blob/3e43b61cbaa3045ec3d0fc579c4ca8d296bd172e/benchmarks/django/benchmarks/benchmarks/settings.py#L58-L76)

------
MayanAstronaut
Not a good comparison.

At least gevent monkey patch all the flask and django for a async comparison.

Can you make a fourth comparison with this line added "from gevent import
monkey; monkey.patch_all()" before the apps are init?

Thanks.

~~~
rspeer
To me, running gevent's monkey-patches in production is a sign of desperation
for speed over all else. And the desperation isn't necessary, because there
are better options now.

Code stability matters.

------
korzun
Flawed benchmarks. If target produces errors during the test, the result
should be invalidated immediately.

You can't compare data from failed result to another failed result or a normal
one.

In instances where flask/django started to output errors, the test case should
have been adjusted until they could have completed operations. Otherwise you
can't compare the results, since you have no baseline.

What this test tells me now is XY gives errors for this test while Z seems to
process it. That's not a benchmark of any sort.

------
bohinjc
AFAIK Flask does not aim for performance first, it aims for developer
productivity and code readability first. The whole thread-local thing is a
good example of those trade-offs.

I'm not saying it's the best choice (nor the worse), just that it's a choice
and it has consequences. At some point you need to make compromises, depending
on your goal.

------
raverbashing
And no uwsgi?

~~~
Beltiras
THIS! I took a hard look at gunicorn a couple of years ago because I found the
learning curve of uWSGI steep. I've found the time used fruitful.

