
Python's Web Framework Benchmarks - rbanffy
http://klen.github.io/py-frameworks-bench/
======
bradleyland
A couple of suggestions for the author. These aren't meant to discourage; the
opposite, actually.

1) Running benchmarks on a hosted VM introduces confounding [a] factors.
Namely, contention. What happens if someone else is running benchmarks at the
same time? Benchmarks are best run on isolated hardware, which is why it's so
hard to find good benchmarks. Who has server-class hardware lying around to
just run benchmarks on?

2) Expressing results in percentile format is great! Using line graphs to
represent the data, not so much. Line graphs are used to "fill in the blanks"
between points of data. When you draw a line between two points, you're
saying, "for these x-axis values, the y-value is approximately _this_ ". This
infers a progression, which isn't really relevant for percentile data. A
clustered column or bar chart would be a better fit for this data, IMO.

3) A histogram of response time is often informative. Just be careful with
your bin width. There are some good suggestions on Wikipedia [b].

a:
[https://en.wikipedia.org/wiki/Confounding](https://en.wikipedia.org/wiki/Confounding)

b:
[https://en.wikipedia.org/?title=Histogram#Number_of_bins_and...](https://en.wikipedia.org/?title=Histogram#Number_of_bins_and_width)

~~~
RA_Fisher
Addendum: one thing to cope with the confounding factors would be to replicate
the test several times (ideally many times like 100+ while varying instance
types).

------
Spidler
This is apples to watermelons comparision.

The remote test uses a new `requests` instance (tear down and set-up of a new
HTTP request, not a session) for each call, except in Tornado , where it uses
`tornado.httpclient.AsyncHTTPClient` and in muffin where it's using
`aiohttp.request`.

The klein example uses `treq` while the aiohttp example uses
`aiohttp.request`.

Requests might have a pretty API, but it is one of the, if not the worst
performing http libraries you can use.

~~~
sunkencity
no klein in the actual test data. Pity, it's pretty fast I think. The test
framework code for klein is not finished and I guess sqlalchemy won't perform
that brilliant atop twisted.

------
gpjt
Maybe I'm missing something, but isn't a t2.medium server a poor choice for
benchmarking? They're burstable, so CPU performance can vary over time. Unless
you're _extremely_ careful with timing the tests, you'll get inconsistent
results.

~~~
sciurus
Yes. Using a VM on multitenant hardware is questionable enough, but using one
that is _specifically designed_ to have variable performance is atrocious.
Thankfully the author has provided the necessary info for someone else to
repeat the tests.

------
barosl
There's also wheezy.web[1], which was very fast compared to ordinary Python
web frameworks. When I tested it locally a year ago, the results were the
following:

    
    
      == Time spent to handle one request ==
    
      CPU: Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
    
      WSGI      : 1 microseconds
      Wheezy    : 10 microseconds
      Bottle    : 38 microseconds
      Werkzeug  : 47 microseconds
      Morepath  : 115 microseconds
      Flask     : 193 microseconds
      Flask with Jinja: 267 microseconds
      Cherrypy  : 561 microseconds
      Django    : 816 microseconds
    

However, at the same time, this also made me more skeptical about the Python
performance story. IIRC, wheezy.web was fast because it minimized the number
of the function calls (only ~20 function calls per request were needed). This
is reasonable, considering that function calls in Python are very costly, and
there is no function inlining in CPython. (I'm not sure of PyPy, probably it
has this kind of optimization?)

But less functions mean less modularity. I have to give up some modularity if
I want to squeeze the last bit of performance from my Python code. I can't
have both. This doesn't sound great because there has been a verified method
to deal with this kind of problem - function inlining - for decades. And I
cannot use it for Python.

[1]
[https://pypi.python.org/pypi/wheezy.web](https://pypi.python.org/pypi/wheezy.web)

~~~
ptx
> there is no function inlining in CPython.

Couldn't this be done with a decorator that on the first call to the decorated
function looks up the variable bindings, extracts the bytecode of called
functions and then merges it into its own (and renames variables etc.)?

I'll leave the implementation as an exercise for the reader. :)

~~~
barosl
This is certainly an interesting idea. A quick Google search gave me this toy
example: [1][2]

Of course making it work for all cases would be very hard due to the excessive
dynamism of Python, but for simple arithmetic operations it could be made to
work fairly well.

[1] [http://tomforb.es/automatically-inline-python-function-
calls](http://tomforb.es/automatically-inline-python-function-calls)

[2] [https://github.com/orf/inliner](https://github.com/orf/inliner)

------
andyrj
Suspicious that aiohttp has peewee async, but falcon gets synchronous
sqlalchemy for its orm test? Seems like a fairly biased way to benchmark. Too
many variables to actually compare the frameworks as things beside the
framework are drastically different.

~~~
arthursilva
aiohttp is based on the stdlib event loop, it's likely to get much more
support for async stuff. You can always plug gevent into Falcon, but you'll
need a bit more luck finding compatible libraries though.

------
StavrosK
The way I consult benchmarks is "use whatever cuts development time to a
minimum, and then swap out the slow parts". Premature optimization, etc.

~~~
geoelectric
To be fair, when you're talking about a web framework that's not quite as
practicable. Huge chunks of your code will be built to the opinions of that
particular framework.

~~~
StavrosK
Definitely, but web apps usually consist of specific views, so there are
specific things you can optimize. Like any codebase, there will be hotspots
you can make faster.

------
jaibot
WHY WOULD YOU USE PIE CHARTS FOR THIS

~~~
collyw
Because the line charts get difficult when numbers are similar.

------
halayli
When it comes to web frameworks, the things you should look at before
performance are: community support, how active the project is, what does the
framework provide me free out of the box, # of bugs, etc...

You'll not bottleneck on the framework most of the times.

------
infecto
[https://www.techempower.com/benchmarks/](https://www.techempower.com/benchmarks/)

Maybe this is more useful?

~~~
bhauer
The current test types in our benchmarks (I work at TechEmpower) aim to
establish a high-water mark for web frameworks and platforms. If you review
the implementation requirements, included is a brief rationale for each test
type:

[https://www.techempower.com/benchmarks/#section=code](https://www.techempower.com/benchmarks/#section=code)

For example, the "plaintext" test focuses on raw request routing and HTTP
parsing throughput and is intended to allow ultra-high performance servers to
shine. Meanwhile, the "fortunes" test is intended to exercise a more diverse
spectrum of framework functionality including database connectivity, the ORM,
dynamic-sized collections, in-memory sorting, server-side templates, and XSS
countermeasures.

Future tests in our project will likely exercise still more framework and
platform capabilities, and with more computationally-demanding requirements. I
believe this will go further to demonstrate the appeal of high-performance
platforms and frameworks. Presently, it is easy to point out that few
applications require the capacity to serve 10,000 or 100,000 requests per
second. I routinely suggest that a real-world application is likely to perform
at 1% to 10% of the high-water mark we show. Where it gets interesting, then,
is determining if 10% of 50,000 rps or 10% of 500 rps is workable for you.

Despite conventional wisdom that suggests that a web service is constrained by
external systems such as databases and peer services—and not dismissing that
as untrue, but rather exaggerated—we believe that real-world web applications
are often constrained by the platform or framework. In fact, it is my opinion
that selecting a high-performance framework and platform (that suits your
development efficiency requirements!) is precisely how you avoid premature
optimization. By making a reasonable decision early on, you defer scale pain
that is all too common with low-performance platforms until later in your
project's life-cycle. Do you want to bring out the big architecture guns and
increased system complexity at 10,000 users or at 100,000 or 1,000,000 users?
Indeed, nothing is a drop-in replacement for something you know and love.
There will be some learning curve if you're moving from one framework to
another, and even greater learning is needed if switching platforms or
language. But these inflection points in programming are needed sometimes, and
you may find it's good to position them at the start of a project rather than
midstream.

------
jhh
I mean how much latency is even acceptable? If Django is able to deliver a
result in 200ms can I even accept it being delivered with a latency of 280,
400, 700ms? The long tail of these graphs seems almost useless to me for real
live scaling, as the latencies involved would simply force you to scale out
more (if possible).

------
baldfat
> Nothing here, just some measures for you.

That is the right conclusion, and I applaud the author. I just wonder why
people would even take anything at all from this?

------
peterbe
Great little article.

One odd thing is that the article used 2 python workers when using gunicorn
but a single when using tornado. Seems a bit unfair to say the least :)

------
hoodoof
I just don't care too much about benchmarks. I'm more interested in what its
like to program.

------
crimsonalucard
If anyone has a very opinionated conclusion about this I'd love to hear it.

------
HackyGeeky
sad to see no web2py here :|

~~~
tomcam
Would also love to see web2py.

------
vassy
Highcharts makes this page impossible to scroll on mobile.

------
ddorian43
It's missing wheezy.web which should be the fastest

------
sandGorgon
no testing using gevent ? I mean flask+gevent does boost performance quite a
bit

