
Web Framework Benchmarks - ksec
https://www.techempower.com/benchmarks/#section=data-r16
======
bhauer
I wrote the blog entry for this round [1], and I would agree with ksec
(elsewhere in these comments) in recommending that readers start there since
it includes the highlights of this round. I apologize ahead of time for my
humor.

It's still a bit early out here in California, but if anyone has any
questions, I will try to answer as I can!

[1] [https://www.techempower.com/blog/2018/06/06/framework-
benchm...](https://www.techempower.com/blog/2018/06/06/framework-benchmarks-
round-16/)

~~~
tomaha
Do you have any plans to add memory usage to the benchmarks? It adds an extra
dimension and sometimes even shows if there's a problem with a specific
implementation (e.g. in the benchmark game it's often a hint for a program
that can be optimized if the ratio compared to other languages is off).

~~~
bhauer
Yes! In fact, I just responded to a similar question over at the Rust
Subreddit thread about the same topic.

We do in fact capture dstat data while executing the tests but as of today do
not render these in any way. You can find raw CSV output from dstat at our
logs server. For example at [1] are stats for Grizzly while measuring the
"json" test type.

I will create an issue at the project's GitHub repo to begin a conversation
about which stats to render. I'd like to select a meaningful bite-size set of
stats that we can render into a table view. Basically another tab in the
results view similar to the latency tab.

[1] [http://tfb-
logs.techempower.com/round-16/final/grizzly/json/...](http://tfb-
logs.techempower.com/round-16/final/grizzly/json/stats.txt)

~~~
drunner
If I could make a recommendation that would humor me personally, you could
publicize the stats required to run the instance on an B1S or Fv2 Azure
instance.

I think it would be really interesting to see which frameworks are so costly
upfront that they would impede a developer from prototyping with them on the
cheap.

~~~
bhauer
I like that idea. I'll see what we can do!

------
bilbo0s
I had not checked these for a good long while. (Probably around round 10 or
11.) Looking back in on it now, holy MAN!

C (C++) and Java. Wow.

Even though I'm old, I'd sort of forgotten how speedy they can be. It's
strange, because I actually work with Java and C in making streaming game
engines. And you kind of get used to thinking in terms of millions on a set of
hardware. So you think Java and C are slow. Then you realize if you were using
Python or Node or anything other than C or Java really, you'd likely still be
working out how to support tens of thousands on that same set of hardware. (If
that.)

We can get so used to what we have. Maybe that's just a human thing.

~~~
christophilus
> Anything other than C or Java really

Rust, C#, and Go also seem to fit the bill, according to these benchmarks. C#
particularly has rocketed up since the last time I looked in depth at these.
Pretty impressive work from the .NET Core folks, I'd say.

~~~
Spartan-S63
For Rust, I'm interested to see what the future holds once the community
coalesces around an async IO interface. Additionally, when procedural macros
stabilizes (for real), the possibilities for ergonomic frameworks in Rust
become endless.

Rocket has been really interesting to work with, but it only works on nightly
since it relies on experimental codegen/procedural macro features. Once
procedural macros stabilizes, all the codegen can be ported to that. It
doesn't support async IO yet, but that's because it's waiting for the
community to coalesce before diving into it.

I think there's a future where Rust web frameworks are as expressive as Rails
or Laravel but bring type safety and sound, zero cost abstractions to the
table.

~~~
steveklabnik
Speaking of which, a few hours ago Futures were added to the standard library,
so the next nightly will have them in. We're quite close to that
standardization!

~~~
bluejekyll
Just adding for clarity: this isn't async io, but the foundation trait for
building promise/future combinators in Rust. Tokio, an async io library, is
built around this trait.

~~~
steveklabnik
Yes. It's the Future _and_ executor stuff, but doesn't provide an executor.
You'd do that via Tokio or whatever else.

------
onion2k
These are fun benchmarks, but every single framework in the top 250 or so is
_more_ than fast enough for most apps. Even the slowest in the list manages
366/requests per second, which is probably enough to prove an idea before
optimizing for speed.

We're approaching the point where speed is essentially a solved problem unless
you're at Google-scale.

~~~
lllr_finger
Is it? Concurrent connection handling at scale is a major problem, and plenty
of Fortune 1000 companies would jump at the chance to have even 5% fewer
instances deployed.

366 rps in a test scenario doesn't scratch the surface of what I need, and I'm
not doing anything I'd consider crazy scale. When you need to support hundreds
of thousands of rps, small improvements can be very noticeable.

~~~
mkirklions
I'll tell you what.

When I'm a fortune 10,000 company, we can afford to rewrite the web app.

~~~
ddorian43
But do they rewrite ? Twitter did. Facebook didn't.

~~~
tomnipotent
> Facebook didn't.

You have no idea what you're talking about. Not only did they produce TWO
different implementations of PHP (HHVM and Hack), a vast majority of their
infrastructure is NOT in PHP (which is mostly reserved for the front-end/web
tier).

~~~
ddorian43
I know. Still, cheaper than rewrite.

------
blunte
I'm not familiar with this site and benchmarks, but from reading comments it
seems to be respected.

Given that, I'm confused why this is called "web framework" benchmarks. It
looks to me like it is comparing some actual frameworks (which rank very
poorly) against minimalist, task-focused http servers (which rank highly).

~~~
bhauer
We use the word "framework" as a term of convenience covering the full
spectrum from platforms, micro-frameworks, to full-stack frameworks. We are
also liberal with accepting contributions from the community, which means we
do indeed include several frameworks that are well outside the mainstream.

That said, in tests such as Fortunes, you will see many full-stack frameworks
demonstrating their capability to deliver myriad web application fundamentals
(such as request routing, database connection pooling, ORM, XSS
countermeasures, character encoding, data structures, and server-side
templates) at very high performance levels.

This project establishes a _high-water mark_ of performance. Very few web
applications process anything remotely close to the requests per second we're
measuring. But using Fortunes as a proxy and applying a coefficient (e.g.,
0.005) to adjust for real-world application sizing can give you a very rough
but nevertheless potentially useful approximation of real-world expectations.
For example, one can guess their application is 200 times more complex than
our Fortunes test. Rough back-of-the-envelope math will then estimate a
framework from the ultra-high performance tier (~200,000 fortunes/sec) might
yield a generous ~1,000 real-world app requests per second on a modern Xeon
server. Meanwhile, a framework in the middle tier (~10,000 fortunes/sec) would
yield a more constraining ~50 real-world app requests per second. Again, this
math is hand-waving and you can interpret the results in whatever way you
prefer; you can dismiss the validity of approximating things so coarsely. But
my experience is that real-world applications based on frameworks I've
used—which are sprinkled among all tiers of our results—do align with this
hand-waving approximation.

 _Edit: I invite you to read the last section of the blog entry about this
round [2] where I argue the same point in another way and share one of the
tweets from an application developer whose real world application benefited
from the performance improvements made to his favorite framework._

[1]
[https://www.techempower.com/benchmarks/#section=motivation](https://www.techempower.com/benchmarks/#section=motivation)

[2] [https://www.techempower.com/blog/2018/06/06/framework-
benchm...](https://www.techempower.com/blog/2018/06/06/framework-benchmarks-
round-16/)

~~~
blunte
Thanks for the explanation. I see your reasoning.

If there were a (asterisk) in the title: Web Framework Benchmarks (asterisk
link), with the asterisk taking you to a page with something like your
explanation above, that would be lovely.

------
narrator
One thing I've noticed is that all the fastest implementations now run
PostgreSQL. For the longest time MySQL was thought to be faster, so I guess
PostgreSQL really caught up recently. I'm seeing it be the default database in
a lot of new open source projects.

~~~
aninteger
This is probably due to the fact that PostgreSQL (libpq) has an "async"
interfere built in and MySQL/MariaDB don't. These benchmarks are running very
basic queries that perform well on either MySQL or PostgreSQL.

------
ksec
The Blog post [1] which explains things in details.

Basically they discovered Docker! Given the amount of hype it had for the best
2-3 years I am surprise, Now it is all Kubernetes. ( Or was that a joke I am
not getting )

They had new hardware, and I am surprised it was sponsored by Microsoft. They
are also using Azure for Cloud. And the hardware is recent and much better
represent the common usage. Before it was a Quad Socket CPU platform that I
doubt many are using.

I know a lot of these does not represent real world usage. But even if we
pick, Full Stack Framework, ORM usage, Realistic implementation, ignoring the
erroneous results at the bottom, we still have a gap of 50x difference in
Fortune, which is the only results I look at.

I didn't check out why Ruby + Racks were not working. Hanami wasn't working
either. So the best results for Ruby were roda & sequel, both framework by
Jeremy Evans.

[1][https://www.techempower.com/blog/2018/06/06/framework-
benchm...](https://www.techempower.com/blog/2018/06/06/framework-benchmarks-
round-16/)

~~~
bhauer
> _( Or was that a joke I am not getting )_

Yes, that was a bit of a joke in the blog entry. We suspected for years Docker
would be a good fit for this project. Only within the past months did we find
the time necessary to convert the hundreds of test permutations to Docker. It
wasn't a quick thing, but we did get into a rhythm. The "joke," for whatever
it's worth, is simply that we didn't sufficiently appreciate how useful it
would be for this project and how insignificant the overhead would be. Perhaps
we would have prioritized it higher in the past had we known. But it's done
now!

In brief, the Docker conversion effort has increased the stability and
reliability of the results.

There are a huge number of ways to consume the results and as many opinions.
We welcome the diverse points of view and hope you find the data useful!

------
wheaties
I love these but I've also seen in some of the implementations that I care
about a bit of benchmark gaming. That saddens me and I wonder just how much of
it is happening in languages and frameworks that I don't follow.

~~~
brightball
It's interesting to look at the source code. There's definitely some heavy
differences in how each implementation is handling underlying aspects of the
tests.

Just looking at the database related ones to see about some of the differences
in interesting.

It also doesn't look like the Elixir code has really been updated in about 2
years (aside from version bumps). It's still using a JSON encoder (Poison)
that's 4x slower than the primary one (Jason). For multiple queries, Ecto
hands back the connection after every query for concurrency sake. Looks like
the Plug logger is still setup on the endpoint.

I'd be really interested to see the Discord folks look at that as a pet
project. :-)

~~~
nobleach
That's the beauty of these "pinewood derbies" though. Sometimes they sit
dormant until some kind soul says, "HEY!, that's not representative of my
favorite framework!" and submits a PR to showcase its true capabilities.

------
argonium
I've been following these benchmarks for some time, and am always shocked that
Spring does so poorly (it's 7% here). I haven't had any performance issues
with Spring in production, so these benchmarks are puzzling. Are the other
frameworks really that much faster in practice?

~~~
christophilus
What kind of traffic do you see in production? Rails gets hammered in these
benchmarks, but does just fine for most companies' requirements. It's pretty
rare that you'd actually need to eke out the raw numbers that the top
contenders get. I'm of the opinion that-- within reason-- developer
ergonomics, and ability to quickly solve business problems are more important
than raw performance, so long as performance is good enough by some agreed
upon metric.

~~~
argonium
Good point. Thanks for the feedback. Our sites don't get a huge amount of
traffic, so it's possible Spring doesn't have as good a concurrency story (or
it's due to memory usage) as the higher-ranked frameworks, so it's been
sufficient for our needs.

------
marsrover
I have been working with .NET Core since RC1 and I am so excited about the
performance it's getting.

Ranked 7? That's seriously impressive. Just go look at some of the past
benchmarks.

~~~
hungerstrike
I think it would have been even faster if they'd test with SQL Server because
you can do things with SQL Server that you simply cannot do in PG such as
stored procedures that return multiple resultsets, which saves a ton of round-
trips.

~~~
zapov
You can do even more advanced stuff in Postgres, but they've changed the rules
to dissalow it since it much more faster than the the current best ones.
Something about being fair to other frameworks...

~~~
hungerstrike
Can you request multiple heterogeneous result-sets from PG now, with something
that looks like a stored procedure?

I'd love to see what that looks like. In SQL Server it's as easy as putting 2
SELECT statements in the same procedure.

~~~
anarazel
> Can you request multiple heterogeneous result-sets from PG now, with
> something that looks like a stored procedure?

Not prettily. You can return cursors, you can use json etc, ...

But you can pipeline SQL statements. I.e. just send N SQL statements
(including bind parameters etc, the protocol is the same) without waiting for
results, and then process the results as they come in. If you want to avoid
latency penalties that makes much more sense in my opinion than having to wrap
multiple statements in a function.

------
reacharavindh
It is interesting to watch the latency numbers from the table(in a separate
tab). I suppose latency is more important to most users than the peak
throughput, considering that production applications almost never hit peak
throughput and are often supplied with extra hardware before utilisation gets
there.

------
db-dzine
Can someone explain me why php5 is faster than php7?

~~~
jaequery
theres a lot of weird things that dont make sense. like how a minimalistic
framework sinatra/padrino is slower than rails doesnt seem to make sense at
all.

i feel these benchmarks are misleading to say the least.

------
beal
There is a lot of effort being spent measuring what is very rarely a
performance bottleneck in web applications.

------
spullara
Check out vertx on the multiple query benchmark - it is almost twice as fast
as the next entry. Just got confirmation from TechEmpower that they are using
a new pipelining feature in postgres to wipe the floor with the rest of the
field.

[https://twitter.com/sampullara/status/1004753777209962497](https://twitter.com/sampullara/status/1004753777209962497)

------
kizer
Is 'Fortunes' a test involving the quote generating program? Also, is the
difference between 'Single Query' and 'Multiple Queries' concurrency? I always
look at these but I have trouble understanding the test cases.

~~~
bhauer
The best resource to answer your questions is the Requirements for the various
test types [1]. But in brief:

* Yes, Fortunes is named after the Unix tool of the same name. In our case, it's a test that executes a query of all rows in a table, adds an additional item, sorts the values in the application code, escapes the values as a XSS countermeasure, and then renders them using a server-side templating library.

* Yes, the single-query test is always executing a single-query per HTTP request and is measured at various concurrency levels. The multi-query test is measured with consistent concurrency and varies the number of queries executed per HTTP request.

[1]
[https://www.techempower.com/benchmarks/#section=code](https://www.techempower.com/benchmarks/#section=code)

~~~
kizer
Thank you

------
dorfsmay
It looks like the bottleneck here is the database driver (or possibly ORM when
used). The _plaintext_ test shows a very different picture:

[https://www.techempower.com/benchmarks/#section=#section=dat...](https://www.techempower.com/benchmarks/#section=#section=data-r16&hw=ph&test=plaintext)

PS: As a side note, I do not like golang, my focus has been on python and now
rust, but I am very impressed by both the plaintext and the DB test!

~~~
bpicolo
The bottleneck on plaintext appears to potentially be physical network
bandwidth.

[https://github.com/TechEmpower/FrameworkBenchmarks/issues/35...](https://github.com/TechEmpower/FrameworkBenchmarks/issues/3538)

~~~
dorfsmay
Agreed, but it shows the framework that are able to reduce their overhead,
since they aren't all at 100% of the fastest.

------
dorfsmay
It is really surprising that the same frameworks are much slower running pypy
than cpython. I'm assuming all the result are run from cold... Considering
that webserver rund for long period of time, it'd be interesting to measure
after several runs, and see how the JITted platforms (pypy, java etc...) place
in comparison.

~~~
tomchristie
Nope, they run a warm up stage.

~~~
dorfsmay
Thanks, I browsed quickly through the explanations and didn't see that.

That explains why java frameworks are doing so well but raises questions about
pypy!

EDIT: Looking closer, tornado is faster running pypy.

------
meritt
This is the first time the PHP extension swoole (an async library) was on the
test. It took #10 place on the JSON serialization test. The remainder of the
tests rank terribly because the implementation incorrectly uses synchronous DB
calls, but once that's fixed it should be very promising.

------
mozumder
How different are the various frameworks here?

Are they all running with SSL? Do they keep track of session cookies? Do they
have Content Security Policy headers? etc.. There's probably a million
different feature configuration differences between each framework.

~~~
bhauer
> _How different are the various frameworks here?_

Quite different. We have included test implementations for frameworks that
span 26 computer languages. There are an infinite variety of opinions about
how to do things in computer programming in general, and our project is no
exception.

And on the other hand, fairly similar. Tests should stick to the requirements
[1], which are designed to be permissive but sufficiently clear on the
expected work load of each test type. The principal goal is that test
implementations should be realistic, and we will mark those the community
believes are not realistic as "stripped" implementations.

> _Are they all running with SSL?_

No, not yet. But planned future test types would include SSL/TLS.

> _Do they keep track of session cookies?_

Generally no. These are intended to exercise anonymous requests. But a future
test type could include session management.

> _Do they have Content Security Policy headers?_

No, that is not a requirement of our tests. We have specified which headers we
expect. Others are optional.

> _There 's probably a million different feature configuration differences
> between each framework._

Yes. And it can be challenging at times to get all of these opinions to fit
into the same box. For example, we've to-date kept SQLite implementations out
of the project since those would not incur network costs. And other times the
box has to be reshaped a bit to make room for new consensus opinions about
what is suitable for "production." We recently made a decision to allow for
innovative features in the Postgres protocol that are—in a manner of
speaking—analogous to automatic pipelining. That conversation is still ongoing
on our discussion forum.

[1]
[https://www.techempower.com/benchmarks/#section=code](https://www.techempower.com/benchmarks/#section=code)

~~~
mozumder
So, anyone can submit a framework here as a docker container? I'd love to send
in my custom framework for benchmarking.

~~~
bhauer
Yes! We accept PRs from the community at the GitHub repository [1]. We do ask
that you submit what you believe is _production quality_ code that would be
suitable to run a real web application. We are liberal with how that is
interpreted, but we reserve the right to reject code that is too experimental.

[1]
[https://github.com/TechEmpower/FrameworkBenchmarks](https://github.com/TechEmpower/FrameworkBenchmarks)

------
stunt
Trust me. You won't hit 424,712 requests per seconds per each node (that is
1,528,963,200 per hour).

You need to have many nodes and a a well distributed and scalable
infrastructure to handle 1B req/hr.

There is no point to benchmark your programming-language/framework when doing
web development.

When it comes to web, you will hit N different capacity limits before that
even with slowest framework, starting from your database, network connections,
bandwidth, third-party API rate limits, IO performance etc...

Consider only a few options base on project requirements, ecosystem,
productivity, and available human resources. then compare the top two on
performance.

You can't compare H2O and Rails on the same list. they are two different
animals.

Moreover, I'm sure all of these frameworks are not well optimized for a raw
performance benchmark, and also these results are heavily affected by filters
you are choosing. Don't spend time on pointless benchmarks.

~~~
3pt14159
Yep. It took me about six months to write a replacement of ActiveRecord with
an identical API (for what was needed) that could scale to arbitrarily large
number of writes.

Shit, pre-A16Z raise 500px ran on Rails and they were a social network around
_photography_. You think Twitter is bad? Try timelines where you have half a
million photographers liking hundreds of photos an hour and every like gets
pushed into every feed of the people following them. Guys syncing their entire
photo library. It was 5 application servers, 4 MongoDB servers for the
timeline with some crazy data structures, one or two MySQL DBs.

I think most web projects these days should be in either Pheonix, Rails, or
similar. If you need something really fast here or there just fork off the
request in Nginx or compile something in Rust or C and extend it into Ruby (or
whatever). Or have a compiled worker that communicates through the DB or
Redis. There is this long tail of UI you need to make for every web project
and nobody uses it 99.999% of the time. It should be in whatever secure
language / framework that brings it to market fastest.

~~~
ksec
>Yep. It took me about six months to write a replacement of ActiveRecord with
an identical API (for what was needed) that could scale to arbitrarily large
number of writes

Sorry to ask, would that be open sourced?

>It was 5 application servers, 4 MongoDB servers for the timeline with some
crazy data structures, one or two MySQL DBs.

That is a little hard to warp my head around, because pre-A16z 500px was a
long time ago? Hardware should be much slower and Ruby was way slower as well.
We have 2-3x faster hardware and faster ruby today, and the same hardware
surely won't do much with Discourse and Gitlab scale.

~~~
3pt14159
> Sorry to ask, would that be open sourced?

No. I would have if I owned the code personally, but it wasn't in the
companies interests to open source it.

> That is a little hard to warp my head around...

I was there between was 2012—2013. The hardware wasn't bad, I don't remember
the exact specs but it was colocated at a place in downtown TO on some decent
but not crazy gear. It wasn't perfect, but it worked pretty well.

For the actual image processing we just shelled out to imagemagick and I
vaguely recall some fancy batch write code for keeping things like photo view
counts from completely destroying the DB. We had a hardware load balancer that
worked pretty good and we compiled ruby ourselves for a performance boost,
though that ended up killing us this one time when there was a bug that only
showed up in the compiled version of Ruby when all the dev boxes ran the
universal binary.

People care so much about performance, and in some cases like computer games
it's totally warranted, but if at the end of the day you have a DB the DB is
going to be the problem long, long, long before the application code.

~~~
ksec
> No. I would have if I owned the code personally, but it wasn't in the
> companies interests to open source it.

Thanks, I just thought that might solve the ActiveRecord is slow problem.

~~~
3pt14159
It didn't solve the ActiveRecord is slow problem, it was marginally faster,
but not a whole lot faster. What it solved was handling models over an
arbitrary number of databases. Imagine billions of writes of a single model
type across multiple databases. Imagine needing to join across databases and
not knowing which DB a joined model would be in.

------
IshKebab
This should clearly show this as _time per request_ rather than requests per
second, since these are really trivial requests and aren't going to be
representative of actual performance differences in the real world.

------
sqidyyy
I could never explain why mojolicions (Perl) was placed 5th on "Data updates"
in Round 14.

Guess that's a think of the past now. (53rd)

------
handbanana
Rails :(. Would be interested to see how it fairs using the most up to date
rails/ruby versions

------
akmittal
Just a reminder if speed was the only thing that mattered we all would be
using assembly now.

~~~
mkirklions
Im with you, although we will see if animations can hide loading times.

Btw, you will always have people that disagree with you. I get crap about the
loading time of my website, but since I changed to a massive beautiful change,
my conversion doubled overnight.

I think a human can decide for themselves if something takes too long. I ran
into this issue and optimized, but I wouldnt be chasing benchmarks.

~~~
akmittal
I'm not saying performance doesn't matter I'm saying performance is NOT ONLY
thing which matter.

