
Seeking new test ideas for Framework Benchmarks Round 4+ - bhauer
https://github.com/TechEmpower/FrameworkBenchmarks/issues/133
======
CoffeeDregs
As a passive consumer of the benchmarks/info, I've got to say a huge "Thanks"
for this! I've been a watcher of the Debian benchmarks game for a long time
(since its inception), but, after using Python, Ruby, Haskell, etc, had
written off JVM languages as either slow (e.g. Groovy) or non-expressive (e.g.
Java)... This benchmark has got me __seriously __interested in JVM languages
again. (Note: I don't want to use Scala^H^H^H^H^HFrankenstein)

Given the computational ability of browsers, the "front end" is rich and the
"back end" is shrinking down to just the API. I'm not sure that the extra
verbosity of Java is too large a cost for the performance it offers.

I currently have an application doing 1500 requests a second on Django and
it's using 3 servers. I could have used only one server if I'd used Java or
Clojure? Hmmm...

~~~
meddlepal
The Groovy is slow argument has sort of run out of steam in my opinion.
Support for invok-dynamic and the static compilation annotation provide a lot
of speed. I use Groovy daily and my experience is anecdotal but maybe it is
time to do some benchmarks of my own.

~~~
vorg
Dynamic-mode Groovy is as slow as other dynamic languages. The recently added
static-compilation mode for Groovy still regularly spews up bugs, perhaps
because it was written by a single programmer with little beta testing whereas
Scala and Java have many high-pedigree developers behind them, and heaps of
documentation and testing.

------
coolsunglasses
Can't wait to see how Go 1.1 does in round 3 :)

For anybody else interested in nonsense like this I have a Clojure template
benchmark repository on Github here:

<https://github.com/bitemyapp/clojure-template-benchmarks>

I should probably update the clabango benchmark, some changes were made not
too long ago.

~~~
bhauer
My oh my those are some awesome sunglasses on that bear.

We've got Go 1.1 in Round 3 and it's amazing.

------
bhauer
The community has contributed even more pull requests since Round 2 of our web
application framework benchmarks. We're planning to start Round 3 tests on
Monday 4/15 using a new build of Wrk that allows time-limited tests (rather
than request-count limited) so that all frameworks are tested for a uniform
amount of time.

In the meantime, this Github issue request is seeking thoughts you may have
concerning additional simple tests that we can introduce in Round 4 and
beyond. We want to define tests that continue to exercise typical web
application functionality but remain fairly simple to implement on a ever-
widening field of frameworks.

If you have thoughts, please add them here or at Github. Thanks!

~~~
Terretta
Thoughts of incremental web app functionality to test:

1\. Exercising a randomized mix of reading and writing. I think you already
said you were planning a CRUD test. Consider a tunable ratio here, something
like 10000 R to 100 U to 10 C to 1 D.

2\. Exercising synchronous web service (JSONP) calls in two modes: (a) to some
web service that is consistently fast and low latency, say, the initial JSON
example from this test suite running in servlet mode, and (b) to a web service
written in the same framework as the one being tested, again using the initial
JSON example.

 _(The idea here is that many frameworks fall on their faces when confronted
with latency. This is why synthetic tests are usually so poorly predictive of
real world behavior -- people forget that latency causes backlogs and backlogs
cause all parts of the stack to misbehave in interesting ways.)_

3\. Test async ability if the framework has it, with a system call (sleep?)
that takes a randomized 0 - 60 seconds to return. Would help understand when a
framework is likely to blow up calling out to a credit card processor, doing
server side image processing, etc.

4\. Exercising authentication (standardize on bcrypt, but only create
passwords on 1 in 10K requests), authorization, and session state, if offered.

5\. Exercising any built-in support for caching, where 1 in rand(X) requests
invalidates the DB query cache, 1 in rand(X) requests invalidates the WS call
cache, 1 in rand(X) requests invalidates the long term async system call
cache, and 1 in rand(Y) requests blows away the whole cache.

For the enterprise legacy integrators, it would also be interesting to test
XML as well (in particular, SOAP), anywhere we're testing JSON.

~~~
bhauer
This is great input, Terretta. Exactly the kind of thinking I wanted to tap
into.

Some quick thoughts in response:

For #1, the conceptual test included reading and writing in a 1:1 R:W ratio
(well, to be more accurate 1:1 R:U). I like the idea of extending this a
little bit to include C and D. For the sake of benchmark run-time, I'm looking
to restrain the growth of permutations. But on the other hand, I like your
idea of a tunable ratio. Something to think about!

I like #2 and #3 as well. I'll think about those some more too.

I really like the idea of incorporating some bcrypt and session state (#4).

We have a few caching tests in mind, but like elsewhere, we'll start out
simple and then add complexity.

Thanks for the great input! This planning is to have some good long-term ideas
in mind.

------
voidlogic
What about tracking memory usage? Peak, average, etc.

Previously you tested EC2 vs Local HW. What about adding local KVM virtual
machine as well?

I also think a graph showing latency as a function of concurrency would be
very interesting.

~~~
bhauer
Thanks for the ideas, voidlogic. We do want to capture server statistics and
have that as an issue in Github [1]. I am particularly interested in capturing
CPU and I/O utilization because in spot checks, we've observed some frameworks
do not fully saturate the 8 HT cores on our i7 hardware, suggesting lock or
resource contention.

As for a variety of other hardware and VM environments, the data would be
interesting. Related: we plan to migrate the charts and tables to a stand-
alone page. Right now, the blog entries are hard-coded to fetch two specific
results.json files for rendering the charts/tables. But when we build a stand-
alone page, I would like to enhance the script to allow selection of one or
two results.json files from a menu for comparing side-by-side. And to your
point, the community could then contribute their own results files. Imagine
being able to compare EC2 large vs xlarge or vs Xeon E5s or ...

Right now, as you noticed, the latency is only displayed at 256 concurrency.
I'll make a note to myself to include a chart for latency versus concurrency
when we move to a stand-alone page [2].

[1]
[https://github.com/TechEmpower/FrameworkBenchmarks/issues/10...](https://github.com/TechEmpower/FrameworkBenchmarks/issues/108)

[2]
[https://github.com/TechEmpower/FrameworkBenchmarks/issues/14...](https://github.com/TechEmpower/FrameworkBenchmarks/issues/149)

------
ckluis
@bhauer - this may be the best marketing I have ever seen a company do in the
tech space (and by marketing I mean marketing to developers). This is a
recruiting goldmine.

Sheer genius.

------
happyhappy007
I would like to see a simple CRUD blog built with different frameworks.
Building a blog is like the "Hello world!" for dynamic web development.

~~~
coolsunglasses
Given the scope and breadth involved in these benchmarks, that's a helluva
tall order. I'm sure nothing's stopping anybody from doing it themselves
though.

~~~
kbenson
I think that this could be specced out in stages, and implemented an a number
of rounds. First would be a schema for a blog, with authors, posts an
comments. Next would be a rest API for posts and comments. Finally mock pages
to be used for posting, reading and commenting in HTML to test the included
templating system, if there is one.

You really need to go at least this far. This will also give you an
approximate code size for thus sample project as well, which is is at least as
important as performance to some people.

~~~
Terretta
> _You really need to go at least this far._

"The framework don't care."

What I mean is, the framework doesn't know if you're building the result of 20
queries into a blog post page that pulled in related data from the post
itself, the author profile, and the comments and commenter profiles, or if
you're pulling in arbitrary data. So there's no reason to test a "blog". Most
of us aren't building blogs. But we are interested in querying databases,
calling web services, cached performance, and async process queue handling.

~~~
kbenson
Except that I, and I'm sure many other people, are interested in more than
just performance. I want to know how much code it is to achieve some small
subset of usefulness, and what it looks like. Is it overly complex? Is it
split apart in a paradigm that doesn't match my mental model very well?

I agree most of us aren't building blogs (I'm not), but I believe a blog is a
reasonable stand in for a more complex application. It obviously won't test
everything, but the requirements are well understood (or _can_ be well
understood, if defined well enough).

Also, who's to say that some of these frameworks aren't going to perform
significantly worse when they start having to do more than simply serialize
data as JSON across a socket? With that in mind, how accurate are some of
these benchmarks if they aren't set up and used how they would be in real
life.

------
crypto5
Do we have any chance to look at round 3 results?

~~~
bhauer
Of course! We are starting the tests next Monday and you can expect them to be
complete some time around the middle of next week.

------
darkchasma
ASP.Net MVC?

~~~
bhauer
Sadly, we still haven't fit this in and no pull requests to-date. A Mono pull
request would cause heartfelt cheers.

