This is the latest update to our benchmarking of web application frameworks and platforms. Since Round 2, we've had several pull requests. There is more Scala, Erlang, Lua, PHP, Java, Haskell, more everything! (Sorry, we've not yet received .NET/Mono pull requests.)
Additionally, with the help of the author of Wrk, we've been able to change the methodology to use time-limited tests (1 minute) rather than request-limited tests (100,000 requests). This means all frameworks are exercised for the same amount of time.
We've migrated the results to a stand-alone site so we have a little more screen real-estate for the charts and tables.
I look forward to any feedback, comments, questions, or criticism. Thanks!
I know all languages are using mysql, but mysql drivers are pretty poor for languages such as 'go' and 'python'. So it does make it quite unbalanced. I'd love to see this with Postgres.
For most of the tests, we expect that an ORM or something ORM-like is used to work with the database. For example, the Ruby tests use ActiveRecord and many of the Java tests use Hibernate. We believe the use of an ORM to be conventional for most production sites.
The "raw" suffix indicates that an ORM is not used. This can give you an idea of the cost of using an ORM (or, in some cases, the cost of having more framework code in general).
The "servlet-raw" test uses raw JDBC to connect to and query the database. The "php-raw" test uses PHP's raw MySQL connectivity and no ORM. The "php" test with no "raw" suffix is using PHP ActiveRecord.
Thanks to techempower for putting this together. It's fantastic for me as the author of Phreeze to see how my framework stacks up, I've always been curious.
One strange thing is that Phreeze rocks on the multi-query test on the EC2 instance but on dedicated hardware does poorly. Would anyone have a clue why that would be? At the PHP level I have to admit I don't really factor in performance tuning for specific hardware - I was very surprised to see such a difference. I almost suspect the Nginx setup or something in the include path searching for too many files on one platform but not the other. Does anybody have any clues on where to look for something like that without having access to the testing hardware?
Hi jakejake, thanks for the kind words and for contributing your Phreeze test. It's looking real good, and I hope to have at least language color-coding in the next round so you can more easily see how it compares to your PHP peers.
As for your question, I'm not certain why the physical hardware turned in a lower score than EC2. That suggests a configuration problem since we know the physical hardware is in fact quite a bit higher-performance. Pat (pfalls) may be able to find some time to help you diagnose it further.
Have you had a chance to benchmark Phreeze on some of your own hardware to fine-tune its configuration?
Thanks so much for the reply. I haven't been able to get the full suite running where it creates the environment and everything via the setup scripts - I just have tried to reproduce the Nginx environment and run tests manually. But I really would like to duplicate the whole scaffolding so I can see what may be going on. I have a crushing deadline in two weeks, then after that I'm going to devote some time to getting the full thing running.
I want to give a quick thank you to everyone who has contributed thus far. Being able to show frameworks that span this many different languages and platforms is really an amazing achievement that we only we able to accomplish with the help of the community. For the frameworks that are still missing, we're not done yet, continue submitting pull requests and we'll get them in.
It's great to see such positive attitudes towards the project, and of course even greater to see the healthy competition amongst frameworks and languages. I hope it continues!
Wow, thank you for including Lift, It's mind blowing to see Lift perform only marginally better than Ruby on Rails...I always thought it as a very performant framework...
Lift is optimized for maintaining persistent connections with users and maintaining their state for an easier programming model and probably performs well at that.
If your use case actually matches these benchmarks then yes, Lift would be a poor choice.
Many variables are involved, but my conjecture is that it has more to do with the particulars of Compojure versus Play than it does Clojure versus Scala. For instance, take a look at the numbers put up by other Scala frameworks (Unfiltered, Lift, Scalatra). There's nothing intrinsically slow about Scala.
Where is all the attention that happened for the first two rounds? (I'm really hoping someone will explain why Go's database results are so slow, and what can be done to improve that.)
The most likely cause is in the mysql driver. Go provides an interface for drivers to implement, similar to how JDBC does in Java. One of the go developers, Brad Fitzpatrick, commented [1] that the actual code used by the benchmark looks OK. The discussion of these test results on golang-nuts [1] triggered a number of performance related pull requests [2] to the mysql driver used.
Hi rjoshi. I've added issues at our Github repository for those two. I don't feel that we're qualified to create the benchmark tests for those two, so it would ideal to receive those as pull requests.
We have not discriminated on pull requests, except where a proposed test is redundant (say, testing the same framework as an existing test but with a minor tweak) or doesn't work for us (we can't get the test to run).
Others from the Perl community submitted pull requests for some Perl framework subsequent to Round 3 and we will be including them going forward.
it's very strange that in the Multiple Query Tests vertx performed very well with EC2 and less well with dedicated hardware (it's position changed drematically), but not so in other tests...
Agreed. We only have conjecture on that one. I suspect that something in either the test or the Vert.x code is causing a blocking behavior. If I recall correctly, that test on i7 is not able to fully saturate the CPU cores.
Additionally, with the help of the author of Wrk, we've been able to change the methodology to use time-limited tests (1 minute) rather than request-limited tests (100,000 requests). This means all frameworks are exercised for the same amount of time.
We've migrated the results to a stand-alone site so we have a little more screen real-estate for the charts and tables.
I look forward to any feedback, comments, questions, or criticism. Thanks!