Hacker News new | comments | show | ask | jobs | submit login
Web Framework Benchmarks (techempower.com)
438 points by pfalls on Mar 28, 2013 | hide | past | web | favorite | 396 comments



One of the most interesting things this comparison brings out to me is not so much the differences between the various frameworks (although the differences between options on the same platform is definitely very useful information), but also the issue that few of us seem to think about these days: the cost of any of the frameworks above the bare standard library of the platform its hosted on.

Theres a consistant, considerable gap between their "raw" benchmarks (things like netty, node, plain php, etc.) and frameworks hosted on those same platforms. I think this is something we should keep in mind when we're tuning performance-sensitive portions of APIs and the like. We may actually need to revisit our framework choice and implement selected portions outside of it (just like ruby developers sometimes write performance-critical pieces of gems in C etc.) or optimize the framework further.

I'd like to crunch these numbers further to get a "framework optimization index" which would be the percentage slowdown or ratio of performance between the host platform and the performance of the framework on top of it. I might do this later if I get a chance.


I think this is a much needed and excellent point to make. Just take a look at how Go dips down when using Webgo.


I used Go's benchmarking tool to compare raw routing performance of various frameworks. The handlers all return a simple "Hello World" string. Here are the results:

  PASS
  Benchmark_Routes           100000	     13945 ns/op
  Benchmark_Pat              500000	      6068 ns/op
  Benchmark_GorillaHandler   200000	     11042 ns/op
  Benchmark_Webgo            100000	     26350 ns/op
  ok  	github.com/bradrydzewski/routes/bench	12.605s
I then ran the same benchmark, but this time I modified the handler to serialize a struct to JSON and write to the response. Here are the results:

  Benchmark_Routes              100000	     21446 ns/op
  Benchmark_Pat                 100000	     14130 ns/op
  Benchmark_GorillaHandler      100000	     17735 ns/op
  Benchmark_Webgo                50000	     33726 ns/op
  ok  	github.com/bradrydzewski/routes/bench	9.805s
In the first test, Pat is almost twice as a fast as the Gorilla framework. In the second test, when we added a bit more logic to the handler (marshaling a struct to JSON), Pat was only about 18% faster than Gorilla. In fact, it turns out it takes longer to serialize to JSON (8000ns) than it does for Pat to route and serve the request (6000ns).

Now, imagine I created a third benchmark that did something more complex, like executing a database query and serving the results using the html/template package. There would be a negligible difference in performance across frameworks because routing is not going to be your bottleneck.

I would personally choose my framework not just based on performance, but also based on productivity. One that can help me write code that is easier to test and easier to maintain in the long run.


rorr, you appear to be hellbanned. Here's your comment, since it seemed like a reasonable one:

> Now, imagine I created a third benchmark that did something more complex, like executing a database query and serving the results using the html/template package. There would be a negligible difference in performance across frameworks because routing is not going to be your bottleneck. If you're performing a DB query on every request, you're doing something wrong. In the real world your app will check Memcached, and if there's a cached response, it will return it. Thus making the framework performance quite important.


Ok, so I added a third benchmark where the handler gets and item from memcache (NOT from a database). Here are the results:

  PASS
  Benchmark_Routes             10000	    234063 ns/op
  Benchmark_Pat                10000	    233162 ns/op
  Benchmark_GorillaHandler      5000	    265943 ns/op
  Benchmark_Webgo               5000	    348349 ns/op
  ok  	github.com/bradrydzewski/routes/bench	10.062s
Notice the top 3 frameworks (pat, routes and Gorialla) have almost identical performance results. The point being is that routing and string manipulation are relatively inexpensive when compared to even the most lightweight TCP request, in this case to the memcache server.


By the way:

https://plus.google.com/u/0/115863474911002159675/posts/L3o9...

More SPEEEEEEEEEEEEED coming down the pipe in 1.1 for Go's net/http. :)


I think Go is almost the ideal example here, you're right. Go provides a pretty rich "standard library" for writing web serving stuff so it's a good place where you could really imagine writing your performance critical stuff just on the base platform even if you use something like Webgo for the rest of your app.

Some of the other platforms are much less amenable to that since the standard primitives the platform exposes are very primitive indeed (servlets api, rack api, etc.). Perhaps there's some value in looking at how your favorite framework stacks up against its raw platform and trying to contribute some optimizations to close the gap a bit.


I'm curious about that - because there's so little to webgo I suspect the answer is something really trivial. I haven't really looked at it before, but the framework itself is just 500 lines or so unless I'm looking at the wrong one...

Given that the json marshalling and server components would be exactly the same between go and webgo, I'm curious as to whether changing the url recognised to be just /json in the goweb tests would make any difference, any reason it was different?


Just had a look at the tests and the urls responded to differ:

http://localhost:8080/json

http://localhost:8080/(.*)

Shouldn't all these examples at least be trying to do the same sort of work? For such a trivial test differences like that could make a huge difference to the outcome.

It's great to see replicable tests like this which show their working, but they do need to be testing the same thing. I also think they should have something more substantial to test as well as json marshalling on all the platforms, like serving an HTML page made with a template and with the message, as that'd give a far better indication of speed for tasks web servers typically perform.

Still, it's a great idea and really interesting to see this sort of comparison, even if it could be improved.


One of the next steps we'd like to take is to have a test that does cover a more typical web request, and is less database heavy than our 20 query test, just like you describe. Ultimately, we felt that these tests were a sufficient starting point.


I was a little confused by the different urls used in the tests, as for this sort of light test, particularly in Go, where all the serving stuff is common between frameworks, you're mostly going to be testing routing. Any reason you chose a different route here? (/json versus /(.★) )?

I can't think of much else that this little web.go framework does (assuming the fcgi bits etc are unused now and it has moved over to net/http). I don't think many people use web.go, gorilla and its mux router seems to be more popular as a bare bones option on Go, so it'd possibly be interesting to use that instead. It'd be great to see a follow up post with a few changes to the tests to take in the criticisms or answer questions.

While you may come in for a lot of criticism and nitpicking here for flaws (real or imagined) in the methodology, I do think this is a valuable exercise if you try to make it as consistent and repeatable as possible - if nothing else it'd be a good reference for other framework authors to test against.


Webgo... I'll stick with "net/http" and Gorilla thanks.

Also, They used Go 1.0.3... I hope they update to 1.1 next month. Most everyone using Go for intensive production uses is using Go tip (which is the branch due to become 1.1 RC next month)


This is great to know, we were hesitant to use non-stable versions (although we were forced to in certain cases), but knowing that it's what is common practice for production environments would change our minds.


We switched to using tip after several Go core devs recommended that move to us, the folks on go-nuts IRC agreed and we tested it and found it to be more stable than 1.0.3


Wow, directly on tip? That seems to speak very highly of the day-to-day development stability of Go.


A good tip build tends to be more stable then 1.0.3 and has hugely improved performance (most importantly for large application in garbage collection and generation).

To select a suitable tip build we use http://build.golang.org/ and https://groups.google.com/forum/?fromgroups#!forum/golang-de... . My recommendation would be to find a one or two week old build that passed all checks, do a quick skim of the mailing list to make sure there weren't any other issues and use that revision. Also, you will see some the the builders are broken-

Of course if your application has automated unit tests and load tests, run those too before a full deployment.


Thanks, this comment really helped me in my evaluation of Go today. I had been playing around with 1.0.3 for a couple days, but tip is definitely where it's at.


I'm glad I could help. Go 1.1 RC should be out early next month. So if you want you could wait for that (for production use).


Or it could speak poorly of their release process, which is more accurate. The stable release is simply so bad compared to tip that everyone uses tip. There should have been multiple releases since the last stable release so that people could get those improvements without having to run tip.


Why not both? Insisting on a very stable API can result in long times between releases, which can mean more people using tip. That's distinct from how stable tip is.


Given the frequent complaints that the previous stable release isn't very stable, I think trying to interpret it as "tip is super stable" is wishful thinking. Tip is currently less bad than stable. The fact that stable releases are not stable is a bad thing, not a good thing.


What does stable mean? If stable means there are not unexpected crashes, then Go 1.0.3 is extremely stable.

If stable means suitable for production, Go tips vastly improved performance, especially in regards to garbage collection, make it more suitable than 1.0.3 for large/high-scale applications in production.


I'm not sure many people would use webgo in real life. I don't know... maybe some people... certainly not pros.

Also, the 1.0.3 thing is probably dragging on the numbers a bit. 1.1 would boost it a little. Not enough to get it into the top tier... but a little.

Also, for Vert.x, they seem to be only running one verticle. Which would never happen in real life.

Play could be optimized a bit... but not much. What they have is, to my mind, a fair ranking for it.

Small issues with a few of the others but nothing major. I think Go and Vert.x are the ones that would get the biggest jumps if experts looked at them. And let's be frank... does Vert.x really need a jump?

So what they have here is pretty accurate... I mean... just based on looking through the code. But Go might fare better if it used tip. And Vert.x would DEFINITELY fair better with proper worker-nonworker verticles running.


The Play example was totally unfair since it blocks on the database query which will block the underlying event loop and really lower the overall throughput.


Well... to be fair...

the Vert.x example, as configured, blocks massively as well waiting for mongodb queries.


Could you point me at an example of an idiomatic, non-trivial Go REST/JSON API server? I've been trying for a while to find something to read to get a better handle on good patterns and idiomatic Go, but I haven't really come up with anything. I've found some very good examples of much lower-level type stuff, but I think I have a decent handle on that type of Go already. What I really would like is a good example of how people are using the higher level parts of the standard library, particularly net/http etc.


Sorry... I'm not really a book kind of guy when it comes to this stuff. The golang resources are mostly what I use.


for Vert.x, we specified the number of verticals in App.groovy, rather than on the command line, which we think is a valid way to specify it.


OK... I ran the Vert.x test... runs a bit faster here with 4 workers instead of 8. I suspect what is happening there is that at times all 8 cores can be pinned by workers, while responses wait to be sent back on the 8 non workers. But not that big a change in speed actually. One thing more, when you swap in a couchbase persister for the mongo persister it's faster yet. The difference is actually much larger than the difference you get balancing the number of worker vs non worker verticles. Also thinking that swapping gson in for jackson would improve things... but I don't think that those are fair changes. (well... the couchbase may be a fair change)

Also tested Cake just because it had been a while since I have used it... and I couldn't believe it was that much worse than PHP. Your numbers there seem valid though given my test results. That's pretty sad.

Finally, tried to get in a test of your Go stuff. I'm making what I think are some fair changes ... but it did not speed up as much as I thought. In fact, initially it was slower than your test with 1.1.

So after further review... well done sir.


That first line is truly one of the best comments I've seen when discussing languages. I've clipped it and will use it from now on:

>> I'm not sure many people would use [xxx] in real life. I don't know... maybe some people... certainly not pros.


The Play app uses MySql in a blocking way, while Nodejs uses mongo. It's not comparable.


We have a pull request that changes the Play benchmark (thank you!) so we will be including that in a follow-up soon.

We tested node.js with both Mongo and MySQL. Mongo strikes us the more canonical data-store for node, but wanted to include the MySQL test out of curiosity.


That is bad benchmarking!


I'd love to see your framework optimization index. Honestly, all of this would be a wonderful thing to automate and put in a web app - a readily-accessible, up-to-date measure of the current performance of the state of the art in languages and frameworks. I bet it would really change some of the technology choices made.


Here's a quick version of the framework optimization index. Higher is better (ratio of framework performance to raw platform performance, multiplied by 100 for scale):

Framework Framework Index

Gemini 87.88

Vert.x 76.29

Express 68.85

Sinatra-Ruby 67.88

WebGo 51.08

Compojure 45.69

Rails-Ruby 31.75

Wicket 29.33

Rails-Jruby 20.09

Play 18.02

Sinatra-Jruby 15.96

Tapestry 13.57

Spring 13.48

Grails 7.11

Cake 1.17


In the same vein, I was curious to compare the max responses/second on dedicated hardware vs ec2 on a per framework basis. The following is percentage throughput of ec2 vs dedicated (in res/s):

cake 18.9% (312 vs 59)

compojure 12.1% (108588 vs 13135)

django 16.8% (6879 vs 1156)

express 16.9% (42867 vs 7258)

gemini 12.5% (202727 vs 25264)

go 13.3% (100948 vs 13472)

grails 7.1% (28995 vs 2045)

netty 18% (203970 vs 36717)

nodejs 15.6% (67491 vs 10541)

php 11.6% (43397 vs 5054)

play 20.6% (25164 vs 5181)

rack-jruby 15.6% (27874 vs 4336)

rack-ruby 22.7% (9513 vs 2164)

rails-jruby 22.7% (3841 vs 871)

rails-ruby 20.7% (3324 vs 687)

servlet 13.4% (213322 vs 28745)

sinatra-jruby 21.2% (3261 vs 692)

sinatra-ruby 22.2% (6619 vs 1469)

spring 7.1% (54679 vs 3874)

tapestry 5.2% (75002 vs 3901)

vertx 22.3% (125711 vs 28012)

webgo 13.5% (51091 vs 6881)

wicket 12.7% (66441 vs 8431)

wsgi 14.8% (21139 vs 3138)

I found it interesting that something like tapestry took a 20x slowdown when going from dedicated to ec2, while others only took ~5x slowdown.

Edit: To hopefully make it clearer what the percentages mean - if a framework is listed at 20%, this means that the framework served 1 request on ec2 for every 5 requests on dedicated hardware. 10% = 1 for every 10, and so on. So, higher percentage means a lower hit when going to ec2.

Disclosure: I am a colleague of the author of the article.


You're saying that running a query across the internet to ec2 is 5 times faster than running it on dedicated hardware in the lab? I find that hard to believe.


Sorry, maybe my original post was not entirely clear. Let's take tapestry, for example. On dedicated hardware, the peak throughput in responses per second was 75,002. On ec2, it was 3,901 responses per second.

So, in responses per second, the throughput on ec2 was 5.2% that of dedicated hardware, or approximately 20 times less throughput. The use of the word slowdown was possibly a bad choice, as none of my response had to do with the actual latency or roundtrip time of any request.


These could probably be further broken down into micro-frameworks (like Express, Sinatra, Vert.x etc.) and large MVC frameworks (like Play and Rails).

Gemini is sort of an outlier that doesn't really fit either category well, but the micro-frameworks have a fairly consistently higher framework optimization index than the large MVC frameworks which is as expected.

Express and Sinatra really stand out as widely-used, very high percentage of platform performance retained frameworks here. I've never used Vert.x, but I will certainly look into it after seeing this. I'm very impressed that Express is so high on this list when it is relatively young compared to some of the others and the Node platform is also relatively young.

Play seems particularly disappointing here since it seems any good performance there is almost entirely attributable to the fast JVM it's running on. Compojure is also a bit disappointing here (I use it quite a bit).


The play test was written totally incorrectly since it used blocking database calls. Since play is really just a layer on top of Netty it should perform nearly as well if correctly written.


I believe they're encouraging pull requests to fix that sort of thing. It will be interesting to see if it helps to that degree; I hope so!


But Play's trivial JSON test was much slower than Netty's.


That's because they inexplicably use the Jackson library for the simple test, rather than Play's built in JSON support (they use the built-in JSON for the other benchmarks).


Both Netty and Play use Jackson though one the Netty version uses a single ObjectMapper and the Play version uses a new ObjectNode per request (created through Play's Json library).


I hope this gets fixed!


No, they use Play's JSON lib. It's kind of a moot point because Play's lib is in fact a wrapper for Jackson.

Here's the source: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

So the question stands: If Play & Netty are using the same JSON serialization code, why is Play seven times slower?


I'd imagine that being relatively young is an advantage in a test like this. You're not utilizing any features, and features are what slow down requests. The less features something has, the faster it should perform in these trivial tests.


That's a very good point; I hadn't thought of it that way. Maybe this is some small part of why we seem to keep flocking to the new kids on the block.


It's funny you should put this together because in an earlier draft of this blog entry I had created a tongue-in-cheek unit to express the average cost per additional line of code. Based on our Cake PHP numbers, I wanted to describe PHP as having the highest average cost per line of code. But we dropped this because I felt it ultimately wasn't fair to say that based on the limited data we had and it could be easily interpreted as too much editorializing. Nevertheless, as you point out, it's interesting to know how using a framework impacts your performance versus the underlying platform.

I too wanted Play to show higher numbers. There's certainly a possibility we have something configured incorrectly, so we'd love to get feedback from a Play expert.


Good thing you opted against sensational journalism.

About the cost per additional line of code for PHP, it mainly comes from not having an opcode cache and having to load and interpret files on every visit. mod_php was and will always be trash. I commented earlier about it too.

In case of Ruby, and talking about Rails, even when using Passenger, the rails app is cached via a spawn server. That's not the case with PHP.

Similarly, Python has opcode cache built-in (.pyc files). Also, I am not sure about gunicorn but others do keep app files in-memory.

Servlet does the same about keeps apps in memory. You get the idea.

Frameworks definitely have an impact but it's very hard for one person to know the right configuration for every language. You had done some good work there, but it will take a lot of contributions before the benchmark suite becomes fair for every language/framework.


We've received a lot of great feedback already and even several pull requests. Speaking of, we want to express a emphatic "Thank you" to everyone who has submitted pull requests!

We're hoping to post a follow up in about a week with revised numbers based on the pull requests and any other suggested tweaks we have time to factor in. We're eager to see the outcome of the various changes.


I am completely agree with you, this is not proper bench marking as opcode caching is missing in php, benchmarking should be re calculated by configuring APC.


For Play, you'll want to either 1) handle the (blocking) database queries using Futures/Actors that use a separate thread pool (this might be easier to do in Scala) or 2) bump the default thread pool sizes considerably. The default configuration is optimized purely for non-blocking I/O.

See e.g. https://github.com/playframework/Play20/blob/master/document... and https://gist.github.com/guillaumebort/2973705 for more info.


Why would the JSON serialization test perform so poorly though?


I'm pretty shocked that Play scored so low. One would think that being built on netty would put Play in a higher rank. Database access for sure needs to be in async block


It's not just the DB test. The JSON test was way slower than Netty, too.


Agreed. We need to get the Play benchmark code updated to do the database queries in an asynchronous block. Accepting pull requests! :)


Not quite sure I understand your point re: the cost of the frameworks, above the bare standard library.

Do you mind breaking it down for me a bit please?

Cost in dollars or cost in hardware utilization or some other cost?


If you have developers who for one reason or another prefer a given platform, then the most important performance comparisons are about how close various frameworks on that platform get to the performance of the platform itself.

Knowing how much I'm giving up in performance in order to get the features a given framework gives me is an important consideration. Also understanding when it's worthwhile to work outside the framework on the bare platform given the speedup I'll get versus the cost I'll incur by doing so is a very important optimization decision-making tool.


How do you derive how much performance you are giving up from these benchmarks? There is not a neat relationship between the two.


There are performance numbers for a framework (Cake PHP, for example) and for the raw primitives of the platform it runs on (PHP, in that case). By finding the ratio between the two one can arrive at the performance loss attributable primarily to the framework you've chosen.

See my "framework optimization index" in comments below for a rundown on all these ratios which I was able to back out from this set of benchmarks.


"Among the many factors to consider when choosing a web development framework, raw performance is easy to objectively measure."

Oh really? Then why did Zed write such an angry rant about how you are doing it wrong?

http://zedshaw.com/essays/programmer_stats.html

Can we please see some standard deviations, at least?


What's wrong with that guy?


It's a pretty serious problem with how we benchmark though.

EDIT: when looking up what I vaguely remembered I somehow managed to come across a similar article that was published just today[1], even though I was referring to an older one[2] which was about microstuttering (basically: a high standard deviation in frame rate). The point still stands - in fact it applies to both cases in somewhat different ways.

To give an example: Crossfire and SLI graphics card setups a few years ago[1]. It turned out that while both gave a similar performance increase in average framerates. Then was it discovered that one of them had a significantly lower minimum framerate than the other. A high minimum framerate is probably more important in shooters than peak performance, but that's not what we've been testing all of these years, is it? That's exactly the problem highlighted in the article by Zed.

I know this is a gaming example, but I'm sure that in user perception of the performance this matters just as much for the responsiveness of webpages.

[1] http://www.tomshardware.com/reviews/graphics-card-benchmarki... [2] http://www.tomshardware.com/reviews/radeon-geforce-stutter-c...


Indeed, it's important to look not just at the average performance and performance extremes, but also the distribution of performance.

Standard deviation helps with this. Also, often times looking at the latency at the 50, 90, and 99th percentiles is valuable as you can see events that would make your users unhappy. They're a very tangible series of metrics.


He's a fantastic software engineer but he is _very_ abrasive. I've read some of his posts claiming he would fight some other developer in person at a conference if he steps up. He would rent out a ring and he would put his yellow belt to practice.

Really, I'm not making this up. He sounds like a jerk to work with.


He's actually a really nice person. He helped me out (to him I was just some stranger on the phone) when I was trying to decide what to do with my career when I was in NYC.

Zed Shaw is probably one the best people you can know in the developer community, a very good guy.

Your sensationalism based on some of the stuff he says on Twitter and Blogs is amusing though.


> Your sensationalism

well he doesn't try to be a nice guy in public...


He is actually a very nice person to be around and work with. Besides being professional and experienced he is also outspoken with a firm opinion.

You might think that's being a jerk, I think that's honest and reliable.


I am among those appreciative of his contribution to my education. At the same time I wish he'd let the word 'fuck' regain its undoubted impact by not using it on what seems to be every conceivable occasion (excuse the pun).


He's just a bit too honest and direct for the average American.

From that perspective at least, I expect people like Zed to do great in Northern Europe. There's a reason Americans think DHH is rude. He isn't; he's just Danish.


It seems like some context might be missing here.


Context: http://web.archive.org/web/20080103072111/http://www.zedshaw...

" I mean business when I say I’ll take anyone on who wants to fight me. You think you can take me, I’ll pay to rent a boxing ring and beat your fucking ass legally. Remember that I’ve studied enough martial arts to be deadly even though I’m old, and I don’t give a fuck if I kick your mother fucking ass or you kick mine. You don’t like what I’ve said, then write something in reply but fuck you if you think you’re gonna talk to me like you can hurt me."


He's a lifestyle ranter. He used to have a section of his website called /rants/ which you can still read here: http://web.archive.org/web/20080105054424/http://www.zedshaw...


Seems like a bad case of ADD :\


Ha! Gotta love Zed


I'm torn about this. On the one hand, while I've known my framework of choice (Rails) is slow, I didn't know how much slower it could be in the grand scheme of things. But on the other hand, I'm more shocked by the difference between EC2 and dedicated hardware (10x improvement with rails), and even 89 requests per second (20 query benchmark on EC2) is still a decent amount of traffic. (Plus this doesn't count any optimizations I would make anyway, like caching).

Either way good architectures usually optimize the high traffic or high CPU areas anyway away from a scripted language.

Thanks for the really informative post! Go seems to be a good balance as a high performance language without having to go back to my traumatic Java days.


Don't be too disappointed about Rails.

As a rule I like to divide this world into "Featureless" and "Featurefull" products.

When you use Rails, you're aiming to pile up features. You want to react to Product managers, to users, you want to work fast and satisfy the needs of customers - or else you won't have anyone to build to.

In this reality, the fact that you're doing 20req/s is OK. In fact, I'm betting that even when you take Go or Node.js - pile up all of the infrastructure and features that exist in Rails, and pile up a ton of your code - buggy and not buggy - you'll get around the same kind of satisfaction index from users.

This is because your product can be perceived as slow even though your servers are blazingly fast.

On the other side of the spectrum there are "Featureless" products. These are infrastructural products. A logging service. An analytics service. A full-text search. A classification and recommendation engine.

These you don't want to build in Rails. I'm sure you haven't even considered it. These you want to build with one of the top-notch libraries that this survey indicate.


Also, there are certain features about rails like thin or unicorn that can drastically increase your overall performance. So in that sense, I think it's a lot more complicated to determine.


Thanks for the feedback, atonse! We had a great amount of fun putting this together, as you can imagine.

I agree, a remarkable take-away for us was how dramatically our i7s excelled over the EC2 instances. Admittedly, those were EC2 Large and not Extra Large instances.

A previous draft of this blog entry had a long-winded analysis of hosting costs--discussing the balance between ease and peace-of-mind provided by something like AWS versus the raw performance of owned hardware--but we elected to remove that since it wasn't really the point of the exercise.


Were these the new 2nd-generation large instances or the original ones?


They were m1.large.


EC2 was a hard platform to test on, only because our i7 hardware would give us results fairly instantaneously, but we became impatient when we had to wait upwards of 10x as long for the data on the EC2 large instances.

We're actually very interested in how the large/newer instances perform.


I feel obliged to ask what constitutes 10x of instantaneous.


I worked with Pat (pfalls) on this effort. He pulled the benchmarks together and built the script to automate the tests. We aimed to deploy each framework/platform according to best-practices for a production environment and then stress test common operations: JSON serialization of objects and database connectivity. We were surprised by the wide spectrum of performance we observed and hope that this interesting to you as well. Four orders of magnitude in one of our tests!

If you have any questions or see something we stupid we did, please let us know. We'd like to correct any mistakes straight away, especially since we're certainly not experts on all of these frameworks and platforms.


I'm no expert, but I think certain languages/frameworks are better suited to be behind certain servers when high concurrency is tested. E.g. from http://nichol.as/benchmark-of-python-web-servers it seems django would be better served behind gevent.


There are a lot of variables and tweaking that can be done, and it would be nearly impossible to optimize each.

Similarly, I was wondering what sort of an effect connection pooling would have, as the out of the box django distribution doesn't do that. It really didn't perform too well in their tests.


At LEAST gevent with session write-through caching, psycopg2pool, Postgres SQL (hello, excellent South support?), no unnecessary middlewares or applications that rely on them (if it's a speed oriented use of Django, we're hosting on a specific API sub-domain, right?). At most, THEN you tune the settings to have an optimum number of Postgres threads staying alive and tweak some gunicorn/nginx max connections parameters for your site. If running all locally, use UNIX sockets. This article is trash when it comes to providing any useful data other than Django that's barely been configured beyond not using SQL-Lite, and who the hell uses that in production, so I don't buy their argument about "oh well we just wanted to see what It'd do out of the box" rhetoric. Might as well benchmark ./manage.py runserver. I wish they'd it right or don't publish, let alone publish to shill their company that doesn't provide what they advertise.


Thanks for your pull requests, knappador. We will try to get a revised post out in roughly a week or so with as many of the tweaks we've received (as pulls and tips) as we can muster.


I'd be interested to see performance for Vert.x on its other hosts (this is the JVM version, I believe).


I think you may misunderstand Vert.x's polyglot features:

While vert.x supports many programming languages, all of these are run on the JVM runtime. This means when you use the ruby vert.x API, you're using JRuby; likewise with Javascript run through Rhino, Python through Jython, and Groovy/Scala run through their own interpreter/compilers.

That said, it would definitely be interesting to see the performance implications of using one of those languages and vert.x on the JVM.


Interesting point. You're correct, we've only tested Vert.x as a Java/JVM platform.


I agree, seeing vert.x with it's other language options would be interesting.


Rhino with RingoJS would be another good JVM-based test.


Erlang would be nice to see, with, say, Chicago Boss.


Agreed. As you can imagine, we had to stop adding additional frameworks somewhere or we'd never get this posted. :)


Since these tests seem to be all about JSON serialization, it would've been interesting to see the tests with rails-api instead of the standard Rails stack:

https://github.com/rails-api/rails-api

What webserver were you using on JRuby? Was it Trinidad? Did you try Jetpack?


I concur, and Rails 4 may not be officially released yet but it's stable enough to run these tests against.


rails 4 with some concurrency would be interesting to see


Where is ASP.Net MVC? Odd that you list obscure frameworks like Wicket and leave out one of The Big Four frameworks.

(The big four in my book are: ASP.Net MVC, Rails, Django and CakePHP)


We'd love to have ASP.Net MVC included. One minor gotcha is that to do it justice, we'd need to spin up a Windows EC2 instance and figure out how to script that. It's on our to-do list!

We did briefly test ASP.Net on Mono (see another comment in this thread) but didn't include it since we didn't believe that qualifies as a "production" grade ASP.Net MVC deployment.


You should include it; Lots of shops are deploying their products on .NET MVC with Mono.


I agree... Let's see what the numbers show for Mono, and on Windows.


Just use AppHarbor.


I came here to say this. I don't understand why some of the development community likes to act like .Net doesn't exist....

It pains me to see charts and reporting done like this while leaving out my favorite framework.


Since these benchmarks are so wide-ranging, I agree. But that means setting up an entire new testbed on Windows, and then trying to make it comparable to the other platform testbed; possibly tuning. You need a Windows expert to do this. My question is, why aren't Windows experts setting these up?


If you can cover ASP.Net MVC, then I'd recommend including ServiceStack. My own tests of their JSON implementation have shown it to be 5-10x faster than ASP.net MVC.

Drop me a line if you need a hand with either :)


Just curious, but is CakePHP seen as a major framework by people outside the PHP Community?


Big four without any Java framework, but with .Net one? And CakePHP as a major one, I wasn't aware it's still used.


I'd like to see how .Net MVC would compare. I realize you'd have to spin it up on a Windows EC2 instance and there would definitely be some variance in the performance of that box vs. the nix EC2 instances but I'd still be interested in seeing how it fares in comparison.


We agree. We aim to provide a .NET MVC test soon. We did briefly test that on our i7 hardware and if memory serves me correctly, it clocked in at around the position as Spring.

But don't quote me on that! :)


I would very much love to see:

(win | mono) + (httphandler | asp.net mvc | webapi | servicestack | nancyfx)

it would hopefully compare with java stacks!


Yep, would also be interesting to see Web Api, hosted on Windows and on Mono.


I would also suggest ServiceStack


And Synchronous and Asycnhrounous (Using C#5 async) version.


I think you're about right on that one. Of course we were running on mono.


> This exercise aims to provide a "baseline" for performance across the variety of frameworks. By baseline we mean the starting point, from which any real-world application's performance can only get worse.

I disagree with the implication here (that this is a good point for comparison because "real-world application's performance can only get worse."). Yes it can only get worse but how much worse (per unit of "features") is both significant and unaddressed.

This isn't the best example but look at the gap between the top and bottom of the scale in the Database access test (single query) and Database access test (multiple queries) charts: In the first, Gemini is ~340x faster than Cake, in the second, only ~23x faster. There is still a big gap but it closed by an order of magnitude once you stepped past the most trivial possible DB access test.

So nodejs or php-raw is faster than cake at a single DB access, but what about when you create a real world scenario with authentication, requirement to be able to update features faster (i.e. use an ORM), env. portability requirement, etc.? It seems to me this would look like a little slower, a little slower, a little slower in the {raw} versions, and already included, already included, already included in Rails or Cake. The full featured frameworks take a lot of their performance penalty up-front, with less of a hit as features are added (maybe? :P).

My point is that it's not reasonable to assume that hackernews-benchmarks will actually reflect production use. That said I think the article is cool, and agree that it's good to keep framework authors' feet to the fire regarding performance!


This is completely loaded. Your implication is that the only viable test is a test which exercises all of the functionality of the most feature rich framework. How would that be a)viable and b) meaningful?

We know that there is a set of common features and the benchmarks goal is to test least common denominator stuff on the networks. Authentication and portability are not LCD. The argument that they are is capricious. What if we made the requirement be that the framework is a lisp? Now we've completely changed the intent.


I meant to suggest that comparing php-raw to Rails is apples & oranges- not "you must benchmark in a way that benefits larger frameworks", just "please acknowledge that LCD tests like this inherently cast Railsy frameworks in a bad light."

It's like condemning a swiss army knife because it's not as efficient as a fixed blade at cutting apples. Well yeah that's true, but what about when you need to screw a screw or pull a cork? One is a multitool, it doesn't make sense to compare it to a specialized tool unless all you plan to do is cut apples.


Would've loved to see http://servicestack.net on this list which has great performance on .NET and Mono: https://github.com/ServiceStack/ServiceStack/wiki/Real-world...

And also maintains .NET's fastest JSON and Text Serializers: http://theburningmonk.com/2011/11/performance-test-json-seri...


Thanks for the tips on these. We'll add ServiceStack to our to-do list. As you might imagine, that list is getting long as a result of some great community feedback. Pat (pfalls) is diving into the pull requests this morning.


pfalls - amazingly, I spent the last 2 days of my holiday doing the same thing for a future open source project. I was just stumped when I saw you guys did the same (could have saved me a couple of days!)

I wanted to find the leanest Web framework on any kind of platform; but the difference from your approach - I already knew the kind of code that would run on it.

I tested: Go, Java (servlet, dropwizard), Scala (scalatra), Ruby, Node.js (connect).

For me it was:

* Scala

* Java

* Clojure (equal to Java - big surprise here)

* Node.js

* Go (almost equal to Node.js)

* Ruby (far far down)

Scala took the lead with amazing results. More over, a good metric was latency which Scala was the only one to take micro-second resolution.

I'm not a fan of Scala because of its surrounding tools, which is why I'm still considering going for either Clojure or Node.js.

I think the most surprising positively was Clojure, being that it is a dynamic language. And most surprising negatively was Go - by itself is impressive, but when given real work (Web handling, Redis/mongodb) goes bad quickly. Happy to see this correlates with your findings too; I'm assuming this is a symptom of library maturity..?

I'd be happy to see how Scala fares on your tests.

You've done an awesome job!


Thanks for the comment!

This started out as a small exercise, that quickly ballooned because we were curious about every framework and platform. Obviously we had to stop somewhere, but we're very interested in adding more tests in the future. In fact, we're hoping the community will help us out as well!


Yes, I know the feeling :)

What started as a couple of hours of exercise for myself ended up as 2 days of hacking and barely sleeping, as surprises in my assumptions kept unfolding, and as I wrote and rewrote POCs just to validate that Clojure is as fast as the number say, and that Scala is faster than Java, etc.


Dropwizard looks awesome, just the kind of project I've been looking for, also it links to JDBI which is very similar to a sql lib I maintained for myself all these years and looks awesome. Thanks for posting!


Would you consider releasing some of the benchmark code/setup? (better yet integrated into OP's project) I am very interested in seeing clojure's performance. What kind of framework did you use for clojure?


Scala really would have been nice. Especially a Scala/Akka/Spray kombo :)

I'm working with a setup like this and just love it!


1) The Python version has some basic newbie coding errors. This sort of code is what Python programmers call "Java written in Python". It may be a valid algorithm in Java, but it's the wrong way to do it in Python. Code like this will work, but it will be slow. Depending on the size of "queries", you are potentially allocating gobs of memory in two different places for no reason, and then throwing it away without using it. I wouldn't be surprised if the examples in other languages had similar problems.

2) The JSON serializer in Django 1.4 uses a method which is known to be very slow, but which is easily portable across different platforms and works with older versions of Python. They no doubt included for easy bundling. In a real application you would probably want to simply use the normal JSON serializer from the standard library (which is many times faster).

3) The examples are little more than "hello world". I did some benchmark tests with several Python async frameworks, Pypy, and Node.js for an application I was working on. With small JSON objects there wasn't much difference in performance. Once you started using large JSON objects the performance lines for all versions were indistinguishable from each other. The performance bottlenecks were in libraries, and those standard libraries were all written in 'C', so interpreter versus compiler versus JIT made little difference.

4) The problem with "toy" examples is that in real life there are two performance factors which must be taken into account. Think of as y = mx + b. With a toy example you are probably only measuring "b". With most real life applications it's "m" that matters. There are often different optimization approaches that are best for varying ratios of "b" and "m". You have to know your application intimately and benchmark using data which is realistic for that application.

Python has a reputation for being "easy to learn". However, it is "easy" in the sense of being able to hack something together that works without knowing very much. There can be several different ways of doing things and doing it one way versus another way can mean a difference in performance of several orders of magnitude. The same may be true for some of the other languages, but I haven't examined them in enough detail to say.


Numerous irregularities plus a strong vested interest in the JVM make me doubt they have given adequate shrift to Go, here.

Given the amount of interest in Haskell and Yesod around here, it is strange that it is missing.


Could you elaborate? Their website seems to indicate that they are a very polyglot shop, not someone pushing a JVM agenda:

"On the back-end, we use Java, Ruby, Python, .NET, PHP and others based on what makes sense balancing server performance, scalability, hosting costs, development efficiency, and your internal development team's capabilities."


"We have included our in-house Java web framework, Gemini, in our tests. ... "

Then you see results for some languages that are completely out of whack with most other such benchmarks (they themselves mention the weirdness of Sinatra vs. Rails, for example).

Then you see on a couple platforms that more performant mainstream options have been excluded, for no good reason.

Then if you look at the repo, there are deployment choices and code mistakes in some of the other languages which go well beyond elementary incompetence...


Feel free to issue a pull request to help our testing. We are not trying to push Java-based frameworks over any others and we believe we are being fair across the board. That being said, if there are "code mistakes... which go well beyond elementary incompetence", then we would love to correct and retest these.


Maybe you should've had the controllers execute raw SQL for a better comparison. I see you executing a regular query in Rails whereas your Java servlet is using prepared statements.


We love Go here! But admittedly, we have not yet deployed an actual Go web application to a production environment, so the tests demonstrate our first attempt at creating a Go production environment. We based the approach on whatever material we could find on Go reference sites.

That said, we'd love to hear what we did wrong in the Go tests so that we can fix those up.

We'll be posting follow ups as we've had a chance to go through all the recommended tweaks.


I hope no one contributes a Haskell solution to this farce.


>Given the amount of interest in Haskell

I see more people making uninformed "haskell sucks" posts than expressing interest in it.

>and Yesod

Really? Yesod is the anti-haskell haskell framework.


> Really? Yesod is the anti-haskell haskell framework.

Can you please elaborate. Being interested in Haskel web development and trying to choose web framework makes me wish for more information.


I wouldn't call Yesod "anti-haskell". By default, it relies on QuasiQuotes and TemplateHaskell a lot [1], which are extensions to the GHC. So by default, you'd have a hard time running Yesod applications on anything else but GHC (the Glasgow Haskell Compiler). These extensions allow you to write in an EDSL that generates Haskell for you. IMO, Yesod's use of these extensions are a benefit, as it allows the user to get stuff like type-safe URLs in HTML for free (e.g. you put href=@{Home} on your HTML element and Yesod will ensure that the value interpolates to a route that exists at compile time).

Haskell libraries often depend on language extensions, whether it is overloaded strings or type families or whatnot... so I think it's strange that Yesod gets picked on for doing the same: taking advantage of the tools provided by GHC to create a better environment for the developer.

[1] http://www.yesodweb.com/book/haskell#template-haskell-14


Hugs is now defunct (last release in September 2006, doesn't even support the 2010 language standard), so there is no reason that being GHC-only should be a consideration in selecting a Haskell framework. It's the only real option.


I'm guessing, but I think it's because yesod uses a lot of magic such as templates and the like. The other frameworks like Snap use more idiomatic Haskell.


Yesod is designed to try to replicate rails style frameworks, which as an approach, doesn't work well in a static language. It is also designed to try to hide any traces of haskell. Rather than provide a framework to write haskell code in, they use quasi-quotation to provide a bunch of totally different syntaxes for different parts of the app, which get compiled to haskell behind the scenes. Most haskell users prefer to write haskell rather than specialized, single use languages with limited functionality and poor error reporting.

Then on top of that, the marketing behind yesod is essentially deliberate mis-truths that suggest weaknesses in haskell which do not exist. See how pilgrim689 thinks that the EDSL yesod uses for routing "gets you type safe urls"? Type safe urls are also available in happstack and snap, but written in haskell rather than a weird custom pre-processor language. The EDSL is just giving you a different syntax (and making error messages complex and hard to understand), not giving you the type safety. Pointing out that creating custom languages that are inferior to haskell and have no benefits is bad results in whining of "stop picking on yesod just because we use extensions, everyone uses extensions!", despite the use of extensions never being brought up.

As for more information about web frameworks in haskell: I've tried all three and can give you my thoughts. I ended up using snap, so consider me biased when reading this. Yesod is rails-like in that it pushes a misinterpretation of MVC on you that encourages writing redundant code. Happstack and snap aren't really frameworks in that sense, they don't say "give me some code following my conventions and I'll run it", they say "here's how you get access to the request, have fun". More like libraries than frameworks.

Yesod's DB access layer they provide is of the "dumb everything down to the lowest common denominator" variety, except that it adds even more limitations beyond that. So you end up having to use something else that is not integrated at all. Happstack and snap don't provide a DB access layer, but do provide integration with several DB libraries off hackage (hdbc, haskelldb, postgresql-simple, acid-state).

Happstack has the best documentation of them, and it and snap are very similar design wise. Porting an application from one to the other is pretty straight forward. The only reason we settled on snap instead of happstack is that snap includes a development mode that works well, and happstack does not. Meaning with snap you just change your code and it picks up the changes, recompiles and reloads it automatically, and shows you any errors in the browser when you refresh. With happstack you either need to work out your own way to deal with that, or keep recompiling manually all the time.


You and tikhonj are all over the place in here. If you were being downvoted into the gray, you would know. There is no shortage of people praising Haskell every day, this is what is in fashion today.

There are also regular posts about Yesod.

I conclude that you know perfectly well that Haskell and Yesod are regularly mentioned on HN, but find it inconvenient to have mentioned for some reason I do not fathom.


> There is no shortage of people praising Haskell every day, this is what is in fashion today.

I really don't like this characterization of interest in Haskell. It implies that it's no different from any other language and is just arbitrarily picked up because it's trendy. Learning Haskell is a very substantial investment of time and effort, it is very different from languages that most programmers have used before. It practically tells people “don't even try to like me, I'm high maintenance.”


>You and tikhonj are all over the place in here

I am all over the place in here for the exact reason I mentioned. Go look at my posts, for every post about haskell by me, it is in response to someone posting some absurd nonsense like "haskell can't do real world" and "functional programming is great except you can't really do it because state". If people were interested in haskell, they would express interest, not strawman dismissals.


But that is like claiming nobody wants gay marriage because look how loud those Westboro people are screaming.

I'm interested in Haskell. I find it to be frustrating sometimes, and sometimes I vent my frustrations. It is hard to learn. But out of all the opinionated languages out there, Haskell is the one that I agree with the most.

There plenty of people here that are obviously interested. Why does it matter that the naysayers say nay?


>But that is like claiming nobody wants gay marriage because look how loud those Westboro people are screaming.

That analogy would only be accurate if those Westboro people were in the majority.

>Why does it matter that the naysayers say nay?

I'm not sure how to answer this, given the context. I simply pointed out that I don't think the idea that there's a lot of interest in haskell here is accurate, and cited all the uninformed crap spewed about haskell all the time as evidence.


Would love to see how these results compare to some of the web frameworks for concurrent functional languages like Erlang/Haskell: Nitrogen, Chicago Boss, Snap, Yesod, etc.


Please don't encourage them. Even if Haskell comes out on top, I would still be unsatisfied because the rest of the benchmarks are unfair. Lies by confusion are still lies.


ditto. I hear Warp (the server behind Yesod) is a beast.


I'm not familiar with Warp. Would one of you guys be willing to help us put together a test for Yesod?


#haskell, #yesod, #snapframework on freenode are very helpful.


Right, where is Zotonic??, which is actually focused on speed/performance


This is exactly why I decided to use PHP for my startup. I have something along these lines that I hope to blog about in the coming weeks (I tested php-fpm on nginx/go/node.js/silk.js and php won by a landslide when it came to speed).

I would love to see php-fpm on nginx included in this test.


When it comes to speed, considering you are using nginx, the way to go is using nginx as an app server not just fastcgi frontend. Lua-nginx-module combined with proper database module (async with connection pool support like ngx_drizzle or ngx_postgres) can give you speed. OpenResty provides the simplified preconfigured way to try it and adds some features too. http://agentzh.org/misc/slides/libdrizzle-lua-nginx


The problem with php is that it looks great on (some) micro-benchmarks, but on real apps under real sustained load it certainly turns to cold dog shit from time to time for no apparent reason.


What are you basing this on? I've been using PHP for well over a decade in high load environments and never experienced it turn "to cold dog shit" .. any issues I have experienced had a good reason, not "no apparent reason".

But then, I've never used a PHP framework in all the time I've used it .. maybe that has something to do with me never having had negative issue with PHP.


As a Rails developer and admirer, this is eye-opening. I love the framework (and Ruby especially), but these numbers bear some serious consideration.

30-50x performance difference gets really... real, no? The standard refrain of "throw more hardware at it" must reconcile with the fact that a factor of 30-50x means real dollars for the same amount of load. Is the developer productivity really that much greater?


Preface: This post is going to come across as a Rails apologist piece, but please read the entire thing before you reach a conclusion. Please also consider that you could apply these same arguments to just about any of the high-level language based frameworks on the list. I use Ruby on Rails in my comparisons, but I'm a huge fan of Node.js, Python/Django, and Go.

I fully respect the JVM family of languages as well. I just think that Mark Twain said it best when he said: "There are three kinds of lies: lies, damned lies, and statistics." It's not that the numbers aren't true, it's that they may not matter as much, and in the way, that we initially perceive them.

Performance is certainly something you should consider when selecting a language/framework, but it is not the only thing.

========================

You should undertake a detailed examination of these statistics before making any decisions.

Issue #1) The 30-50x performance difference only exists in a very limited scenario that you're unlikely to encounter in the real world.

Look carefully at the tests performed. The first test is an extraordinarily simple operation: take this string, serialize it, and send it to the client. This is the test in which we see massive differences:

Gemini vs Rails

25,264/687 (gemini/rails-ruby) = 36.774

25,264/871 (gemini/rails-jruby) = 29.000

Node.js vs Rails

10,541/687 (nodejs/rails-ruby) = 15.343

10,541/871 (nodejs/rails-jruby) = 12.102

That's a 37x performance win for Gemini, and 15x for Node.js.

Side note: You might be wondering why I didn't compare to the top performer, Netty. Netty is more like Rack. You build frameworks on top of Netty, not with Netty. As a Ruby dev, you could think of this in the same context of comparing Ruby on Rails with Rack; not a good comparison. Hence We won't compare Rails to Netty.

The error would be in extrapolating that a move to Gemeni or Node.js would give you a 37x or 15x performance increase in your application. To understand why this is an error, we jump down to the "Database access test (multiple queries)" benchmark.

Issue #2) Performance differences for one task doesn't always correlate proportionally with performance differences for all tasks.

In the multi-query database access test, we start to see the top JSON performers slow down significantly when compared to the slow down for Rails:

Gemini vs Rails

663/89 (gemini/rails-ruby) = 7.449

663/108 (gemini/rails-jruby) = 6.138

Node.js vs Rails

116/108 (nodejs-mysql-raw/rails-ruby) = 1.077

60/108 (nodejs-mysql/rails-ruby) = 0.555

In this scenario -- which is arguably much closer to the real world -- Ruby on Rails closes the gap and even beats some of the hip new kids.

But why? The in-depth answer to this question would require a lot of space, but the really, really short version is kind of a "what's the sound of one hand clapping" response: Ruby isn't actually all that slow.

To understand what the hell that means, check out this presentation from Alex Gaynor (of rdio/Topaz fame):

https://speakerdeck.com/alex/why-python-ruby-and-javascript-...

Ruby is just about as fast as C, provided you're comparing it to C that does exactly the same operations on the hardware as the Ruby code. Don't get me wrong, that's a HUGE provision. But it warrants close examination.

The real benefit of lower level languages like C is that they give you the flexibility to drill down in to your actual bare-metal operations and optimize the way the program executes on the hardware. As Alex points out, we don't currently have that level of flexibility in languages like Ruby (without dropping down to inline C), so we suffer a performance penalty.

This penalty is huge for simple tasks because they involve only a handful of operations that execute extremely quickly. As you add complexity, however, the benefits of micro-optimizations get lost in the vastness of the overall execution time.

Look at it like this. When Gemini hits 36,717 req/s in the JSON test, each request only lasts about 1.6 ms. This is only possible because of the simplicity of the operations being done on the hardware. Ruby loses big here because there is a lower boundary to the way you can optimize without dropping down to C.

gemini: 1.6 ms per request

rails-ruby: 87.3 ms per request

When we look at the multi-query database access test, we can see how the optimization at the low level gets lost in the sea of time taken to process the request.

gemini: 90.5 ms per request

rails-ruby: 674.2 ms per request

Granted, that is still over a 7x performance win for Gemini, but this is where the Ruby arguments about programmer efficiency come in to play. I don't know Gemini, so it may very well beat Rails in that comparison too. Ruby is getting more performant with every release though, so it's easier to justify on the basis of preference alone when we're this close.


Don't conflate Ruby with Rails. Ruby _is_ slow:

http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

and so is Python:

http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

The only time Ruby or Python are fast is when the program is not running Ruby or Python but running some C code underneath. If your programs only consist of that, you can very well say "this ‘Ruby/Python’ code is fast". But as soon as you have something that isn't in your standard library, welcome to the actual language, and welcome to performance problems.

Elaborating on the implications of this: whenever you actually _use_ the language to do some abstraction, you pay heavily for it: http://bos.github.com/reaktor-dev-day-2012/reaktor-talk-slid...


You should really check out Alex Gaynor's slide deck. Nothing I said disagrees with what you've said here, provided you take the entire thing in context.


That's a very detailed and thoughtful response. You make some valid points. Maybe I'll try to craft some more complicated benchmarks that replicate normal CRUD operations found in most webapps.


That's kind of missing the point. In statistics, there exists something called confounding variables. Confounding variables are factors that affect your outcome, but are hidden, or are outside your control. As your example becomes more complex the opportunity for confounding to impact your result goes up significantly.

I believe the multi-query database access test is actually a good example of a "complex enough" test, but not too susceptible to confounding. In this test, we see that Rails isn't so far behind.

Basing your choice of framework on speed alone is a pretty bad idea. When you select a framework, you need to optimize for success, and the factors that determine the success of a project are often less related to the speed of a framework and more related to the availability of talent and good tools.

That's not to say you should ignore speed entirely, but that you have to weight your factors. There is a tendency to believe that you will need rofflescale when you really won't. Keep that in mind when you're weighting your factors.


Really depends on which kind of app you're working on. My main work app is 99% cached content so it would probably work just fine with almost anything. Developer time is certainly the biggest expense in my case so high-level it is.


It comes as a surprise to me that this comes as a surprise to you. Really, you didn't know Ruby is pretty much as slow as it gets?


How about Lift? Btw, the play framework you tested is Java or Scala based?

Either way, I'm shocked to see Play perform so slow comparatively. Although it's easily 10x faster than rails on most tests, I'm shocked to see Node.js faster than Play! (by 2x in most cases) Wow!!

Maybe Node.js critics should start appreciating it after all..


It is probably worth noting that while we strive to make the tests as fair as possible, we followed the official tutorials for each framework when building out the tests but we fully expect there to be small instances where minor tweaks improve a given test. Given that I am no Play expert, it would be of great value to have one who is (and it sounds like you could lend a hand there) to check out the code on the github page. If we did anything wrong with the setup or in general, we would rerun the tests. Again, we followed all the official 'getting started' posts for each framework, so we believe we have best practices used.

Disclaimer: I am a colleague of the author of the linked article.


There are a few problems with your Play code that are causing it to be unnecessarily slow.

First-- what you're really testing here is the Jackson library. A majority of the cycles used in your application are being burned in that toJson call of an array of objects. This isn't a fair test compared to the servlet implementation because you're calling Jackson against a map in the Play example, versus against a simple String in the servlet example.

Second-- you are running database calls serially, and those are blocking. Considering that you're using the more-or-less default Play/Akka configuration, there are only enough threads as you have available processor cores. I would start by increasing the parallelism-factor and parallelism-max to be higher, so you'll have more available threads. More importantly though, the database access should be wrapped in a Future, and you should be returning asynchronously. This should speed up the application by a huge amount.


Do you think such a configuration could outrun the Vert.x configuration they've posted? I'm not challenging you, I'm just genuinely curious! Because if Play+Akka can outrun Vert.x, then it would be an interesting game altogether...


I think Play, with well-written asynchronous code, could approach the Netty/Vert.x speed. In other words, I'd be willing to trade the ease-of-use of Play for the slight speed impairment vs. writing directly to Netty/Vert.x/etc.


We'd love to test that theory. Can you or any Play expert rewrite our Play code and submit a pull request?


Not sure if your theory can really account for the 7X difference between Play and Netty in the JSON test. They both create a trivial name/value pair and pass it to the JSON serializer. In Play's case, it gets passed to their Jerkson wrapper for Jackson. What would you even change about that code?

(Note: I'm not necessarily saying the Jerkson wrapper is the culprit. Could be Play's routing framework, or something else.)


The problem is that the database queries were done in a blocking fashion. The test essentially blocks the main event loop which is of course going to kill performance.


This seems to be a nice benchmark. For the Python group, I would suggest two things: (1) include a lightweight framework like Bottle, and (2) Try pypy.


Thanks. Agreed, we'd love to get a Python micro-framework in the test. If you have some free time and feel like putting together a test for Bottle as a Github pull request, we'd really appreciate that.


(3) use bjoern

seriously, in my rough helloworld and sqlite value increment by 1 benchmark, Bjoern+wsgi app runs 2x as fast than nodejs.


I'd be curious about a gevent-worker-driven server, too. Presumably they're running Django on either a thread- or processor-driven concurrency server, and gevent can show some pretty major gains. You could also do permutations of these: gevent on pypy (using pypycore), etc.


I'm most surprised that PHP seems to be around an order of magnitude faster than Ruby on Rails, I knew it was faster, but didn't think it would be that much.


The comparison would be php-raw vs ruby. The actual "php" benchmark did just as poorly as rails.


What kind of server did you guys use for your rails test? Thin, Puma, Unicorn? Are you sure you ran it in production environment?

Update:

Looks like passenger in development mode. Good job you benchmarked a web server that no one uses wile reloading all code between requests.

Update2:

Ok it seems to run in production mode but still, passenger is not an idiomatic choice.


For the sake of curiosity, I happen to have recently done a benchmark of a "hello world" rack app (literally just responds "hello world" to every request) on a number of Ruby servers (mostly JRuby, but also Puma on MRI).

They were all run in production mode with logging disabled, etc.

http://polycrystal.org/~pat/scratch/microbenchmark.png

Note that a difference from 10k requests per second seems huge compared to 3k, but if you invert it, you get 100 and 333 micro seconds per request, respectively. In a real, non-"hello world" app, these differences are going to be negligible.

Though perhaps it would be more interesting if instead of just responding "hello, world", the app parsed some query parameters or something. But I was mostly interested in the overhead of different JRuby servers, not comparing different app servers (i.e. overhead from Sinatra should be more or less identical whether you're on Puma, Trinidad, or whatever).


Sheesh, you can clearly see the GC pauses in the Java versions.


We used Phusion Passenger, although we have plans to add additional servers (such as Unicorn). We tried to spend time with various server choices for all platforms, and for ruby, in our short test, Passenger won out against the others.

Our understanding is that when running Passenger, simply passing '-e production' to the command line is sufficient to run in production, but if that's incorrect, we'll gladly update the test.


Please make sure you're setting higher GC limits for the Ruby tests. Ruby's defaults are awful for a framework, and result in a LOT of GC thrash. It's not uncommon to see an order of magnitude improvement in performance when they're tuned properly. (edit: I'll just send a pull request, I found the setup file!)

Something else you might consider is the OJ gem rather than just the stock Ruby json gem. The latter is notoriously slow and memory-hungry (which will compound the GC issues!)


> make sure you're setting higher GC limits for the Ruby tests

Could you elaborate on this, or point me in the right direction? I'm learning Rails and curious.


For anyone still lurking, this user replied to me via email:

Ruby allocates heaps for its objects, and sets GC thresholds based on those heap sizes. Ruby allows you to change those settings via environment variables, which means that you can end up doing fewer allocations and less aggressive GC, which makes sense when using a full framework like Rails, which is going to allocate a lot of objects.

There's a more complete answer here to get you started: http://stackoverflow.com/questions/13387664/ruby-gc-executio...


Pretty sure passenger+nginx configuration is the most common rails deployment. Not the fastest but most common much like apache is the most common web server for php.


If that's true, it's not really a fair comparison. Running in development mode slows things considerably. They do have a question in the FAQ about that point:

"You configured framework x incorrectly, and that explains the numbers you're seeing." Whoops! Please let us know how we can fix it, or submit a Github pull request, so we can get it right."

Perhaps you/we should submit a pull request?


Passenger wouldn't be my choice personally, but I don't think there's anything non "idiomatic" about it. Engine Yard, Cloud 66, etc. use passenger in their PaaS configs, it's been very popular (on the wane now, but still), etc. It seems fair enough and the differences aren't going to be the sort of order of magnitude change which would really matter on this sort of thing regardless.


Are you sure? This looks like the right file to me, and it says '-e production', last changed 6 days ago: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast... (unless that's an incorrect switch / something else overrides it - I haven't tried it, not too familiar with Passenger)


This (from [1]) looks like passenger in production mode to me.

> rvm ruby-2.0.0-p0 do bundle exec passenger start -p 8080 -d -e production --pid-file=$HOME/FrameworkBenchmarks/rails/rails.pid --nginx-version=1.2.7 --max-pool-size=24

https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...


I'm not familiar with Rails. Could you explain what you mean in layman terms? Why is code being reloaded on every request by default?


In development mode Rails will reload code on each request to pick up on changes you have made to your application. That way you can interact with the application to help verify that your code is working properly.

As of Rails 3.2 development mode watches for file changes and attempts to only reload those files, but it's still a significant performance issue.

By default all Rails applications start in development mode, so one gotcha of benchmarking Rails is that some people will forget to set the mode correctly. That said, from the setup code[1] (line 14) it looks like they were running passenger in production mode. The max pool size seems excessive, especially when running on large ec2 instances, but I'm not fully convinced that it's out of line.

[1] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...


In development mode, it's nice to be able edit ruby code, then hit reload in the browser to see it in action without restarting the whole app. Sure there are other ways to accomplish this, but reloading code is typically how it's done in Ruby.


The Rails framework is written to reload (reevaluate) most application code between requests in development mode, so that changes during active dev are reflected. This is not the (default) behavior in production.


Interesting - C# should have also been added. Java still rules the roost. If you should absolutely have a scripting language - old php with warts is better. More magic in the framework - more abstraction and indirection and code inefficiency - that can be the spoil port when it comes to performance.


We agree and want to get a C# test in there. It's among the top priorities for us.


What I can take from this is that when you use ORM, it slows things down considerably. Also, looking at rails example, you didn't use active record, which is really wrong.

I think you should tweak your tests to use more real world like examples. I realize it would be hard to do this across frameworks.

Like let's have a database query pull user record from 100,000 users by username. And maybe do md5 on password.


How can you write a web framework benchmark and not include some of the non mainstream languages with probably the most performant frameworks like Erlang (Cowboy, Mochiweb), Haskell (Yesod, Snap Framework)? That's just wrong; anyway.


Oh... and you think Erlang or Haskell frameworks are mainstream while you fail to mention ASP.net?


There is a huge difference between raw PHP and CakePHP. I'd be curious to see other PHP frameworks (such as Zend, or Slim) in there-- is Cake just particularly slow, or is that simply what happens when you have a PHP framework?


Per my experience, CakePHP is probably one of the slowest PHP frameworks. It has a large overhead and lots of legacy code that slows down the whole system. Generally, frameworks targeted at PHP 5.3+ are faster. PHP 5.4 has further performance boost in production and PHP 5.5 has Zend Optimizer built-in, which suppose to further speed up but I have never try that in production.


We have Zend in mind as the next PHP framework we'd like to include in the tests.


use either symfony 2 or silex. There is really 0 reason to use Zend Framework II.


CakePHP is always at the bottom end of these "framework performance" tests even when comparing PHP only. It's the everything but the kitchen sink of PHP frameworks with bells and whistles never used by these "hello world" performance tests. Just the wrong tool for the job if you want to spill out some js.


I'm curious if they configured php with apc or zend optimizer. The huge difference is easy to explain as parse overhead for the framework's code, which happens on each request if you're not using a bytecode cache.


Yep, this is exactly what I was wondering. Opcode caching is a key part of production PHP environments - in this case, since the code isn't changing, they should even disable the "change check" (apc.stat=0) as one would do in a production environment.

And if this is the case with their configuration of PHP, it makes me wonder what other platforms are not configured for production in this benchmark :)


The relation of PHP to everything else in this benchmark is so unusual among benchmarks, that I strongly suspect a failure of parity with other configurations. Hopefully that is unintentional.


Also, it would be beneficial to see Phalcon PHP - its implemented as a C extension so should theoretically be faster. http://phalconphp.com.


If we're going that route, the fastest php performance you would probably get from facebook's HHVM JIT compiling php engine: https://github.com/facebook/hiphop-php

It's no accident that zend optimizer (bytecode cache) is being bundled into php 5.5 as open source, as mere bytecode caching is now not fast enough to charge money for when hhvm is open source. I expect the next version of zend's commercial php server product to contain a JIT engine to match what hhvm can do.


That would certainly be very interesting, especially as PHP is being nudged into that area (Twig also now offers a c-module).

I really hope the author picks up on this and compares it. It could provide some tremendous insights into pushing PHP further into this space.


Utter crap.

"Sadly Django provides no connection pooling and in fact closes and re-opens a connection for every request. All the other tests use pooling."

But it's free, open-source software and we provide asynchronous database connection pooling for Postgres SQL:

https://github.com/iiilx/django-psycopg2-pool


Your comment is valid and would have been vastly improved without the first two words.


I'll elaborate on the first two words:

They're shilling for their company with a config and tools I wouldn't be caught dead using. No idea what other craziness lurks in the other daemon configs. It's irresponsible and misleading. They're misrepresenting the framework I use to do my work and probably others while hoping I or someone else is going to do their work for them. "Outsourced CTO services?" Their trash. My lawn.

Someone's going to eventually ask me to develop the rest of xyz in node and I'll have to repeat myself about articles like this. Bad enough when it's bloggers. Worse when it's self-shilling company that's obviously not willing to put the time in to be what they claim to offer.


Obviously if they optimized each and every one of these benchmarks we would see different results, but it would take a massive amount of time to learn the ins and outs of each framework to the point where you can do so effectively.

For one person do a benchmark over this many samples, they have to just go with the out of the box setup for each.


This is really the point.

As our blog post suggests, where we are not experts we had to rely on the tutorials provided by each framework's authors to build a test setup. If a specific framework seems low on the list, it could be due to the fact that the best practices guides we found for getting set up were not correctly configured for production use.

Draw what conclusions you would like from this statement, but we did aim to be as fair and unbiased as possible.


We aim to do Postgres testing soon. As you can imagine, the feedback from this has been awesome. Looking forward to seeing Django on Postgres.


Great to hear you're planning to do some Postgres benchmarks!

To improve fairness, you might want to consider using pgbouncer (setup to only offer simple session pooling) in between the Postgres db and any framework that doesn't have internal support for connection pooling.

E.g. I'd love to see how Flask performs using just the psycopg2 driver (i.e. raw db access, no ORM) and pgbouncer to handle the pooling.


I'm curious to see Django results when using the gevent worker for gunicorn. For these type of quick JSON calls, you can see huge performance increases.


Along with gevent, it would be good to throw in psycogreen to improve DB (well, postgresql) evented connections.


It looks as if we've got a Github pull request including these changes, so we'll be able to revise the Python-Django numbers soon.


I Will love to see dotNet (C#) incluid in this test. Asp.Net WebAPI (Synchronous and Asycnhrounous) Asp.Net MVC (Synchronous and Asycnhrounous) Asp.Net HTTP Handlers (Synchronous and Asycnhrounous)


Here's a fairly recent end-to-end ServiceStack vs WebApi benchmark: https://twitter.com/anilmujagic/status/272544925478973440

    ServiceStack      9615ms
    WebApi           30607ms
GitHub project for benchmarks used: https://github.com/anilmujagic/ServiceBenchmark


Also ServiceStack vs ASP.NET MVC vs NancyFX vs Fubu, Mono vs IIS/windows, default serializer vs JSON.NET vs ServiceStack's, etc. There are tons of variations that could be done.


ServiceStack vs WCF would probably be a more apt comparison, but I suppose there are people using SerivceStack for web apps as well.


haha, all those hipster developers using rails can now eat the php guys shorts :)

Seriously though, this isn't news to anyone that does this professionally. The further up the abstraction curve you climb, the less performant the code will be. Ease of development vs run-time performance.


Exactly. And since human time is much more expensive than CPU time, very few people are writing their web apps in machine code.


What I'd love to see paired with this data is a cost comparison. At what scale does performance of ruby/python/php become cost prohibitive? Twitter made the move from Ruby to Java some years back, did they ever post a comparison of their numbers before and after?

Also, the difference between EC2 and local i7 hardware is glaringly obvious. At what scale does owning the server hardware become imperative?

I know these questions are beyond the scope of a performance review, but inquiring minds would like to know.


As you might imagine, during this exercise, we've had a lot of conversations about the points you raise. We have our own opinions, but we ultimately removed most of that content from the blog post because we didn't want it to be too editorial. We will be posting follow ups with some of our opinions.

Some things are really difficult to answer in a vacuum. If you already have a competent devops staff, hosting your own hardware is probably beneficial. The increased performance per "server" is substantial. But no devops staff? Then it's either very risky or cost-prohibitive to own hardware.


I too was shocked at the deltas between Amazon and dedicated hardware. I think AWS runs on Xen, so I wonder if you could strike a more optimal performance/flexibility balance using lighter virtualization (LXC or OpenVZ) with either in-house hardware or another VPS provider.


While it is true that EC2 runs on a customized version of Xen, it's very unlikely that latency is being introduced by the hypervisor itself. With paravirtualized kernel and drivers, Xen introduces negligible overhead, and thus LXC would likely be no better.

The reason that the virtualized setup performs more poorly than the dedicated setup is that you are fighting for CPU time with other AWS customers, so those other customers are introducing latency into your application. Any shared/virtualized host will have this problem.

Interestingly, where Netflix really wants to squeeze CPU performance out of EC2 instances, they allocate the largest instance type so that they know that there's nobody else on the underlying machine.


Regarding performance of Amazon, this post from 2009 is (still) very interesting:

http://uggedal.com/journal/vps-performance-comparison/

Amazon performance is surprisingly low, but also surprisingly consistent. For some use cases, consistency (knowing what you get you for what you pay) might be worth a considerable hit in performance.


Actually, we had an early revision of the benchmark tests that had exactly that analysis, but we really felt it is a very interesting subject that deserves more focus.


Since amazon always takes a cut it stands to reason that if you can fully load a server on a continuous basis, owning s cheaper than renting, regardless of how efficient EC2 is. You should only rent the peak load, not the base load.


What would be much more interesting than "peak responses per second," which is a weird metric to begin with, is the actual histogram of sampled throughputs. Or at the very least a box-and-whisker plot (http://en.wikipedia.org/wiki/Box_plot).

Most folks who have run Rails at scale, for example, find that the untuned garbage collector in MRI (Ruby's default interpreter) introduces a large amount of variance, for example.


We don't have a box plot, but we have line charts for all of the data (performance at multiple client concurrency levels). Just click the "All samples (line chart)" tab on the data panels.

(Note that the very first data panel in the intro is an image and doesn't have tabs.)


No love for Flask? I would have love to see how it compare to Django and RoR


I'd love to see how PHP 5.4 compares. In my own app, I saw a noticeable speedup and RAM usage per request dropped by half.


One glaring ommision: DOS on Dope [1]

[1] http://dod.codeplex.com/


It would be nice to see raw Python like you have raw PHP. I would expect Django to perform very poorly unless you optimize it's caching.


What was the parallelization like across these tests? Were they all running single thread/process mode or did you try to take advantage of threading/multiple-processes/etc. to optimize a production-like performance across all the different frameworks? If the latter, that's a really impressive amount of work!


The short answer is that we attempted to use the CPU as fully as possible across all frameworks. So for those that had tunable parallelism, we used the settings that seemed best given the number of cores available on each platform (2 for EC2 Large, 8 for i7).

We posted the deployment approach for each framework to the Github page.


After having a look at the code on github it looks like they did set up multi processes/threads to really test the scalability on these platforms. That's really impressive, way beyond the single thread experiment I would have anticipated something like this doing! Wow!


We attempted to take advantage of threading/multiple-processes as best we could (see the nodejs code for an example of using the cluster module). But we suspect there are additional areas of improvement here.


This is super awesome. I wish Yesod/Warp was available though :)


I've just send a pull request[1] with an Yesod implementation.

It runs pretty well, scoring similar to webgo for the JSON pong benchmark, is almost at the same level as grails for the 1 query benchmark and is slightly faster that Play for the last benchmark.

So, Yesod is in the same performance gap as Play or Grails, and is 3~4x faster that Django or Rails.

But I've tested those on a mono-core Virtual Box, and I know Yesod scales pretty well on a multi-threaded environment.

Also, keep in mind that most of the top performing frameworks are not fully featured web framework but asynchronous I/O libraries (netty, go, nodejs, vertx, ...) which implementations just write mindlessly the raw response directly on the socket whatever the HTTP request was.

[1] https://github.com/TechEmpower/FrameworkBenchmarks/pull/39


Thanks, Raphaelj! Pat (pfalls) will be in touch if he has any questions. Really appreciate the contribution!


What's the difference between "php" and "php-raw" in some of the data? Maybe I'm still in my morning fog, but having trouble thinking of what "php-raw" might mean. Sigh.


The code says php-raw refers to using PDO (which all project should be using) vs using an ORM or ActiveRecord.

I am completely stunned by the performance cost of using ORM/AR, and will be using this to shame our team lead into giving it up and going for raw queries.


Don't be so hasty. As others pointed out, caching often plays a factor on larger projects. A degree of portability provided by ORMs is nice (test mode hits SQLite, prod is pg or mysql, etc). Also, the readability of ORM-oriented code shouldn't be overlooked, especially as new people come in to the team. Two or three expressive lines in ORM vs 15-20 of nested SQL hitting weird tables names and aliased column names and such isn't easy to decipher the intent of (especially when there's a problem).

I typically use ORM for about 95% of a project, falling back to a few explicitly native SQL calls when performance can be shown to be a bottleneck in those locations.


Some of the ORM cost can be mitigated by using caching. In most cases this is essential in a production deployment.


Agreed. However, for our database tests we expressly wanted to stress-test the ORM and, where configurable, disabled caching.

We plan a subsequent test, time permitting, that enables caching.


Important to note that OP doesn't mention using APC with this, something that any production code would be using in a real world case. This would have an impact on the numbers he uses and on the diff we are seeing.


Gotcha. Thanks!


The best way to understand it is to look at the source:

php - https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

php-raw - https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

For "php", they used an ORM. For php-raw they used the stdlib (pdo).


Hi Chasing. We put a note about that suffix in the "Environment Details" section. The "raw" suffix means there is no ORM used. If there is no "raw" suffix, you can assume some type of ORM is used.


FYI on the PHP-DB-Raw approach: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

By default, PDO does not do true prepare()s, it just does string interpolation. You need to pass this parameter with the options:

    PDO::ATTR_EMULATE_PREPARES => false
And then you'll actually be using MySQL prepared statements. You'll see a noticeable performance improvement for large amounts of queries.


Thank you! We'll get this modified and re-tested.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: