Hacker News new | comments | ask | show | jobs | submit login

One of the most interesting things this comparison brings out to me is not so much the differences between the various frameworks (although the differences between options on the same platform is definitely very useful information), but also the issue that few of us seem to think about these days: the cost of any of the frameworks above the bare standard library of the platform its hosted on.

Theres a consistant, considerable gap between their "raw" benchmarks (things like netty, node, plain php, etc.) and frameworks hosted on those same platforms. I think this is something we should keep in mind when we're tuning performance-sensitive portions of APIs and the like. We may actually need to revisit our framework choice and implement selected portions outside of it (just like ruby developers sometimes write performance-critical pieces of gems in C etc.) or optimize the framework further.

I'd like to crunch these numbers further to get a "framework optimization index" which would be the percentage slowdown or ratio of performance between the host platform and the performance of the framework on top of it. I might do this later if I get a chance.

I think this is a much needed and excellent point to make. Just take a look at how Go dips down when using Webgo.

I used Go's benchmarking tool to compare raw routing performance of various frameworks. The handlers all return a simple "Hello World" string. Here are the results:

  Benchmark_Routes           100000	     13945 ns/op
  Benchmark_Pat              500000	      6068 ns/op
  Benchmark_GorillaHandler   200000	     11042 ns/op
  Benchmark_Webgo            100000	     26350 ns/op
  ok  	github.com/bradrydzewski/routes/bench	12.605s
I then ran the same benchmark, but this time I modified the handler to serialize a struct to JSON and write to the response. Here are the results:

  Benchmark_Routes              100000	     21446 ns/op
  Benchmark_Pat                 100000	     14130 ns/op
  Benchmark_GorillaHandler      100000	     17735 ns/op
  Benchmark_Webgo                50000	     33726 ns/op
  ok  	github.com/bradrydzewski/routes/bench	9.805s
In the first test, Pat is almost twice as a fast as the Gorilla framework. In the second test, when we added a bit more logic to the handler (marshaling a struct to JSON), Pat was only about 18% faster than Gorilla. In fact, it turns out it takes longer to serialize to JSON (8000ns) than it does for Pat to route and serve the request (6000ns).

Now, imagine I created a third benchmark that did something more complex, like executing a database query and serving the results using the html/template package. There would be a negligible difference in performance across frameworks because routing is not going to be your bottleneck.

I would personally choose my framework not just based on performance, but also based on productivity. One that can help me write code that is easier to test and easier to maintain in the long run.

rorr, you appear to be hellbanned. Here's your comment, since it seemed like a reasonable one:

> Now, imagine I created a third benchmark that did something more complex, like executing a database query and serving the results using the html/template package. There would be a negligible difference in performance across frameworks because routing is not going to be your bottleneck. If you're performing a DB query on every request, you're doing something wrong. In the real world your app will check Memcached, and if there's a cached response, it will return it. Thus making the framework performance quite important.

Ok, so I added a third benchmark where the handler gets and item from memcache (NOT from a database). Here are the results:

  Benchmark_Routes             10000	    234063 ns/op
  Benchmark_Pat                10000	    233162 ns/op
  Benchmark_GorillaHandler      5000	    265943 ns/op
  Benchmark_Webgo               5000	    348349 ns/op
  ok  	github.com/bradrydzewski/routes/bench	10.062s
Notice the top 3 frameworks (pat, routes and Gorialla) have almost identical performance results. The point being is that routing and string manipulation are relatively inexpensive when compared to even the most lightweight TCP request, in this case to the memcache server.

By the way:


More SPEEEEEEEEEEEEED coming down the pipe in 1.1 for Go's net/http. :)

I think Go is almost the ideal example here, you're right. Go provides a pretty rich "standard library" for writing web serving stuff so it's a good place where you could really imagine writing your performance critical stuff just on the base platform even if you use something like Webgo for the rest of your app.

Some of the other platforms are much less amenable to that since the standard primitives the platform exposes are very primitive indeed (servlets api, rack api, etc.). Perhaps there's some value in looking at how your favorite framework stacks up against its raw platform and trying to contribute some optimizations to close the gap a bit.

I'm curious about that - because there's so little to webgo I suspect the answer is something really trivial. I haven't really looked at it before, but the framework itself is just 500 lines or so unless I'm looking at the wrong one...

Given that the json marshalling and server components would be exactly the same between go and webgo, I'm curious as to whether changing the url recognised to be just /json in the goweb tests would make any difference, any reason it was different?

Just had a look at the tests and the urls responded to differ:



Shouldn't all these examples at least be trying to do the same sort of work? For such a trivial test differences like that could make a huge difference to the outcome.

It's great to see replicable tests like this which show their working, but they do need to be testing the same thing. I also think they should have something more substantial to test as well as json marshalling on all the platforms, like serving an HTML page made with a template and with the message, as that'd give a far better indication of speed for tasks web servers typically perform.

Still, it's a great idea and really interesting to see this sort of comparison, even if it could be improved.

One of the next steps we'd like to take is to have a test that does cover a more typical web request, and is less database heavy than our 20 query test, just like you describe. Ultimately, we felt that these tests were a sufficient starting point.

I was a little confused by the different urls used in the tests, as for this sort of light test, particularly in Go, where all the serving stuff is common between frameworks, you're mostly going to be testing routing. Any reason you chose a different route here? (/json versus /(.★) )?

I can't think of much else that this little web.go framework does (assuming the fcgi bits etc are unused now and it has moved over to net/http). I don't think many people use web.go, gorilla and its mux router seems to be more popular as a bare bones option on Go, so it'd possibly be interesting to use that instead. It'd be great to see a follow up post with a few changes to the tests to take in the criticisms or answer questions.

While you may come in for a lot of criticism and nitpicking here for flaws (real or imagined) in the methodology, I do think this is a valuable exercise if you try to make it as consistent and repeatable as possible - if nothing else it'd be a good reference for other framework authors to test against.

Webgo... I'll stick with "net/http" and Gorilla thanks.

Also, They used Go 1.0.3... I hope they update to 1.1 next month. Most everyone using Go for intensive production uses is using Go tip (which is the branch due to become 1.1 RC next month)

This is great to know, we were hesitant to use non-stable versions (although we were forced to in certain cases), but knowing that it's what is common practice for production environments would change our minds.

We switched to using tip after several Go core devs recommended that move to us, the folks on go-nuts IRC agreed and we tested it and found it to be more stable than 1.0.3

Wow, directly on tip? That seems to speak very highly of the day-to-day development stability of Go.

A good tip build tends to be more stable then 1.0.3 and has hugely improved performance (most importantly for large application in garbage collection and generation).

To select a suitable tip build we use http://build.golang.org/ and https://groups.google.com/forum/?fromgroups#!forum/golang-de... . My recommendation would be to find a one or two week old build that passed all checks, do a quick skim of the mailing list to make sure there weren't any other issues and use that revision. Also, you will see some the the builders are broken-

Of course if your application has automated unit tests and load tests, run those too before a full deployment.

Thanks, this comment really helped me in my evaluation of Go today. I had been playing around with 1.0.3 for a couple days, but tip is definitely where it's at.

I'm glad I could help. Go 1.1 RC should be out early next month. So if you want you could wait for that (for production use).

Or it could speak poorly of their release process, which is more accurate. The stable release is simply so bad compared to tip that everyone uses tip. There should have been multiple releases since the last stable release so that people could get those improvements without having to run tip.

Why not both? Insisting on a very stable API can result in long times between releases, which can mean more people using tip. That's distinct from how stable tip is.

Given the frequent complaints that the previous stable release isn't very stable, I think trying to interpret it as "tip is super stable" is wishful thinking. Tip is currently less bad than stable. The fact that stable releases are not stable is a bad thing, not a good thing.

What does stable mean? If stable means there are not unexpected crashes, then Go 1.0.3 is extremely stable.

If stable means suitable for production, Go tips vastly improved performance, especially in regards to garbage collection, make it more suitable than 1.0.3 for large/high-scale applications in production.

I'm not sure many people would use webgo in real life. I don't know... maybe some people... certainly not pros.

Also, the 1.0.3 thing is probably dragging on the numbers a bit. 1.1 would boost it a little. Not enough to get it into the top tier... but a little.

Also, for Vert.x, they seem to be only running one verticle. Which would never happen in real life.

Play could be optimized a bit... but not much. What they have is, to my mind, a fair ranking for it.

Small issues with a few of the others but nothing major. I think Go and Vert.x are the ones that would get the biggest jumps if experts looked at them. And let's be frank... does Vert.x really need a jump?

So what they have here is pretty accurate... I mean... just based on looking through the code. But Go might fare better if it used tip. And Vert.x would DEFINITELY fair better with proper worker-nonworker verticles running.

The Play example was totally unfair since it blocks on the database query which will block the underlying event loop and really lower the overall throughput.

Well... to be fair...

the Vert.x example, as configured, blocks massively as well waiting for mongodb queries.

Could you point me at an example of an idiomatic, non-trivial Go REST/JSON API server? I've been trying for a while to find something to read to get a better handle on good patterns and idiomatic Go, but I haven't really come up with anything. I've found some very good examples of much lower-level type stuff, but I think I have a decent handle on that type of Go already. What I really would like is a good example of how people are using the higher level parts of the standard library, particularly net/http etc.

Sorry... I'm not really a book kind of guy when it comes to this stuff. The golang resources are mostly what I use.

for Vert.x, we specified the number of verticals in App.groovy, rather than on the command line, which we think is a valid way to specify it.

OK... I ran the Vert.x test... runs a bit faster here with 4 workers instead of 8. I suspect what is happening there is that at times all 8 cores can be pinned by workers, while responses wait to be sent back on the 8 non workers. But not that big a change in speed actually. One thing more, when you swap in a couchbase persister for the mongo persister it's faster yet. The difference is actually much larger than the difference you get balancing the number of worker vs non worker verticles. Also thinking that swapping gson in for jackson would improve things... but I don't think that those are fair changes. (well... the couchbase may be a fair change)

Also tested Cake just because it had been a while since I have used it... and I couldn't believe it was that much worse than PHP. Your numbers there seem valid though given my test results. That's pretty sad.

Finally, tried to get in a test of your Go stuff. I'm making what I think are some fair changes ... but it did not speed up as much as I thought. In fact, initially it was slower than your test with 1.1.

So after further review... well done sir.

That first line is truly one of the best comments I've seen when discussing languages. I've clipped it and will use it from now on:

>> I'm not sure many people would use [xxx] in real life. I don't know... maybe some people... certainly not pros.

The Play app uses MySql in a blocking way, while Nodejs uses mongo. It's not comparable.

We have a pull request that changes the Play benchmark (thank you!) so we will be including that in a follow-up soon.

We tested node.js with both Mongo and MySQL. Mongo strikes us the more canonical data-store for node, but wanted to include the MySQL test out of curiosity.

That is bad benchmarking!

I'd love to see your framework optimization index. Honestly, all of this would be a wonderful thing to automate and put in a web app - a readily-accessible, up-to-date measure of the current performance of the state of the art in languages and frameworks. I bet it would really change some of the technology choices made.

Here's a quick version of the framework optimization index. Higher is better (ratio of framework performance to raw platform performance, multiplied by 100 for scale):

Framework Framework Index

Gemini 87.88

Vert.x 76.29

Express 68.85

Sinatra-Ruby 67.88

WebGo 51.08

Compojure 45.69

Rails-Ruby 31.75

Wicket 29.33

Rails-Jruby 20.09

Play 18.02

Sinatra-Jruby 15.96

Tapestry 13.57

Spring 13.48

Grails 7.11

Cake 1.17

In the same vein, I was curious to compare the max responses/second on dedicated hardware vs ec2 on a per framework basis. The following is percentage throughput of ec2 vs dedicated (in res/s):

cake 18.9% (312 vs 59)

compojure 12.1% (108588 vs 13135)

django 16.8% (6879 vs 1156)

express 16.9% (42867 vs 7258)

gemini 12.5% (202727 vs 25264)

go 13.3% (100948 vs 13472)

grails 7.1% (28995 vs 2045)

netty 18% (203970 vs 36717)

nodejs 15.6% (67491 vs 10541)

php 11.6% (43397 vs 5054)

play 20.6% (25164 vs 5181)

rack-jruby 15.6% (27874 vs 4336)

rack-ruby 22.7% (9513 vs 2164)

rails-jruby 22.7% (3841 vs 871)

rails-ruby 20.7% (3324 vs 687)

servlet 13.4% (213322 vs 28745)

sinatra-jruby 21.2% (3261 vs 692)

sinatra-ruby 22.2% (6619 vs 1469)

spring 7.1% (54679 vs 3874)

tapestry 5.2% (75002 vs 3901)

vertx 22.3% (125711 vs 28012)

webgo 13.5% (51091 vs 6881)

wicket 12.7% (66441 vs 8431)

wsgi 14.8% (21139 vs 3138)

I found it interesting that something like tapestry took a 20x slowdown when going from dedicated to ec2, while others only took ~5x slowdown.

Edit: To hopefully make it clearer what the percentages mean - if a framework is listed at 20%, this means that the framework served 1 request on ec2 for every 5 requests on dedicated hardware. 10% = 1 for every 10, and so on. So, higher percentage means a lower hit when going to ec2.

Disclosure: I am a colleague of the author of the article.

You're saying that running a query across the internet to ec2 is 5 times faster than running it on dedicated hardware in the lab? I find that hard to believe.

Sorry, maybe my original post was not entirely clear. Let's take tapestry, for example. On dedicated hardware, the peak throughput in responses per second was 75,002. On ec2, it was 3,901 responses per second.

So, in responses per second, the throughput on ec2 was 5.2% that of dedicated hardware, or approximately 20 times less throughput. The use of the word slowdown was possibly a bad choice, as none of my response had to do with the actual latency or roundtrip time of any request.

These could probably be further broken down into micro-frameworks (like Express, Sinatra, Vert.x etc.) and large MVC frameworks (like Play and Rails).

Gemini is sort of an outlier that doesn't really fit either category well, but the micro-frameworks have a fairly consistently higher framework optimization index than the large MVC frameworks which is as expected.

Express and Sinatra really stand out as widely-used, very high percentage of platform performance retained frameworks here. I've never used Vert.x, but I will certainly look into it after seeing this. I'm very impressed that Express is so high on this list when it is relatively young compared to some of the others and the Node platform is also relatively young.

Play seems particularly disappointing here since it seems any good performance there is almost entirely attributable to the fast JVM it's running on. Compojure is also a bit disappointing here (I use it quite a bit).

The play test was written totally incorrectly since it used blocking database calls. Since play is really just a layer on top of Netty it should perform nearly as well if correctly written.

I believe they're encouraging pull requests to fix that sort of thing. It will be interesting to see if it helps to that degree; I hope so!

But Play's trivial JSON test was much slower than Netty's.

That's because they inexplicably use the Jackson library for the simple test, rather than Play's built in JSON support (they use the built-in JSON for the other benchmarks).

Both Netty and Play use Jackson though one the Netty version uses a single ObjectMapper and the Play version uses a new ObjectNode per request (created through Play's Json library).

I hope this gets fixed!

No, they use Play's JSON lib. It's kind of a moot point because Play's lib is in fact a wrapper for Jackson.

Here's the source: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

So the question stands: If Play & Netty are using the same JSON serialization code, why is Play seven times slower?

I'd imagine that being relatively young is an advantage in a test like this. You're not utilizing any features, and features are what slow down requests. The less features something has, the faster it should perform in these trivial tests.

That's a very good point; I hadn't thought of it that way. Maybe this is some small part of why we seem to keep flocking to the new kids on the block.

It's funny you should put this together because in an earlier draft of this blog entry I had created a tongue-in-cheek unit to express the average cost per additional line of code. Based on our Cake PHP numbers, I wanted to describe PHP as having the highest average cost per line of code. But we dropped this because I felt it ultimately wasn't fair to say that based on the limited data we had and it could be easily interpreted as too much editorializing. Nevertheless, as you point out, it's interesting to know how using a framework impacts your performance versus the underlying platform.

I too wanted Play to show higher numbers. There's certainly a possibility we have something configured incorrectly, so we'd love to get feedback from a Play expert.

Good thing you opted against sensational journalism.

About the cost per additional line of code for PHP, it mainly comes from not having an opcode cache and having to load and interpret files on every visit. mod_php was and will always be trash. I commented earlier about it too.

In case of Ruby, and talking about Rails, even when using Passenger, the rails app is cached via a spawn server. That's not the case with PHP.

Similarly, Python has opcode cache built-in (.pyc files). Also, I am not sure about gunicorn but others do keep app files in-memory.

Servlet does the same about keeps apps in memory. You get the idea.

Frameworks definitely have an impact but it's very hard for one person to know the right configuration for every language. You had done some good work there, but it will take a lot of contributions before the benchmark suite becomes fair for every language/framework.

We've received a lot of great feedback already and even several pull requests. Speaking of, we want to express a emphatic "Thank you" to everyone who has submitted pull requests!

We're hoping to post a follow up in about a week with revised numbers based on the pull requests and any other suggested tweaks we have time to factor in. We're eager to see the outcome of the various changes.

I am completely agree with you, this is not proper bench marking as opcode caching is missing in php, benchmarking should be re calculated by configuring APC.

For Play, you'll want to either 1) handle the (blocking) database queries using Futures/Actors that use a separate thread pool (this might be easier to do in Scala) or 2) bump the default thread pool sizes considerably. The default configuration is optimized purely for non-blocking I/O.

See e.g. https://github.com/playframework/Play20/blob/master/document... and https://gist.github.com/guillaumebort/2973705 for more info.

Why would the JSON serialization test perform so poorly though?

I'm pretty shocked that Play scored so low. One would think that being built on netty would put Play in a higher rank. Database access for sure needs to be in async block

It's not just the DB test. The JSON test was way slower than Netty, too.

Agreed. We need to get the Play benchmark code updated to do the database queries in an asynchronous block. Accepting pull requests! :)

Not quite sure I understand your point re: the cost of the frameworks, above the bare standard library.

Do you mind breaking it down for me a bit please?

Cost in dollars or cost in hardware utilization or some other cost?

If you have developers who for one reason or another prefer a given platform, then the most important performance comparisons are about how close various frameworks on that platform get to the performance of the platform itself.

Knowing how much I'm giving up in performance in order to get the features a given framework gives me is an important consideration. Also understanding when it's worthwhile to work outside the framework on the bare platform given the speedup I'll get versus the cost I'll incur by doing so is a very important optimization decision-making tool.

How do you derive how much performance you are giving up from these benchmarks? There is not a neat relationship between the two.

There are performance numbers for a framework (Cake PHP, for example) and for the raw primitives of the platform it runs on (PHP, in that case). By finding the ratio between the two one can arrive at the performance loss attributable primarily to the framework you've chosen.

See my "framework optimization index" in comments below for a rundown on all these ratios which I was able to back out from this set of benchmarks.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact