I love what you guys are doing. This is by far the most comprehensive (in terms of number of frameworks) comparison of web frameworks. I also am a big fan of the new filtering metadata.
However, I'm starting to think that all of the advocates of various frameworks are now conspiring independently to make this comparison meaningless...any framework (except Cake for some reason) can be superoptimized towards a small set of tasks. If you do another round, could you increase the number of different tasks? Some examples could be:
1) Mixed bag of queries of various complexity
2) Static file serving
3) A few computation/memory-intense benchmarks (such as those in the Language Benchmarks Game)
As Pat points out, we definitely look forward to implementing some more computationally-intense request types in the future. This round does include the first server-side template test. We'd like to hear the community's opinions about more tests.
That said, I feel most of the frameworks' implementations of the existing tests are not cheating. Our objective in this project is to measure every framework with realistic production-style implementation of the tests. No doubt there is temptation to trim out unnecessary functionality and focus on the benchmark's particular behavior. We have attempted to identify any such tests that remove framework features to target the benchmark as "Stripped" and those can now be filtered out from the list.
In other words, our aim is that the implementation of each framework's test is idiomatic to that framework and platform. And if that's not the case for a test, we want to correct it.
Your concern could be clarified by pointing out that framework authors may be tuning up their JSON serialization, database connection pools, and template processing in order to improve their position on these charts. And, to be clear, I have already seen evidence of that in my interaction with framework authors. To that concern, however, I would say: That is awesome. I want those features to be fast.
I would like to pile my thanks onto this list as well. I'm the author of Phreeze and I can say that I'm grateful that fairness is being encouraged. There is certainly glory in ranking well on any benchmark and I have to admit, as I was implementing the tests in Phreeze, I saw many opportunities to "cheat." For example, skipping the framework routing, not using the "proper" way to communicate between the layers, etc and substituting things with "raw" code would have potential to skew the result. I feel that would be missing the entire point of a benchmark, so I'm glad that is being considered.
I can also say that this benchmark inspired me to take a hard look at class loading and I was able to make some improvements to the framework's efficiency in general. So, in a way, I did some tuning - not for the benchmark, but rather as a result of the benchmark. Thanks to this benchmark all Phreeze users will gain a little performance.
I would also like to suggest a test idea. I think the biggest challenge for frameworks comes into play when you have to do table joins. Something like looping through all purchase orders and displaying the customer name from a 2nd table - that would be a very real-world type of test. I think foreign key type of queries are more telling about an ORM than a single table query.
Jakejake, perhaps I've said it before, but it bears repeating: your reaction to and participation in this project has been precisely the kind we hoped it would see (but weren't sure we'd actually see in practice). Thank you very much for joining in and having fun with it. It sounds like you've been able to get some increased performance from your tuning, and I hope you don't mind us feeling a little bit of pride in having inspired that.
Some readers may feel we are attempting to paint some frameworks in a poor light. Yes, we do have favorites, but we are absolutely intent on keeping this open and fair. If we're doing something wrong, help us fix it! A pull request is very happily received.
When I read reactions of that sort, I selfishly want to point the author to Jakejake's comments to demonstrate how awesome it is to see a framework improving. Speaking of that, I want to eventually have the ability to show performance over time (e.g., compare Round 1 to Round X) as a potentially interesting illustration of a framework's intent to improve performance.
Also, thanks for the idea for a future test. That sounds like a good one.
Oh sure, nothing to complicated. Basically I just happened to notice that I loaded several classes that were not always needed. I was able to tune up the framework to load some of them on-demand instead.
One example is that the framework loaded an lot of MySQL classes whether or not you do a DB query. So, now I wait to initialize the DB stuff until after you make a call that requires it. Phreeze has always been lazy about opening the DB connection, but now it's even lazier and doesn't even load the classes until you need them!
There were some other utility-type classes like XML parsing and such that probably don't even get used much. So that is lazy loaded now too.
For a non-DB request I was able to get it down from about 37 files that loaded to around 20. For a DB request I think it's still around 30 files, but I definitely consider that a performance improvement. The benchmark led me to scrutinize what is being loaded so I think it has already improved the framework.
The logic you have previously posted on HN for these benchmarks is that they measure the minimum overhead available on the platform, so that you cannot get faster than the benchmarked numbers. If a framework is too slow, the framework-chooser can exclude it from consideration for because the resulting project just can't be any faster than the framework benchmark. Sounds reasonable.
Except now it is clear that you are refusing optimizations for some frameworks due to a vague, aesthetic judgement of 'stripped'. Which now means that you actually aren't measuring the minimum framework overhead. You are measuring the overhead of the defaults, or the overhead of not taking optimization seriously, with large amounts of performance left on the table. Worse, selectively applying optimizations means you are comparing one framework's defaults to another framework's minimum overhead. And since you have abandoned minimum overhead, it now makes very little sense about why we are measuring performance independent of normal first-resort tactics like caching (who is running Cake without caching?)
If you were going to do that, you should have benchmarked defaults right down the line and allowed a full, normal range of simple deployment optimizations. Instead we have selective optimization and totally unrealistic deploys, so it really indicates very little.
I'm not sure where you get the impression that we are refusing tuned tests (what we call "Stripped" tests). We have accepted two of those and would accept further tests of that nature. An implementation of course still needs to work and meet the obligations of the test scenario. For example, each row must be fetched from the database individually and the response must be serialized JSON. We did "reject" one test that fetched all 20 rows using a WHERE IN clause, but that implementation was quickly reconfigured by the submitter to match our specification.
We are expressly not including reverse proxy caches in these tests. We're not benchmarking the performance of the nginx proxy cache, Apache HTTPD's proxy cache, Varnish, or anything similar. You can find such benchmarks elsewhere. We are benchmarking the performance of the application framework for requests that do reach the application server. The tests are intended to be a viable minimum stand-in for application functionality in order to fulfill requests that, for whatever reason, reach your application server.
If the scenario is difficult to conceive, imagine your site cannot leverage a proxy cache because every request is providing private user information.
To be clear: none of the frameworks are being tested with a front-end cache.
Also presently, none of the tests use a back-end cache either, but future tests will include tests of back-end in-memory and near-memory caches.
These are very good points you bring up and I will need to address them in the site's FAQ in addition to this response. I would appreciate any follow-ups as I am open to revising the opinions I include below.
First, if there are any specific examples of frameworks that have been mis-characterized, I would appreciate that we address each individually as a Github issue. For example, I will create an issue to discuss the Yesod test and its session configuration .
Here is our basic thinking on sessions. None of the current test types exercise sessions, but if the test types were changed to make use of sessions, session functionality should remain available within the framework.
If the a particular test implementation/configuration has gone out of its way to remove support for sessions from the framework, we consider that Stripped. If session functionality remains available but simply isn't being exercised because the test types we've created to-date don't use sessions, then at least with respect to sessions, that is Realistic.
Logging is an important point that we need to address. We intentionally disabled logging in all of the tests we created and will need to be careful to review the configuration of community-contributed tests to do the same.
You're correct, disabling logging is not consistent with the production-class goal. So, why did we opt to disable logging? A few reasons:
* We didn't want to deal with cleaning up old log files in the test scripts.
* We didn't want to deal with normalizing the logging granularity across frameworks. (Or deal with not doing so.)
* In spot checks, we didn't observe much performance differential when logging is enabled.
We're not unmovable on logging, however, and if there is sufficiency community demand, we would switch to leaving logging .
I fully understand why logging is disabled. What I am just pointing out is that the numbers that you see are probably not indicative of a framework's production performance. I realize that logging does add another variable to the mix but in my opinion, it is something worth knowing as it gives an idea of the actual performance of a framework. And on the contrary, I find that logging impacts performance noticeably depending on the implementation and granularity. I also think that cleaning up the logs on server shutdown should be fairly trivial. However, there are the cons you also listed that's quite a compelling argument for disabled logging.
As for sessions, I just used Yesod as an example but it applies to all frameworks and other "middleware" as well and this is something I am mixed on. Some platforms do not support any middleware at all so should these also be classified as "stripped" or "barebones" also? What I'm getting at is, is this really a fair comparison? From a glance on the benchmark page, it is not apparent which frameworks have which configuration or feature if you're not familiar with the framework itself and it can get really complicated. I think labeling the frameworks in terms of size is a huge step in the right direction but my belief is that more information is needed.
Thanks for the kind words. We're very interested in adding additional tests, this round even includes a new test dubbed "Fortunes" which does in fact do server-side templating. We have an open github issue asking for the community's input for just this sort of thing, and we'd love to have your feedback included.
Without advocates "conspiring", the results will favor those frameworks which are most suited to the chosen tasks, as configured by the testers. With conspiring, the results will favor those frameworks with a community who cares about contributing a superoptimized microbenchmark config. In either case, the results might be good discussion fodder, but should be taken with a grain of salt.
A new "Fortunes" test was also added (implemented in 17 of the frameworks) that exercises server-side templates and collections.
With 57 total frameworks being tested, we have implemented some filtering to allow you to narrow your view to only those you care about.
As always, we'd really like to hear your questions, suggestions, and criticisms. And we hope you enjoy this latest round of data.
Thanks for the feedback! We started the project with WeigHTTP, then starting with Round 2 we switched to Wrk  at the advice of other readers. Wrk provides latency measurements consisting of average, standard deviation, and maximum.
If we had distribution data available, we would aim to provide that in some form. And perhaps the author of Wrk could add that in time.
However, for the time being, I consider the matter somewhat academic. Not to be dismissive--I value your opinion--but I don't believe that would measurably impact my assessment of each framework's performance. Though, it would be fascinating to be able to validate my suspicion that Onion, being written in C, does not suffer even the tiny garbage collection pauses of the Java frameworks.
"average and std dev are only revelent if the distribution is Gaussian in distrition"
technically not true. Knowledge of the second order moment (variance) lets you uniquely identify other distributions like Poisson, or uniform. Knowledge of even higher order moments lets you fit more complicated statistical models.
Low variance is good, regardless of underlying distribution.
Probably the statement should be that comparing mean and variance are only relevant if both metrics follow the same distribution. In the absence of distribution information (and it is usually absent in empirical tests like that) quantiles would help to do a better job at comparing performance.
I completely understand your point, but I think it's fair to say that most .Net code will run on Windows server, and that pretty much everything else will run on some kind of linux flavor.
Just like you have a "keep the default framework setting" approach to help compare very different frameworks because that's how the majority of people will use them, you may very well assume that comparing frameworks on their preferred OS is fair enough.
I know that i wouldn't mind switching to a windows+.Net environment if it proved to be much much faster than what i'm using right now.
I have a friend who went to a Mono talk at MS MIX where a Mono developer was speaking. The Mono developer said that while Mono is a little slower than .NET (and he was talking a couple percent) Mono often ends up being faster on the same hardware because the Linux system calls the runtime uses are faster. There were a few ROFLs in the audience.
I also agree with you that investing in a .NET (Windows) test isn't good bang for the buck here.
I don't know that mono is guaranteed to be slower or not, but definitely a lot fewer man hours have gone into polishing mono as compared to the amount of effort spent to polish the .net framework on windows.
Further, even if Mono itself is as fast as the .net framework, IIS the web server is going to be totally different performance characteristics from whatever webserver you are using on linux.
I am glad you showed me the error of my ways because I would have guess .net was somewhat faster than mono, but it goes to show even more comparing across operating systems is meaningless...
now i would love to get my hands on the numbers dont get me wrong, just saying if i was running the project I wouldnt go through the amount of effort required to get the .net results.
"As with the previous question, we'd love to. We have heard tentative word from a reader/contributor that a pull request may be incoming soon that will include several .NET frameworks on Mono, which we assume will be as easy to include as any other pull request. One challenge we face is that the test infrastructure we've built assumes a Linux deployment that we can automate using ssh and Python. To do a proper .NET test on Windows Servers, we will need to work on adapting that platform to automate Windows Servers as well. Community assistance on this would be greatly appreciated."
I don't think it seems dumb. It would be a very useful comparison between platforms. Stackoverlfow runs on .Net and Windows servers and they say it's very performant. So why not compare with other frameworks on the same hardware.
Voidlogic is right that we are waiting to get a pull request that will include some .Net frameworks . If you can help, it would be greatly appreciated. We do want to include .Net. We will test on Mono to start.
We also want to test on .Net's native Windows platform. But we need to work on the testing platform we've built in order to automate a Windows server in the same way we presently automate a Linux server.
I had the same thought. ASP.NET MVC and Web Forms are both very popular frameworks, and they're free to develop for. Might be harder to set up on a non-Windows machine, which could be why they were not tested.
There's two main mysql drivers for Go mymysql and go-mysql-driver. I've found the concurrency performance of the former to be abysmal when doing my benchmarking. Then the moment I switched to the latter, Go's performance went through the roof.
First off, I love the work you're doing, keep it up.
Benchmarks like this are designed to be the starting-point of a discussion an investigation, and not as anything meaningful in their own right. Boiling it down a framework to one performance number ignores the many, many nuances of a framework.
What surprises me most is the difference between different frameworks. A few years ago the mantra seemed to be "Use Rails, Django or a similar full-stack framework. Speed of deployment trumps everything!" Over the last few years I've seen a shift as people are trying to get more performance from limited hardware. Personally I'm intrigued by how a fairly innocent decision early in the project (of what language/framework) may have profound performance implications in the long run.
For myself, I've been looking for a good functional-programming framework. Just looking at this gives me a good list of frameworks to start looking at. It feels to me that a framework that performs well is likely well engineered, so the ones that perform better will go at the front of my queue for investigations.
>Over the last few years I've seen a shift as people are trying to get more performance from limited hardware.
Part of that shift is also that other frameworks have learned and integrated a lot from Rails/Django. The productivity/time-to-launch gap isn't as significant as it used to be, so other factors like performance, compatibility with pre-existing infrastructure (eg for JVM-based frameworks), security, etc. are gaining more influence in the decision about what to use.
Thanks, Periodic. It's especially rewarding to hear that people have gleaned value from the project.
You're precisely right about how to put this data to use: as one point in a holistic decision making process. We address that in the Questions section of the site, in fact. That said, we are not reducing each framework to a single performance number. Our goal is to measure the performance of several key components of modern frameworks: database abstraction and connection pool performance, JSON serialization, list and collection functions, and server-side templates. We'd like to add even more computationally-intensive request types in future rounds.
So, no, we're not testing your (or anyone else's) specific application on each framework. But we are testing functions that your application is likely to use. You're still better off measuring the performance of your use-case on candidate frameworks before you start work, but perhaps you can first trim the field to a manageable number.
In the first round, we echoed your surprise at the spread--four orders of magnitude! I think the shifting winds of opinion come from the fact that today's high-performance languages, platforms, frameworks are not necessarily more cumbersome to use for development than the old guard. As others have pointed out elsewhere in this thread, Go is not a terribly verbose language, and yet its performance is fantastic.
Has the era of sacrificing performance at the altar of developer efficiency ended? I'm not sure. But we have some data to add to the conversation.
Before looking at the benchmark results, I took a glance at the Node source and I expected it to perform worse than it did previously. It does almost universally. Not only haven't the glaring perf issues remained since round 1, it's added more. In the real world, when you look at a metric that says your req/s is a bottleneck, which is what this benchmark is loosely simulating, you'd fix it. You wouldn't just say "nope, that's what this framework does, sorry boss."
I still don't find these benchmarks very useful. From the looks of the comments, a lot of you don't really either (even if you don't realize it).
For example, a lot of people in these comments want to correlate language speed with performance in these benchmarks, by arguing specific examples, but comparing almost any two frameworks/platforms in this "benchmark" is an apples to non-apples comparison, and the result is actually full of counter examples (faster languages performing more poorly). That should instantly tell you that this benchmark isn't telling you what you think it's telling you, and that you haven't really derived any value from it.
Perhaps the biggest reason I don't find value here is that every product here does wildly different things. It's like comparing wrenches to hammers to screwdrivers to 3D printers.
I also want to point out to people who say that this is a "comparison" of frameworks that it is emphatically not a comparison. What is the value of a framework? Is it speed? Atypically. And this "benchmark" tends to point at such cases as "being better" because they do better in this specific task. A framework/platform's value lies in features and abstractions. This does not compare those.
I will gladly build a "framework" in NodeJS that is only capable of doing the tasks in this benchmark as fast and with as little overhead as possible. You would NEVER use it in the real world, but it would be a beast at serializing JSON and making repeated database queries in an insecure fashion. But score here is the important factor, right?
In my opinion you've missed the point almost entirely:
1) If you see problems with a language you're an expert in, submit a pull request. I've never seen a benchmark done like this before, it gives everyone a chance to fix problems in their favorite framework/language.
2) It is a little bit of a unfair comparison between very low feature frameworks to higher ones, but it gives you a good idea of what you're trading off on basic performance. For example, I thought our use of play1-java wasn't far off of servlet on basic tasks, but boy was I wrong, perhaps by 10x.
Should you read this list and pick the top thing on the chart? No. However, hard to argue this isn't interesting and useful information.
I am not sure if you had this in mind or not ( and I already wrote this in a comment above, so sorry for repeating ) but I was wondering about concurrency. That is primarily my concern as web frameworks show their mettle so to speak when a large swarm of parallel requests hammer it. What do they do then? Maybe sequentially one request at a time they are very fast but start barfing out socket error when concurrency increases only slightly. That is a worse case in general than something that perhaps is slower in sequential benchmark but stays up in the face of a concurrent onslaught of client requests.
Otherwise I can see how someone would assume a simplified and misleading heuristic "If I can process 1000 requests in 1 second. That means the server can handle 1000 requests/seconds. So if 1000 requests come in at once, they will all be processed in 1 second". Two thing can happen, it could processes it slower than one seconds, it could error out and die, or it could actually process it fast if it can scale across CPUs. That is where the gold is if you ask me... Anyway just my 2 cents.
It's impressive how well PHP holds up with many queries per request (which is the most common CRUD/webapp scenario).
While for no or just one query it's slower than a lot of the other frameworks (due to PHP being slow to parse, startup etc), as soon as we have a lot of DB queries, the C interface to MySQL leaves the other frameworks in the dust.
The well known PHP shortcomings aside, that's a nice example of optimizing for the things that matter most, especially for it's common use cases (Wordpress, Drupal, etc).
In really scalable sites, you need sharding. Unless your database itself is doing the scaling (such as with Riak), you're going to sometimes hit multiple shards. With PHP and other languages that can't do async, you're going to have to query the DB sequentially, increasing latency proportionally to the number of shards you have to hit. With Node.js and other asynchronous apps, you don't.
Disclaimer: mysqli does have async capabilities, but most people such as myself use PDO for its other benefits. And mysqli only works with MySQL.
Some of the fastest implementations you see in these tests are not asynchronous.
With Servlet for example, a worker thread is chosen from Resin's thread pool and used to handle a request. The Servlet then executes 20 queries sequentially and returns the resulting list data structure. This is Servlet 3.0 but not using Servlet 3.0 async.
Async isn't making the top performers fast. Being fast is making them fast.
What you need for sharded queries is concurrency, not necessarily asynchronous requests. Async callbacks (ala Node.js, Twisted Python, Event Machine, etc.) give you a kind of cooperative multitasking, which is one way to have concurrent I/O-bound tasks going; multithreaded programs are another. (Ruby and Python threads are kind of in-between, due to their respective GIL limitations.)
That being said, above a certain scale and complexity level, you probably want the topology of your persistent data store hidden from your web request handlers anyway. For one thing, making requests to N backend shards from M frontend web workers starts to get bad when N and M are both large; for another, introducing really complex scatter-gather query logic into your request-handling pipeline can be a maintenance and debugging nightmare.
Introducing a proxy or data-abstraction service in between cuts down on the number of open connections and lets you change the data storage topology without updating frontend code.
There should be no surprise that interpreted, dynamic languages are utterly out-gunned as compared to compiled (JIT'd or otherwise) languages. It's inherent to the system - every little thing you do costs more.
Many people choose Ruby and figure, given that premature optimization is the root of all evil, they'll optimize later if needed.
That's like choosing between a farm tractor or a ferrari - and figuring if the tractor doesn't perform up to snuff, we'll add a spoiler (and given the 10x disparity between Java and Ruby in some of those graphs, if we throw out a 20mph top speed for a farm tractor, the ferrari analogy is actually rather spot on).
There are many good reasons to choose dynamic/interpreted languages - but always know you're giving up performance in exchange.
> Many people choose Ruby and figure, given that premature optimization is the root of all evil, they'll optimize later if needed.
> That's like choosing between a farm tractor or a ferrari - and figuring if the tractor doesn't perform up to snuff, we'll add a spoiler
Its really not like that at all, because programming languages aren't like vehicles. Particularly, with Ruby, on typical method of optimization is finding which bits of code are bottlenecks, and then optimizing those bottlenecks, often by replacing them with C (or, if the Ruby runtime being used in JRuby, Java).
Which I guess is like having your tractor turn into a Ferrari for the parts of work that involve going long distances on a road without towing something, but I think that kind of points out how bad even using the tractor/Ferrari analogy is.
I keep hearing about this, but do people really rewrite performance-critical parts of their web apps in C? Even if it happens to be part of some third-party library? And maintain a fork? What if that performance-critical part is dependent on other parts in a non-trivial way? It seems that an unanticipated replacement of some core functionality with a C library may involve a major rewrite and most Ruby teams may not have the expertise to do a good job maintaining a C code base any way.
Yup! Many do. Github just wrote about it here (replacing the Rails default HTML escaper with a C one for a 30% increase): https://github.com/blog/1475-escape-velocity and I think the Judy Arrays they're using for code classification are in C.
At my job after benchmarking we've done things like break out computation heavy things into C/C++, and have been even eyeballing things like Go and the Lua/Nginx based OpenResty for small computationally heavy services.
In many cases this means rewriting what used to be a 3rd party library. The big question is usually around cost in time and if we want to have to maintain that knowledge for the long term. Most of the time it's cheaper to toss more servers at it - but for certain things - namely cases where latency is very important no amount of scaling out is going to make it faster.
> I keep hearing about this, but do people really rewrite performance-critical parts of their web apps in C?
Certainly they do it for Ruby apps in general. I don't think its all that common for it to be a high-value proposition for web apps.
> Even if it happens to be part of some third-party library? And maintain a fork?
If its an open-source third-party library that tends to get used in a way that is performance-critical, upstream will probably accept moving bottlenecks to (portable) C and maintaining the API, so its unlikely that you'll need to take up responsibility for a fork.
> What if that performance-critical part is dependent on other parts in a non-trivial way?
If the call pattern is such that they are not part of the performance critical part themsellves, then the performance critical part calls them through the regular conventions for calling Ruby from C.
If the call patter is such that they are part of the performance critical piece, well, I think the answer is obvious.
> It seems that an unanticipated replacement of some core functionality with a C library may involve a major rewrite
It might, but in the meantime you've got working code.
> and most Ruby teams may not have the expertise to do a good job maintaining a C code base any way.
If the team determines it needs expertise in a particular area that it doesn't currently have, then it should either develop that expertise or bring in people that have it. That's true whether its particular domain expertise (e.g., building messaging systems) or particular technology expertise (e.g., C). That's part of the normal development of a team.
I guess my point is that it seems disingenuous to point out "you can write that bit in C" as a way to mitigate performance problems with Ruby, when in practice it's so costly compared to available alternatives (throw more hardware, write manually optimized Ruby, switch to a faster language/runtime) that almost no one does it. How much of Rails is written in C? It's like proposing compiler extensions/patches as a way of dealing with performance problems. And if you have a complex application that utilizes many of Ruby's idioms to deal with the complexity, it's extremely unlikely that you can simply replace parts of it with C libraries without reorganizing in such a way to increase complexity.
> I guess my point is that it seems disingenuous to point out "you can write that bit in C" as a way to mitigate performance problems with Ruby, when in practice it's so costly compared to available alternatives
I don't think its costly compared to available alternatives; I think its generally an efficient alternative for the type of bottleneck that is actually related to implementation language efficieny. I think, for most typical web apps, the bottlenecks are only rarely of that type, so that's generally not where the effort is going to be spent, but for the ones that do have bottlenecks of that type, its quite appropriate a way of solving it.
> throw more hardware, write manually optimized Ruby, switch to a faster language/runtime
If writing manually optimized Ruby is an effective and cheaper solution, you aren't experiencing the class of bottlenecks that are related to implementation language efficiency. Switching languages or runtimes for a component is a proper subset of the work of switching languages or runtimes for a project, so the latter isn't going to be less costly than the former (it may, if language-related bottlenecks are pervasive, or if you have non-performance interests in the alternative language, have a bigger net payoff and be more cost effective, but it won't be less costly, and its inherently riskier to do all at once, since component-wise transition gives you a faster cycle time in terms of realizing value even if you end up doing a full replacement in the end.)
> And if you have a complex application that utilizes many of Ruby's idioms to deal with the complexity, it's extremely unlikely that you can simply replace parts of it with C libraries without reorganizing in such a way to increase complexity.
I disagree. Anything you can do in Ruby you can do in API-equivalent C that can still call out to the exact same Ruby code for the functions that aren't being moved into C, so there is no reason at all for the kind of reorganization you suggest, particularly if you are building with loosely-coupled components in the first place.
If you are building a complex app and its all tightly coupled, you've got a big maintainability nightmare no matter what language you're using, and that has nothing to do with Ruby.
I completely disagree that Ruby's slowness is rarely the bottleneck. In this benchmark, we have reasonably simple requests on decent hardware in the realm of 1 second latency, where faster frameworks are 10+ times faster. We have Sinatra being far slower than similar frameworks like Scalatra, Unfiltered.
Yes, you can write Ruby in C, but it would be almost as slow as writing Ruby in Ruby. I don't really see the point of saying, you can do anything you can do in Ruby in C, it would be much more verbose and about as slow. The point is that true optimization may force you to do things that you can only do in C and there's no guarantee that this optimized version can be easily utilized from the rest of your Ruby code. This has nothing to do with tight-coupling - it's simply taking advantage of the language's abstraction facilities.
And no, having to write hand-tuned Ruby, as opposed to idiomatic Ruby, to get performance that can be had by writing, say, idiomatic Scala or Haskell is an indictment of slow implementations and prevents you from taking full advantage of the expressiveness provided by the language.
And that's before you get into things like your team may have to get bigger because you need a C/Ruby-extension expert, half the team not being able to understand a critical part of the code base (very few Ruby developers are reasonably competent in C), etc.
Again, the whole point is that Ruby's performance problems pose a real pain point. Yes, you can rewrite parts of it in C, yes you can mitigate by using gems written in C, yes, you can spend more time optimizing, yes you can throw more hardware. But all of those are costly and it's disingenuous to pretend that a problem doesn't exist simply because a workaround does.
None, on purpose. We want maximum portability, and so the Rails defaults are Ruby-only on purpose. Of course, it's easy to add gems that replace things that are written in C or Java, depending on what makes the most sense for your platform.
Would it make sense to submit a rails-optimized pull request for this benchmark that replaces some key performance bottlenecks with appropriate C gems? I'd be curious to see how fast rails can go out of the box without doing your own hand optimization.
If you cannot write one small part of the app in C due to the difficulty or time consumed, then how much better is it for you to write everything in Java from the beginning? Java does not really substitute for Ruby in the same niche.
Seems a big part of this is is a lot of the proponents thought the comparison was really a Ford vs a Corvette, I dont think a lot of people were really internalizing exactly the ramifications of the order(s) of magnitude difference in performance. Which is why this kinda benchmark is pretty helpful.
The traditional Lua API is interpreted only so every call interrupts a trace. You have to use the ffi API to call C code instead. Plus some string operations are interpreted only but that is being fixed. It is all being worked on but there is a fair amount to do...
I think the tractor/ferrari analogy does't really hold up because a tractor to me implies that it would be slower, but more powerful. The ferrari sacrifices power for speed. I'd say these platforms are more like comparing a ferrari with a go-kart. Or comparing a ferrari with another ferrari that has 10,000 lbs of bricks in the trunk. (Actually, that probably doesn't hold up either because ferraris probably don't have trunks!)
But to further add to the analogy, the tractor, the ferrari and the go-kart may all perform about the same if you're only traveling 1 inch.
People chose language X over language Y not for performance reasons. Cost , ease of use and deployment , librairies , available programmers ,etc ... Things are more complicated than just a benchmarks. Furthermore NodeJS and raw PHP are doing quite well in the benchmarks.
I really don't like these benchmarks. Its like benchmarking Fizzbuzz or something. Frameworks don't do anything. No one chooses a framework (at least I don't) based on performance. You choose one framework over the other because you like the API and/or language. I myself am a framework author (giotto, a python framework that was not included in these benchmarks). If my framework had been included, I'm sure it would end up dead last. When I built it, I wasn't thinking about performance, I was focusing on building a framework that would result in applications that are easy to understand/debug and fast to write.
The general point of these benchmarks is not to resemble a full production app, but to provide a baseline measurement. From the original blog post:
This exercise aims to provide a "baseline" for performance across the variety of frameworks. By baseline we mean the starting point, from which any real-world application's performance can only get worse. We aim to know the upper bound being set on an application's performance per unit of hardware by each platform and framework.
But we also want to exercise some of the frameworks' components such as its JSON serializer and data-store/database mapping. While each test boils down to a measurement of the number of requests per second that can be processed by a single server, we are exercising a sample of the components provided by modern frameworks, so we believe it's a reasonable starting point.
So, yes, these benchmarks should not be the only factor in choosing a framework, but they do provide a possibly important data point (depending on the specific scenario).
This is a good point. Especially if you only cared about how fast you can make your app. But if you want to also consider how cheap you can run your app, you need to consider how many app servers will it take to saturate the DB? 1 or 10? At certain scales for certain tasks, the hosting costs matter more than the development costs.
>But if you want to also consider how cheap you can run your app, you need to consider how many app servers will it take to saturate the DB?
Moore's law has made this sorta moot. Unless you're on Heroku, for a successful small-to-medium app, the denominator in your hosting costs is doing to be the salary of the engineer or sysadmin who tends to it.
(If you're on Heroku, then you start worrying about dynos because, with monitoring, you're paying $60 per "worker".)
This is to say, the cost in salary to properly shard a database probably outweighs a year or two of hosting for the extra two or three boxes you're spinning up; almost no one experiences explosive growth where you need to spin up dozens of new boxes overnight.
Moore's law hasn't made it moot. Running in the cloud is pretty slow and extremely expensive. Look at StackExchange for example - they used to handle a LOT of traffic on a handful of servers. Even these benchmarks say (or said) that the EC2 instance used is waaay slower than an i7 2600K.
Except your assumption is complete nonsense. The more non-trivial your app is, the less the database is a bottle neck and the more the app is. The vast majority of web apps are extremely read heavy. Those apps benefit massively from caching, which completely removes the database as a bottleneck. This means you are often choosing between a language and framework combo that means paying for 50 instances vs one that means paying for 4 instances. That is a lot of money, and the fallacious notion that the slower language is more productive by virtue of being slow is silly.
You can build an application in Ruby, deploy it onto 15 machines, and it will outperform the same application written in C and deployed to only one machine. Performance is more of a function of the underlying hardware than the language used to build it.
If a language is 30x less efficient than another language then you would likely need 30x more servers. Many folk are simply not prepared to spend 30x more than they need to on hardware. It's the difference between 20 servers and 600 servers.
Case in point: "How We Went from 30 Servers to 2":
That's only true when you are using the full resources of one or more servers. If you are only using 1/100th of the server's resources, then being 30x less efficient still doesn't require any more servers.
That doesn't negate the point though, language performance matters at certain scales.
If you split a single app instance onto 15 machines, you will lose some efficiency due to network communication unless those 15 instances can work isolated without any shared data (sessions). That may not be much but worst case: you have to write that inter-machine sync code.
Mojolicious, Dancer and Kelp have set the bar for small code size for me. Not sure yet if there are smaller ones (note that there are no other files required for those apps, period)
In the same vein, Lua's OpenResty looks good, as do Tornado, Flask and Bottle (although you need to tease the raw/ORM methods apart to get an idea for the last two). And of course, Sinatra.
There probably a lot more, especially for PHP, but I didn't feel like going through that list.
> I don't consider the Go size pretty small. Mojolicious, Dancer and Kelp have set the bar for small code size for me.
You've basically just listed Perl 3 times though. Particularly when the guts of the code in all 3 of those examples was Perl's standard database interface (the same DBI you'd use for CGI Perl or even standalone .pl scripts).
I do love Perl for the flexibility of it's syntax and how concise the code can be. But for me the performance of Go won out. And while mod_perl* does make great gains in performance, it also makes the code a lot less portable (unlike Go). So I found myself porting my performance critical webapps over to Go
* I've not tried Mojolicious, Dancer nor Kelp so I couldn't comment on how they compare for performance.
> Yeah, the post started slightly different than it ended, and that's an artifact of that change. My next paragraph listed many more, and I tried to do multiple from each language that I looked at which had simple implementation.s
> Particularly when the guts of the code in all 3 of those examples was Perl's standard database interface (the same DBI you'd use for CGI Perl or even standalone .pl scripts).
The benchmark page clearly tags which implementations use raw SQL access and which use an ORM. These all happen to be using raw SQL. To my knowledge, none of them have a pre-bundled ORM, and I'm not sure whether the ORM tested implementations are only supposed to indicate the pre-shipped ORM.
> But for me the performance of Go won out
I wasn't trying to imply they competed on that metric, I just wanted to give some examples of much simpler implementations. What one considers small is obviously relative.
> I've not tried Mojolicious, Dancer nor Kelp so I couldn't comment on how they compare for performance.
They all look to be bottom-half of the full set of results, performance wise. Mojolicious quite a bit slower (relatively, they are all slow compared to Go) than the others, most likely because it uses it's own internal, pure Perl JSON module. There's was to fall back to the optimized C-based JSON::XS module, but I'm not sure whether that would keep with the spirit of the benchmarks.
> The benchmark page clearly tags which implementations use raw SQL access and which use an ORM. These all happen to be using raw SQL. To my knowledge, none of them have a pre-bundled ORM, and I'm not sure whether the ORM tested implementations are only supposed to indicate the pre-shipped ORM.
You miss my point. All of those examples you gave used the same core database framework and as the test was primarily a database performance test, all those 3 examples were essentially the same core Perl code.
Whether it's ORM or raw SQL is completely besides the point (though since we're on the topic, Perl's DBI basically works the same as Go's - or rather that should be the other way around given their age).
>I wasn't trying to imply they competed on that metric
Again, you missed my point. I wasn't suggestion that you were comparing the performance of the two. I was commenting on why I switched away from Perl to Go.
> I just wanted to give some examples of much simpler implementations.
Except you didn't You gave AN example (singular). It was one language; Perl.
> They all look to be bottom-half of the full set of results, performance wise.
I wouldn't trust that kind of benchmark for comparisons of Perl frameworks as setting up a Perl environment isn't as simple as compiling a Go program. With Perl, you have a number of different ways you can hook the runtimes into the web server (CGI, Apache libs, etc), pure Perl and C libraries (which you also mentioned) that significantly affect both memory usage and runtime performance and a whole boatload of config ($ENVS in mod_perl, bespoke handlers, etc) that also affect performance.
The ironic thing with Perl is despite scripts in the language being some of the most portable code on the POSIX community, running performance critical Perl webapps leads to very unportable set ups. (which was the other reason I migrated my sites to Go).
This might sound critical, but I genuinely do love Perl. I'd say it was up there as one of my favourite languages (and over the years I've learn to develop in a great number of different languages). But sadly nothing in life is perfect.
> You miss my point. All of those examples you gave used the same core database framework and as the test was primarily a database performance test, all those 3 examples were essentially the same core Perl code.
I think we are talking past each other. I listed a lot of frameworks, including three in python. I started with Perl, and added a whole bunch more. I could, and should, have presented them better.
Personally I think the fact they are using DBI is the inconsequential part. It takes up few lines of the example, and most of the other code is the specifics of the framework (although they are very similar, because they all Sinatra clones, to varying degrees). What do you expect to be different in a non-DB based test (I'm still unclear what point you are trying to make)? Their template systems are pretty simple to use as well.
> Again, you missed my point. I wasn't suggestion that you were comparing the performance of the two. I was commenting on why I switched away from Perl to Go.
That's fine, and a worthy conversation to have, I'm just trying to keep this on the topic of implementation size, since I think the performance side of the discussion is being handled well enough elsewhere.
> Except you didn't You gave AN example (singular). It was one language; Perl.
Actually I gave eight examples, three Perl, three Python, three lua and 1 Ruby. The fact there were three Perl implementations first, and listed by themselves is sort of an accident. I was really interested in how Mojolicious did, since that's my favorite at the moment, and then I checked the other Perl implementations, and then I looked for others that might be good examples. I intended for them to be taken all together, even if that's not how it seemed.
> With Perl, you have a number of different ways you can hook the runtimes into the web server (CGI, Apache libs, etc), pure Perl and C libraries ...
> he ironic thing with Perl is despite scripts in the language being some of the most portable code on the POSIX community, running performance critical Perl webapps leads to very unportable set ups.
How recent is the data this opinion is based on? My understanding is that now most (new) Perl web projects are using PSGI as a common back-end making it extremely portable, and often using pure-perl servers for performance. There's some evidence they can significantly beat mod_perl2.
> This might sound critical, but I genuinely do love Perl. I'd say it was up there as one of my favourite languages (and over the years I've learn to develop in a great number of different languages). But sadly nothing in life is perfect.
I was really, _really_ trying to not make it a Perl vs Go thing. It's obvious I do have a preference though. I'm glad you like Perl, it does seem to fit the mindset of certain people well, and even if they don't stick with it, they remember it fondly. :)
I wasn't aware of PSGI nor the performance it has compared to mod_perl. That's probably one of the most interesting things I've read on here for a while (interesting in terms of it could have a direct impact on my business).
No problem! To tell the truth I didn't really have a clue about real performance until I looked it up for that post. I use the hypnotoad (pure-Perl preforking non-blocking), server for Mojolicious for my projects, but those are mostly internal, so I didn't have to worry much about performance. I always figured I would look more into it when it mattered. I thought worst case I would deploy using PSGI on mod_perl, but I also knew from prior experience you can get pretty good performance from a pure-Perl solution.
Actually I find Perl's type system to make the most sense for web work:
1) Any zero length string or 0 valued int is classed as false, which is handy when checking the returns from query strings et al.
2) You can use eq for string comparison or == for numeric checking, which means which is handy has you can read values from a query string and then compare it against an int without having to do type conversion.
Don't get me wrong, I don't have anything against statically typed languages - in fact I normally prefer them. But the way Perl does type checking I find reduces the number of type problems when dealing with web development.
That all said, I much prefer working with structured types in Go than in Perl.
> You realize that Go implements the new template test right? Your linked ones do not (at least the ones I spot checked).
hello.go uses more lines defining variables and types than the entirely of many of the alternatives I posted. Obviously they will be a little longer if they implement the fortune handler, but I doubt that will really make much of a difference.
> Also, Go is statically typed = win
I'm not sure what that has to do with implementation size (which is the only thing I was addressing), but feel free to make a case.
One thing I am wondering is "what about concurrency level"?
Just because a server can handle 10x the number of requests when doing a single request a time for 1000 requests, doesn't necessarily mean it can also handle those 1000 request at 10x performance when they all come in at once or in a short time period.
I saw some tests have "256 concurrency" does that mean they are sending 256 request concurrently? I want to see them play more with those numbers. Why not have 1024 or more. Then also play with the number of available CPUs and see which frameworks can auto-scale based on that. Some that can process sequential requests fast might fall face down when faced with slightly increased concurrency, in that respect these benchmarks are a bit misleading.
On the other hand it is good to see latency. That is a important. Now latency vs level of concurrency would also be interesting.
Thanks for doing these extensive benchmarking tests. It would be really helpful to see a more complex example that includes user authentication. Aside from the benchmarks it's also a really good starting point to compare the code in different languages and get a first impression of a framework.
On a side-note, I'd really like to know why so few start-ups seem to be using Spring. It could be just a wrong impression . But from what I have seen most start-ups use RoR or Django. My guess is that Spring is less flexible and less known outside big companies, where it is usually the default. It could also be that Spring works better with the waterfall model whereas Django or RoR are better suited for explorative programming and that fits the respective spheres better.
> It could also be that Spring works better with the waterfall model
I've used spring mvc in an agile setting a couple times now, and it has worked fine. It doesn't tend to make developers all that happy, in my experience. If you're in an enterprise full of spring, starting up the next app with it can be attractive -- there likely already exists a bunch of tooling and knowledge around spring.
I wouldn't use spring directly if I were trying to build something quickly for a startup. I'd be more apt to reach for grails (which wraps spring), dropwizard, or any of the other rapid-development frameworks.
Maybe that's me, but I think it's easier to learn typical web frameworks like Rails, Django etc. than spring. On top of that the xml config sucks (at least in my opinion - though I used spring the last time around 2007, maybe it's not as bad as back then).
I think the overlooked part of this, once we step back from the natural desire to pick 'the best', is that people who care about the platforms are providing a vast set of starting examples for people looking to get started on each network. Its easy to do a side by side comparison of similar tasks across languages which is something that is very valuable and, in my experience, relatively novel. Thanks for all your amazing work!
Off all the top performers, Go seems to be the only sane choice to write a web app. Moreover it is at the sweet spot; expressive, flexible, simple, super performant, good community etc. I think it is convincing enough for me to give Go a serious look for our new app.
As someone who is also really enjoying Go, I think you need to add a huge, gigantic disclaimer before making a statement like this: Go's ecosystem of web development packages is in its infancy. You're not going to find any super-well-documented, super mature/stable web frameworks (thought a few are showing great promise). and some of the individual components (for example, Gorilla) are looking very good, but still have some more cooking to do.
I love the language, but let's not get too carried away until the ecosystem grows. The reality is, if you're going to use Go for web dev, you're going to need to be prepared to do a whole lot of things on your own.
I've read about the limitation of Go in terms of third party libs, but I believe it is only temporary. I had a glance at the std libs, looks really good for a new language. Our new app doesn't require all the bells and whistles of a full fledged framework.
(Disclaimer: I'm pretty new to Go). I tried Revel out a little while ago, and found it awesome at first. It felt like it could have the potential to dramatically speed up the development of Go web apps. I like the server and hotswapping features a lot, for instance.
But at the moment unfortunately I don't think it's very mature. Support for interacting with the DB, arguably the most important part of a web application, is pretty lacking IMO for something that otherwise wants to be an end-to-end solution.
In one of the examples I noticed that all kinds of interaction with the DB was being done in the Controller, not the Model. Which just seems wrong to me (at first I thought it was a "Play" framework thing, since Revel is modeled on that, but Play uses Hibernate for an ORM in its models). Also, you'll have to roll your own support for interacting with the DB using, say, gorp: https://github.com/coopernurse/gorp
That being said, robfig seems like a really cool dude, and he was responsive on github when I needed some help. The documentation is pretty great too.
I've been meaning to get one of the examples setup and just play around with it. To be perfectly honest I'm relatively new to the world of databases and such (I've been using RoR for some projects if that explains why), so tackling that gorp will be another awesome learning experience. Thanks!
how easy is to use GO with other frameworks, let's say PHP?
Would it be possible to write an application that uses PHP for some tasks, so you can benefit from the speed of GO and the the maturity of PHP?
You'll be better off writing raw PHP without classes than using Go. Go performances are raw Go. As soon as you add any framework to the party , the perfs are falling. see the gorilla test which is a Go framework.
Honestly, Scala is far more expressive than Go and much more mature. Scala is more complex, but it's never really been an issue for me or my fellow devs. I'm not sure that "Native" vs "virtual" is a valid point of comparison. Scala uses the JVM which has a highly advanced JIT compiler.
After seeing this, and having been a long time Sinatra devotee (I thought Sinatra was pretty fast until I saw these benchmarks), I'm considering picking up Go or Lua (OpenResty) at some point in the future. I'm blown away by the speed differences.
This is probably caused by running all the queries in their own goroutine (if you don't know what this is just think of it as a thread - but much cheaper).
This causes the queries to be handled in more or less random order.
We'll fix this for Round 5 ;-)
I still think you're not quite grasping his point. "Out of A and B I don't like B so I would chose A" and "The only sane choice is A" are not logically equal in any interpretation of the english language.
Java gets that wrap from originally being slow to execute, and also having a huge up-front cost to spin up a VM.
The first isn't true any more: the Java VM competes with native code on most benchmarks, and due to its ability to perform runtime optimizations, can occasionally outperform native code.
The second doesn't matter at all for web servers. The cost of starting up the web server is tertiary to uptime and performance. If the thing is going to run for 4 months without going down, who cares all that much if it takes 5 seconds or 5 ms to start up?
If you're continuously restarting, you're doing it wrong. That argument became invalid about 4 years ago I think around the time Eclipse Helios was released.
Now-a-days, starting a Tomcat or TomEE JVM in debug mode with Eclipse gives you the ability to hot swap probably 95% of your changes. It doesn't supporting adding completely new functions or changing declared fields. JRebel does support this though.
As a matter of fact, if you're in a stack frame and you pause the execution pointer with a breakpoint, you can completely change the code of the function and the JVM will discard the current stack frame and then restart the functional call. Essentially, you can rewrite your code, while it's executing, without losing your stack.
I think that this lingering perception is an artifact of two things. Firstly, in the mid-to-late 90s, Java was really was slow and back then C/C++ application layers were pretty common (perl if you could get away with it). So initially Java did not compare well, which is why we wasted a couple years on applets. Even as late as 2005 at Amazon, there were many people predicting doom when we introduced the first Java service as a dependency of the home page. Secondly, the early Java web frameworks were highly synchronous, with lots of locking, and there was no evented IO. So sites written that way really were dogs.
Although I hope to never write "public static void main" again (except ironically, of course), and I spend some time dabbling in Python/Ruby/obscure-language land, I'm really happy to see Clojure and Scala doing well here.
Because it is slow. Compared to C/C++/langs without runtimes [albeit with JIT compilation, for long running applications this is becoming a non issue].
Compared to the current crop of dynamic language interpreters, waaaay more engineering time and talent has been poured into optimizing the jvm.
People I think forget that Java was a more user friendly C++; the price you paid was somewhat slower apps, but that's OK because you write more robust apps more easily. Rinse and repeat for Ruby/Python/Your Lang Here.
Feels like Java is the new C, C is the new ASM (performance wise at least). I've been doing this long enough that when I started using Java it was too slow for "anything serious", now it's the coice for performance. Definitely feels weird.
Feature request: it would be nice to have permalinks (even if they were really messy URLs) to filter-sets so that I can share the slice of the benchmarks I'm looking at with people, without having to list the filters manually.
With slim in particular I notice that the benchmarks list it as "Raw database connectivity" but in the code it looks like it's using RedBean ORM. I'll look more at lunch, i'm probably just misreading something.
Although obviously ORM is more realistic, since if you're sophisticated enough to be using composer and a framework, you're probably using an ORM. I know the point of this benchmark is frameworks not ORMs, but it would be interesting to swap them out and see if there's a huge difference.
Raw php also has to initialize connections, and it does really well, so I don't think that's it. To me, that points to overhead from initializing the objects as the bottleneck. I would have thought that with APC, this wouldn't be a major issue, though, so I wonder if that's still not it.
APC only caches the opcodes, so the interpreter doesn't have to parse your code. But the framework still has to set up itself for each request individually. Parse configuration, create objects, etc. It adds up quickly.
Actually, as soon as a lot of db connections are involved, PHP jumps to the head of the pack.
Which means that in most common web use cases (which are db heavy), PHP is as fast as any of them, since all the slowdowns (initialization, slow Zend engine etc) are dwarfed out by the fast db handling.
Yes, it does look like none of the frameworks are using persistent connections, which would explain the horrible performance. To be fair though, it is still a valid measurement if the frameworks don't enable persistent connections by default.
> it is still a valid measurement if the frameworks don't enable persistent connections by default
I don't agree. Frameworks often prefer the "safe" option over "performance" by default. If you activate persistent connections by raw coding it  then you should also set an absolutely obvious database configuration flag like  in a framework.
If you're building an actual webapp I assume you wouldn't put anything more than a require and a call to the app's entry point in the config file. Putting nontrivial amounts of Lua in the config file is more of a way to stave off the evolution into terrible turing-complete languages that config files often make.
I implemented the Ringo app for this benchmark and of course ran it against Node and a couple of others to see how we would perform in this neighborhood before I opened the pull request.
And since that day I've been wondering: why does NodeJs (=V8 JS engine in C) talking to MongoDB have higher response times and latency than Ringo (=Rhino JS engine on JVM) talking to MySQL. The only thing where Node beats us JVM guys seems to be the JSON response test.
Node favours concurrency over raw speed; calls deferred with process.nextTick and callbacks end up costing time, in exchange for better concurrency. I think a blocking driver could leave Ringo in the dust, but it would be useless.
Until the project includes a WebSocket-enabled test or a test with forced idle time (e.g., waiting for an external service to provide a response), concurrency higher than 256 yields very little of interest. The reason being that we are fully saturating the server's CPU cores at 256 concurrency .
Increasing the client-side concurrency level simply means that the front-end web server (or built-in web server's socket listener thread) needs to maintain a small queue of requests to hand off to the application server's worker threads. It doesn't make the server any faster at completing those requests. I've written some more about this at my personal blog .
 Caveat: Some frameworks appear to have locking or resource contention issues and do not saturate the CPU cores. We will attempt to capture CPU utilization stats in future rounds since this might be of interest to readers and framework maintainers. But increasing concurrency would not increase CPU utilization in these scenarios either.
> concurrency higher than 256 yields very little of interest. The reason being that we are fully saturating the server's CPU cores at 256 concurrency .
Well websocket connections are becoming more and more popular. Maybe that's a different benchmark.
But the level of concurrency is pretty important. It basically tells the story of what happens to a "slashdotted" server. If nothing crazy like that happens than most servers might be ok, just maybe have a little higher latency. It is when shit hits the fan that different servers start separating from the herd. Some gracefully slow down, some scale smoothly across CPUs, some start throwing socket errors.
Who cares about these issues? Well anyone who becomes successful. If there are no visitors and no customers and only a GET request here and there every 10 minutes, then those places could really just use any server. A simple Perl or Ruby one will do. Now those that grow and see customers they will be interested in what happens in cases like that. There is a traffic spike at launch of new product so now there is a 200% increase in traffic for that one day and it tapers off.
Maybe we just come from a different background and that's why they focus is on different metrics.
> It doesn't make the server any faster at completing those requests.
But I am not sure what story does benchmarking the servers at an artificial level of concurrency tells us. Maybe it helps those that have a throttling/balancing proxy that always sets the number of connection to 256 at most and otherwise balances out the rest to other servers... And I am not sure if a the heuristic that "If it can handle 2456 requests/second with a single connection at a time" can be extrapolated and implies then it can "handle 2456 concurrent connections in a single second".
Yes, WebSocket is a different test and we aim to include a WebSocket test in the future.
I understand what you're saying about the "Slashdot Effect," but I think you may be misunderstanding me.
Taken from the context of preparing for a Slashdot effect, the 256-concurrency test we are running against high-performance frameworks on our i7 hardware plays out like the world's worst case of Slashdotting. Think about it for a moment: Finagle is processing 232,000 JSON requests per second. It would be even higher if our gigabit Ethernet weren't limiting the test.
With requests being pulled off the web server's inbound queue and processed so quickly, do you think it would be easy to simulate and maintain 1,000, 5,000, or 10,000+ concurrency?
Conceptually, the load tool has an opposite goal of the web server. From the load tool's point of view, an ideal request is one that takes infinitely long. If the request takes a long time for the server to fulfill, the load tool can just keep the request's connection open and satisfy the user's concurrency requirement. East peasy. But as soon as the server fulfills the request, the load tool must snap to it and get another request created ASAP to keep up its agreed-to concurrency level. Asking a load tool to maintain 1,000 (or worse) concurrency versus a web server completing requests at the rate of 232,000 per second is asking a lot. Wrk is up to the challenge, but gigabit Ethernet holds everything back. The Ethernet saturation means that even if you crank up concurrency against a high-performance web server, the results look basically the same. The web server simply doesn't perceive the concurrency target because gigabit Ethernet can't meet the demand.
As I wrote in the blog entry I cited earlier, if you start thinking about the idealized goal of a web server--to reduce all HTTP requests to zero milliseconds--it should become more clear why increasing concurrency beyond the CPU's saturation level doesn't actually do much except show the depth of the web-server's inbound request queue. In other words, once we've saturated the web server's CPUs with busy worker threads, we can increase concurrency for only one goal: to determine at what rate can we get the server to reject requests with 500-series HTTP responses. For the JSON test on gigabit Ethernet, we find it's impossible to cause high-performance frameworks to return 500-series HTTP responses because the load tool simply cannot transmit requests fast enough to keep the server's request queue full.
A slightly less-performant framework--let's use Unfiltered as an example--is not running into the gigabit Ethernet wall but is still processing 165,000 JSON requests per second. Since the network is not limiting the test, the CPU cores are completely saturated. 100% utilization.
165,000 requests per second is way worse than being "slashdotted." Slashdot has many readers, but they can't generate that kind of request rate in their wildest dreams. Hacker News also has a great deal of readers, but nothing with a narrow audience such as this could generate 165,000 requests per second from readers clicking on a news link. Not even an article about Tesla, Google, hackathons, lean startups, girl coders, 3D printers, web frameworks, and classic computing all wrapped into one could generate that kind of request rate from Hacker News readers. Being at the #1 spot on Hacker News will see a few dozen requests per second or so.
* Web servers maintain an inbound queue to hold requests that are to be handed off to worker threads or processes.
* If there are worker threads available, the web server will assign requests to worker threads immediately without queuing the request.
* If there are no worker threads available, the web server will put the request into its queue.
* If a worker thread becomes available and a request is in the queue, it will be assigned as above.
* If no worker thread is available, and the queue is full, the server will reject the request with a 500-series HTTP response.
* Worker threads are made available very quickly if requests are fulfilled very quickly.
* The server becomes starved for worker threads if requests are not fulfilled quickly enough to keep the inbound queue routinely flushed.
* Actual usage does not come in as 1,000 requests in a nanosecond followed by nothing, and then another burst of 1,000 requests in a nanosecond. Even if it did, because gigabit Ethernet is slow, the server's perception of that traffic would be 1,000 requests spread over several milliseconds.
* For (nearly) all platforms and frameworks below roughly 200,000 JSON responses per second, 256 concurrency causes the worker threads to completely saturate the server's CPU cores, busy fulfilling requests. In fact, in many frameworks' cases, even 128 and 256 concurrency are nearly identical--check the data tables view in the result page.
* Since the CPU cores are saturated, increasing concurrency only can demonstrate the limits of the server's inbound request queue. Doing so does not show anything of interest from a performance (completed requests per second; speed of computation) perspective. Once your CPU cores are saturated, your server is in dire risk of filling up its request queue. A hot-fix is quickly adjusting the queue size in the configuration and hoping that's enough to survive the traffic; the real fix is simply fulfilling requests faster.
* In practice, if your server can fulfill 200,000+ requests per second, your server will essentially never actually perceive concurrency over 256 anyway. Gigabit Ethernet simply can't transmit the requests rapidly enough.
I find that questions about very high concurrency (where the questioner is not asking about a WebSocket scenario wherein connections are held live but are mostly idle) are confusing high concurrency with a simpler matter: inability to fulfill requests rapidly enough to keep the web server's inbound queue flushed. That is a performance problem, plain and simple, and not a high-concurrency problem.
In other words, one may perceive a large number of live connections, and think, "this is a high concurrency situation," but what they are actually contending with is a side-effect of being slow.
Ah, okay. That's interesting, I suppose. You basically are interested in seeing at what concurrency level the server starts spitting back 500-series responses (or simply doesn't provide a response). Basically, how many concurrent requests are needed before the server's inbound request queue overflows.
gzipped source size in generally considered a better measure of effort than actual LOC. The reason for this is that the languages with a lot of boiler plate usually have tooling, IDE's etc to make that pain go away, and gzipped source, while still counting repeated boiler plate end up weighing it less than unique lines of code.
If go is less verbose, then why write it in a more verbose language first? The reality is, go and python are pretty much even for verbosity, and there is absolutely no benefit to writing your app in python first and then go. You'd be better off writing it in go and then rewriting it in go if you want the benefits of learning from your first attempt.
While I really enjoy results, I still would prefer to see smaller number of frameworks with specific tasks that are common in web development, logging in (assume user and pass are provided), listing of something. These two would be way more meaningful then anything else to me.
Again, thank you very much for hard work. I think for some there are some revelations there, like that C framework.
Take a look at the new Fortunes test. That test lists rows from a database, sorts them, and then renders the list using a server-side template. It's only implemented in 17 of the 57 frameworks right now, but we hope to have better coverage on that in time as well.
Wow... some of these tests are still pretty severely hobbled.
Is there some reason that you use built in json serialization for some frameworks and not others?
There is also a lot of heterogeneity in the implementation of the multiple queries test. For instance, even if I only look at... say... java frameworks, you seem to implement the exact same feature in very different ways between platforms. For instance, for servlets, you will store all of the results in a simple array... and then write them out when you are done. Like so:
final World worlds = new World[count];
final Random random = ThreadLocalRandom.current();
try (Connection conn = source.getConnection())
try (PreparedStatement statement = conn.prepareStatement(DB_QUERY,
// Run the query the number of times requested.
for (int i = 0; i < count; i++)
final int id = random.nextInt(DB_ROWS) + 1;
try (ResultSet results = statement.executeQuery())
worlds[i] = new World(id, results.getInt("randomNumber"));
catch (SQLException sqlex)
System.err.println("SQL Exception: " + sqlex);
// Write JSON encoded message to the response.
catch (IOException ioe)
// do nothing
But for other frameworks, like Vert.x, you use CopyOnWriteArray to store all of the results... and then write them out when you are done. Like so:
private final HttpServerRequest req;
private final int queries;
private final List<Object> worlds = new CopyOnWriteArrayList<>();
public void handle(Message<JsonObject> reply)
final JsonObject body = reply.body;
if (this.worlds.size() == this.queries)
// All queries have completed; send the response.
// final JsonArray arr = new JsonArray(worlds);
final String result = mapper.writeValueAsString(worlds);
final int contentLength = result.getBytes(StandardCharsets.UTF_8).length;
this.req.response.putHeader("Content-Type", "application/json; charset=UTF-8");
catch (IOException e)
req.response.statusCode = 500;
In other words, you literally create a new array each time you add a result to that CopyOnWriteArray. In fact, not only are you creating a new array, but you are creating new copies of the data in the array as well. Seems a little strange??? DEFINITELY inefficient. Is there a reason that is implemented differently? It seems to me that, at the least, they should both use arrays... but maybe there is something more you guys are testing???
The Onion C based code is written in an even MORE efficient manner for the multiple queries test. It actually stores it's results in json format from the outset! Like so:
snprintf(query,sizeof(query), "SELECT * FROM World WHERE id = %d", 1 + (rand()%10000));
MYSQL_RES *sqlres = mysql_store_result(db);
MYSQL_ROW row = mysql_fetch_row(sqlres);
json_object_object_add(obj, "randomNumber", json_object_new_int( atoi(row) ));
const char *str=json_object_to_json_string(json);
The equivalent java code would be something like:
private final HttpServerRequest req;
private final int queries;
// INSTEAD OF:
//private final List<Object> worlds = new CopyOnWriteArrayList<>();
private final JsonArray worlds = new JsonArray();
public void handle(Message<JsonObject> reply)
final JsonObject body = reply.body;
// INSTEAD OF:
if (this.worlds.size() == this.queries)
// All queries have completed; send the response.
// final JsonArray arr = new JsonArray(worlds);
// INSTEAD OF:
//final String result = mapper.writeValueAsString(worlds);
final String result = worlds.encode();
final int contentLength = result.getBytes(StandardCharsets.UTF_8).length;
this.req.response.putHeader("Content-Type", "application/json; charset=UTF-8");
catch (IOException e)
req.response.statusCode = 500;
With a similar change for Servlets. According to the benchmark results, Onion comes out on top. It's the fastest. But how much of that's because it seems to be written correctly and other tests seem to be written without taking advantage of the same efficiencies.
Is it the case here that some people have sent you test code optimized for their own frameworks?
If that is so, you should add some tests that would not be so amenable to optimization. I'm not picking on Onion here by the way. In fact, the argument could be made that Onion is not actually 'optimized', so much as just written correctly, and the other frameworks have tests written incorrectly. But I just wanted to know if you guys actually intended to use these different implementations for some reason that I am unaware of? Do they make the tests more fair somehow???
Thanks for taking the time to dig in and provide some feedback. As much as possible, we want each test to be representative of idiomatic production-grade usage of the framework or platform. Furthermore, we have solicited contributions from fans of frameworks and the frameworks' authors. A side objective is that the code double as an example of how best to use the framework or platform.
All of this means we fully expect that the implementation approaches will vary significantly.
The multiple query test has a client-provided count of queries, so in most Java cases, we create a fixed-size array to hold the results fetched from the database. I wrote the Servlet and Gemini tests, so I can confirm that behavior in those tests.
We are not Vert.x experts and we have not yet received a community contribution for the Vert.x test. However, it is our understanding that idiomatic Vert.x usage encourages the use of asynchronous queries. The question then is: how do we collect the results into a single List in a threadsafe manner? Is your JsonArray alternative threadsafe? Admittedly, using a CopyOnWriteArrayList gave us pause, but we are not (yet) aware of a better alternative.
The Onion test was contributed by a reader and admittedly its compliance with the specification we've created is perhaps a bit dubious. We want a JSON serializer to process an in-memory object into JSON. I'm not certain if the Onion implementation matches that expectation, but the test implementation nevertheless seemed sufficiently idiomatic for his platform.
We're certainly open to more opinions on that matter.
"..The multiple query test has a client-provided count of queries, so in most Java cases, we create a fixed-size array to hold the results fetched from the database. I wrote the Servlet and Gemini tests, so I can confirm that behavior in those tests..."
I agree, that approach would be best. I just was unsure why you didn't do it in Vert.x.
"...it is our understanding that idiomatic Vert.x usage encourages the use of asynchronous queries..."
Someone can correct me if I am wrong, but my understanding of Vert.x is that any query you send to the event bus is already asynchronous. There is no need for a developer to worry about threads at all when writing a vert.x handler. That handler will only ever be called from a single thread. So using a simple array is fine. Using the JsonArray is even better, because then it matches the Onion test idiomatically speaking. Which, I agree, is what you should be going for.
"...The Onion test was contributed by a reader and admittedly its compliance with the specification we've created is perhaps a bit dubious. We want a JSON serializer to process an in-memory object into JSON..."
Please don't misunderstand, the Onion test does what you want it to do. As well, it does it in the correct idiomatic fashion. That's exactly how I would write the Onion test. I was just wondering why the other tests went out of their way to decode Json and the reencode Json for each result. Onion only ever encodes to Json once, other tests are encoding and decoding multiple times. I only pointed out Vert.x because it was the most egregious. I mean in that case the answer from the persistor is already in Json. It is put in a non Json data structure... and then that data structure is encoded to Json??? Just seemed weird.
EDIT: Just verified that there is no need for thread safe code in a Vert.x handler. (Gotta say... that is pretty slick)
On a connected note... man ... these tests are a VERY good way to learn more about these different frameworks!
You're right, this is fascinating stuff. Like I said, we had not yet received a pull request for the Vert.x test, but presumably we will get one before Round 5? :)
We had not previously understood that there was no need for thread-safe behavior within a Vert.x handler. Removing that (apparently fictional) requirement allows us to use just a simple array. Out of curiosity, can you point me to where you found confirmation that handlers do not require thread safety?
Thanks again for your feedback!
Edit: spot checking Vert.x with a simple array does not appear to affect performance to a measurable degree.
Long story short, you dedicated the entire machine (8 cores) to the equivalent of database connection management, (the persistor). Very little of the machine, (whatever is context switched in effectively), is dedicated to request handling. Try something a bit more fair... like 4 workers and 4 web request handlers.
Also, I was going through the Node.js tests and I had a question... do you guys do any clustering for the Node tests at all? Or are these results from the tests run on a single Node?
Sorry for all the questions, just want to make all of the tests do the same thing across all of the frameworks so that when you guys run it again, we can use that data here in a more meaningful fashion. For us, it is useful.
We used 8 Vert.x verticles on i7 because there are 8 HT cores and our understanding is that the "best practice" for Vert.x is to create a number of verticles equal to the server's core count . Obviously we would be happy to hear from Vert.x experts about an more idealized configuration. Admittedly, we have not spent a great deal of time attempting to tune Vert.x and, like I said earlier, we have not received a pull request .
In all tests, the database test is allocating a greater amount of effort to database connection management (in totality: the handling of connections, statements, queries, and result sets) versus request handling. This is not unique to Vert.x. The reason some frameworks' database tests achieve nearly 50% as many requests per second on i7 versus the pure JSON test is simply that at the ~210k rps range for the JSON tests, we are running into a Gigabit Ethernet wall (which I have commented about elsewhere). If we had 10 GBE, the JSON test results on i7 would be even higher. (Also see comments elsewhere about our intent to normalize, to a degree, the response header requirements since the variation observed is attributable to response-headers .)
Yes, the node.js tests are running with the cluster module .
Thanks for the comments. We have received great feedback in the previous rounds and this round received even more attention so there have been some more good questions. Unfortunately, there has also been some rehashing, which indicates we're not doing a great job of explaining to people how each environment is configured (linking to the repository only goes so far). That said, we also continue to receive some fantastic pull requests. Thanks to everyone who has helped out!
Unless the number of reads heavily outnumbers the numbers of writes, it is better to use something like Collections.synchronizedList(new ArrayList<String>()) instead of CopyOnArrayList. The synchronized list does lock while reading or writing, but writing still becomes much faster.
Thanks! That's true. Incidentally, we spot tested the Vert.x test with a plain array in lieu of the CopyOnArrayList and there was no perceptible performance change.
We're very happy to make it more correct and eek out whatever small gains can be had--plus make the code cleaner. But if there are readers looking to improve the Vert.x numbers, I think we're going to need contributions from someone with a deeper understanding of Vert.x tuning (or some more time to invest in becoming that!)
I have one more question, I see a number of really fat frameworks on top, if rails is faster, then it must be because it is heavily optimized and people who made it are super smart, but for example codeigniter is above slim, kohana is above fuel, like twice the speed for Fortunes test. This isn't what I would expect at all, I would also expect Rails to be below php frameworks for speed alone.
Did you use orm when it was available? Or just used raw query?
This is in line with my previous query about more usage scenarios (and thank you for fortunes test) who would be like we use them usually.
My point is, that the same reason why we use framework instead of raw language, we use orm instead of direct sql.
Ok to answer my question, it is raw query passing to db. I know this is easier for test to be made, but it is not as realistic. Having said that, I really appreciate work you are doing, let me repeat that for 100 times, I don't want to sound ungrateful, but this is still not the benchmark that can be used to compare frameworks.
it is stated in the benchmark when a framework test uses an ORM or "raw" pdo/whatever for db request.
Things like Doctrine are elegant and smart ,but let's face it,they are so slow. PHP is not JAVA. Hibernate may be fast on JAVA but Doctrine is hardly (fast)...
Symfony Laravele and Silex share the same http-kernel & event-dispatcher. Laravele and Silex however can use closures for controllers and filters/middleware, Maybe that's why there are faster.
Classes are expensive in PHP , since PHP is not OO centric and classes are merely a add-on. Bootstraping Symfony means creating an insane number of objects. There are things that could be done about it. I'm sure PHP frameworks are so slow because of the abuse of class hierarchy.
Any chance you could add JEE6?
The two major JSF 2.1 implementations MyFaces and Mojarra are both missing. I don't expect awesome performance numbers, but since JSF2.1 is the 'official' (sigh) web framework of JEE6, it would be interesting how awful they are compared to some of these other languages.
I guess though if this is only testing JSON serialization, it may not make sense. Perhaps adding JAX-RS implementations like CXF, Jersey, RESTEasy, and RESTLet would be more appropriate.
I don't particularly love Java itself, but it is quite impressive how Java and other JVM languages are pretty much ass-kicking at the top of the list. That seems to go for heavyweight stuff as well as simpler "micro" frameworks.
I guess if anything it speaks to what a solid piece of software the JVM is.
Well no matter what you use, you should always benchmark for that exact reason. Making things go fast takes a lot of performance engineering, and the JVM has millions of dollars worth of tuning put into it.
BTW, if you like Node.js, you should probably look at Vert.x. I haven't used it, but it's a similar concept, it runs on the JVM, and it seems to spank Node.js.
Indeed. Node's IPC costs are orders of magnitude slower than Java's volatile or atomic datatypes. I would only choose Node for stateless or trivially parallelizable problems--e.g., those where I could push the state problem into a runtime with real threads.
I got on the Node thing for awhile too, but went back to Java/JBoss/Tomcat/Spring. If you are skilled with the Java stack it is hard to beat for performance and breadth. If you are not, well, I admit the learning curve is steep.
But for quickly spinning up a few light service endpoints, Node can't be beat. Especially if you are using JSON-based persistence like MongoDB or CouchDB, using JSON all the way from database to the client is a huge win. I get tired of writing lots of JAXB POJOs to map my JSON objects to and from, especially early on in development when those definitions change rapidly. That's why enjoy using Node, especially for "toy" projects. Less boilerplate and more productive quickly.
Side note: I find myself wishing Node had Annotations and AOP... one of Java's coolest (though oft-misused) features IMHO.
The ability to script Java is one of Ringo's killer features - for this benchmark, for example, we dropped in two jars (JDBC myqlconnector & connection pooling from apachecommons) and glued them together with 10 LOC of JS.
The ecosystem around Clojure web development is in good shape. I have used Compojure and Hiccup (and Noir) for a large part of my web development in the last few years and it has been a happy experience.
I have had a little less joy experimenting with both Clojurescript and Ember.js (with Clojure back end services): I eventually get things working, but at a huge time cost over writing non-rich clients just using Hiccup.
I know the following is small, but surprised Pyramid didn't make the cut. I've used it for a couple of projects outside of work, and among the several other web frameworks I've worked with (Struts, Play, Django), it seems relatively mature and well documented.
Raw comparisons are quite misleading - ridiculous setups or slow interpreted languages aside, full stacks with ORM are going to be slower than micro stacks with raw DB access. Play! does fairly well compared to other frameworks of its kind.
Could anyone explain: why is gemini sooo much faster then the others in these tests?
I believe that the way these test are setup slightly advantage gemini, and more broadly java. Since they do not measure memory usage, or tasks that make memory usage critical, which is something JVM sucks at.
Gemini is our in-house framework and there are two points to consider:
(a) We are obviously very familiar with Gemini and therefore know how to use it effectively. For example, we know that we prefer to deploy Gemini applications using the Caucho Resin application server because it has proven the quickest Java application server in our previous experience. Of course, the other Java Servlet-based frameworks also benefit from deployment on Resin in these tests.
(b) In our design of Gemini, we do keep an eye on performance. But as the data shows, there are faster options.
It's very depressing to see Symfony2 at the bottom of these lists. Although there are quite a few performance optimizations that can be done to improve this, there are few excuses for such poor performance by default.
If you have improvements to the test (that would be realistic for a production deployment) then please submit a pull request. We definitely want to show each framework doing the best it can do.
A few frameworks have a "stripped" version (just Django and Rails so far) to try to show the best that can be achieved when typical functionality is stripped out. Essentially optimizing for this test, which is interesting even if it isn't the point of these benchmarks. If you think Symfony2 would benefit from a separate "stripped" test then please consider a submitting a pull request with that.
At this point, most of the tests were contributed/improved by the community. Not all have been reviewed by experts in that framework, and I agree that an "expert reviewed" marker would be nice.
That said, we have tried to not run anything in the "default" configs, but rather the "production deployment" configs if we could find documentation on that. Unfortunately there is a huge variability across frameworks in how good the "production deployment" documentation is.
though they're also using Rails 3.2.11, which not only isn't the latest 3.2.* release but doesn't take advantage of Ruby 2.0... it would be more interesting to see the latest Rails 3.2.* on Ruby 1.9.* along with the latest Rails 4 release candidate on Ruby 2.0.0-p0
I encourage you to benchmark Rails 4 yourself and see if there is a measurable difference from the latest 3.2.x, to get a preview of the impact it will have. When Rails 4 is released we'll definitely want to upgrade to that.
Edit: pfalls is too fast for me, and just completed the upgrade to 3.2.13 and closed the issue. The next round will use Rails 3.2.13.