This sounds really cool but one thing that this blog post didn't mention is that there is memory constraint on each dyno. So, this model might work if you have a Rails[1] app that doesn't use up a lot of memory. And, since Unicorn forks as worker processes, so, 2 unicorn workers will use up double the memory of a typical Rails app. If you have a really huge Rails app, you could still end up using only one unicorn worker for your Rails app, which is the same as just using any webserver on one dyno.
One interesting thing about the post is the mention of Puma[2] as the webserver for Rails app. Puma is thread based, so, its memory footprint is significantly less than unicorn. So, in theory, you should be able to handle more concurrent requests(more threads) per dyno with Puma.
Btw, don't just blindly follow the post. Make sure you test your app and see whether your app use up more memory each dyno allows. If your dyno runs out of memory, Heroku will just silently drop all your requests. So, you might end up having a worse problem.
[1]: In general, Rails uses up quite a lot of memory on cold start.
[2]: http://puma.io/
I'm totally not in memory optimization, but I suppose an app memory size is directly proportional to its required gems' size ? Or is it in the contrary quite insignificant in runtime memory ?
Ya, it's definitely related to the number of files required. There are other factors though, like whether methods are defined via eval vs define_method, string allocations from stuff like database tables / column names.
And, since Unicorn forks as worker processes, so, 2 unicorn workers will use up double the memory of a typical Rails app.
Thanks to copy on write, the amount of memory needed for 2 processes is strictly less than 2. Just make sure to preload as much data as you can before you fork. And in many use cases memory slowly comes unstuck (change anything, and a whole page of data comes unshared) so it can make sense to kill workers every so often and let them refork with everything properly shared.
Unfortunately, Ruby 1.9.x is not copy-on-write friendly, mainly due to the garbage collection strategy. I would venture to guess that most Rails apps these days are running on some version of Ruby in the 1.9 series. So siong1987 is generally correct in the assumption that forking a Rails process n times will consume about n times the amount of memory as a single process. However, Ruby 2.0 has a new garbage collector (called bitmap-marking) that promises to be copy-on-write friendly, so as adoption of that increases your suggestions will become more important.
AFAIK, to use the brightbox packages on heroku you'd have to write a buildpack that either downloads all the dependencies and runs through the same build process that brightbox uses, or use something like heroku-buildpack-fakesu to create a fakeroot type of environment where you can install debs.
Another issue with using the brightbox packages is that if you happen to run into a bug, you'll have to figure out if the bug was caused by something non-standard in your ruby installation or if it's actually a bug in ruby.
One last thing to note is that it looks like the latest brightbox release targets patchlevel 327, while ruby core is at 392 (not counting 2.0.0-p0), so you're missing a lot bug fixes until the brightbox team gets around to building against the latest release.
We haven't had any issues with memory leaks since we've been using Unicorn and Rails 3. Back in the old days of Mongrel and Rails 2 we had to restart the processes periodically because of this, but we haven't seen any issues with this setup. Here are the stats:
> So, this model might work if you have a Rails[1] app that doesn't use up a lot of memory.
I can confirm that. In my previous company, we had a lot of H12 (request timeout) on heroku. We thought this should be our codebase fault, without being able to isolate why some usually fast requests were sometime so long to complete. As a workaround, we decided to temporary switch to unicorn to add more concurrency to our dynos.
H12 went down, but we were flooded by R14 (memory exceeded).
Even with two workers, it was too much. Granted, our app was eating too much memory and we were refactoring it to split it into several distinct apps.
We also had the problem that heroku router seemed to send requests to dyno while master unicorn process was starting (and thus, could not process it yet), which was a major problem since we used hirefire, which keep starting new dynos(and thus, new unicorn masters).
I am glad to see them mentioning this. We use Unicorn, and have for quite a while, as a means of lowering the bottlenecks and slow response times. I strongly suggest looking at Unicorn Worker Killer too (https://github.com/kzk/unicorn-worker-killer) given the cap on memory for the dynos
It will gracefully kill 2 most memory-consuming unicorn workers above random range of 144-192MB of memory. It proved to be extremely efficient and simple solution to our needs.
Is it just me, or is this not at all the same kind of node.js-style concurrency that makes sense Heroku's new router? Using Unicorn simply means that you're firing up a few more workers, it's only slightly better than buying more dynos.
It's actually a lot better than buying more dynos, because you're adding a second level of intelligent routing to the system. The original rap genius blog post has some graphs that illustrate the benefits: http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics...
It is probably about the same as buying more dynos -- I would think that unicorn set to fork three times is actually about the same as three dynos with thin in standard configuration -- as you say, either way it's three processes, each of which can only handle one request at a time.
It is _probably_ not a solution to the bad worst-end latencies that can occur when you have random (or round-robin) routing, and relatively high variation in request durations.
Using a multi-threaded request dispatch (NOT what unicorn does, but what they start hinting at towards the end) _may_ be a solution to that though. Although multi-threaded concurrency is STILL not "at all the same kind of node.js-style concurrency", as node.js is evented, not multi-threaded. But multi-threaded concurrency seems likely to make sense with random or round robin routing too. Although more research and data is called for (at least more than I have, which is zero).
But there's no reason to assume that node.js-style evented concurrency is the only thing that can possibly make sense with random routing. And if it is, that's very inconvenient. Trying to do node.js-style evented concurrency with a Rails app... or any app that wasn't built from the ground-up (including all it's dependencies) for evented-style concurrency... or even if it WAS... it's non-trivial.
In contrast, multi-threaded request dispatch requires not a lot of change to your code -- basically just avoiding shared in-memory state between requests (basically, global/class variable access).
In addition to puma, thin in fact does have a multi-threaded mode, although it's poorly documented.
I think they are actually passing the buck suggesting "We can't tell you which way to do multi-threaded request handling, it depends on your app!" Well, sure, it always depends on your app, but that applies to their suggestions to use unicorn as well, but that didn't stop them. There are still often general best practices for typical web apps.
I think it's true that they can't tell us the right way to do multi-threaded request handling -- because it requires more research and analysis and experimentation (and possibly a patch or two) to figure out the best practices here, the community hasn't put enough into it yet. But if anyone's got the resources to put into it, it's heroku, and it would be rather good for their business to figure this out and educate the community. Heroku became so respected because they seemed to really know what they were doing, to be at the top of the game -- if they can't manage to figure this out either, it lowers our trust in them.
Unicorn is only meant for "fast clients" and they specifically say that "slow clients" should be served by a reverse proxy like nginx. I put those terms in quotes since they're subjective. But I've read that Fast Clients are only those on a LAN. So basically, Unicorn should NEVER be directly exposed to a user.
Problem is, Heroku doesn't run a reverse proxy anymore on its Cedar stack. Rainbows[1] is a server based on Unicorn but designed to also handle longer requests/slow clients.
Is there a good explanations of why Rainbows is better than Unicorn for slow clients? All I seem to have read is that it is better, but without going into details what is different.
We use Nginx in front of Unicorn as suggested, however we also do synchronous file uploads and thumbnailing, the process takes a while (maybe 20 seconds) and I would like to understand whether we should be using Rainbows for this.
The downside is use of more memory that can cause an out-of-memory condition, and the subsequent guesswork as to what the concurrency factor ought to be. However, it is often a really good idea for most applications that don't need a whole lot of memory and spend a long of time waiting (for databases, the client, or whatever).
Quite some time ago I wrote (mostly for my own entertainment) a little example project to try to cut Dyno use for Python programs, although many of the precepts are exactly the same: https://github.com/fdr/double-boiler
I've been running unicorn for a while, and the only real issue I've run into is apps taking too long to start up and their processes getting killed. Turning preload_app off seems to fix it, though.
If you are using the NewRelic add-on you can go to the "dynos" tab and see the memory footprint of your application over time. The Heroku memory cap is 512mb, so if you stay a decent amount of space clear of that then you are likely to be ok running multiple workers per dyno.
For instance, our app which I'd say is a medium sized rails app, generally consumes between 175mb and 210mb of memory. So we /should/ be safe running 2 workers per dyno which would effectively double our processing power.
HOWEVER, if you hover over the memory graph it'll tell you the min/max/avg consumption over that particular time period. Looking at that I see we sometimes spike the max over 350mb. The question is, what is that spike going to look like running unicorn. Unfortunately, I can't think of many ways to accurately test/predict that without simply giving it a try.
One other thing that was mentioned in the comments of the article is that you can set the unicorn backlog to a very low number (default appears to be 1024) which will apparently tell the heroku router that a particular dyno is "full" and to look somewhere else to process the request. One comment recommends setting the backlog to 25. I wonder what setting the backlog to something like #workers*2 would do.
Since the Unicorn documentation explicitly states that you should not let it handle slow clients (everything outside your datacenter) directly, using Rainbows![1] might be a better solution.
Heroku apps at the very least sit behind the heroku router, which I would assume is in the same datacenter. If the information in this quora answer http://www.quora.com/Scalability/How-does-Heroku-work is still valid, your app is also behind a front-facing nginx reverse-proxy.
I'm not sure that actually means it's handling slow-clients, though.
I think the important part is that the reverse-proxy should buffer the request, but I can't find anything about wether or not the Heroku router does this.
One interesting thing about the post is the mention of Puma[2] as the webserver for Rails app. Puma is thread based, so, its memory footprint is significantly less than unicorn. So, in theory, you should be able to handle more concurrent requests(more threads) per dyno with Puma.
Btw, don't just blindly follow the post. Make sure you test your app and see whether your app use up more memory each dyno allows. If your dyno runs out of memory, Heroku will just silently drop all your requests. So, you might end up having a worse problem.
[1]: In general, Rails uses up quite a lot of memory on cold start. [2]: http://puma.io/