Adding Concurrency to Rails Apps with Unicorn

siong1987 · on Feb 27, 2013

This sounds really cool but one thing that this blog post didn't mention is that there is memory constraint on each dyno. So, this model might work if you have a Rails[1] app that doesn't use up a lot of memory. And, since Unicorn forks as worker processes, so, 2 unicorn workers will use up double the memory of a typical Rails app. If you have a really huge Rails app, you could still end up using only one unicorn worker for your Rails app, which is the same as just using any webserver on one dyno.

One interesting thing about the post is the mention of Puma[2] as the webserver for Rails app. Puma is thread based, so, its memory footprint is significantly less than unicorn. So, in theory, you should be able to handle more concurrent requests(more threads) per dyno with Puma.

Btw, don't just blindly follow the post. Make sure you test your app and see whether your app use up more memory each dyno allows. If your dyno runs out of memory, Heroku will just silently drop all your requests. So, you might end up having a worse problem.

[1]: In general, Rails uses up quite a lot of memory on cold start. [2]: http://puma.io/

tenderlove · on Feb 27, 2013

I'd like to figure out where the memory consumption is. I feel like we could do a better job reducing the cold start memory footprint.

I've done some stuff with routes, but I feel there are other wins we could get. The good news is we have `ObjectSpace.memsize_of_all`:

    irb(main):001:0> require 'objspace'
    => true
    irb(main):002:0> ObjectSpace.memsize_of_all(String)
    => 169318
    irb(main):003:0> x = 'x' * 1024; nil
    => nil
    irb(main):004:0> ObjectSpace.memsize_of_all(String)
    => 171817
    irb(main):005:0>

What's cool is that in MRI we can get size of the instruction sequences too:

    irb(main):005:0> ObjectSpace.memsize_of_all(RubyVM::InstructionSequence)
    => 1600800
    irb(main):006:0> 100.times { |i| eval("def x_#{i}; end") }
    => 100
    irb(main):007:0> ObjectSpace.memsize_of_all(RubyVM::InstructionSequence)
    => 1711904

Also we can see the delta between methods defined with eval vs define_method:

    irb(main):011:0> ObjectSpace.memsize_of_all(RubyVM::InstructionSequence)
    => 1717984
    irb(main):012:0> 100.times { |i| define_method("x_#{i}") { } }
    => 100
    irb(main):013:0> ObjectSpace.memsize_of_all(RubyVM::InstructionSequence)
    => 1722416

:-)

EDIT: stop saying "cool" so much. :-/

_pctq · on Feb 27, 2013

I'm totally not in memory optimization, but I suppose an app memory size is directly proportional to its required gems' size ? Or is it in the contrary quite insignificant in runtime memory ?

tenderlove · on Feb 27, 2013

Ya, it's definitely related to the number of files required. There are other factors though, like whether methods are defined via eval vs define_method, string allocations from stuff like database tables / column names.

btilly · on Feb 27, 2013

And, since Unicorn forks as worker processes, so, 2 unicorn workers will use up double the memory of a typical Rails app.

Thanks to copy on write, the amount of memory needed for 2 processes is strictly less than 2. Just make sure to preload as much data as you can before you fork. And in many use cases memory slowly comes unstuck (change anything, and a whole page of data comes unshared) so it can make sense to kill workers every so often and let them refork with everything properly shared.

markov_twain · on Feb 27, 2013

Unfortunately, Ruby 1.9.x is not copy-on-write friendly, mainly due to the garbage collection strategy. I would venture to guess that most Rails apps these days are running on some version of Ruby in the 1.9 series. So siong1987 is generally correct in the assumption that forking a Rails process n times will consume about n times the amount of memory as a single process. However, Ruby 2.0 has a new garbage collector (called bitmap-marking) that promises to be copy-on-write friendly, so as adoption of that increases your suggestions will become more important.

zwily · on Feb 27, 2013

Brightbox has Ruby 1.9 packages with the GC copy-on-write-friendly patches included. We run it in production with good results.

markov_twain · on Feb 27, 2013

AFAIK, to use the brightbox packages on heroku you'd have to write a buildpack that either downloads all the dependencies and runs through the same build process that brightbox uses, or use something like heroku-buildpack-fakesu to create a fakeroot type of environment where you can install debs.

Another issue with using the brightbox packages is that if you happen to run into a bug, you'll have to figure out if the bug was caused by something non-standard in your ruby installation or if it's actually a bug in ruby.

One last thing to note is that it looks like the latest brightbox release targets patchlevel 327, while ruby core is at 392 (not counting 2.0.0-p0), so you're missing a lot bug fixes until the brightbox team gets around to building against the latest release.

zwily · on Feb 27, 2013

We don't deploy on Heroku, so I didn't know about that process.

Your other two points are true though. We consider the trade-off worth it for the memory gains.

pixelcort · on Feb 27, 2013

Will Unicorn automatically kill older forked processes periodically and fork off new ones, or is that something that would need to be done separately?

swampthing · on Feb 27, 2013

One of the comments on the Heroku article mentions this gem, which does just that:

https://github.com/kzk/unicorn-worker-killer

kawsper · on Feb 27, 2013

Regarding memory, I think you can do something like this in your Unicorn config:

after_fork do |server, worker| size_in_bytes = 60010241024 # megabytes Process.setrlimit(Process::RLIMIT_AS, size_in_bytes)

lucaspiller · on Feb 27, 2013

We haven't had any issues with memory leaks since we've been using Unicorn and Rails 3. Back in the old days of Mongrel and Rails 2 we had to restart the processes periodically because of this, but we haven't seen any issues with this setup. Here are the stats:

    www-data 21714  0.0  5.6 102928 87164 ?        Sl   Feb26   0:30 unicorn_rails master -c production.conf.rb -E production -D                                                        
    www-data 21762  0.2  8.5 150672 131996 ?       Sl   Feb26   1:58 unicorn_rails worker[0] -c production.conf.rb -E production -D                                                     
    www-data 21768  0.2  8.5 152036 132352 ?       Sl   Feb26   1:58 unicorn_rails worker[1] -c production.conf.rb -E production -D                                                     
    www-data 21774  0.2  8.1 145248 125668 ?       Sl   Feb26   1:56 unicorn_rails worker[2] -c production.conf.rb -E production -D                                                     
    www-data 21780  0.2  8.3 148792 129736 ?       Sl   Feb26   2:01 unicorn_rails worker[3] -c production.conf.rb -E production -D                                                     
    www-data 21786  0.1  8.3 148384 129260 ?       Sl   Feb26   1:48 unicorn_rails worker[4] -c production.conf.rb -E production -D                                                     
    www-data 21792  0.1  8.6 152156 133880 ?       Sl   Feb26   1:52 unicorn_rails worker[5] -c production.conf.rb -E production -D

lucian1900 · on Feb 27, 2013

Why would that be desirable? (I'm not a Ruby developer, so I have no idea)

kawsper · on Feb 27, 2013

I think Unicorn kills off old workers and spawns new regularly.

_pctq · on Feb 27, 2013

> So, this model might work if you have a Rails[1] app that doesn't use up a lot of memory.

I can confirm that. In my previous company, we had a lot of H12 (request timeout) on heroku. We thought this should be our codebase fault, without being able to isolate why some usually fast requests were sometime so long to complete. As a workaround, we decided to temporary switch to unicorn to add more concurrency to our dynos.

H12 went down, but we were flooded by R14 (memory exceeded).

Even with two workers, it was too much. Granted, our app was eating too much memory and we were refactoring it to split it into several distinct apps.

We also had the problem that heroku router seemed to send requests to dyno while master unicorn process was starting (and thus, could not process it yet), which was a major problem since we used hirefire, which keep starting new dynos(and thus, new unicorn masters).

scottshea · on Feb 27, 2013

I am glad to see them mentioning this. We use Unicorn, and have for quite a while, as a means of lowering the bottlenecks and slow response times. I strongly suggest looking at Unicorn Worker Killer too (https://github.com/kzk/unicorn-worker-killer) given the cap on memory for the dynos

eminh · on Feb 27, 2013

For those interested in doing monitoring outside of the process itself, we run this command every minute via cron

ps -o rss,pid= -U username | sort -k 1 -nr | head -2 | awk -F: '{FS=" ";if ($1>limit*1000) print $2}' limit=`jot -r 1 144 192` | xargs kill -QUIT

It will gracefully kill 2 most memory-consuming unicorn workers above random range of 144-192MB of memory. It proved to be extremely efficient and simple solution to our needs.

cmelbye · on Feb 27, 2013

Is it just me, or is this not at all the same kind of node.js-style concurrency that makes sense Heroku's new router? Using Unicorn simply means that you're firing up a few more workers, it's only slightly better than buying more dynos.

33degrees · on Feb 27, 2013

It's actually a lot better than buying more dynos, because you're adding a second level of intelligent routing to the system. The original rap genius blog post has some graphs that illustrate the benefits: http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics...

jrochkind1 · on Feb 27, 2013

It is not at all node.js-style concurrency.

It is probably about the same as buying more dynos -- I would think that unicorn set to fork three times is actually about the same as three dynos with thin in standard configuration -- as you say, either way it's three processes, each of which can only handle one request at a time.

It is _probably_ not a solution to the bad worst-end latencies that can occur when you have random (or round-robin) routing, and relatively high variation in request durations.

Using a multi-threaded request dispatch (NOT what unicorn does, but what they start hinting at towards the end) _may_ be a solution to that though. Although multi-threaded concurrency is STILL not "at all the same kind of node.js-style concurrency", as node.js is evented, not multi-threaded. But multi-threaded concurrency seems likely to make sense with random or round robin routing too. Although more research and data is called for (at least more than I have, which is zero).

But there's no reason to assume that node.js-style evented concurrency is the only thing that can possibly make sense with random routing. And if it is, that's very inconvenient. Trying to do node.js-style evented concurrency with a Rails app... or any app that wasn't built from the ground-up (including all it's dependencies) for evented-style concurrency... or even if it WAS... it's non-trivial.

In contrast, multi-threaded request dispatch requires not a lot of change to your code -- basically just avoiding shared in-memory state between requests (basically, global/class variable access).

In addition to puma, thin in fact does have a multi-threaded mode, although it's poorly documented.

I think they are actually passing the buck suggesting "We can't tell you which way to do multi-threaded request handling, it depends on your app!" Well, sure, it always depends on your app, but that applies to their suggestions to use unicorn as well, but that didn't stop them. There are still often general best practices for typical web apps.

I think it's true that they can't tell us the right way to do multi-threaded request handling -- because it requires more research and analysis and experimentation (and possibly a patch or two) to figure out the best practices here, the community hasn't put enough into it yet. But if anyone's got the resources to put into it, it's heroku, and it would be rather good for their business to figure this out and educate the community. Heroku became so respected because they seemed to really know what they were doing, to be at the top of the game -- if they can't manage to figure this out either, it lowers our trust in them.

bitcartel · on Feb 27, 2013

I think the OP is referring to the fact that Node is also single-threaded and you have to use the Node Cluster module to fork child processes.

habosa · on Feb 27, 2013

So after reading this I can't see a single disadvantage to using Unicorn. Is there any reason why I shouldn't immediately switch from thin?

bretthopper · on Feb 27, 2013

Unicorn is only meant for "fast clients" and they specifically say that "slow clients" should be served by a reverse proxy like nginx. I put those terms in quotes since they're subjective. But I've read that Fast Clients are only those on a LAN. So basically, Unicorn should NEVER be directly exposed to a user.

Problem is, Heroku doesn't run a reverse proxy anymore on its Cedar stack. Rainbows[1] is a server based on Unicorn but designed to also handle longer requests/slow clients.

[1]http://rainbows.rubyforge.org/

lucaspiller · on Feb 27, 2013

Is there a good explanations of why Rainbows is better than Unicorn for slow clients? All I seem to have read is that it is better, but without going into details what is different.

We use Nginx in front of Unicorn as suggested, however we also do synchronous file uploads and thumbnailing, the process takes a while (maybe 20 seconds) and I would like to understand whether we should be using Rainbows for this.

habosa · on Feb 27, 2013

Why does the speed of the client matter?

bretthopper · on Feb 27, 2013

This is a good read about it: http://unicorn.bogomips.org/PHILOSOPHY.html

fdr · on Feb 27, 2013

The downside is use of more memory that can cause an out-of-memory condition, and the subsequent guesswork as to what the concurrency factor ought to be. However, it is often a really good idea for most applications that don't need a whole lot of memory and spend a long of time waiting (for databases, the client, or whatever).

Quite some time ago I wrote (mostly for my own entertainment) a little example project to try to cut Dyno use for Python programs, although many of the precepts are exactly the same: https://github.com/fdr/double-boiler

33degrees · on Feb 27, 2013

I've been running unicorn for a while, and the only real issue I've run into is apps taking too long to start up and their processes getting killed. Turning preload_app off seems to fix it, though.

malyk · on Feb 27, 2013

If you are using the NewRelic add-on you can go to the "dynos" tab and see the memory footprint of your application over time. The Heroku memory cap is 512mb, so if you stay a decent amount of space clear of that then you are likely to be ok running multiple workers per dyno.

For instance, our app which I'd say is a medium sized rails app, generally consumes between 175mb and 210mb of memory. So we /should/ be safe running 2 workers per dyno which would effectively double our processing power.

HOWEVER, if you hover over the memory graph it'll tell you the min/max/avg consumption over that particular time period. Looking at that I see we sometimes spike the max over 350mb. The question is, what is that spike going to look like running unicorn. Unfortunately, I can't think of many ways to accurately test/predict that without simply giving it a try.

One other thing that was mentioned in the comments of the article is that you can set the unicorn backlog to a very low number (default appears to be 1024) which will apparently tell the heroku router that a particular dyno is "full" and to look somewhere else to process the request. One comment recommends setting the backlog to 25. I wonder what setting the backlog to something like #workers*2 would do.

hayksaakian · on Feb 27, 2013

Check out the unicorn-rails gem if you'd rather not fiddle with config files.

nwienert · on Feb 27, 2013

I was just checking it out but I saw there were a ton of forks and a number of seemingly important issues. I was hesitant to use it after seeing that.

molecule · on Feb 27, 2013

...there were a ton of forks and a number of seemingly important issues...

I see no open issues, 2 closed issues, and 5 forks, in which two or three configuration settings are tuned.

https://github.com/samuelkadolph/unicorn-rails

nwienert · on Feb 27, 2013

Ah, my bad. Confused with this:

https://github.com/sosedoff/capistrano-unicorn

nifoc · on Feb 27, 2013

Since the Unicorn documentation explicitly states that you should not let it handle slow clients (everything outside your datacenter) directly, using Rainbows![1] might be a better solution.

[1] http://rainbows.rubyforge.org/

markov_twain · on Feb 27, 2013

Heroku apps at the very least sit behind the heroku router, which I would assume is in the same datacenter. If the information in this quora answer http://www.quora.com/Scalability/How-does-Heroku-work is still valid, your app is also behind a front-facing nginx reverse-proxy.

I'm not sure that actually means it's handling slow-clients, though.

nifoc · on Feb 27, 2013

I think the important part is that the reverse-proxy should buffer the request, but I can't find anything about wether or not the Heroku router does this.

maxpow4h · on Feb 27, 2013

GitHub wrote about this back in 2009.

https://github.com/blog/517-unicorn

obilgic · on Feb 27, 2013

This (concurrency) makes heroku's routing system little more bearable...