
Twitter switches from Mongrel to Unicorn - mqt
http://engineering.twitter.com/2010/03/unicorn-power.html
======
wglb
This is a good engineering discussion. At the core of it is what we used to
call back in the 70s doing real-time medical work "multiple server queues"
versus "multiple queues each with one server". The different performance
implications of each of these was pretty well known well before we studied it.
The reference book we used as I recall was "Real-Time Data processing" by
Stimmler (maybe also Robert Martin of Robert Martin fame).

~~~
anotherjesse
I thought the coverage was pretty weak. Not due to unicorn but because there
are many options (including passenger (aka mod_rails) for apache) that allow
queueing to occur at the proxy instead of at the individual workers.

I wish they had discussed why unicorn instead of mongrel and why didn't
solutions like haproxy or passenger or ... didn't work?

This discussion was: our apache configuration didn't work so we switched the
load proxy mechanics and our app server at the same time.

I've had great success with mongrels with HAProxy (with maxconns=1 so only 1
request per mongrel at a time) for years. I've also had great success at
Passenger with apache.

I think it is a great step forward for twitter's servers, I just wish the
article had some meat. Isn't this the twitter engineer blog, not blog for the
general audience about how twitter works?

~~~
teej
(HAProxy + Mongrel) and (Apache/Nginx + Passenger) work great. Don't get me
wrong, I've seen lots of different Rails server architectures and I've spoken
at RailsConf on this very topic. I would recommend either setup to a Rails
startup in an instant.

But in the edge case of immense load, they simply don't keep up to Unicorn.

The thing that sets Unicorn apart is that it does it's load balancing on the
Kernel level. All Unicorn worker processes are listening on the same socket.
The OS takes care of getting each request to a single, available worker. So
unlike Mongrel, you don't end up with per-worker queues. Though HAProxy is
smart about distributing load well, Unicorn makes it seamless. The workers
simply ask for a new request and the Kernel gives it one.

There are some other niceties too. Unicorn processes are forked from a master
process. If you are using REE, they can keep Rails in a shared memory. When we
deployed it, we dropped memory usage by 30%.

On top of all that, Unicorn's flawless rolling restarts are a pretty big plus.

In conclusion, if you're in the top 10% of Rails apps by traffic, give Unicorn
a look. It is likely that switching over is worth the dev risk and cost.
Otherwise, keep it on your radar, but don't consider it a must-have.

[reference]

<http://unicorn.bogomips.org/DESIGN.html>

<http://news.ycombinator.com/item?id=872283>

~~~
FooBarWidget
(FYI I'm a Phusion Passenger developer.) I find it interesting that you think
the shared socket performs better than letting a proxy distribute the load.
I've also done some tests and I find that the shared socket harms performance
on high concurrency situations because of the so-called thundering herd
problem. The problem is that all Unicorn workers select() on the socket, but
when a client comes in all workers are waken up, all of them try to accept()
the client but only one succeeds, and the rest goes to sleep. We're working on
some pretty heavy performance and scaling optimizations for the upcoming
Phusion Passenger 3 and we've found that avoiding the shared socket gives us
much better overall performance. It'll be nice if the kernel provides an
interface for performing select() and accept() in a single atomic operation
but until then I think the shared socket isn't that good.

From what I've seen so far, peoples' experiences with both Phusion Passenger
and Mongrel/Unicorn can vary drastically. Some people noticed a huge response
time drop and performance increase when they switch from Mongrel/Unicorn to
Phusion Passenger, others experience the opposite. I guess it depends a lot on
the server. Phusion Passenger has got some pretty heavy users though, e.g. the
New York Times Obama real time election results page was running on Phusion
Passenger. The Dutch national TV broadcasting organization is running all
their Rails apps on Phusion Passenger and they get huge spikes of traffic
whenever something is mentioned on TV.

------
albahk
Amusing quote taken out of context: "..called Stormcloud, to kill Unicorns
when they ran out of control."

Tomorrow's headlines: "Twitter is killing Unicorns"

~~~
armandososa
Haha! I was about to quote the same line, thinking that somewhere there's an
mythical animals and chimeras protecting organization.

------
xal
Seriously? Apache? There is another low hanging fruit guys :-). Tremendous
speedups can be had by moving to nginx from apache (something we did at
shopify about 2 years ago )

~~~
netik
Define 'tremendous' and present hardware specs and numbers please. Shopify
runs varnish (we run varnish on Twitter search) - are speedups coming out of
varnish or ngnix for you?

We've done plenty of simulations, load tests, and lots of graphing that shows
ngnix's performance gains are negligible given our hardware configuration.
We've also got huge dependencies on mod_rewrite right now and didn't want to
convert to ngnix for that very reason.

There seems to be this awful myth, completely unsupported by science, that
seems to state that unless you're running Rails with ngnix that you're doing
the wrong thing. The prevelance of ngnix in the Rails community is astounding.

It's a good server and it certainly has it's place in the world, but it's just
not for us and not supported by our benchmarks.

~~~
xal
Our web nodes use 6mb of memory for all the web serving each. This frees up
gigabytes compared to apache which we can use for more processes ( web and app
notes are combined in shopify's case ). The speedup lies there. Added benefit
is that nginx somehow manages to terminate SSL with a lot lower CPU load which
makes the web/app configuration very appealing. We experimented with
terminating SSL in the Ciscos but it seems to be impossible for Cisco that
ships a firmware that has this and weighted load balancing working both.

Tremendous, in this case, is because the extremely low resource usage of nginx
allowed us to remove an entire layer in our serverfarm flowchart and now we
can use our machines much more efficient.

------
oomkiller
I'm not sure why they needed to develop their own process monitoring script,
as bluebill does just as good as monit or god, even managing the children
processes. With my bluepill script, I have it setup so if I send restart, it
uses hot code reload. My deployments consist of git pull origin master &&
bluepill restart unicorn!

~~~
netik
When we deployed we didn't know about bluepill and wrote a quick script.
Stormcloud does a fair number of other things such as tracking min/max
lifetime of children and it attempts to sort out why the process died in the
first place.

Thanks for the bluepill link though, we'll take a look at it and might
consider a rewrite of our internal tool at some point.

I'd like to open source Stormcloud when it becomes more accessible.
Unfortunately, it's laden with internal dependencies to our monitoring
systems, and not publicly consumable at the moment.

------
generalk
Is there some technical reason I'm missing for why they wouldn't go the
Passenger route? Seems to be by-far the easiest way to deploy a Rack-capable
web app, and it (correct me if I'm wrong) allows for "hot deploy": update the
code on the server and it'll sit there unused until you touch
"tmp/restart.txt" under your app root.

~~~
adacosta
I had issues with passenger at an event where a couple thousand people were
relying on my app (about 1 thousand concurrent). I had a single dual quad
machine running a rails 3-pre app on ruby 1.9, which was mem leaking every
30-45 min and melting down the goods. On site, I switched from nginx +
passenger to apache + passenger so I could use a max-requests-per-worker type
directive (only available with apache + nginx), which didn't solve my
problems. I made one more leap (on the spot) to nginx + unicorn, and problem
solved. It also saved me at least 1gb of normal operational ram.

I still think passenger is great for low demand sites. It took a while to
figure out everything that had to be done for unicorn, which included a proper
unicorn config file, a rake task to start/stop/restart, and a matching init.d
script for my ubuntu; I also wrote a rake task to install the init.d script.
Someone should post that stuff online.

------
ssp
Shopping is even more efficient if you distribute all your items across all
the cashiers. That way, the time from you are first in line until you are
done, is _n_ / _num_cashiers_ instead of _n_.

Someone, please try this the next time you go shopping, and then blog about
what happened.

------
ck2
For a moment the name made me think it was an early april fools but that
turned out to be very interesting.

These days however, almost anything is faster and more stable than apache
under load. It's really starting to show it's age unfortunately.

(and darn I wish we had a Fry's here!)

------
grinich
_Unicorn surprised us by dropping request latency 30% and significantly
lowering CPU usage._

Wow.

