The single best advice I can give for scaling Rails, and it is widely applicable to other frameworks: never block an HTTP request/response cycle on I/O. In Rails, for example, if you have an external API call which routinely takes 3 seconds to complete, then that call costs about three hundred megabyte-seconds of RAM time to service, because the 100 MB or so used by your mongrel is effectively worthless while you wait.
When you run into resource constraints in Rails, odds are you are running out of RAM first. It's a memory munching beast in most common deployment scenarios.
It is much more efficient to offload those blocking requests to a worker process and poll for results, since a single mongrel can blow through lots of poll requests and useful work in five seconds. Bonus points: your web app will feel snappier, because progressive rendering tricks users into thinking 5 seconds is not actually 5 seconds.
My single-VPS-hosted app spends a lot of time waiting on DB queries, which themselves spend a lot of time waiting on disk I/O.
I've therefore found that adding RAM and doing nothing with it so it can be used as disk cache (or, for Postgres, allocating some of it as shared_buffers) is the best non-non-blocking solution.
(I'd originally allocated almost all RAM to running up to 16 or so Passenger instances. Once more than 3 or 4 of these were serving requests, everything ground to a halt waiting on the DB. Now I'm down to about 4 instances, with the rest of the RAM for cache, and they cope fine with the same load).
Yes .. thats a great advice. We use beanstalkd for asynchronous tasks (like sending emails or updating twitter) and backgroundrb for jobs that need polling (re-encoding a mp3 file and showing a progress bar to the user)
The single best advice I can give for scaling Rails, and it is applicable to nearly every other framework: give it more memory. Generally, when you run into a bottleneck in Rails, you're running out of RAM first. You can get a couple of 8GB sticks for around $500; just over-provision your server and worry about the problem later.
Sorry, but that's terrible advice, especially for bootstrapped businesses. Throwing resources at a problem will only scale as large as your wallet is. It's both wasteful and lazy.
Throwing money at problems is sometimes the right answer, and I say that as someone who is so bootstrapped I can occasionally taste the shoe leather. Heck, I spend ~$500 a month on hosting when I could probably do it in half that on renting a dedicated server (or just buy a big beefy box and pay a pittance for a spot on a rack somewhere). I don't because it isn't worth my time to even think about migrating.
My favorite anecdote: asking tptacek to help me me squeeze a few megs out of Redis to avoid having to bump my VPS up a tier. That would have been, heck if I know, $BOATLOADS / hr in engineer costs to avoid $30 per month on my Slicehost bill. He restored me to sanity fairly quickly.
But that's what you're doing, whether you choose to add hardware or improve the code -- it's not like your time as a business owner is free, and every moment you're spending on the codebase is a moment you're not marketing or selling.
The real question is which is more valuable to your business.
Consider these two examples:
Let's say I'm a bootstrapper, and have a startup that earns $3k monthly. My customers are happy, and my business is growing, but I'm having some serious scaling problems.
Scenario One: I've outgrown the small Linode that I started on. If doubling the cost of my Linode allows me to service 2X the number of customers, I do it, because $3k in revenue is way higher than the cost.
Scenario Two: I've maxed out the biggest Linode, and the next step up is a cluster of dedicated nodes at Rackspace for $4k/month, which will allow me to scale to 2X my current volume. At this point, I fix the code, because my earnings will drop even if I get twice the number of customers.
When you run into resource constraints in Rails, odds are you are running out of RAM first. It's a memory munching beast in most common deployment scenarios.
It is much more efficient to offload those blocking requests to a worker process and poll for results, since a single mongrel can blow through lots of poll requests and useful work in five seconds. Bonus points: your web app will feel snappier, because progressive rendering tricks users into thinking 5 seconds is not actually 5 seconds.