
Building and Scaling a Startup on Rails: Things We Learned the Hard Way (by Posterous S08) - arjunlall
http://axonflux.com/building-and-scaling-a-startup
======
patio11
That is a very good article.

I'd suggest absolutely everybody with a web page read up as much as they can
about optimizing web page loading. It doesn't matter if you're Posterous or
you run your 2,000 daily visitors off a 256 MB VPS which is _never_ under
load, optimizing the page can result in 90% decreases in the user-perceptible
time it takes to load the page. It doesn't matter if you slave over a hot
memcache to save 200 ms off the database queries if your page then takes 7
seconds to render.

Shaving off two seconds, one second, half a second just prints money. Every
time I do it I'm flabbergasted by how much it matters.

The HTTP cache control mentioned in the article is one excellent place to
start. For the rest, Yahoo pretty much owns this field of research -- any of
the presentations from the YSlow folks are worth your time.

[http://www.slideshare.net/natekoechley/high-performance-
web-...](http://www.slideshare.net/natekoechley/high-performance-web-
sites-2008)

I'm kind of a lightbox junkie and I've had some good results recently by the
simple expedient of having the browser preload the image (e.g. <img
src="foo.jpg" style="display: none;" />) prior to the actual user interaction
which calls it to display. There are a billion similar site-specific tricks
you can do in Javascript these days.

~~~
patio11
Oh, one more link while I'm at it:

[http://www.slideshare.net/stoyan/high-performance-web-
pages-...](http://www.slideshare.net/stoyan/high-performance-web-pages-20-new-
best-practices)

The presentation I cited in the parent is more motivational and less stuffed
with "Here is a checklist of actionable steps that if you implement will make
you more money for each one you do."

------
charlesju
I hate when people tell me that Rails doesn't scale because I have scaled
Rails and it has nothing to do with the framework.

The reason why websites can't scale is ALWAYS the DB. To fix the DB it is
always master/slave, memcached, then sharding. For EVERY language.

Good article, sums up a lot of the tools I used to scale my stuff.

I'd like to add a little more.

1\. Turn on slow-query logging in your DB and tail -f the slow query log. Find
slow queries and kill them in your code through indices or using several fast
queries to make up for 1 long-one.

2\. Cache most reads. Both from application caching and memcached.

3\. Turn off associations, they don't play well with caching.

4\. If you're using memcached, don't use the plugins.

5\. If you're a serious startup, use Engine Yard, they're life savers.

~~~
pcowans
Rails running on vanilla Ruby does have particular scaling issues due to the
limitations of the Ruby garbage collector. You hit a GC cycle after every 8MB
of allocation, which typically takes around 150ms to run, so can dominate the
runtime of your requests. We've had 80% of the runtime in GC, even when you
include database time.

This isn't necessarily a killer, but it means there's sometimes more work than
you might like in scaling. There are patches to tune the garbage collector,
and JRuby etc. may help, but ultimately you need to be much more aware of
memory allocation than you might think.

~~~
charlesju
That's an application bottleneck which is fixed by more application servers.
This has no effect on scaling what-so-ever.

And by scaling here, we're talking about 1 M to 10 M, not 10,000 to 100,000.

------
bjclark
Pretty good run down. Actually, a very good run down.

Sphinx is cool, but there's no reason to be afraid of solr/lucene. It takes
_very_ little java knowledge to get it up and running and it's very very,
really, crazy fast. Like, hundreds of thousands of searches a day on millions
of documents and it's totally stable. _knocks on wood_

Also, Passenger is indeed better than all this mongrel + god + tweaks they're
talking about it. AboutUs.org (my employeer) is the largest site on Passenger
(that we can find) and we've had 2 actual crashes in the last 6 months, and
they were fixed with rebooting Apache.

~~~
amix
From my experience (and I have used Sphinx, Lucene, Solr and Xapian in
production) is that Lucene/Solr have a pretty bad perfomance compared to
Sphinx or Xapian.

My Lucene setup began to throw deadlocks and memory exceptions pretty early
on. Searching on "deadlock lucene" on Google yields 25000 results. I have
later rewritten the system to Xapian where it has run without any problems.

For live updates I would recommend using Xapian. For fairly static indexes I
would recommend Sphinx (as it's _really_ fast for both indexing and searching,
but it does not support live index updates yet).

~~~
bjclark
Really? What size of index/documents where you doing searches on?

We're doing live updating, probably close to 1 update per second.

------
teej
> You're only as fast as your slowest query....

Use HAProxy. Take the time to configure it correctly. It's absolutely the most
stable and useful loadbalancer I've used. It also solves this issue by talking
to your backends.

~~~
pwk
We use pen, which can be configured to avoid this issue as well. Just specify
the maximum number of connections to be the same as the number of backend
servers (via the -x option), and it will queue connections beyond that number
and hand them to the first backend that frees up.

We've had zero issues with pen, it's been rock solid, but we're still looking
at moving to Passenger at some point: seems like it will be more flexible and
efficient than our current pack of mongrels.

~~~
teej
I attempted to switch to pen when I was having issues with Perlbal. It was
segfaulting under light load (light for me is heavy for most people)
comparable to Perlbal, so I switched back. I can't say I reccomend it at all.

------
cperciva
_Sad town happens. 1 in 4 requests after Request A will go to port 8000, and
all of those requests will wait in line as that mongrel chugs away at the slow
request..._

I don't use Rails, or even Ruby at all if I can avoid it, so I'm sure I'm
missing something obvious here, but... why in the world would anyone want to
use a web server which can only handle one request at once?

~~~
tptacek
Sorry, what's the platonic ideal you're suggesting as an alternative? There
are threaded Ruby web servers, but, just like with Python, the interpreter is
mostly giantlocked. The overwhelming majority of web apps out there are
running under Apache, which just like Mongrel is preforking and queueing, not
running everything simultaneously.

~~~
cperciva
If you have N requests hitting Apache and one of them is slow, that one slow
request will run in its own process while the fast requests are sent off to
other processes. The fact that each _process_ only handles one request at once
is irrelevant.

~~~
tptacek
Uh, this is how Rails setups work too. You aren't talking directly to Mongrel.

~~~
cperciva
Maybe I misunderstood the article -- it sounded to me like requests were being
distributed between Mongrel processes _and queued on the individual processes_
rather than being queued centrally and only allocated to individual processes
when a process is free (like Apache does).

~~~
davidw
The problem with Mongrel is that you allocate N mongrel instances at setup
time. Apache, on the other hand, can dynamically allocate new processes (up to
a limit) in order to meet increased demand. This is especially important for
people like me, who host more than one site on a machine, and want to be able
to handle load up to a certain point without fiddling with config files every
time there is a spike in traffic.

------
100k
Great post with obvious real world experience behind it.

I think it's a fantastic point that you should focus on optimizing your
database before you start adding caching. If you can tune your DB with the
right indexes and give it enough RAM to fit the whole dataset, you've got a
great cache right there!

(BTW, I have been using PostgreSQL on my latest project. I'm impressed so far.
It has a much query optimizer and better indexes than MySQL.)

I also like using Solr/acts_as_solr. I haven't used Sphinx but from what I've
read about setting it up it sounds incredibly fiddly. Solr, by contrast, is
quite simple.

~~~
wmoxam
Have recently switched from using Solr+acts_as_solr to Sphinx+Thinkinh Sphinx
I have found quite the opposite. Thinking Sphinx is simpler than Solr and
required no 'fiddling'.

I had to work around some problems in acts_as_solr, such as it's tendency to
automatically update solr as soon as a DB record changed (regardless if no
fields that solr is indexing have changed). Thinking Sphinx simply updates
it's index in a batch process that is called by cron (and is super fast!). I
highly recommend it.

------
blader
Rails scales just fine - the problem is that it's just really expensive to do
it.

Here's some of our learnings:

1\. Use query_trace to trace your DB calls and query_analyzer to automatically
run EXPLAIN on each call:

<http://github.com/ntalbott/query_trace/tree/master>
<http://github.com/jeberly/query-analyzer/tree/master>

2\. Use our patches to mongrel proc_title to troubleshoot slow queries:
<http://asemanfar.com/Request-Queue-via-Mongrel-Proctitle>

3\. Don't use ActiveRecord.

4\. Don't use any link or url helpers in Rails.

5\. For that matter, don't use Rails. Rewrite your most hit components in
something faster.

I love Ruby, but the simple truth of it is that we'd be saving a couple of
engineer's worth of money if we weren't on Rails.

~~~
rantfoil
I dunno about saving a couple engineer's worth of money -- that's why Rails is
massively viable in the first place.

You absolutely can iterate faster with fewer people and less wasted time.

Servers are cheap. People (salaries) are expensive.

(Well.. until servers become expensive due to straight up load/scale, that
is.)

~~~
blader
Right now we're estimating that our Rails premium = 4X the salary of a San
Francisco engineer. But we do have a lot of servers.

------
tdavis
I don't know jack about Rails, but there is some good general advise here too.
I would have liked to see some DB commentary that didn't choose MySQL as a
foregone conclusion. I can't think of many instances where I would recommend
it in general.

~~~
reconbot
May I ask what you do use exactly? MySQL has always been my goto for simple db
needs. What do you normally use and what's your "general" cases where you
wouldn't use it?

~~~
tdavis
For "simple [rdms] needs" I would recommend SQLite. For anything more,
PostgreSQL. I find MySQL too buggy and it diverts from the SQL standard too
often (or doesn't implement enough of it) for my tastes. A properly configured
postgre install (granted, not exactly trivial) will perform at least as good
if not better than MySQL and its advanced functionality is extremely mature
and robust, unlike MySQL's which has largely been tacked on in the current
major version (views, templates, triggers, etc.)

------
tptacek
Great article.

Background/deferred job processing has been immensely painful for us, and I
have no idea what the accepted best job queue is. We're still stuck on
backgroundrb, which is a nightmare. Nanite looks like overkill, and has too
many deps.

~~~
rantfoil
Workling has worked well for us -- Rany Keddo is a phenomenal open source guy.

Tobi's deferred jobs also looked totally solid to me. That one's proven
because it drives Shopify. We use his liquid plugin, which is also stellar.

That underlines a big issue with Rails dev even today -- its really hard to
know what's good / what works, and what is just some weekend project for
someone.

~~~
tptacek
Found Tobi's delayed_job (not deferred) --- I think I like the design better
than Workling; fewer moving parts, just a database backend end a rake script.
The biggest problem we have with backgroundrb (which again: nightmare) is not
really ever knowing the state of current running jobs.

~~~
ezmobius
yeah backgroundrb sucks, who wrote that shit? (me and i wish i never did)

------
wastedbrains
Great post, you discovered a bunch of the gotchas faster than we did on our
first rails site. I wish I would have had a post like that a year ago. Thanks

------
shafqat
Can someone shed some extra light about the point re: reducing the number of
requests to the DB for a dynamic page. When left unoptimized, Rails (and a lot
of other frameworks) often result in 100 DB queries for a page.

One obvious way around this is to ensure the DB joins are done correctly. But
the article mentions batching/grouping up the requests. How does that work?

~~~
rantfoil
Check out the :include parameter to find method calls in the ActiveRecord
documentation.

Say you have 30 blog posts, and your views reference associations to the blog
post's owners. Well, views are dumb and if you call post.user on each one
within a loop, you end up calling User.find 30 times.

But if you do Post.find(..., :include => [:user]), Rails will know to eager
load all users -- and User.find never gets called 10 times.

------
laktek
Great article! BTW, I was wondering what are the testing strategies used and
how did you guys ensured smooth integration of features?

~~~
rantfoil
Rspec, Cucumber, Integrity, and a custom end-to-end testing process that runs
using the daemons gem that assures all critical aspects of the site runs at
all times.

