

The Minimum Viable Rails Stack, Now Reddit Frontpage Approved - johnb
http://goodfil.ms/blog/posts/2012/09/15/minimum-viable-rails-stack/

======
zhoutong
This is almost exactly how I deployed NameTerrific
(<https://www.nameterrific.com>). It automatically gets ready for any
horizontal scaling in the future.

In addition, I allocate the workers to their own instances so that they can
grow independently. Also I use Capistrano to automate deployment. Whenever
additional resources are needed, simply clone the instance and add it to the
load balancer.

I use 2 Nginx workers and 8 Unicorn workers in a single app instance, and they
serve in HTTP at 80 and 443 (yes they respond in HTTP via port 443 and the app
should assume it's HTTPS). The app instances are firewalled in a private
network and they use private IP addresses.

The load balancer has two networking interfaces, one public-facing and one
private-facing. It's basically a reverse proxy serving traffic in both HTTP
and HTTPS (which will be forwarded to HTTP:443). So the load balancer handles
the encryption entirely.

The assets get compiled during deployment, and everything gets cached. I
bundle all the CSS and JS into a single package, so that there's nothing to
load other than the HTML and images (natively async) in subsequent visits. The
website is simply way more responsive when deployed this way. I also use
CloudFront to host the assets by simply pointing the cloud distribution to the
website itself as the origin. Rails will handle the assets host automatically
as well.

Whenever I need to deploy or manage the database directly for any reason, I
will need to connect to the private network via VPN. It's secure and just
neat.

If I need to build a local mirror, I'll install another nginx box, connect to
the VPN, and reverse proxy HTTPS requests to the HTTP:443 directly. This
reduces the HTTPS handshaking latency while still maintaining the security.
And it just works.

EDIT: Added a link to make it easier for the curious people.

~~~
purephase
If you don't mind me asking, what load balancer are you using? I'm trying to
setup multiple app servers on EC2 behind ELB using Redis as a session_store
and I can't get sessions to persist.

~~~
zhoutong
I used to use ELB, but later I migrated the whole app to a self-hosted private
cloud on a dedicated server.

I'm using Nginx as the load balancer, and there're no session problems for me
because I use the built-in secure cookie store.

~~~
purephase
Thanks for the response. I appreciate it.

------
autotravis
"We didn’t have any page caching in place at all."

Cache me if you can! If you're a high-traffic site and you are not caching
static content (I recommend Varnish highly), then your are throwing money or
good user experience out the window.

~~~
johnb
The key thing I got a bit wrong was high-traffic mostly static "content", vs
high-traffic "very custom" which is a) what I was used to at envato and b)
what I _thought_ we had on our hands at goodfilms.

I think in a growing site with a lot of custom per-user content (like a social
network) the extra complexity of a cache layer and managing expiry is more
pain than it's worth while iterating the product quickly. If you're mostly a
content site, it's definitely the #1 thing you should be doing.

Realising that we're both at the same time, depending on the user or page,
means that sometimes the cache stuff is the right thing to do, and sometimes
not. I was leaning too far towards not, and happy now with the balance we've
picked.

~~~
ben_h
Absolutely. At theconversation.edu.au we run a content site—we publish the
news, which is the same for everyone. This means we can cache the front page
and all the articles as static HTML, and then annotate the page with user info
for signed-in commenters, editorial controls for signed-in editors, and so on.

(We have a separate cookie that is present for signed-in users, so the fronted
knows whether it should fire the annotation request.)

The result is that we can serve a sudden influx of unauthenticated users (e.g.
from Google News or StumbleUpon) from nginx alone, which gives us massive
scale from very little hardware. It's likely that the network is actually the
bottleneck in this case, and not nginx.

~~~
ianpri
Interested in what you mean by annotating the page after caching it, do you
have any more info on this?

~~~
jhealy
The cached page contains content suitable for everyone, so it looks the the
user is logged out.

An extra AJAX request grabs the users logged in status, CSRF token and similar
data as JSON and then modifies the page so the user sees what they expect (a
logout button, a comment form, etc).

~~~
rhizome
Doesn't that cause content movement?

------
wolfeidau
Great post, it is great to see startups being open about how they architect
their systems and overcome issues with scaling.

This sort of information helps people building the next group of startups save
both time and money.

Keep up the great work!

------
jval
Well done guys, keep scaling and keep representing the Melbourne tech scene!

------
lucisferre
We used New Relic for a while with Heroku but turned it off because of a
number of performance and stability issues we seemed to be seeing with it. In
particular application boot times (for the first requests anyways) seemed to
be negatively affected.

------
ericcholis
While the "webscale" method with nginx you detailed is nice [1], why not use
Varnish or nginx's built in proxy_cache features [2]?

[1] <https://gist.github.com/3711251>

[2] <http://wiki.nginx.org/HttpProxyModule#proxy_cache>

------
dumbluck
Nice post, but:

Don't self-back-pat.

It read more like marketing for the company than anything of much worth.

Don't use the words "web scale".

It is a meaningless term. How many requests per second and how many guest and
user sessions for how long of a time is "web scale"? Does "web scale" just
mean Christmas shopping traffic on your crappy e-commerce site that no one
visits, does it mean you can survive being top link on HN on a Friday morning,
or slashdotted, or DoS attacked by anonymous, or survive a fucking flood and
the data center lifts off the ground and enough flexibility and strength in
the trunk lines to handle a tsunami?

Don't just throw users at it.

Unless testing is very costly and you need every user's eyes on your untested
code as possible, that is just stupid. Look at Tsung, Grinder or JMeter, or
the many other ways you could generate load as a first step before you do
that.

Don't gloss over the details.

Sure you said you were using Rails 3.2 and postgres and a tad bit about the
architecture, but who in the hell _doesn't_ know that you need to load
balance, need to put the DB servers on different VMs/servers than the apps.
Although- having everything on both and some stuff just not turned on and live
is not a bad idea for emergency fault tolerance, and you didn't mention that.

~~~
marknutter
Your caps-lock key appears to be broken.

~~~
dumbluck
Oh, you're right- thanks! Just fixed the comment. I'll get the capslock key
fixed on Monday.

------
instakill
A few questions:

1\. Is it possible to spin up two apps on Heroku? 2\. What load balancers are
available with the above? 3\. Anyone have a link to a run down of how to back
up your Postgres DB periodically?

Thanks

~~~
zhoutong
For 1 and 2: AFAIK Heroku already has three layers, the Nginx routing mesh,
Varnish cache and your dynos. Everything is already handled for you so if you
have two dynos, the traffic will rotate between the two. You don't need a load
balancer for Heroku. It's built-in.

~~~
subpixel
Note: Varnish isn't available on the (standard) Cedar stack. Here's how Heroku
suggests you approach cacheing:

<https://devcenter.heroku.com/articles/http-caching>

------
marcosero
Great job! Can you do such page for EC2 too?

~~~
dasil003
Engine Yard Cloud does a lot of this out of the box on EC2 while still
affording root access. There is a price premium obviously.

------
praveenhm
we have pretty much used a similar stack in websketch.com

