
Strategy: Run A Scalable, Available, And Cheap Static Site On S3 Or GitHub  - yarapavan
http://highscalability.com/blog/2011/8/22/strategy-run-a-scalable-available-and-cheap-static-site-on-s.html
======
patio11
I have no objection to static sites, but it is 2011. Computers are beefy beefy
beasts.

Even CMSes with a reputation for being slow will eat almost all conceivable
loads for breakfast _unless_ you bork something architecturally like, say,
leave Apache KeepAlive on. (That would similarly kill you if you got on the
front page of Reddit with a 1 kb static text file, but people remember CMSes
as dying to KeepAlive because CMSes are often written in PHP and the way
everyone tells you to configure Apache/PHP is _broken by design_.)

You've got billions of operations per second to play with. Read from database,
render template, spit to browser is not really that hard. This is even more
true if you can cache things, in which case you're the moral equivalent of
running a static site from performance perspectives, with the only difference
being whose CPU gets used for the single compile step.

~~~
mechanical_fish
I don't think the primary appeal of static sites is their performance. It's
about complexity and maintenance.

At a minimum, a LAMP stack requires prompt security patches at all levels of
the stack and a working system for automated SQL backups (and, of course,
testing your SQL backups to make sure they restore). If you haven't configured
your system perfectly (e.g. you forgot to rotate one of your logfiles
properly) you'll need to perform more maintenance than that; Murphy's Law
implies that you'll be doing that at three in the morning local time.
Hopefully you installed an uptime monitor.

If your CMS contains bugs -- and it does -- that's a slew of additional
security patches which you'll have to apply, and occasional broken
functionality that you'll have to track down and fix.

Every few years (at most) a new version of Linux will come out. Every few
years (at most) a new version of your CMS will come out. The old versions will
stop getting fixes, so you'll have to upgrade.

Despite these efforts, there is still a good chance that your CMS will get
hacked: LAMP stacks are complex beasts, even out of the box with no
customizations. When you do get hacked, what will you do? Turn off the code
with the vuln in it? How do you know which bit of code that was?

In theory your $10 per month is buying a hosting provider that will take care
of all of the above for you, by leveraging the awesome economy of scale. And,
indeed, maybe the correct answer is to use a blog-hosting service. Many people
seem to be happy with Tumblr.

But for those of a more control-freaky nature, the dream of the static site is
that you'll reduce or eliminate these pains by making all the moving parts as
stupid as possible. (You can't perform SQL injection on a site which has no
forms or, for that matter, SQL.) In addition to being stupid, static sites are
also as generic as possible: You can switch from S3 to another host in
minutes, sign up for a CDN in minutes, switch from Apache to nginx to IIS in
minutes. Meanwhile you have at least one up-to-date offsite backup at all
times by default -- the data for your entire site lives on the box you publish
from. Plus you have a perfectly functioning, extremely intuitive dev/staging
setup -- you can tinker with a static site offline on your local machine, get
your new hacks working, then push live with an rsync and be reasonably assured
that the static pages will work the same in production as they do locally.

~~~
tptacek
There is a very compelling reason to run a static site: it is much less likely
to harbor a security vulnerability. Most CMS sites do.

There is a very compelling reason to run a dynamic site: for most software
businesses, that site is the primary venue for all your "marketing
engineering", including SEO, A/B testing, interactivity, &c.

The companies that have this dilemma generally have already excepted the
architecture of "public site that is separate from their actual application
site", which is a step in the direction of mitigating the impact of the
security risk.

Those companies would mitigate the impact even further by hosting the public
site in a different data center from their application.

Generally speaking, between "adequately secure CMS site" and "adequately
engineered marketing processes", and speaking as a security person with a
vested interest in selling everyone on buttoning down security everywhere as
much as possible: most startups are doing more damage to themselves by not
maximizing marketing engineering than they are by exposing themselves to
attack.

I'd suggest a middle ground: designate some large, important swath of your
public site as sacred and make it static. Host it with stripped down vanilla
Apache or Nginx. Then, where you benefit from dynamicism, expose another
resource (like a small EC2 server) in your URL space to implement it. This
sounds complicated but is really very easy and gives you some of the benefit
of a static site (attacks won't tear down your front page) and all the
benefits of a dynamic site.

A rule of thumb, by the way: if you sell a web app on the public Internet, and
you haven't done something to have that web app formally assessed for security
flaws, running a static brochure site for its security benefits is a premature
optimization.

~~~
mechanical_fish
_I'd suggest a middle ground: designate some large, important swath of your
public site as sacred and make it static. Host it with stripped down vanilla
Apache or Nginx. Then, where you benefit from dynamicism, expose another
resource (like a small EC2 server) in your URL space to implement it._

Yes, exactly. This is what I'm actually thinking about.

(I agree that a one-hundred-percent static site is a thought experiment, not a
plan; it's not really going to fly in the twenty-first century. I chose the
word "dream" in my original post with some care. ;) And "marketing
engineering" is a nice concise term to explain why you're going to need
dynamic features somewhere on your site, sooner or later, even on something as
apparently simple as a personal blog.)

I've seen quite a few websites built on a dynamic CMS with a bunch of optional
modules containing tens of thousands of lines of hairy PHP, running on a big
pile of heavy-duty high-availability hardware... most of which is apparently
there just to be ready to rebuild the Varnish cache as quickly and reliably as
possible on the rare occasions when that cache gets cold. (On some sites, the
cache never gets cleared except by some unavoidable accident: The performance
consequences of trying to rebuild the cold cache under load are too dire to
contemplate.) A very high percentage of the actual page loads on such sites
are served directly from a Varnish instance, which consumes very few CPU
resources and a moderate amount of RAM on a single box.

At some point I began to wonder why we conceptualize such sites as "dynamic
sites with a static cache" instead of as "static sites, with a few dynamic
elements on some pages, and a perhaps-more-synchronous-than-necessary PHP-
based page-generation engine that runs on public-facing servers". There are a
bunch of reasons, but I wonder how many of them might ultimately prove to be
historical reasons.

------
DanielBMarkham
The informal hn hacker book project, <http://hn-books.com>, is ran completely
as a static site. It manages reviews on hundreds of different startup books
and tools. It has book lists, allows sorting and multiple filtering by various
options. It allows you to create and share lists of books with your friends
(<http://www.hn-books.com/AnswerQuestion.htm>) You can comment on reviews
(through LiveFyre) and it even has a "Drop Dead" calculator where you can
configure a startup business model and it will display a countdown until you
run out of money (<http://hn-books.com/EZ-Business-Model.htm>) Once you create
your biz model, it "remembers" it and you can share your models with other
people simply by passing them an URL.

Not trying to plug the site, just wanted to show how much you can really do if
you put your mind to it. The entire hn-books site fits on a simple directory
and requires zero setup. It just sits there, making money. No upgrades, no
patches, no dickering around with hosting providers, no worry about load
times. No user database or sensitive data to worry about securing. It. Just.
Works.

Quite honestly, if I had my way this is how I'd do all my projects. It
required a completely new way of thinking through the app, but to me the
trade-offs are easily worth it.

EDIT: It wasn't all roses, though. Biggest problem I had (have) is sorting
speed in JS with the big book lists. I'm sure there's a way to optimize that,
but I never got around to nailing it down.

~~~
akkartik
Interesting. So all the dynamic stuff is in client-side javascript?

~~~
DanielBMarkham
Yes, once the page is loaded, everything it needs to run is there. There are
no further AJAX calls or loads.

You know, I've seen some services lately that allow you to tag random data and
put it in the cloud. As long as you were clear that all user data was public,
you could easily use one of those for user-related storage.

And now, of course, there's WebStorage, available on all modern browsers.

I'm looking forward to a future micro-project that incorporates all of that
stuff. Should be a hoot. If I had my druthers, I'd try to pick something that
everybody would think would be impossible, then do it.

------
timtadh
> Using DropBox is the clever bit here for me. DropBox make it so the files
> follow you so you can edit them on any of your machines.

I mean, I _guess_ you could do that, if you really wanted too... or you could
use Git. Dropbox is a great service, but for code (even static html) it isn't
a good fit.

~~~
mechanical_fish
To first order, people don't understand version control systems. Even highly
educated, computer-literate people.

This is a major reason why the CMS was invented in the first place. It's also
a major reason why Dropbox was invented.

------
yarapavan
The original post from Amazon CTO Werner Vogels -
[http://www.allthingsdistributed.com/2011/08/Jekyll-
amazon-s3...](http://www.allthingsdistributed.com/2011/08/Jekyll-
amazon-s3.html)

------
a_dy
Google still lets you do free site search.

Here's an example: <http://octopress.org/>

------
EGreg
Actually, this is just a special case of a more general strategy: move as much
as possible to the client. Typically, you can move all the code that doesn't
sit behind an authorization wall, to the client.

Just have two things: 1) Serve static files 2) Build web services and call
them with js

This scales.

------
geuis
Quick note: If you are hosting your sites on EC2 using the Elastic Load
Balancer service and Route 53 (Amazon's DNS service), you can now use naked
domains (without the www. or other subdomain requirement).

[https://forums.aws.amazon.com/thread.jspa?threadID=63893&...](https://forums.aws.amazon.com/thread.jspa?threadID=63893&tstart=15)

------
MatthewPhillips
I have to recommend Staticloud for hosting static sites.

<http://staticloud.com/>

------
JENNY1
Alas, this elegant system was replaced with a new fangled dynamic database
based system. Content was pulled from a database using a dynamic language
generated front-end. With a recent series of posts from Amazon's Werner
Vogels, chronicling his experience of transforming his All Things Distributed
blog into a static site using S3's ability to serve web pages, I get the
pleasure of asking: are we back to static sites again?
<http://cheapestusedcomputer.com/>

------
cppsnob
S3 is sloooooow. You'll really need to have cloudfront if you want to do this
and still have a good user experience.

------
pointyhat
It's almost the wrong tool for the job. S3 sucks for latency.

