
Is Nginx obsolete now that we have Amazon CloudFront? - peterbe
http://www.peterbe.com/plog/is-nginx-obsolete-amazon-cloudfront
======
peterwwillis
How does this shit make it to the front page?

First and foremost, everyone needs caching. It's what makes computers fast.
That RAM you have? Cache. The memory in your CPU? Cache. The memory in your
hard drive? Cache.

Your filesystem has a cache. Your browser has a cache. Your DNS resolver has a
cache. Your web server's reverse proxy [should] have a cache. Your database
[should] have a cache. Every place that you can conceivably shove in another
cache, it can't hurt. Say it with me now: Cache Rules Everything Around Me.

First you should learn how web servers work, why we use them, and how to
configure them. The reason your Apache instance was running slow is probably
because you never tuned it. Granted, five years ago its asynchronous
capabilities were probably haggard and rustic. It's gotten a lot more robust
in recent years, but that's beside the article's point. Nginx is an apple,
CloudFront is an orange.

Next you should learn what CDNs are for. Mainly it's to handle lots of traffic
reliably and provide a global presence for your resources, as well as
shielding your infrastructure from potential problems. Lower network latency
is just a happy side effect.

~~~
alexgartrell
> How does this shit make it to the front page?

Obviously the title was a little ridiculous, but I up-voted it, because it's a
novel idea. If you're planning to have almost all of your static assets hosted
from the CDN (which is pretty reasonable for almost everyone) then why bother
with a super high-throughput low-latency web server if the only purpose is to
occasionally refill the CDN? If you end up thrashing the CDN and constantly
going back to refill it, you're going to have bigger problems.

From what I can tell, the rest of your comment is just super aggressive and
doesn't really go anywhere. I will tell you that I have extensive experience
with every piece you've mentioned here, and none of that really has any effect
on the author's thesis (again, no need to optimize serving static content from
your host if a CDN is going to do the legwork).

In general though, when someone works at Mozilla, I tend to give them the
benefit of the doubt regarding their knowledge of elementary computing
principles.

~~~
peterwwillis
The problem is this isn't a novel idea. Replacing a fast webserver with
somebody else's fast webserver is half of the reason most people choose CDN's
(the other reason usually being bandwidth). Just using someone else's fast
webserver does not obsolete a different fast webserver.

It's totally acceptable that you might not have the infrastructure to serve
all of your static content from your measely web servers and 100mbit
connection. CDNs are a great choice here. But this has nothing to do with what
web server you use, nor does it mean you should process every request
dynamically just because right now you have the resources for it using a CDN.

Even with a CDN and an extremely efficient static content layer, you still
have to hand out dynamic content to your users individually which a CDN
generally will not help with. At a high enough number of requests you will run
out of resources (RAM, CPU, Disk, Network, etc). At this point it's handy to
have the fastest things you can so scaling doesn't become one huge
clusterfuck. Then whoever re-implements Nginx to help handle requests will
write a blog post about how Nginx makes CDNs obsolete.

My point before (and now) is: Caching matters, and having a fast frontend web
server matters, and CDNs matter, and none of this is directly related: we're
talking apples and oranges.

As an aside, CloudFlare seems to use a novel little fast web server:

    
    
      psypete@pinhead ~/ :) wget -S -O /dev/null http://4chan.org/ 2>&1 | grep -e "^[[:space:]]\+Server:"
        Server: cloudflare-nginx
        Server: cloudflare-nginx

------
gojomo
More generally: once you adopt any of the various schemes for having a inbound
proxy/front-end cache (Fastly, CloudFlare, CloudFront, or an in-house
varnish/squid/etc), are all the optimizing habits of moving static assets to a
dedicated server now superfluous?

I think those optimizing habits _are_ now obsolete: best practice is to have a
front-end cache.

A corollary is that we usually needn't worry about a dynamic framework serving
large static assets: the front-end cache ensures it happens rarely.

Unfortunately it's still the doctrine of some projects that a production
project will always offload static-serving. So for example, the Django docs
are filled with much traditional discouragement around using the staticfiles
serving app in production, including vague 'this is insecure' intimations. In
fact, once you're using a front-end cache, there's little speed/efficiency
reason to avoid that practice. And, if it is specifically supected to be
insecure, any such insecurity should to be fixed rather than overlooked simply
because "it's not used in production".

~~~
slurgfest
Tools like Apache and nginx are not ONLY faster at serving files with less
load on the system than a script. They are also more thoroughly audited and
battle-tested. And their declarative configs won't go wrong just because the
person writing them missed an unbelievably subtle corner case introduced by
using a Turing-complete language.

It's important because there are so many opportunities for error in serving
arbitrary files out of a filesystem with some rough and ready script.

For example, if you are serving files out of the same filesystem that holds
your configs and secret keys then you should be a bit nervous. You have to get
the permissions right and make sure you don't have anything improper under a
directory which you are publishing as a whole. If your users are uploading
files to the same place you should feel really nervous.

There are too many easy ways for people to be negligent and screw this up. In
the context of designing an opinionated framework, you accept a lot of social
liability and you are really dropping the ball if you are setting up tired and
ignorant users to screw up this badly, without even a warning in the docs to
think about what you are doing.

With n script languages and m static file serving implementations per
language, there are now (n*m) obscure packages to audit. Not counting their
combinations...

Your idea to just "fix the insecurity" and remove any warning from the docs
means to do things which you merely believe to fix the insecurity, and then
overlook the underlying risks of the approach.

I am also not sure you are right when you suggest that there cannot be any
performance (or reliability) impact of pushing static serving into some script
library. Just as these are not audited they are also not nearly as likely to
be benchmarked and tuned.

If there is a reason to serve static files out of script, that reason will be
because of some positive reason (like convenience or the need for some
particular flexibility) rather than some vague sense that using Apache is
"obsolete".

~~~
gojomo
Here's the thing: Django makes the standard staticfiles app available. It's a
great convenience, eliminating extra steps (collecting/moving static assets)
and processes (another nginx/etc). And many projects' 'dev'/prototype
incarnations are already open to the world, in one way or another.

So if this is a security sin, they've already encouraged its widespread
commission. A bit of "don't do this" or "don't do this in production" hand-
waving in the docs don't resolve a security problem, if there's a real
vulnerability in the current implementation.

On the other hand, committing to the idea that the bundled staticfiles app may
be used this way -- that in fact it's a _good_ and _modern_ way to operate, in
production, once you have an inbound proxy cache -- would mean accepting
deeper responsibility. It would give up the hedge, "if there's a security bug,
we warned you!". It's not taking on n*m obligations: it's taking on 1
language, 1 module. And it's not even a new module or an obscure need... it's
exactly the sort of thing an opinionated framework can solve for people.

The old opinion -- "take this risk in development, but by the time you get to
production use the 'best practice' of a separate static server" -- should be
updated to a new opinion -- "the 'best practice' is now a front-end proxy
cache, which makes the performance benefits of an extra static server
negligible, so we're no longer going to assume everyone will do that in
production".

An admonition against using other less-tested code to achieve the same effect
would still be appropriate. But not nonspecific FUD about the framework's own
code -- that it is "probably insecure". Anything that's truly "probably
insecure" ought to be fixed.

~~~
St-Clock
"eliminating extra steps (collecting/moving static assets)"

Interestingly, we extended the collectstatic command in many ways to perform
minification, combination of assets, generation of sass and javascript
variables (based on settings in python), etc. It's part of our deployment and
if we were serving static files through django, we would still have to run a
similar command.

I'm also happy that nginx is handling file uploads, aliasing, redirection,
virtual hosting on different IPs and Ports with different access control, real
ip extraction (when behind a load balancer), etc.

I'll be following more closely this trend of moving static asset hosting from
a regular web server to the application container, but I believe that web
servers like nginx and apache can do a lot more than just serving static files
(at least, in complex deployment scenarios).

~~~
gojomo
That's certainly a value of a 'project prep step' (whether it involves static
export or not).

Not also, though, that a service like CloudFlare now puts some of these
optimizations (minification, asset-combination, obfuscation, etc) into the
cache layer, as optional cloud 'app' services to be enabled/disabled/paid-for
as desired.

Not saying that way is better for all, but it has potential as a convenience
for some, getting those same expert-level optimization benefits while
retaining a simple project/deployment structure.

------
meritt
When you're only using nginx as a CDN, then yes another CDN can replace it.

nginx can do a lot more than serve static files.

~~~
peterbe
Nginx is not a CDN.

~~~
ehutch79
a hammer is not a paperweight

------
cbsmith
Wait, so if I pay more for a CDN to deliver my static data, that will work
better than when I try to save money and do it myself?

[Insert Oscar winning Face of Shock here]

------
rabidsnail
nginx still buys you SSI (which allows you to, for example, cache the same
page for all users and have nginx swap out the username with a value stored in
memcache), complex rewrite rules, fancy memcache stuff with the memc module
(ex: view counters), proxying to more than ten upstream servers, fastcgi, and
lots of other fancy stuff.

Cloudfront is a replacement for varnish, not nginx.

~~~
peterbe
Isn't that better to do in a programming environment you're more familiar
with? LIke python/rails/ASP Then you have much better tools for building unit
tests and stuff too.

~~~
rabidsnail
The performance of doing it in nginx is _much_ better, and you can't do
anything complex enough for unit tests to pay off. For the SSI stuff, you have
your web framework of choice produce html with SSI tags in it, cache that, and
nginx just swaps out the volatile bits at the last second (even for pages in
cache).

------
georgebarnett
I was once told by somebody wise that if a post asks a question, then the
answer is usually no.

e.g.: Is Mountain Lion going to kill Windows 8? .. etc.

~~~
lubutu
A meta-corollary: whenever a headline with a question mark appears on Hacker
News, there _will_ be at least one comment referring to Betteridge.

(Please, can this stop?)

~~~
jamesaguilar
Ultra-meta: Whenever a repeating meme occurs on the internet, there will be
someone asking for it to stop. And the answer to whether it will stop will be
no.

~~~
StavrosK
Recursive-super-ultra-meta-induction: Whenever someone refers to a level of
meta-ness, someone else will refer to level n+1.

~~~
jamesaguilar
Transparently false: this thread's length is measurably finite.

~~~
StavrosK
For now.

------
StavrosK
Does anyone have experience with using nginx as a caching proxy? I've used
Varnish and swear by it, it's just an amazing piece of software. How well can
nginx replace Varnish?

~~~
mef
It's great in tandem with Redis. [http://mikeferrier.com/2011/05/14/my-
beautiful-dark-twisted-...](http://mikeferrier.com/2011/05/14/my-beautiful-
dark-twisted-reverse-proxy-LRU-cache/)

~~~
StavrosK
Hmm, that looks very interesting, but I've had Varnish serve 100k requests on
a small (512 MB RAM) VPS without me _actually noticing_ (I only found out when
Google Analytics had a spike the size of a mountain).

Can nginx do that? It sounds like this solution wouldn't really be able to,
having to go through Lua and all, but nginx is an all-around very solid piece
of software too, so I wonder...

~~~
todsul
Nginx can handle much higher loads than Varnish in many situations.

Check this out for a look at Nginx's architecture:
<http://www.aosabook.org/en/nginx.html>

~~~
StavrosK
I have read that, but what are those situations? I really doubt claims of
"much higher" performance, Varnish is very, _very_ performant.

------
bithive123
I think not. Requirements change, and locking myself in to a front-end cache
is not appealing. I may also have things which I can't or won't let others
cache for me, so I want my local stack to be optimized anyway. You won't see
me serving everything out of WEBrick anytime soon just because I have a cloud
cache.

It's nice to be able to defer decisions, especially optimizations, but making
performance someone else's problem entirely seems like it could promote sloppy
thinking and poor work. It's the difference between augmenting a solid
platform when the need arises versus front-loading dependencies because it's
okay to be lazy.

~~~
peterbe
On several of my modern projects, there's not a single piece of static data
that can't be cached forever in a CDN. That's because server-side code is not
getting really good at managing the initial build of static assets and the
delivery of their URL.

~~~
ehutch79
errr. just because it's static, and a pdf, doesn't mean you want it cached on
amazons servers.

sensitive business documents and such.

~~~
pavel_lishin
If it's serious enough, why are you serving it as a static, unprotected
resource?

------
kiwidrew
There's a good post from late 2011, in the context of 12-factor deployment on
Heroku, where the author muses about just using a pure Python server behind a
CDN to serve static content:

 _...and yeah, I think I should bloody use this server as a backend to serve
my in production._

[http://tech.blog.aknin.name/2011/12/28/i-wish-someone-
wrote-...](http://tech.blog.aknin.name/2011/12/28/i-wish-someone-wrote-django-
static-upstream-maybe-even-me/)

------
est
Oh god, the file.read() then write to response method..

At least you should try `sendfile`.

------
devmach
Sure it's obsolete, who needs databases and live, chancing data. All we need
is a static pages. Besides who needs to build his own infrastructure, it's
2012 right ? Let's buy it.

~~~
peterbe
Is this just an odd joke? Of course you need a database.

If you need to build a toaster, you don't need to build an iron smelting
plant. Certain things other folks are better at taking care of.

~~~
devmach
> _Is this just an odd joke? Of course you need a database._

Looks like we need a sarcasm alert on HN, like spoiler alerts :)

------
SeppoErviala
If you want to serve static files cheaply and are moving less than 10TB/mo you
will find that CloudFront is a magnitude more expensive than bunch of VPSes
with lots of monthly bandwidth.

Viability of this depends heavily on the use but if you're moving funny
pictures of cats then you won't be generating lots of income and want to
optimize the bandwidth costs.

------
zimbatm
Before implementing that, be aware that CloudFront doesn't support custom SSL
certificates. If you have any user-session in your app, you don't want them to
login on <https://efac1bef32rf3c.cloudfront.net/login>

------
banana_bread
CloudFront is pretty good, just make sure you are able to config your asset
source in one line. Otherwise you have to use a tool to invalidate the
cloudfront cache frequently during dev and it's not instant.

~~~
peterbe
Invalidation is for suckers. A fresh new URL is much safer.

------
stevewilhelm
No.

