Hacker News new | comments | show | ask | jobs | submit login
Is Nginx obsolete now that we have Amazon CloudFront? (peterbe.com)
75 points by peterbe on July 30, 2012 | hide | past | web | favorite | 61 comments

How does this shit make it to the front page?

First and foremost, everyone needs caching. It's what makes computers fast. That RAM you have? Cache. The memory in your CPU? Cache. The memory in your hard drive? Cache.

Your filesystem has a cache. Your browser has a cache. Your DNS resolver has a cache. Your web server's reverse proxy [should] have a cache. Your database [should] have a cache. Every place that you can conceivably shove in another cache, it can't hurt. Say it with me now: Cache Rules Everything Around Me.

First you should learn how web servers work, why we use them, and how to configure them. The reason your Apache instance was running slow is probably because you never tuned it. Granted, five years ago its asynchronous capabilities were probably haggard and rustic. It's gotten a lot more robust in recent years, but that's beside the article's point. Nginx is an apple, CloudFront is an orange.

Next you should learn what CDNs are for. Mainly it's to handle lots of traffic reliably and provide a global presence for your resources, as well as shielding your infrastructure from potential problems. Lower network latency is just a happy side effect.

> How does this shit make it to the front page?

Obviously the title was a little ridiculous, but I up-voted it, because it's a novel idea. If you're planning to have almost all of your static assets hosted from the CDN (which is pretty reasonable for almost everyone) then why bother with a super high-throughput low-latency web server if the only purpose is to occasionally refill the CDN? If you end up thrashing the CDN and constantly going back to refill it, you're going to have bigger problems.

From what I can tell, the rest of your comment is just super aggressive and doesn't really go anywhere. I will tell you that I have extensive experience with every piece you've mentioned here, and none of that really has any effect on the author's thesis (again, no need to optimize serving static content from your host if a CDN is going to do the legwork).

In general though, when someone works at Mozilla, I tend to give them the benefit of the doubt regarding their knowledge of elementary computing principles.

The problem is this isn't a novel idea. Replacing a fast webserver with somebody else's fast webserver is half of the reason most people choose CDN's (the other reason usually being bandwidth). Just using someone else's fast webserver does not obsolete a different fast webserver.

It's totally acceptable that you might not have the infrastructure to serve all of your static content from your measely web servers and 100mbit connection. CDNs are a great choice here. But this has nothing to do with what web server you use, nor does it mean you should process every request dynamically just because right now you have the resources for it using a CDN.

Even with a CDN and an extremely efficient static content layer, you still have to hand out dynamic content to your users individually which a CDN generally will not help with. At a high enough number of requests you will run out of resources (RAM, CPU, Disk, Network, etc). At this point it's handy to have the fastest things you can so scaling doesn't become one huge clusterfuck. Then whoever re-implements Nginx to help handle requests will write a blog post about how Nginx makes CDNs obsolete.

My point before (and now) is: Caching matters, and having a fast frontend web server matters, and CDNs matter, and none of this is directly related: we're talking apples and oranges.

As an aside, CloudFlare seems to use a novel little fast web server:

  psypete@pinhead ~/ :) wget -S -O /dev/null http://4chan.org/ 2>&1 | grep -e "^[[:space:]]\+Server:"
    Server: cloudflare-nginx
    Server: cloudflare-nginx

I think it still worth it. You can't have 100% availability. Even AWS sometimes fails. It's then just a matter of changing DNS record and have your server(s) handle the load in some kind of degraded mode.

Peripheral but somewhat relevant, at work, we don't upload photos immediately to S3. We have to pre-process them but as soon as they are ready, we show them to the customer serving then locally. And only then upload them to S3.

I treat a CDN as a cache, great to have but I wouldn't exclusively rely on it. For whatever reason, you might need to degrade or show results asap, you can't do that if the CDN is the primary source.

But if the CDN is serving your static assets, your origin webserver only has to generate them once[1] to populate the CDN. It almost doesn't matter how long this takes. And this works well enough that you don't need to bother setting up an Nginx or Apache instance at all. And furthermore, you don't have to copy your static files anywhere -- just use your framework's built-in webserver for everything!

This greatly simplifies your production deployments and in my books that's a huge win.

[1] well... that fudges it a bit, since each POP needs to make its own fetch, and the assets can theoretically drop out of the CDN's cache; so the truth is actually "a handful of times" instead of "once".

How you use a CDN is site-specific. Some people still need to serve hundreds of thousands of requests via a caching frontend layer separate from their CDN origin. It's silly to assume you will never need a fast web server, because unless you aren't serving dynamic content, you will be serving content yourself and the rate will be consistent with the number of users, among other things.

Using a CDN does not actually simplify production deployment, it complicates it. It's an extra layer of complexity, and one you don't know very much about since it isn't your gear. You need an API hook (or a web interface) to invalidate old content when you publish new content. You need a contact with whom you can figure out why a tenth of your users can't route to the CDN all of a sudden, but can route to you. You need to get all your headers right so you don't accidentally push an invalidate to all content and kill your slow-ass origin with new traffic.

Finally, as someone else mentioned, using a framework webserver for production is A Bad Idea(TM). Only one of the reasons why is poor performance. Several others are security, compatibility, cache control, access control, privilege separation, stability, high-availability, virtual hosting, and about a billion other features that webservers have been designed to handle for decades that you will need to reinvent the wheel for with your application framework, which was never intended to be a webserver.

The reason your framework has a webserver is it's the simplest way to get the dynamic content from the app server to the frontend proxy. For example, AJP is such a huge pain in the ass that most Tomcat admins I know use http to communicate between app server and frontend (and it's more compatible). But would they use Tomcat as their production server? Not if they wanted to stay sane.

I wonder if anyone actually use the framework's built-in webserver for everything! Sounds Great and i wonder what happens in practice. Anyone know of any real world usage on this?

Sounds great except all the framework with built-in webserver advice _against_ using them on production server, at very least for security reason.

More generally: once you adopt any of the various schemes for having a inbound proxy/front-end cache (Fastly, CloudFlare, CloudFront, or an in-house varnish/squid/etc), are all the optimizing habits of moving static assets to a dedicated server now superfluous?

I think those optimizing habits are now obsolete: best practice is to have a front-end cache.

A corollary is that we usually needn't worry about a dynamic framework serving large static assets: the front-end cache ensures it happens rarely.

Unfortunately it's still the doctrine of some projects that a production project will always offload static-serving. So for example, the Django docs are filled with much traditional discouragement around using the staticfiles serving app in production, including vague 'this is insecure' intimations. In fact, once you're using a front-end cache, there's little speed/efficiency reason to avoid that practice. And, if it is specifically supected to be insecure, any such insecurity should to be fixed rather than overlooked simply because "it's not used in production".

Tools like Apache and nginx are not ONLY faster at serving files with less load on the system than a script. They are also more thoroughly audited and battle-tested. And their declarative configs won't go wrong just because the person writing them missed an unbelievably subtle corner case introduced by using a Turing-complete language.

It's important because there are so many opportunities for error in serving arbitrary files out of a filesystem with some rough and ready script.

For example, if you are serving files out of the same filesystem that holds your configs and secret keys then you should be a bit nervous. You have to get the permissions right and make sure you don't have anything improper under a directory which you are publishing as a whole. If your users are uploading files to the same place you should feel really nervous.

There are too many easy ways for people to be negligent and screw this up. In the context of designing an opinionated framework, you accept a lot of social liability and you are really dropping the ball if you are setting up tired and ignorant users to screw up this badly, without even a warning in the docs to think about what you are doing.

With n script languages and m static file serving implementations per language, there are now (n*m) obscure packages to audit. Not counting their combinations...

Your idea to just "fix the insecurity" and remove any warning from the docs means to do things which you merely believe to fix the insecurity, and then overlook the underlying risks of the approach.

I am also not sure you are right when you suggest that there cannot be any performance (or reliability) impact of pushing static serving into some script library. Just as these are not audited they are also not nearly as likely to be benchmarked and tuned.

If there is a reason to serve static files out of script, that reason will be because of some positive reason (like convenience or the need for some particular flexibility) rather than some vague sense that using Apache is "obsolete".

Here's the thing: Django makes the standard staticfiles app available. It's a great convenience, eliminating extra steps (collecting/moving static assets) and processes (another nginx/etc). And many projects' 'dev'/prototype incarnations are already open to the world, in one way or another.

So if this is a security sin, they've already encouraged its widespread commission. A bit of "don't do this" or "don't do this in production" hand-waving in the docs don't resolve a security problem, if there's a real vulnerability in the current implementation.

On the other hand, committing to the idea that the bundled staticfiles app may be used this way -- that in fact it's a good and modern way to operate, in production, once you have an inbound proxy cache -- would mean accepting deeper responsibility. It would give up the hedge, "if there's a security bug, we warned you!". It's not taking on n*m obligations: it's taking on 1 language, 1 module. And it's not even a new module or an obscure need... it's exactly the sort of thing an opinionated framework can solve for people.

The old opinion -- "take this risk in development, but by the time you get to production use the 'best practice' of a separate static server" -- should be updated to a new opinion -- "the 'best practice' is now a front-end proxy cache, which makes the performance benefits of an extra static server negligible, so we're no longer going to assume everyone will do that in production".

An admonition against using other less-tested code to achieve the same effect would still be appropriate. But not nonspecific FUD about the framework's own code -- that it is "probably insecure". Anything that's truly "probably insecure" ought to be fixed.

You are correct that the Django folks should be a little more careful with the staticfiles app.

But I disagree with you that the best practice should change from a static server to a front-end proxy cache. Rather, the best practice should consist of using both.

An important concept in security is deperimeterisation- the idea that you shouldn't assume you have a fixed border, inside of which is secure. So by all means, use a front-end proxy cache. But also put some effort into hardening your individual servers, treating them as if they will be operating under a full load.

"eliminating extra steps (collecting/moving static assets)"

Interestingly, we extended the collectstatic command in many ways to perform minification, combination of assets, generation of sass and javascript variables (based on settings in python), etc. It's part of our deployment and if we were serving static files through django, we would still have to run a similar command.

I'm also happy that nginx is handling file uploads, aliasing, redirection, virtual hosting on different IPs and Ports with different access control, real ip extraction (when behind a load balancer), etc.

I'll be following more closely this trend of moving static asset hosting from a regular web server to the application container, but I believe that web servers like nginx and apache can do a lot more than just serving static files (at least, in complex deployment scenarios).

That's certainly a value of a 'project prep step' (whether it involves static export or not).

Not also, though, that a service like CloudFlare now puts some of these optimizations (minification, asset-combination, obfuscation, etc) into the cache layer, as optional cloud 'app' services to be enabled/disabled/paid-for as desired.

Not saying that way is better for all, but it has potential as a convenience for some, getting those same expert-level optimization benefits while retaining a simple project/deployment structure.

The point was that; what difference does it make if Apache/Nginx is better when it's only doing it once. Or, more specifically, once per revision of the code.

Regarding Django and staticfiles; Rightly so, because they're not ready for production and being taken over my the CDN. You need to sprinkle some django_compressor on it first but still that doesn't get the cache headers perfectly right. Or the gzipping.

Wouldn't it be great if a framework got the headers/compression right without sprinking extra options in?

I think in Django's case, it's not so much "this is insecure", but more that runserver is simple/stupid and hasn't been built or tested for serving multiple concurrent users.

Nor should it be. There are plenty of WSGI capable web servers that can be installed into a Django project as apps.

If you want to run just Django behind a frontside cache and server static media from the same connection, simply install an appropriate app. Gunicorn and cherrypy both 'drop in replacements' for the built in runserver and both well up to the task.

Yes, but I wasn't talking about the admonitions against runserver -- I was talking about the admonitions against the staticfiles app.

I propose that staticfiles under gunicorn is a reasonable choice. Further, if fronted by a inbound proxy cache, it could perhaps even be the reference/recommended setup for high-volume production sites, rather than the current doctrine that such sites should have some extra static-collect/export to a helper server.

When you're only using nginx as a CDN, then yes another CDN can replace it.

nginx can do a lot more than serve static files.

Nginx is not a CDN.

a hammer is not a paperweight

Wait, so if I pay more for a CDN to deliver my static data, that will work better than when I try to save money and do it myself?

[Insert Oscar winning Face of Shock here]

nginx still buys you SSI (which allows you to, for example, cache the same page for all users and have nginx swap out the username with a value stored in memcache), complex rewrite rules, fancy memcache stuff with the memc module (ex: view counters), proxying to more than ten upstream servers, fastcgi, and lots of other fancy stuff.

Cloudfront is a replacement for varnish, not nginx.

Isn't that better to do in a programming environment you're more familiar with? LIke python/rails/ASP Then you have much better tools for building unit tests and stuff too.

The performance of doing it in nginx is _much_ better, and you can't do anything complex enough for unit tests to pay off. For the SSI stuff, you have your web framework of choice produce html with SSI tags in it, cache that, and nginx just swaps out the volatile bits at the last second (even for pages in cache).

I don't have much experience with Varnish, but it does support ESI, that is similar to SSI.

I was once told by somebody wise that if a post asks a question, then the answer is usually no.

e.g.: Is Mountain Lion going to kill Windows 8? .. etc.

A meta-corollary: whenever a headline with a question mark appears on Hacker News, there will be at least one comment referring to Betteridge.

(Please, can this stop?)

Ultra-meta: Whenever a repeating meme occurs on the internet, there will be someone asking for it to stop. And the answer to whether it will stop will be no.

Recursive-super-ultra-meta-induction: Whenever someone refers to a level of meta-ness, someone else will refer to level n+1.

Transparently false: this thread's length is measurably finite.

For now.

It will always stop, at least for the most part. It will take a long time, though, and by then it will only be a dim memory, so you probably won't notice.

(Netcraft confirms: BSD is dying!)

Why? I never heard about this "law", and now I just learnt about it.

Betteridge's Law of Headlines[1]

[1]: http://en.wikipedia.org/wiki/Betteridges_Law_of_Headlines


Does anyone have experience with using nginx as a caching proxy? I've used Varnish and swear by it, it's just an amazing piece of software. How well can nginx replace Varnish?

Hmm, that looks very interesting, but I've had Varnish serve 100k requests on a small (512 MB RAM) VPS without me actually noticing (I only found out when Google Analytics had a spike the size of a mountain).

Can nginx do that? It sounds like this solution wouldn't really be able to, having to go through Lua and all, but nginx is an all-around very solid piece of software too, so I wonder...

Nginx can handle much higher loads than Varnish in many situations.

Check this out for a look at Nginx's architecture: http://www.aosabook.org/en/nginx.html

I have read that, but what are those situations? I really doubt claims of "much higher" performance, Varnish is very, very performant.

I think not. Requirements change, and locking myself in to a front-end cache is not appealing. I may also have things which I can't or won't let others cache for me, so I want my local stack to be optimized anyway. You won't see me serving everything out of WEBrick anytime soon just because I have a cloud cache.

It's nice to be able to defer decisions, especially optimizations, but making performance someone else's problem entirely seems like it could promote sloppy thinking and poor work. It's the difference between augmenting a solid platform when the need arises versus front-loading dependencies because it's okay to be lazy.

On several of my modern projects, there's not a single piece of static data that can't be cached forever in a CDN. That's because server-side code is not getting really good at managing the initial build of static assets and the delivery of their URL.

errr. just because it's static, and a pdf, doesn't mean you want it cached on amazons servers.

sensitive business documents and such.

If it's serious enough, why are you serving it as a static, unprotected resource?

There's a good post from late 2011, in the context of 12-factor deployment on Heroku, where the author muses about just using a pure Python server behind a CDN to serve static content:

...and yeah, I think I should bloody use this server as a backend to serve my in production.


Oh god, the file.read() then write to response method..

At least you should try `sendfile`.

Sure it's obsolete, who needs databases and live, chancing data. All we need is a static pages. Besides who needs to build his own infrastructure, it's 2012 right ? Let's buy it.

This misses his point that originally he had app <-> nginx <-> user, then he added cloudfront so he had app <-> nginx <-> cloudfront <-> user. At that point is nginx really serving much purpose?

I think for the average use case the answer is no, nginx doesn't buy you much. However nginx is a lot more flexible than cloudfront, so if you have more complicated caching rules and such nginx is a perfect fit.

It's only an average use case if you're blogger and use disqus for comments. For most dynamic web applications we need more flexible caching methods and we need to able to change/update cached part on our side.

Don't get me wrong : cloudfront is a cool tech and obviously useful too but a bold claim like " we don't need nginx ( or any performance oriented web server ) because we have cloudfront " is just the very wrong way of thinking.

Is this just an odd joke? Of course you need a database.

If you need to build a toaster, you don't need to build an iron smelting plant. Certain things other folks are better at taking care of.

> Is this just an odd joke? Of course you need a database.

Looks like we need a sarcasm alert on HN, like spoiler alerts :)

I am going to make a codecademy course on detecting sarcasm just for you :)

CloudFront can do a lot more than simply serve up static data.

If you want to serve static files cheaply and are moving less than 10TB/mo you will find that CloudFront is a magnitude more expensive than bunch of VPSes with lots of monthly bandwidth.

Viability of this depends heavily on the use but if you're moving funny pictures of cats then you won't be generating lots of income and want to optimize the bandwidth costs.

Before implementing that, be aware that CloudFront doesn't support custom SSL certificates. If you have any user-session in your app, you don't want them to login on https://efac1bef32rf3c.cloudfront.net/login

CloudFront is pretty good, just make sure you are able to config your asset source in one line. Otherwise you have to use a tool to invalidate the cloudfront cache frequently during dev and it's not instant.

Invalidation is for suckers. A fresh new URL is much safer.

Note that you can now configure CloudFront to take query strings into account when caching files. Tweaking the query string is basically instant, unlike waiting for the invalidation tool...


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact