Hacker News new | past | comments | ask | show | jobs | submit login
Nginx Load Balancing Basics (jsdelivr.com)
172 points by jimaek on Jan 14, 2013 | hide | past | web | favorite | 48 comments

Something that is pretty important, and is missing from this guide, is to make sure you add headers indicating what the original IP address for the requests were (either in x-forwarded-for or x-real-ip or something else common.)

Can do this in the root location "/" with:

proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

Also good to remember to put another header in for the forwarded protocol (if you're terminating an ssl tunnel at the balancer.)

On a security note - your application code should only trust those headers (X-Forwarded-For, X-Real-Ip, etc) for IP lookup if you control the load balancer and strip it from incoming requests.

There is nothing to stop a malicious client adding the header themselves and if you rely on IP lookup (i.e. Dev Mode active for for access control you can leave yourself wide open. While I can't find the article at the moment, Stack Overflow accidentally gave admin level access to the site because of this over sight.

In my experience if a client adds their own X-Forwarded-For header trying to spoof their IP, nginx simply prepends it to the X-Forwarded-For header like ",", where is the address the client supplied in their spoofing attempt, and is the actual IP address forwarded by nginx.

So you can choose to trust only the rightmost one, if there are several entries in the list.

For backend nginx instances (we use nginx to balance application servers, and nginx right in front of Unicorn on those application servers) use the Real IP module to have logs transparently show the original request IP not the load balancer's IP.


Agreed, and also the host header

  proxy_set_header Host $host;

I'm really on the fence between Haproxy or Nginx. I have used Haproxy successfully in the past, but I'm tempted by the simplicity of Nginx, especially now that it supports SPDY.

Would like to hear people's thoughts on using Nginx in "real life" for load balancing rather than Haproxy.

I can't compare it to HAProxy, but nginx load balancing was probably the simplest and most reliable part of our web infrastructure, and did exactly what we wanted and needed. Never played around with SPDY, but I liked the various options regarding server weighting, SSL termination, and like you mention the ease of configuration. It wasn't too fancy, but solved a problem and solved it well.

Unfortunately we had to switch off of it due to PCI compliance concerns[1], but I'd use it again in a heartbeat.

[1] not because there were actual issues, but because other solutions were fully audited out of the box. I'm hardly surprised that we've had more issues with those solutions than we ever had with nginx, including the time when we barely knew how to configure the thing. One of the unavoidable hazards of PCI Level 1 :( We still use it for the actual web requests quite happily.

I love nginx. You stole the words out of my mouth re: simplicity & stability of nginx for load balancing. If there's one thing you can really nail like a pro while still a rookie (like me) it's configuring nginx to load balance.

Gives me confidence in going with nginx for LB.

Was the lack of a web ui (like haproxy has) ever a concern? How did you keep track of dead servers behind the LB?

We didn't have enough servers behind it to really deal with dead servers, to be honest. Seemed to detect a failed server and route around it quickly enough, and we have monitoring per server in place to go in and reboot the thing or whatever.

The configs are pretty straightforward, but might get a little nuts if you're dealing with hundreds of servers behind the thing. I don't have to wear a sysadmin hat too frequently (thank god) but when I did it was pretty easy to deal with.

Huge fan of the fact that reloading the config would perform a configtest automatically before trying to apply the new settings. I don't know why all software doesn't do this.

care to elaborate what made it a PCI compliance concern?

I don't know too many details as I wasn't on that side of the PCI audit (more handling the software we write), but my impression was that off-the-shelf hardware was already certified where nginx was not. It was also one less component for us to manage, as we opted for hosting where we manage our web stack and the hosting company deals with the hardware and network.

I've been using it heavily under production loads, and the balancing portion hasn't blinked.

I'm also doing SSL termination at it, so I don't really have any metrics on the balancing in isolation, but for moving 50-100 concurrent connections around it hasn't blinked.

I do really like HAProxy's more flexible up/down monitoring, though. In the past, we've done the trick with separate control connections that we can bring up & down with iptables to shuffle traffic around without any broken connections.

Having SSL termination on a single box is definitely a big plus; that's how I wanted to do it as well.

Did you miss Haproxy's web ui? Does nginx have any way of reporting if a server is down?

HAProxy 1.5 supports SSL. I've had good luck with it so far.

How much traffic are you pushing through it? I've been planning to set up stunnel + HAProxy on separate instances once we're comfortable going with 1.5, but am curious if we could get away terminating SSL on the same instance as HAProxy runs.

So far it has just been used in development and QA environments. Unfortunately I'm not sure how much traffic our load testing pushes through it.

Nginx has also formed the basis of CloudFoundry's routing tier, and this cloudfoundry.com and appfog.com. Nginx load balancing can be very simple, but you can customize it to your heart's content with Lua [1].

You can also use it for your application tier with Passenger, [U]WSGI, FPM, FCGI...

[1] https://github.com/cloudfoundry/cf-release/blob/master/jobs/...

What does Nginx offer that HAProxy doesn't? We've been using HAProxy 1.4 for over a year now, up to 1500 req/s on a virtual machine. It's the most reliable piece of all of our infrastructure. Hardest part has been tuning the Linux instance for a lot of connections when encountering DDOS attacks and the like.

We also run another HAProxy instance for rate limiting for attacked sites that feeds back into the main load balancer. And this is Layer 7 load balancing including inspecting headers. Never breaks a sweat. 1.5 supports SPDY, which is the last big thing for us (though I need it in the opposite direction from other mentions, used alongside stunnel).

To add more, as well: we use HAProxy for deploying (tell it to take down nodes using the unix socket), we have it set up to meter requests to a machine just brought back online, and with one command we can route all traffic to a standby Apache instance that serves a maintenance page. On top of that it has the monitoring page, and using the socket we pull stats every minute and ship them off to Librato Metrics as well as watch for high sessions and the like. I (obviously) cannot sing its praises enough.

Maybe I'm wrong, but I think Haproxy only supports NPN and not SPDY itself? I'd be delighted if it did support SPDY out of the box!

Whoops! I definitely said SPDY and definitely meant the PROXY protocol (http://haproxy.1wt.eu/download/1.5/doc/proxy-protocol.txt)

You can setup stunnel to terminate SSL then append this line to the request that's sent to HAProxy, which will then add an X-Forwarded-For header from that info. This may be relevant to your interests, though: http://www.igvita.com/2012/10/31/simple-spdy-and-npn-negotia...

You cannot use nginx as a proxy in front of websocket backends currently if you have need for that. nginx 1.3 has it on the roadmap though.

HAproxy works correctly for websocket backends today.

Can either of those solution do dynamic secure web sockets? I want to terminate SSL to various dynamic backend web socket servers. I'm spinning up additional web socket servers per user for user privilege separation.

What exactly are you trying to do? Where in the chain are you hoping to terminate SSL? Do you need to inspect the traffic before load balancing it?

I'm terminating the SSL outside VMs, so the VMs can be compromised without giving up the certificate's private key.

The VMs are each running a websocket server running as the user that will be connecting. This makes the security aspects very easy to handle. Each user can only modify their own environment and write to their own files (backed by unix permissions). Even if they root the VM (excluding hypervisor vulnerabilities) they won't be able to access any private data.

If I want to be able to hot migrate VMs between physical machines, I need some way of dynamically proxying the connections. If I had lots of IPs, I could simply let each VM have an IP address and the SSL terminator would route properly no matter where I move the VM.

Does that make sense?

No, not really. Sounds like you want to update the backend servers that your load balancer is proxying to while the load balancer is up? Can't you just create an internal network if you need IPs? I think this hinges on what you mean by the phrase "dynamic proxy"?

Hmmm! Good to know. Does 1.3.x unstable have websocket support already? We're using the 1.3 branch in production and it's been super reliable.

Another awesome thing - as of 1.3.1/1.2.2, nginx can do least connections load balancing, which is better if your upstream response time isn't very consistent.

Another alternative is Hipache:


It's based on NodeJS, and it's really good. I've been using it in front of three web servers serving around 800 small-to-medium business websites for the last six months and it's been fantastic.

It pull configuration data from Redis so you can easily do things like automating deployments etc.

It would be nice to have a library for adding server/s to an upstream on the fly. Something like:

nginx addserver upstream-name

EDIT: Added upstream name

If you really want to do it dynamically, you could read the upstreams from Redis with a combination of Nginx modules. Or use Puppet and restart, as the other poster suggested, as it won't break connections in progress.

Storing and reading the configuration from Redis will be slower at runtime, but should scale more easily for a large number of hosts and be much more responsive than using Puppet or Chef. I recommend choosing either solution based on the number of hosts and rate of change.

You can also use a custom DNS resolver with Nginx, and point it to a tool like DNSMasq or PowerDNS [1].

[1] http://wiki.nginx.org/HttpCoreModule#resolver

That is what puppet or a similar configuration management solution is for (or doing it by hand I suppose). There's not really a realistic scenario I can think of where you'd be wanting a built-in nginx function to service this requirement.

For those interested, the "application_nginx" cookbook for Chef does just this. It can configure upstream servers based on role very easily.

In puppet it's pretty easy as well. The puppet module modification I made to puppet-nginx allows you to do resource collection for a group of upstream servers (and thus add to a group of upstream locations transparently).

See my comment about Hipache -- it pulls config from Redis on the fly.

That's really helpful & nicely written - shouldn't that be added to the NGINX Wiki in some way (either as a complete how-to-do-this or just as a link)?

Thank you! I do not feel right about adding it myself. But maybe somebody eventually will do it

What does the syntax server unix:/tmp/backend; do? Does that serve from the local filesystem without going through another http daemon?

It uses a unix socket. It's also a bit faster than using a TCP socket to proxy (at least in my limited testing), so I prefer it.

It was used just as an example that you can use anything as an upstream. Its just a unix socket that can be listened by php-fpm for example.

Does anyone know of a quality basics guide for HAProxy?

I actually think the basic docs are very good for going over theory. There's a more readable version of them on Google Code: http://code.google.com/p/haproxy-docs/wiki/HAProxy

Beyond that, it depends how you're using (HTTP load balancing, TCP only, etc) Got any specific questions? We've been running it in production for over a year.

Can anyone compare Nginx and HAProxy, and also provide an example of using both in an infrastructure?

Here or in an article? We use both in our infrastructure.

Generally I would say that if you're proxying web connections and need caching or the ability to do lots of complicated rewriting on the proxy side, use nginx. If you're proxying database, mail or similar... haproxy. If you don't need any caching or similar, either nginx or haproxy depending on your application.

I'm not terribly experienced with Nginx, but HAProxy was (and is) a load balancer first where Nginx is a server with load balancing abilities (same as Apache can, though it gets less love these days). HAProxy has pretty powerful HTTP support and capabilities, however, so I'm not sure I buy the other argument in this thread.

HAProxy also allows you to modify balanced nodes while the server is running, and has fantastic logging once you get used to looking at it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact