
Ask HN: Is load balancing essential, and where? - Skywing
Hi all,<p>I'm slowly but surely finishing up my weekend(s) project. I'm beginning to think about deployment. I know that most sites utilize some sort of load balancing at various parts of their tech stack. I'm wondering how this is performed most correctly. I've never taken a project this far, and therefor have never had to think about this until now.<p>What parts of the data flow do most people use load balancing at? How would you load balance your database and web server? I figure these are two that take most of the load.<p>When I say load balancing, I'm not really talking about all of the techniques that can be used such as serving up static content from a single web server and dynamic content front another, or caching database queries - I'm already doing that. I'm talking about actually balancing incoming user requests and processing load across a cluster of machines.<p>I know, for example, that mongodb handles its own load balancing with its sharding and "mongos" client.<p>I guess my main question or point of confusion here is this: when I think of load balancing, I think of a single host name taking incoming requests and balancing them across a selection of "worker" servers. Is this still considered the current way of balancing certain things? And if so, what things would this need to be applied to in the Web world?<p>Thank you
======
charlesju
Here is my tip for you. Sign up for a hosted solution with a well known
framework (ie. Rails). They will take care of all the server configuration
issues so you can just focus on your application.

Heroku does this for free and scales up with you.

Engine Yard starts at $85.14/mo.

Google App Engine also does this for free and scales up with you.

------
frio
For the webapp load balancing we do (which admittedly isn't very much; we're
an ISP and aren't _overly_ concerned with how quickly our homepage loads :p),
we just use a DNS round-robin across a bunch of servers. We keep a short TTL
so if a server dies, there's at most 5 minutes of downtime.

A better solution would be proxying, but we haven't had the need to step up to
that: stick an nginx in front, put _n_ backends behind it, and it'll
distribute load across them as it sees fit. Of course, for reliability, you
then need to drop a DNS round-robin on a couple of nginxes, so the whole thing
repeats itself :).

But yeah, that's just for the frontend. For the backend databases, you'll want
to look at replication and doing writes to one and reads from another, etc. We
haven't had the need to do that, luckily :).

~~~
whimsy
>we're an ISP and aren't overly concerned with how quickly our homepage loads

Is this sarcasm?

~~~
frio
Not at all. The priority has and will always be making sure customers can get
to _other_ webpages quickly; we're not of a scale where the load on our own
sales/account management pages exceeds the needs of 2-3 boxes (with a couple
of redundant virtuals ready to fire up). That they feel fast to the CEO,
without a defined metric for "fast", is about the only criteria.

~~~
whimsy
That's a great priority, and I applaud it. On the other hand, I'd have extreme
apprehension of subscribing to any ISP with a webpage that loaded slowly - it
would make me very concerned about their competence.

------
cd34
Depends on your situation. Dual load balancers serving a virtual IP allows one
to fail while still maintaining connectivity. Behind that, you might run a
pool of machines doing static files (or use a CDN where they handle much of
the redundancy for you) and machines that run your app along with machines
that handle your datastore. Design your application's architecture so that you
can pull off pieces to enable easier scaling.

------
dieselz
I typically deploy on slicehost, so I setup a 256meg slice [$20/mo] and put
nginx [<http://wiki.nginx.org/HttpUpstreamModule>] on it to split the traffic.
This configuration can handle enough traffic that if traffic became an issue,
I would be making enough money to move the application to a dedicated
environment.

------
jmtulloss
I you're using rails, just start out with heroku. If you're using Django, try
djangy. It's way easier to let the pros do that stuff and just worry about how
your app works.

------
aonic
nginx, and pound are two widely used web load balancers. You should be able to
find plenty of blog posts and write-ups with sample configs for setting those
two up.

For databases, at least with MySQL, you should setup a master -> slave(s)
setup to send reads to slaves and writes to master. If you have many slaves,
you can configure your DB classes to randomly select DB servers, or you can
setup a TCP load balancer using something like HAProxy to load balance the
slaves.

------
tomjen3
Honestly, if you are just starting out I would not worry too much about it.
Concentrate on getting a lot of users first, them worry about scale.

------
thwarted
I've used both LVS (Linux Virtual Server) and haproxy, both on linux.

LVS (managed with the ipvsadm command) runs in kernel mode and mainly just
routes packets, and perhaps does DNATing (depending on how you configure it).
I've had the most luck with using the NATing option, I think the direct
routing are a little bit harder to set up from a network topology standpoint,
the NATing option is more, uh, "obvious". The drawback is that your topology
has to be setup so that your LVS load balancer is also the default gateway.
LVS is protocol agnostic, it's a layer 4 load balancer. In this setup, you'll
need to run SSL on your webservers. You'll configure LVS with something like
keepalived, which is a userspace program to do healthchecks and manage the
kernel interface to LVS. There are others, but I'm only familiar with
keepalived.

haproxy is a user-space program. It can be both a plain TCP load balancer but
also can do Layer 7 load balancing of HTTP based on header matching and
whatever ACLs you want to write. haproxy doesn't support SSL out of the box,
you need to use something like stunnel in front of it. Of course, you can
configure all SSL traffic and just use TCP load balancing, but then you can't
do layer 7 load balancing (because haproxy can't see into the request) which
is one of haproxy's strengths. Of course, since it is userspace, your web
server will see your load balancer's IP as the source of the request. There is
an apache module, rpaf (I think), that honors X-Forwarded-For headers that can
be inserted by haproxy. I think something similar exists for nginx and
lighttpd. You need a patched version of stunnel to have it insert X-Forwarded-
For headers if you want to wrap haproxy in stunnel.

Right now, I'm preferring haproxy because the setup can be much more
customized. It's also easier to firewall haproxy, since it's application
level. Getting iptables and LVS to work well together is possible but can be
confusing if you use NAT mode because of where LVS integrates into the IP
stack in relation to where iptables integrates. haproxy has many more options
to deal with timeouts and healthchecks than keepalived.

But if you're just finishing up a weekend project, load balancing may be a
premature optimization right now. The thing to keep in mind is that you want
to make sure your program will work with load balancing so it's easier to
transition to it when the time comes. Things like not assuming there is only
one node services all requests -- common problem with PHP and the default PHP
session handler is that it writes its sessions to the local file system. If
future requests come in and go to a different web server, it won't see the
session. This often manifests itself as having to continuously login. You'll
need some way to share session data between web servers (or store everything
in (encrypted) cookies. This problem is in no way limited to PHP, however.

~~~
dieselz
[http://allurcode.com/2010/03/16/php-session-handling-with-
me...](http://allurcode.com/2010/03/16/php-session-handling-with-memcache/) I
have memcache PHP sessions deployed on a site that at its peak was seeing
4.5MM page views daily. It works flawlessly.

