
HAproxy in the era of Microservices - thorinus
http://47ron.in/blog/2015/10/23/haproxy-in-the-era-of-microservices.html
======
patio11
My current project consists of ~7 microservices, which has been wonderful for
many reasons but exposed us to substantial complexity on this score. At
present ~2 of them have to receive communications internally or externally
over HTTP, so we have those fronted by a proxy to make service discovery
easier. (It is, regrettably, not an off-the-shelf proxy, since one of them
required application-specific routing to a sticky host(s) depending on the
contents of the HTTP request).

The thing which has made our lives MUCH easier from an orchestration,
discovery, and routing perspective is NSQ, an OSS message bus. It's an event
model rather than a request model -- producers of events say "Hey world, by
the way, EventType just happened." and consumers can register to say "Apropos
of nothing, please tell me about any EventType" without producers or consumers
having to be aware of each other or aware of what a consumer intends to
actually _do_ with regards to the event.

Events go into topics ("a type of event") and each topic can have an arbitrary
number of channels ("one thing which should happen for each event in this
topic"). If you're clever about where you raise events, you can stitch
together arbitrarily complex systems by just adding new consumers on new
channels, without having to make the producers aware of the new functionality
at all.

Concrete example: We wanted a log of all orders made to the stock exchange. We
have a little utility, nsq-to-s3, which takes all the events on a
topic/channel, persists them in memory/disk for a while, and periodically
flushes that log to S3 for long-term storage. This trivialized the logging
feature. (I was going to write our own nsq-to-s3 but an OSS project of that
same name existed and was adequate to our needs, so we didn't have to go to 8
microservices... yet.)

This was much easier than making a generic S3 logging service and then making
edits to seven microservice codebases to make a HTTP request out to the
logging service, with e.g. retry/failure recovery/monitoring/etc, everywhere
in those codebases we wanted something logged to S3.

~~~
ma2rten
This sounds similar to kafka, how does it compare?

~~~
patio11
I have heard from several folks that they use Kafka where we'd use NSQ, but
have insufficient experience with Kafka to give an informed opinion on the
tradeoffs. I will say that few technologies I've used in my career were as
easy to adopt as NSQ was.

------
jolynch
The most significant issue with HAProxy and large SOAs is the lack of really
good dynamic re-configuration. You don't want to have to manually change your
configuration every time someone wants to add or change load balancing to a
service, and if you do it automatically you run into HAProxy reloads being a
reliability problem as they drop traffic on the floor due to Linux being goofy
in how it implements SYN handling.

It looks like there is a lot of good advice on this thread for how to mitigate
the configuration problem (bamboo, consul template, etc ...) but then you run
into reloads, and even if you use zero downtime restarts as I described in my
HAProxy deep dive ([http://engineeringblog.yelp.com/2015/04/true-zero-
downtime-h...](http://engineeringblog.yelp.com/2015/04/true-zero-downtime-
haproxy-reloads.html)), you still end up with periodic ~50ms blips in your
timings.

We've found that Synapse
([https://github.com/airbnb/synapse](https://github.com/airbnb/synapse)) is
best in class at managing HAProxy for us in a fairly large SOA. It is really
intelligent about doing as much as possible dynamically over the HAProxy stats
socket as opposed to reloading for every configuration change, and we can
finely tune how often we are reloading HAProxy. The best part is that it is
fully pluggable. Use DNS, marathon, zookeeper, etcd, etc...: Synapse can
manifest that service registration information into HAProxy configurations.

~~~
jrv
Another problem with the configuration reloading is that all the
backend/service metrics counters that HAProxy exposes on its metrics endpoint
get reset on every reload. That makes for really noisy metrics gathering in
situations where you have rapid successive changes to the config, such as
gradually scaling up hundreds of instances of a new version of a microservice
and then scaling down the old version. When you then scrape the metrics via
Prometheus using an bridge like HAProxy Exporter
([https://github.com/prometheus/haproxy_exporter](https://github.com/prometheus/haproxy_exporter)),
you lose a lot of counter increments due to the frequent resets, so you get
lower and noisy rates for that time period.

~~~
jsmeaton
I hadn't heard of Prometheus before. We use nagios for alerting/alarms, and
graphite/grafana separately for metrics. Would you recommend looking into
prometheus as an alternative?

~~~
jrv
As a Prometheus author, I'm biased, but yes :)

See also [http://prometheus.io/](http://prometheus.io/) and
[https://www.youtube.com/watch?v=HjN23GgCzQY](https://www.youtube.com/watch?v=HjN23GgCzQY)
for an intro.

------
exelius
HAProxy is a great fit for most microservices, but I'm not sure it has the
feature set for most organizations that are deep into microservices - at least
in the way the author describes it. The problem is more business-driven than
technical; so let me explain why (and feel free to correct me if you think I'm
wrong!)

Most companies don't need microservices, so I'm making an assumption that any
company using microservices has both the volume and complexity of business
partners that make a microservices approach a good business idea (this
assumption is assuredly false, but still useful). And if you've gone this
route, the next step is generally organizing your business around an SOA where
business units can be represented by an API (microservices are useful from a
business perspective because you have many stakeholders within a company that
want to make small changes frequently but independently). Once you've done
that, you start wanting to encapsulate your relationship with your customers
in an API as well, and the eventual destination (if your company survives long
enough and is successful) is that you just operate a set of APIs that power
your customer-facing UX product in addition to the API-based services you
provide to your high-volume clients.

My point is this: the end state of a microservices architecture is an API
gateway. API gateways provide essential services such as authentication,
access control, and connection throttling. You still need something like
HAProxy to provide you with load balancing, health checks, etc. across many
nodes - but you're going to have a very simple instance of HAproxy for each
microservice and rely on your API gateway to organize everything under a
single domain.

With microservices, your configurations can get incredibly complex very
quickly, so it's often better to maintain independent components with very
simple configurations (that can be easily updated dynamically) than to try to
have a single component in the middle of your stack upon which many other
components are dependent.

~~~
imdsm
I think that HAProxy can work, but it needs assistance. As it doesn't allow
you to use resolvable hostnames, you can't rely on DNS load balancing for
microservices. As I said in my comment on the article, Consul + Consul-
template works well for this.

~~~
vidarh
As of 1.6 it _does_ allow resolvable hostnames [1]. This is the example given
in the release announcement:

    
    
        resolvers docker
         nameserver dnsmasq 127.0.0.1:53
     
        defaults
         mode http
         log global
         option httplog
     
        frontend f_myapp
         bind :80
         default_backend b_myapp
     
        backend b_myapp
        server s1 nginx1:80 check resolvers docker resolve-prefer ipv4
    
    

[1] [http://blog.haproxy.com/2015/10/14/whats-new-in-
haproxy-1-6/](http://blog.haproxy.com/2015/10/14/whats-new-in-haproxy-1-6/)

~~~
exelius
That's pretty new; but still doesn't remove the need for an orchestration
layer (though it does make it a lot easier since I often rely on DNS with
short TTLs for global routing).

------
adamtulinius
I'm sorry to be that guy, but the lack of contrast makes the text difficult to
read.

~~~
tupacmaister
It's good enough

~~~
buffportion
I'm sorry to be that guy, but the lack of contrast makes this comment
difficult to read.

------
meddlepal
Folks should take a look at Baker Street (bakerstreet.io) which is a HAProxy-
based system for routing traffic between microservices. Baker Street is
designed to be really easy to get up and running (the install is super quick)
and is based on the Yelp popularized SmartStack. However, one of the beautiful
aspects of Baker Street is that unlike SmartStack it does not require running
and maintaining a ZooKeeper cluster which can be a PITA for teams just
starting to work with microservices.

Full disclosure: I am the lead developer of Baker Street :)

Website: [http://bakerstreet.io](http://bakerstreet.io)

Source:
[https://github.com/datawire/bakerstreet](https://github.com/datawire/bakerstreet)

~~~
toomuchtodo
Hey there!

Just a small nitpick: Running Zookeeper is insanely simple. We deployed it in
2-3 days, and it has had literally 0 issues over the last year.

I highly recommend [https://github.com/mbabineau/docker-zk-
exhibitor](https://github.com/mbabineau/docker-zk-exhibitor) to get started
(Docker container that wraps together Zookeepher and Netflix's Exhibitor
manager for it).

Remember, pick boring technologies that just work.

~~~
felixgallo
zookeeper is not a boring technology that just works:

[http://blog.cloudera.com/blog/2014/03/zookeeper-
resilience-a...](http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-
at-pinterest/)

[http://arstechnica.com/information-technology/2015/05/the-
di...](http://arstechnica.com/information-technology/2015/05/the-discovery-of-
apache-zookeepers-poison-packet/)

[https://tech.knewton.com/blog/2014/12/eureka-shouldnt-use-
zo...](https://tech.knewton.com/blog/2014/12/eureka-shouldnt-use-zookeeper-
service-discovery/)

Here's ibm and netflix with some advice about how non-boring running a ZK
cluster can be:

[https://www-01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/...](https://www-01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.pd.doc/doc/containerstreamszookeeper.html)

[http://techblog.netflix.com/2011/11/introducing-curator-
netf...](http://techblog.netflix.com/2011/11/introducing-curator-netflix-
zookeeper.html)

------
jsmeaton
We use a single monolithic HAProxy to route to all of our applications. We
have about 8 different kinds of applications we host. Of those, we have a few
separate instances of those applications available at similar (but different)
URLS. Think:
`[https://sub.domain.com/<prefix>/appname/path`](https://sub.domain.com/<prefix>/appname/path`).

We route based on `sub`, `prefix`, `appname`, and even sometimes `path`.
Managing the config is literally going into this relatively big file, adding
in ACLs, adding backends, and setting up the routing rules. We manage the file
with puppet, but it's static - not generated.

I feel that it's probably time to start deploying many HAProxy instances,
generating the file per service, and then maintaining a frontend haproxy that
routes according to subdomain.

Can anyone comment on similar deployments? Have you found it easier to manage
many haproxy instances rather than a single monolithic one?

------
ende42
We're using a home baked solution using our own config server
([https://github.com/niko/goconfd](https://github.com/niko/goconfd)) which
evaluates templates POSTed to it. Combined with some trivial shell loops over
a blocking POST and a HAProxy restart this does the job sufficiently well. If
anybody is interrested in the details I'm happy to elaborate.

------
steven2012
I heard AirBNB installs HAProxy on each of their nodes, and then manages the
configuration for each service on each node. Then the service just needs to
access 127.0.0.1 instead. This sounds compelling to me because you eliminate
the single point of failure of the HAProxy goes down.

~~~
thenayr
It's called Smartstack and is also entirely open-source -
[http://nerds.airbnb.com/smartstack-service-discovery-
cloud/](http://nerds.airbnb.com/smartstack-service-discovery-cloud/)

Having been a long-time user, I can tell you the added complexity of every
service having to allocate its own port, then tracking that port across every
other service is a lot of overhead on its own.

Not to mention you also have to run a zookeeper cluster and two additional
daemons on every instance.

All in all it's a good concept, but slightly too complex for my needs.
Recently made the switch back to using internal ELB's for every service and
not looking back.

------
fidget
I'm moving away from HAProxy. The lack of zero downtime config reloads is
consistently causing problems, especially in a high reload environment (i.e.
using bamboo [0]). One team deploys an application that goes into a restart
loop, changing our routing table constantly? Well there goes our 99th
percentile out the window.

That said, would prefer not to be using nginx, so if anyone has any
recommendations for high quality zero-downtime-reloadable HTTP proxies, I'd be
very interested.

[0]:
[https://github.com/QubitProducts/bamboo](https://github.com/QubitProducts/bamboo)

~~~
jolynch
Check out the deep dive I wrote into how you can help prevent HAProxy reloads
from dropping traffic without a large 99% hit over at
[http://engineeringblog.yelp.com/2015/04/true-zero-
downtime-h...](http://engineeringblog.yelp.com/2015/04/true-zero-downtime-
haproxy-reloads.html)

Looks like bamboo encourages folks to use this strategy via a custom reload
command:
[https://github.com/QubitProducts/bamboo/issues/152](https://github.com/QubitProducts/bamboo/issues/152)

~~~
fidget
That's egress traffic :) And yeah,
[https://github.com/QubitProducts/bamboo/issues/143#issuecomm...](https://github.com/QubitProducts/bamboo/issues/143#issuecomment-139306818)
is currently my best idea (with some additional nfq stuff). Doesn't help when
requests span the entirety of the reload though.

~~~
jolynch
I mean, I wouldn't encourage anyone to use HAProxy for microservice load
balancing unless they're running it on every node, at which point clients are
always talking to localhost and everything is egress.

That being said, you can turn ingress into egress with an ifb. A coworker of
mine created a proof of concept for using our strategy with an external facing
load balancer but I don't think he ever tried it in production so I'm not sure
how well it works.

To be fair I'm scared of both the iptables and tc solution on external LBs
because you might never know that you're refusing connections accidentally.
Linux 4.4 is coming with some patches that should help with this and afaik
BSDs have done it right since the start.

------
johnnycarcin
This is exactly why we built out conf-builder
([https://github.com/radiantiq/conf-
builder](https://github.com/radiantiq/conf-builder)). None of the existing
solutions matched what we needed and so far it's worked out pretty well for
us. If you use docker there is also a neat tool called registrator that we've
played around with
[https://github.com/gliderlabs/registrator](https://github.com/gliderlabs/registrator)

------
kkamperschroer
Using consul ([https://www.consul.io/](https://www.consul.io/)) in conjunction
with HAProxy is another great way for service discovery and routing.

~~~
imdsm
Yep, Consul-template + HAProxy is great for a dynamic HAProxy solution.

~~~
magiconair
[https://github.com/eBay/fabio](https://github.com/eBay/fabio) might also be
an option for you.

Disclosure: I'm the author.

~~~
imdsm
That looks interesting. I'll check it out. Thanks!

------
jdubs
If you're using marathon, check out bamboo, it can generate haproxy
configuration files from templates which can be triggered by changes in
marathon. Cool stuff!

------
falsedan
What a non-article! tl;dr you should use run HAproxy on a known host that
forwards to your services.

Why not nginx?

