

Introducing SmartStack: Service Discovery in the Cloud - _harry
http://nerds.airbnb.com/smartstack-service-discovery-cloud/

======
chris_va
I'm glad to see more cluster management software getting open sourced, and
this is sort of on the right track.

However, looking at the design, this still has a long way to go. There are a
lot of failure modes you guys haven't encountered yet, which will result in a
few design tweaks. For example, what happens if your health checkers decide to
start reporting garbage data (e.g. maybe they are too overloaded to properly
perform health checks)? Or when you have a query of death being issued? Also,
things like traffic sloshing can very quickly build resonant failures in a
system like this.

(Source: many years working on Google infrastructure, including causing
outages related to load balancing code)

~~~
igor47
What is traffic sloshing?

Good point on garbage data reporting; we do basic validation in synapse, here:
[https://github.com/airbnb/synapse/blob/master/lib/synapse/se...](https://github.com/airbnb/synapse/blob/master/lib/synapse/service_watcher/zookeeper.rb#L97)

We could probably do more there to ensure valid names, IPs and ports (matching
against a regex should do it). Also, because of the built-in health checking
in haproxy, just the presence of some invalid name in the list of machines
doesn't mean that we're going to try to start sending traffic there.

~~~
chris_va
Traffic sloshing (basic overview): Say you have a pool of machines for a
service (traditionally this problem is multi-regional, though it technically
can happen at any scale). For some reason (machine restart, query of death,
reloading, etc) a subset of your backends become unhealthy. This gets
automatically detected by your framework, and the traffic gets routed to
different machines. Now, you may have under-provisioned your backends (or you
have a query of death), so this concentration of traffic on a smaller number
of machines causes them to choke. You get a seesaw effect of traffic going
around to the different backends, taking them out like a concentrated
firehose. These failures all get detected by your framework, which routes
traffic away from the backends. What you really wanted was a steady stream to
all backends. A lot of load balancing systems have this failure mode. The good
ones can detect it and converge back to a good steady state. The naive ones
just keep the firehose spinning. It is harder to fall into this trap with
simple binary health checking. It becomes a lot easier when you do traffic
allocation by latency, or have more complicated health criteria that is easier
to fail.

On the health checking/garbage data front: It's usually more of a problem when
something misreports a bad backend, rather than misreporting a good backend.
The latter is easy to catch (as you mention, haproxy does it). The former is
hard because one misbehaved health-checker can suddenly unload all of your
services.

~~~
spc476
That happened at work a few weeks ago---basically, one of our components did a
health check of of DNS when the health check DNS record was mistakenly
deleted. Because of the way the health check code was written, that component
thought the DNS servers were down and shut down. That in turn, shut down other
components that were health checking that component. Boom! Instant virtual
dominoes.

------
igor47
Hi guys! I'm one of the primary authors of SmartStack. Happy to answer any
questions that aren't covered in the blog post.

We're also doing a Tech Talk on SmartStack today at Airbnb HQ; stop by if
you're in SF: [https://www.airbnb.com/meetups/33925h2sx-tech-talk-
smartstac...](https://www.airbnb.com/meetups/33925h2sx-tech-talk-smartstack-
scaling-airbnb)

~~~
gregwebs
As a thought experiment, what do you think of trying to make this completely
de-centralized? Each service has its own synapse, and the services form a
nerve for that service. They communicate status directly to the app servers,
which form their own nerve.

Boot-strapping would of course be a problem, so you would still need a
centralized nervous system for that, but you could survive its failure.

~~~
igor47
i think that another project that aims to address this was announced same day
as smartstack: [http://www.serfdom.io/](http://www.serfdom.io/)

i haven't read much beyond the splash page, but hashicorp generally does great
work.

------
hkarthik
Great post. This is interesting due to similar discussions we're having at
work about moving from a monolithic Rails app architecture to an SOA.

I'm curious though, what does the local developer environment look like when
you run an SOA of this complexity? Does everyone needs to run a series of
Vagrant VMs/Docker containers to have a fully functional local version of the
application running?

~~~
grosskur
One approach is stubbing out the services you don't need. This article by a
Heroku engineer describes how to do this at the Rack level:

[https://brandur.org/service-stubs](https://brandur.org/service-stubs)

Another approach is creating a set of shared services that developers can use
rather than deploying their own instances. This article by a LinkedIn engineer
describes their internal Quick Deploy system:

[http://engineering.linkedin.com/developer-
productivity/quick...](http://engineering.linkedin.com/developer-
productivity/quick-deploy-distributed-systems-approach-developer-productivity)

Personally, though, I'd go the route of creating a self-contained Vagrant
setup if possible. This helps folks become familiar with the entire system and
fix bugs anywhere in it, rather than drawing strict lines of ownership around
specific services.

~~~
hkarthik
yeah we've used stubs in the past and it's often awkward and creates silos
within the team. I'm not a huge fan of this approach.

------
ceejayoz
> On AWS, you might be tempted to use ELB, but ELBs are terrible at internal
> load balancing because they only have public IPs.

Yet another reason to run in a VPC, which includes internal-only ELBs as a
very useful feature.

------
druiid
Cool stuff I think though that much of this can be handled with other ways of
doing things (although obviously there is never one right way of doing these
kinds of things). This application kit is one way of orchestrating
service/server discovery. Another way, which I have implemented personally is
to use a combination of mcollective and puppet (with puppet facts enabled).
This allows you to defined roles for specific systems and run tasks against
servers of that specific role type, keep track of which servers are that role
type, connect them to a 'central' load-balancer and many other things.

This serves to solve most of the issues that this toolkit provides for, but
likely would not be the good option for everyone. Just some info on at least
one other way to deal with this stuff!

~~~
mLewisLogic
Having a central load balancer is going to turn into a nightmare once you
start managing a reasonable number of servers. Hardware goes bad (especially
in the cloud), and having a single point of failure leaves you at it's mercy.

~~~
druiid
A load balancer should never be a single point of failure. You should always
have multiples.

Also, if the response to this is then 'but it's still a central point of
failure', they haven't really removed that in this solution. If the zookeeper
cluster dies you lose everything.

Generally if a clustered load-balancer dies and another takes over there's a
half second or a couple seconds of transition, but you're back up and running
with a very simple architecture. If all of your load-balancers die, something
much bigger to worry about is going on.

Really all this does is place the failure mode into a higher-level service
with a much greater potential for failure. Zookeeper even makes you specify
the size of your cluster and in my experience it's difficult to live update
this. I've read they're working on that, but still.

Clustered load-balancers (using Pacemaker/Corosync, keepalived or similar) are
very well understood these days. Pacemaker/Corosync can even run within EC2
now, since a couple years ago they added unicast support thus obviating the
multicast issues present within EC2.

Additionally, if we want to talk about load then a well configured
haproxy/Nginx load-balancer can handle hundreds of thousands of connections a
second. If your installation needs more than this then I'm certain you could
get a layer to distribute the load-balancing between a set. Obviously another
problem to introduce, but still not one you'll reach until you probably have
even more traffic than airbnb gets.

~~~
vzctl
by the way keepalived >= 1.2.8 also supports vrrp over unicast

~~~
druiid
Indeed it does! I use it for less complicated fail-over situations myself, but
if you start to need more complicated topologies (or something which has good
interaction with IPVS/LVS) then Corosync!

------
devonbleak
ELB will actually do internal load balancing in a VPC using your own custom
security groups. Doesn't help if you're not in a VPC, but nowadays the default
is for everything to go in a VPC.

~~~
igor47
Unfortunately, Airbnb is a pretty legacy AWS client; when we started using it,
VPC was a much less prominent feature. Migrating to a VPC is a very non-
trivial process, especially if you have RDS instances outside the VPC

~~~
devonbleak
I'm in the same boat with most of my AWS workload, was created before IGW was
available and keep having the same discussion about migrating :-)

------
eliwjones
I don't know... sounds like they just shifted the 'single point of failure' to
the Zookeeper cluster. Is it somehow sexier to have your SPOF be things
running Zookeeper instead of things running loadbalancer or DNS software?:

"The achilles heel of SmartStack is Zookeeper. Currently, the failure of our
Zookeeper cluster will take out our entire infrastructure. Also, because of
edge cases in the zk library we use, we may not even be handling the failure
of a single ZK node properly at the moment."

~~~
threeseed
If your load balancer/DNS goes down then every service won't be able to find
other services.

With this approach everything will still work.

~~~
eliwjones
According to the quote I provided from the article, failure of the Zookeeper
cluster will bring down their entire architecture.

------
mattj
I've been hearing good stuff about this for a while - it's awesome to see it
open sourced! Setting up a local haproxy to handle service failover /
discovery is a really clever solution, and is an awesome approach to
encapsulating a bunch of really messy logic.

------
nl
So.. SOA.

Everyone is doing it, but I never see details about how the services are wired
together.

Do people use ESBs, or directly wire up services to each other?

~~~
threeseed
Services are generally using HTTP/REST these days.

~~~
nl
Yes. Most (all?) ESBs support (and indeed encourage) REST.

I'm pretty sure there has never been an ESB that doesn't support HTTP.

My question is if people are actually using them to wire system together
(outside corporate environments, where considerations are different).

------
BaconJuice
Can someone explain to me what this really is? I don't follow and yes I read
the intro.

~~~
devonbleak
With a service-oriented architecture (SOA) you basically break your
application into distinct services that run across different machines. But you
also need a way to find and connect to those services (service discovery).
This provides a way of doing that using ZooKeeper and local HAProxy processes.

~~~
superbaconman
So I register a new service (or process) to run on metal using Chef. Then Chef
installs the service and updates the nerve config running on that metal
device. Nerve then keeps a central Zookeeper updated with the status of its
local services? Synapse checks for available services in zookeeper that your
app may need to use, and then configures a local load balancer for any
requests you make?

~~~
igor47
That's the idea! If you're using Chef, check out the cookbook for SmartStack;
you keep a small hash of configuration information per service, and we take
care of the rest. [https://github.com/airbnb/smartstack-
cookbook](https://github.com/airbnb/smartstack-cookbook)

