
Real World Microservices: When Services Stop Playing Well and Start Getting Real - adamnemecek
https://blog.buoyant.io/2016/05/04/real-world-microservices-when-services-stop-playing-well-and-start-getting-real/
======
sakopov
From my own experience building a platform of products on top of
microservices, I find that the most difficult part about this architecture
(and the one that nobody ever talks about) is how to share data between
microservices via message bus instead of direct requests. If you can get this
down the rest, in my opinion, is just pure bliss and a piece of cake.

~~~
devonkim
Message buses and queues are far, far easier to manage than API endpoints
across a bunch of loci of control, so this may be why it's not discussed as
much as, say, how the lifecycle of REST services work. Topics and queues with
different segmentation features are much more powerful and fine-grained in
control compared to a service that's tied to HTTP transport than, say, a REST
or SOAP API. For object serialization / marshaling, you can use a lot more
stuff now all of a sudden like Protocol Buffers, Capn Proto, Thrift, Avro. The
recommended advice comes down to what your bus / queue is (each one offers
vastly different best practices and features to implement them) as well as how
you want to scale out and up. But perhaps a general set of terminologies /
jargon may be worth pursuing as a community for the problem.

The question I really have is how people have managed to upgrade their message
bus and to keep availability high with rollback contingencies while upgrading,
say, RabbitMQ or Kafka across all your connected services. I really don't hear
about that as much as how to do rolling restarts / upgrades of nodes for a
service endpoint using some feedback from monitoring and metrics.

~~~
rando289
> Topics and queues with different segmentation features are much more
> powerful and fine-grained in control compared to a service that's tied to
> HTTP transport than, say, a REST or SOAP API.

"than" and beyond is an illogical statement. Like Y is Z than X.

~~~
devonkim
Funky parse as I wrote it out, but you're correct. Despite the problem, I
think the point being made is understandable. You can replace "than" with
"like."

------
ninkendo
Interesting article. It was initially confusing to me because I wasn't able to
find what's unique about this approach, and how it's different from the
typical Ambassador pattern that I'm more familiar with. That is, what does the
word "Routing" even mean in this context?

I think the answer is that he's doing essentially layer-7 routing, using the
HTTP path (and verb?) to decide what backends to route to, rather than doing
it on a per-port basis (which is necessary to support non-HTTP services.)

Implementations I've run into before seem to fall into a few categories:

\- A service declares that it needs to talk to, say, 3 other services. Each of
these upstream services is assigned a unique client-side port, and an
ambassador proxy is launched alongside each instance, which exposes those 3
ports, routing each to the corresponding 3 backend services. To keep the ports
out of the source code, typically environment variables are automatically
assigned, so you just talk to ie. "my-ambassador:${UPSTREAM_PORT_NAME}"

\- A service is responsible for using a service discovery layer like zookeeper
or etcd to find backends on its own. This is important for things that use
raft/paxos/gossip, where blind routing isn't enough, you actually need to keep
track of peer instances (although in that case, the service discovery layer is
only used for initial discovery).

\- Service discovery is done with just plain DNS on well-known ports (or even
SRV records if you're lucky enough to have client software that can tolerate
them), and you just hope the right thing happens. This can be accomplished
with things like SkyDNS on top of etcd... a surprising amount of flexibility
can be accomplished by putting logical things right in a hostname.

This approach seems to be like a more opinionated version of the first
approach, but instead of ports, it uses HTTP routes, which is definitely more
flexible, but only works with HTTP. The routes can contain enough information
to route more intelligently than just enumerating your dependency services
ahead of time and getting static port assignments.

It's almost like an API gateway distributed as an ambassador to each service.
At least, it looks a lot _like_ an API gateway, in that it presents the whole
mesh of services underneath in a single URI namespace, using HTTP to route
requests accordingly and handling things like TLS and potentially
authentication. It's definitely an interesting approach, and something I'm
curious to try.

~~~
lobster_johnson
You'll want to read the whole pitch about Linkerd [1], which the article seems
to assume the reader has read. In short, it's a sidecar proxy that performs
all the functions needed to glue apps together. This allows apps to be
simpler; they can implement a minimal interface and rely on simple protocols
such as HTTP, with no knowledge of how to reach their remote partners. Linkerd
implements a bunch of techniques such as health checks, load balancing and
circuit breakers.

[1] [https://linkerd.io](https://linkerd.io)

~~~
ninkendo
Well, linkerd was what the article was about, so it's what I was referring to.
A "sidecar" sounds like the more established term for what I was calling an
ambassador proxy, so it's nice to learn that that's what people call this
pattern.

Your explanation sounds a bit "no duh" though, since I can't imagine any way
of implementing this pattern that _doesn 't_ abstract away knowledge of how to
reach remote partners, perform health checks, load balancing, retry logic,
etc. It's generally what the term "proxy" implies. Or am I missing something?

Edit: I think my confusion is caused by my experience with running vanilla
Mesos (not Marathon or DCOS, but our own custom frameworks), where it's
expected that you just have to implement this stuff yourself. I never even
thought of open sourcing or productizing what I've written, because I kind of
thought that everybody else must be rolling their own quick solutions for
these problems too? I admit I'm probably the one with a messed up worldview
though.

I think the mesos community is still in the phase of picking the winners for
best practices, and hopefully one day it'll be obvious that you use services
like linkerd and don't roll your own solution.

~~~
lobster_johnson
Apologies, I didn't mean to sound condescending. I think you got it.

I just wanted to point you to the official explanation, since you seemed to be
asking for clarification as to Linkerd's purpose.

A proxy can be a lot of things, of course. Linkerd is explicitly designed to
route RPC, not just any web traffic. Linkerd is designed to run close to your
app and offload all the routing intelligence.

For example, it seems quite common these days to build many types of service
discovery and so on into the app itself: App wants to find another
microservice, so it looks up the target host in Etcd or ZooKeeper or whatever,
then talks to it, handling retrying and load-balancing and so on. If you're
using a DNS solution like SkyDNS or Consul, then the app is isolated from the
lookup mechanism, but you're still talking directly to your peer. The
opposite, older trend is to use something like HAProxy to handle the routing,
but HAProxy was designed for a fairly static set of routing targets.

Linkerd is a bit of a middle way: Make the app stupid and put all the
operational intelligence in an external process that isn't actually your app,
but still sort of behaves like it is. Linkerd is designed to be dynamically
configured to support all the kinds of glue you can think of (ZooKeeper,
Kubernetes and so on) so that no app changes are needed to support different
routing schemes.

~~~
ninkendo
Yup, makes total sense.

I think I'm just jealous that they're getting attention for solving something
that I solved too, but didn't even think to release it because I thought
everybody else was just like me.

My approach involves running a sidecar that registers a set of etcd watchers,
and when upstream services move around, it passes the list of backends through
a config file template (using Go's templating syntax), and runs a configurable
command.

Meanwhile, other services' sidecars are health checking them, and keeping them
in etcd only as they are healthy.

We wire that up to have the watcher process in the downstream sidecar, which
rewrites an haproxy config with the list of upstream backends, and triggers an
haproxy restart when it changes the config file.

And then, that whole thing is wrapped in a declarative syntax that lets you
say "here's the things I want to talk to", and it knows how to find them in
etcd, how to construct the haproxy template, and how to restart haproxy, and
puts the whole thing in a docker container that links to your container.

And then we wrap all _that_ in a web UI (and CLI) that lets you say "I want my
service to talk to that service" and all the things happen for you.

Looking back, it shouldn't surprise me that this level of effort is something
people would want an off-the-shelf solution for. But to be fair, I started
doing this a few years back so things like DCOS and Marathon and confd and
linkerd and namerd didn't exist then. :-D

------
qaq
At times it seems the only thing we are doing with microservices is shifting
complexity into different areas.

~~~
chuhnk
Microservices is about tradeoffs. Many people will reiterate this over the
coming years and attempt to make it clear its not the be all and end all of
software engineering. Before it had this name it was known by many others. It
is in fact simply distributed systems.

The logical evolution of ones architecture when scaling from zero to orders of
magnitude beyond is to eventually split out functionality so it can be worked
on and scaled independently. That's it, that's all its about. And over time
you find this becomes the pattern that helps stabilise and speedup an
organisation as the number of people increase.

Most of the time it's an organic process. I would never tell anyone to start
with microservices but to merely keep these ideas in the back of their mind
and pick tools that simplify the process of moving to distributed systems
later.

~~~
kpil
True. No silver bullet this time either...

But right now it's a cool thing, since google is doing it. And Netflix, or
whatever. Organisations are about to waste crazy amount of money on
microservification of "monolithic" applications that are perfectly fine,
because they think it will solve all of their problems.

Microservices are not solving complexity, it is adding complexity in order to
solve problems related to scalability and size, and you need to be _good_ to
pull it off. If you are working in a half-arsed inhouse development department
in a medium sized company in a uninteresting business, chances are that the
organisation (as a whole) is not good enough even if there are some smart
people around, and you should be grateful that the RDBMS is there and rolls
back problems caused by the shitty code that keeps piling up in your git
repository.

I started working for about 20 years ago with distributed systems in the
telecom industry, where we had to scale telephony services over a lot of
machines, both vertically and horizontally. We had key-value datastores,
services, application logic, and interfaces as small programs running in it's
own machine. And it was bloody hard to handle transactions, routing, fail-
overs and generally getting the right balance between calling a service or
doing it locally. We did web applications like that too actually - using cgi,
a custom html template format and the crappiest script language ever
imaginable and it kind of worked too.

It was rather ahead of it's time, much thanks to a few smart guys that were
trying to solve the rather hard problem of using cheap hardware for providing
very reliable telecom services. It never was "very" reliable, but it was ok.
At least on sunny days.

It scaled up to a point. The networks weren't that fast, and the sheer
complexity of it all was a limiting factor. We had to create tools for
generating the configuration and at one point I found out that we had more
than 100.000 lines of configuration in a moderately large system, which
explained why it took a while to assemble these systems by hand.

But as time went, the hardware was catching up and you could more or less put
everything in one box. And we did. And we started to use RDBMs because we
could really need the power of a flexible and transactional database manager,
and it was a bliss to write a large monolithic application for user
provisioning and administration and not needing to handle every shitty little
detail by yourself.

I am very reluctant to going back to key-value stores and really small
services, unless we really need the speed. It comes with a heavy cost. (Even
though you get more for free now, as more people is doing it.)

The best trade off in my opinion is to build as large systems as possible
while it still handles the load and it is still possible to work on
efficiently with a few teams. When it's hard to keep up with what is happening
and how everything works, it might be better to split the system into smaller
systems and (semi) independent teams. The key is the independent team, real
devops, and a hard focus on making deployments fast and with no or almost no
downtime.

Unfortunately, where I work now, the major limiting factor of our velocity is
our dependencies to other teams and systems. If we can build something on our
own we can typically do a reasonable feature in between 3 days to 3 weeks. If
we are depending on development by another teams changes in other systems, it
minimally takes 3 weeks.

And the problem grows exponentially. If 3 systems are involved, it will take
us 3 months or more, and failure is always an option.

I don't see how more services will solve that rather hard organisational
problem, but that is apparently what some people think will happen,
effortlessly, just because technology.

------
markbnj
Love the use of the header for per-request overrides to the routing table.
I've rolled my own solutions using haproxy as a k8s service a couple of times
now. Really looking to move to something more packaged that handles some of
these use cases. Will be giving linkerd and namerd a try.

------
jeevand
I am new to Microservices, any good place to learn about major patterns/anti-
patterns?

~~~
mac01021
This is just a 40 minute lecture, but you might find the ideas inside helpful.

It talks about anti-patterns that lead to the construction of a "distributed
monolith" and how to avoid them.

[http://www.microservices.com/ben-christensen-do-not-
build-a-...](http://www.microservices.com/ben-christensen-do-not-build-a-
distributed-monolith)

------
zaroth
> When Twitter moved to microservices, it had to expend hundreds (thousands?)
> of staff-years just to reclaim operability.

That's both impressive and not surprising. Orchestration is just really hard.
Big systems that are supposed to behave rationally and be easy to upgrade and
debug... it seems like such a meager request at first.

------
sheeshkebab
Interesting project. It would be great if something more out of the box was
available in AWS - or a project that could package AWS services for a similar
pattern (Route53/elb etc).

It's cool to run your own proxy, however doing that at production level is far
from simple, at least with smaller teams.

------
raarts
I don't understand. You either deploy this as a sidecar proxy, in which case
you need to reconfigure a lot of them, or you deploy them centrally
introducing a spof?

