Ask HN: How do you keep track of releases/deployments of dozens micro-services? - vladholubiev
======
dhinus
Our apps are made by 5-15 (micro)services. I'm not sure if this approach would
scale to hundreds of services managed by different teams.

We store the source code for all services in subfolders of the same monorepo
(one repo <-> one app). Whenever a change in any service is merged to master,
the CI rebuilds _all_ the services and pushes new Docker images to our Docker
registry. Thanks to Docker layers, if the source code for a service hasn't
changed, the build for that service is super-quick, it just adds a new Docker
tag to the _existing_ Docker image.

Then we use the Git commit hash to deploy _all_ services to the desired
environment. Again, thanks to Docker layers, containers that haven't changed
from the previous tag are recreated instantly because they are cached.

From the CI you can check the latest commit hash that was deployed to any
environment, and you can use that commit hash to reproduce that environment
locally.

Things that I like:

\- the Git commit hash is the single thing you need to know to describe a
deployment, and it maps nicely to the state of the codebase at that Git
commit.

Things that do not always work:

\- if you don't write the Dockerfile in the right way, you end up rebuilding
services that haven't changed --> build time increases

\- containers for services that haven't changed get stopped and recreated -->
short unnecessary downtime, unless you do blue-green

~~~
iamtew
> Whenever a change in any service is merged to master, the CI rebuilds _all_
> the services and pushes new Docker images to our Docker registry.

Why are you rebuilding _all_ the services, wouldn't it make sense to just
rebuild the ones that have changes? You're now rebuilding perfectly working
services without any new changes just because some other service changed, or
am I misunderstanding something here?

~~~
dhinus
Because we want to make sure that in the Docker registry we have _all_
services tagged with the latest commit.

For example you might have a Git history like this:

* 89abcde Fix bug in service_b

* 1234567 Initial commit including service_a and service_b

When 89abcde is pushed, the CI rebuilds both service_a and service_b so we can
simply "deploy 89abcde" and you always have only one hash for all services,
that is also nicely the same hash of the corresponding Git commit.

The trick to avoid rebuilding perfectly working services is to use Docker
layer caching so that when you build service_a (that hasn't changed) Docker
skips all steps and simply adds the new tag to the _existing_ Docker image.
The second build for service_a should take about 1 second.

In our Docker registry we end up with:

service_a:1234567

service_a:89abcde

service_b:1234567

service_b:89abcde

But the two service_a Docker images are _the same image_, with two different
tags.

~~~
segmondy
Why? Microservices are suppose to be truly independent.

~~~
dhinus
For ease of deployment and to solve the problem of "what version of service_b
is compatible with version x of service_a"?

IMHO this makes sense if the microservices are developed by the same team. If
we're talking about services developed and managed by different teams... maybe
it's not a good idea.

------
alex_duf
At the guardian we use [https://github.com/guardian/riff-
raff](https://github.com/guardian/riff-raff)

It takes a build from your build system (typically team city, but not
exclusively) deploys it and record the deployment.

You can then check later what's currently deployed, or what was deployed at
some point in time in order to match it with logs etc.

Not sure how useable it would be outside of our company though.

------
joshribakoff
My experience with micro-services is code-bases that have prematurely adopted
the pattern. Based on this, my advice is as follows...

You can deploy the whole platform and/or refactor to a monolith, and maintain
one change log which is simple.

That however has its own downsides, so you should find a balance. If you're
having trouble keeping track, perhaps re-organize. I read on one HN article
that Amazon had 7k employees before they adopted microservices. The benefits
have to outweigh the costs. Sometimes the solution to the problem is taking a
step back. without more details its hard to say.

So basically one option is refactor [to a monolith] and re-evaluate the split
such that you no longer have this problem. Just throw each repo in a sub-
folder & make that your new mono-repo & go from there, it is worth an
exploratory refactoring, but not a silver bullet.

~~~
zip1234
"Amazon had 7k employees before they adopted microservices"

Sounds like the services were no longer 'micro' :)

~~~
randallsquared
"micro" doesn't refer to level of usage, but level of domain responsibility.

------
vcool07
Something called 'integration testing' that has to be done before the final
build which clearly flags off any compatibility issues between components.

Every component comes with a major/minor release no., which tells about the
nature of change that has gone in. For ex: Major rel is incremented for a
change that usually introduces a new feature/interface. Minor release no are
reserved for bug fixes/optimizations, that are more internal to the component.

The build manager can go through the list of all the delivered fixes and
cherry pick the few which can go to the final build.

------
bootcat
In the company i worked for, they had their own CI/CD system which tracked
information about each service and the systems onto which it has to deploy.
Once it was all configured, it was basically button pushes. Also the system
tracked feedback after deployment to confirm if the build went good or needed
to be fixed - if certain parameters were unwell, basically it did a role back
! Also there were canary deployments to make sure code was deployed only to
portion of systems to make sure it indeed pushed correctly and worked. If not,
they are rolled back !

------
wballard
We’ve been using our own setup for 4 years now.
[https://github.com/wballard/starphleet](https://github.com/wballard/starphleet)

We have 200 services, counting beta and live test variants. Most of the
difficulties vanished once we had declarative versioned control of our service
config in the ‘headquarters’ repository.

Not aware of anyone else using this approach.

------
whistlerbrk
In the past I've used a single repo with all the code which gets pushed
everywhere, and each service only runs it's portion of the code. No guess work
involved, but this may not work for a lot of setups of course. That and your
graceful restart logic has to be slightly more involved.

------
twic
At an old company, we wrote this, er "model driven orchestration framework for
continuous deployment":

[https://github.com/tim-group/orc](https://github.com/tim-group/orc)

Basically, there's a Git repo with files in that specify the desired versions
and states of your apps in each environment (the "configuration management
database").

The tool has a loops which converges an environment on what is written in the
file. It thinks of an app instance as being on a particular version (old or
new), started or stopped (up or down), and in or out of the load balancer
pool, and knows which transitions are allowed, eg:

    
    
      (old, up, in) -> (old, up, out) - ok
      (old, up, out) -> (old, up, in) - no! don't put the old version in the pool!
    
      (old, up, out) -> (old, down, out) - ok
      (old, up, in) -> (old, down, in) - no! don't kill an app that's in the pool!
    
      (old, down, out) -> (new, down, out) - ok
      (old, up, out) -> (new, up, out) - no! don't upgrade an app while it's running!
    

Based on those rules, it plans a series of transitions from the current state
to the desired state. You can model state space as a cube, where the three
axes of space correspond to the three aspects of the state, vertices are
states, and edges are transitions, some allowed, some not. Planning the
transitions is then route-finding across the cube. When i realised this, i
made a little origami cube to illustrate it, and started waving it at
everyone. My colleagues thought i'd gone mad.

You need one non-cubic rule: there must be at least one instance in the load
balancer at any time. In practice, you can just run the loop against each
instance serially, so that you only ever bring down one at a time.

This process is safe, because if the tool dies, it can just start the loop
again, look at the current state, and plan again. It's also safe to run at any
time - if the environment is in the desired state, it's a no-op, and if it
isn't, it gets repaired.

To upgrade an environment, you just change what's in the file, and run the
loop.

------
perlgeek
We have separate repos for each service, and use
[https://gocd.org/](https://gocd.org/) to build, test and deploy each
separately. But, you could also configure it to only trigger builds from
changes in certain directories. There is a single pipeline template from which
all pipelines are instantiated.

Independent deployments are one of the key advantages of microservices. If you
don't use that feature, why use microservices at all? Just for scalability? Or
because it was the default choice?

------
underyx
We wrote [https://github.com/kiwicom/crane](https://github.com/kiwicom/crane)
which posts and updates a nicely formatted Slack message with the status of
releases. It also posts release events to Datadog (in a version we're
publishing soon) and to an API that records them in a Postgres DB we keep for
analytics queries.

------
drdrey
[https://www.spinnaker.io/](https://www.spinnaker.io/)

Full disclosure: I'm on the Spinnaker team

------
mickeyben
What do you mean by keep track? Do you want to be aware of deployments?

A Slack notification could do it. Or do you want to correlate deployments with
other metrics?

In this case we instrument our deployments into our monitoring stack
(influxdb/grafana) and use this as annotations for the rest of our monitoring.

We can also graph the number of releases per project on different aggregates.

~~~
tfjaeckel
I think Slack notifications are really nice to see what's going on right now
but not so great to see the state of dozens of service, i.e. what version is
deployed to what environment.

Then there is the issue of linking the Git release/tag with the corresponding
changes, say from a ticketing system such as Jira. That can be helpful to
communicate changes to other people within the organization and to users.

How do you define dependencies for releasing new versions to service? Likely
going to happen at some point when you have non-trivial changes to services.

~~~
mickeyben
> I think Slack notifications are really nice to see what's going on right now
> but not so great to see the state of dozens of service

Completely agree, that's why we instrument our releases so we can easily see
what's deployed by service and environment.

> Then there is the issue of linking the Git release/tag with the
> corresponding changes, say from a ticketing system such as Jira. That can be
> helpful to communicate changes to other people within the organization and
> to users.

Each commit is related to a ticket, helps generate a changelog. We enforce a
lot of things in each of our release. We have an internal release tool heavily
inspired by shipit from Shopify. We have the concept of soft/hard checker to
make sure it won't break or that you aware of what could break with the
current diff.

> How do you define dependencies for releasing new versions to service? Likely
> going to happen at some point when you have non-trivial changes to services.

As I said we instrument our releases and can easily track how changes affects
our performance/bugs.

We also try a lot not to release non-trivial changes in one big release by
doing stuff like release part of the changes behind a feature flipper first or
route only a part of the traffic to the new code path, ...

Then we don't have dozens of different services deployed and we're still a
relatively small team (~20) so I'm pretty sure I don't have the full picture
just yet :)

~~~
tfjaeckel
Thanks for adding more color to your original answer.

I like you enforce the commit/ticket relationship. Is this purely an agreed
process or do you use other measures to keep things consistent? E.g. we
typically add the ticket ref to each commit but at times that gets omitted.

Also, I think that (internal) release tool is something crucial as the team
grows. Will check shipit a bit further.

Would you mind expanding a bit on the things you enforce for each of your
releases?

~~~
mickeyben
Sure my pleasure.

> I like you enforce the commit/ticket relationship. Is this purely an agreed
> process or do you use other measures to keep things consistent? E.g. we
> typically add the ticket ref to each commit but at times that gets omitted.

We're not enforcing it but we might in the future if the team grows and this
gets out of hands. At the moment we're just reminding people that they should
and it works great so far.

> Would you mind expanding a bit on the things you enforce for each of your
> releases?

It's still early but so far we check:

\- it's not friday afternoon, we want to avoid as much as possible to have
issues on the weekend

\- it's not out of office hour - we're still all on the same time zone

\- there's no lock (we can lock the release in case something goes wrong)

\- there's no schema migration. If there is we remind you how to safely
migrate schema and who to ping if you have a doubt (usually it should have
been caught at the PR review)

\- there's someone from the ops/core team around (connected on slack)

\- that there's no translations missing for our main languages
(french/english)

\- + we do a few sanity checks like that our master staging is healthy
(release means promoting our master staging)

edit: also I forgot but this is the shipit I'm talking about
[https://github.com/Shopify/shipit-engine](https://github.com/Shopify/shipit-
engine)

~~~
tfjaeckel
Thank you!

------
geocar
Service discovery contains all the versions and who should be directed at
what.

We also store stats in the service discovery app so versions can be promoted
to "production" for a customer once the account management team has reviewed
and updated their internal training.

------
_drFaust
Got about 80+ services. One repo per service, each service has it's own
kubernetes yaml that details the services deploys to the cluster. K8s has a
huge ecosystem for monitoring, versioning, health, autoscaling and discovery.
On top of that, each repo has a separate slack channel that receives
notifications for repo changes, comments, deployments, container builds,
datadog monitoring events, etc. There are also core maintainers per repo to
maintain consistency.

For anyone that has begun the microservice journey, kubernetes can be
intimidating but way worth it. Our original microservice infrastructure was
rolled way before k8s and it's just night and day to work with now, the
kubernetes team has thought of just about every edge case.

------
discordianfish
Keep track as having a version controlled state of all revisions/versions
deployed? That's something I would be interested in solutions to too,
especially in a kubernetes environment with CI.

I could probably snapshot the kubernetes state to have an trail I can use to
rollback to a point in time. Alternatively I thought about having CI
updatemanifests in an integration repo and deploy from there, so that every
change to the cluster is reflected by a commit in this repository.

------
chillydawg
We built a small internal service that receives updates from the build &
deployment scripts we run which then presents us with a html page that shows
what branch & commit of everything is deployed (along with the branch and
commit of every dependency) where, when and by who. It's totally insecure so
it can be trivially spoofed, but it's our V1 for our fleet of golang services
and it works well.

------
ukoki
Have a CI/CD pipeline that does the following:

\- unit tests each service

\- all services fan-in to a job that builds a giant tar file of source/code
artefacts. This includes a metadata file that lists service versions or commit
hashes

\- this "candidate release" is deployed to a staging environment for automated
system/acceptance testing

\- it is then optionally deployed to prod once the acceptance tests have
passed

------
char_pointer
[https://github.com/ankyra/escape](https://github.com/ankyra/escape)
(disclaimer: I'm one of the authors)

We use Escape to version and deploy our microservices across environments and
even relate it to the underlying infrastructure code so we can deploy our
whole platform as a single unit if needs be.

~~~
char_pointer
Just to add: I've worked on pipelines like this for dozens of clients and I'd
be happy to talk more in-depth about your options, as business requirements do
tend to influence your delivery pipeline a lot. Email is in my profile if
you're interested.

------
nhumrich
We use gitlab CI for pipelines which is great. You can figure out when
everything was deployed last etc. We even built our own dashboard using gitlab
api that shows all the latest deploys, just so its easier to track down what
was recently deployed if we are investigating issues.

~~~
askz
Did you opensourced it? I'd love to use that too :)

~~~
nhumrich
No. Its currently very specific to our org. But i'll consider making it more
generic and open sourcing it now.

------
ecesena
Maybe I'm misunderstanding the question, but you may want to have a look at
Envoy: [https://www.envoyproxy.io](https://www.envoyproxy.io)

------
invisible
We use Jenkins for releases, kubernetes for deployments if I understand the
question correctly. We’d like to use something like linkerd to simplify
finding dependencies.

------
lfalcao
[https://github.com/zendesk/samson](https://github.com/zendesk/samson)

------
brango
Master=stable and in prod, non-master branches=dev & staging. Jenkins deploys
automatically on git commits.

------
hguhghuff
In what technical environment? More info needed.

