
Canary analysis: Lessons learned and best practices from Google and Waze - Daviey
https://cloud.google.com/blog/products/devops-sre/canary-analysis-lessons-learned-and-best-practices-from-google-and-waze
======
not_kurt_godel
There are 2 gaps here, at least as they relate to my understanding of what
"canaries" are (based on experience):

1\. Requiring manual validation of canary results is antithetical to CI/CD
principles. If you can't trust your canaries enough to automatically promote
or block based on results, doing manual validation is just a band-aid that
will have to be ripped off painfully eventually.

2\. Canaries should run continuously against all deployment stages, not just
as a one-time approval process in your pipeline. Good canaries give visibility
into baseline metrics at all times, not just when validating a new application
version.

Overall I would say this guide aligns more with what I would term as a non-
CI/CD load-test approval workflow rather than a "canary".

~~~
mrtrustor
Author here.

1\. Yes. This is what the article explains in the "Monitor your new pipeline
before you trust it" section. This is more of a "how to get started", and when
you just created your canary config, you probably don't want to trust it just
yet to push to production.

2\. I'm not sure I understand here. What version of your application is your
canary running if you're not using it to validate a new version? The same as
the baseline? But then, what are you using it for?

~~~
donavanm
For #2 canaries are commonly run against the stable/“production” deployment on
some sort of periodic basis. This is used to approximate a customer experience
and detect faults in underlying components, intermediate infrastructure, or
changes outside of “the software” like configuration data. Its an adjunct or
backstop to metric based anomoly detection. From what Ive seen.

Edit: as an aside theres a very interesting area of discusssion around the
spectrum of integration tests, canaries, & user experience monitoring. If you
change the periods and the sources they seem to blend in to the same outcome.
Ie write your integ tests to cover the ux. Run them continuosly. Associate
results with underlying inputs. Suddenly theyre very much the same thing.

~~~
cavisne
I suspect you are confusing the terms.

What Google/Netflix call a "Canary" other companies call "deploying a single
host/percentage of production traffic with a new version". When other
companies talk about canaries they mean regular tests against production to
detect issues.

~~~
donavanm
Youre correct. I didnt catch the distinction when I first skimmed the article
and parent comment. Part of it is that my active “canary” tests themselves
emit relevant TSD indicative of system performance.

The general concept outlined Id lump in to “approval workflows.” The gradual,
intentional, deployment of mixed versions to the same workload Id call
something like A/B or red/blue version deployments.

~~~
joshuamorton
At least from what I've seen red/green (I've heard blue/green) or A/B
deployments represent a different thing. A blue/green deployment says you have
2 environments, each able to handle all of your traffic. So you have 2x the
servers you need running, and move traffic between environments to upgrade.
Its double buffering, but with binary versions.

The (traffic) canarying process that Google and Netflix use, and that is
described in this article is distinct from that, since you don't need a
significant amount of overhead.

~~~
donavanm
Huh. Nomenclature. FWIW Ive also never heard of A/B being limited to binary or
requiring full N sets of resources. I've only seen it as small subsets of
traffic that is ramped up to some confidence interval. Similarly two
concurrent variants is the simplest and minimal value. But Ive also seen
literally thousands of concurrent variants with enough workload & consumers.
Agree on overhead, as it's essentially a version management + stable routing
problem you dont/shouldnt increase resource requirements.

------
joatmon-snoo
For clarification, "canary" in this article refers to rolling out a new
release to some small subset of production traffic, a la canary release (eg
[https://martinfowler.com/bliki/CanaryRelease.html](https://martinfowler.com/bliki/CanaryRelease.html)).

At least a few other people in the comments are saying that in their
experience, "canaries" are black-box monitoring programs that simulate
critical user journeys. This is not what this article is discussing.

------
btmiller
> Spinnaker is an open-source, continuous delivery system built by Netflix and
> Google

Politely correct me if I'm wrong...isn't Spinnaker originally a Netflix
system, having nothing to do with Google? Unless perhaps the author is
alluding to open source contributions by Google after the tool went open
source?

~~~
svachalek
It is originally a Netflix creation but Google has been a contributor since it
went open source. In particular, the canary analysis features were co-
developed with Google. (I was a contributor on the Netflix side.)

~~~
techcofounder
That is correct. Google played a big role in open-sourcing Spinnaker alongside
Netflix back in Nov 2015.

