My experience might be a little different due to the nature of the environment I work in, but I strongly disagree with the feature flag segment.
I run the DevOps team for a large CDN, and am the lead developer of our canary system. We deploy a wide variety of software to a number of different systems, and can have anywhere from 20-50 simultaneous canaries deploying code to 30,000+ servers. With that many servers and canaries going, it can be easy to run into conflicts when you want to deploy a new version of a piece of software while another canary is still deploying the previous version. Canaries often need to go slowly enough to get full day-night cycles in a number of global regions (traffic patterns are not the same everywhere).
The way we get around this is through the heavy use of feature flags, and the idea of having all feature flags enabled by default is the opposite of our strategy.
My old boss used a 'runway' analogy for canaries; you have to wait for the previous release to finish 'taking off' (be deployed) before you can release the next version. So we need to deploy quickly so you don't 'hog the runway'
So it seems we have two disparate goals; we want to let canaries release slowly to see full traffic cycles, but need to release quickly to get off the runway.
We solve this problem by having the default behavior for any new feature be disabled when releasing the code. This allows us to deploy quickly; once you verify in your canary that your new code is not being executed (i.e. your feature flag is working), you can deploy quite quickly. Your code should be a no-op with the feature flag disabled.
Once your code is released, THEN you can start the canary for enabling your feature flag. Feature flags (and in fact, all configuration choices) are first class citizens in our canary system; you can canary anything you want, and you get all the support of a slow controlled release, with metrics and A-B comparisons.
Since the slower canary is simply enabling a feature, other people can keep releasing new versions of code without interfering with your feature flag canary. Now, you can have confounding issues with multiple people making canaries on the same systems, but our tooling allows you to disambiguate which canary is causing issues we find.
Thanks so much for this comment. Do you have a single person tracking the state of all the canaries, or are developers responsible for their own release cycles, including tracking any canaries that might interact with their own?
Developers are responsible for their own releases. We have a lot of tooling around our canaries that helps keep track of everything. A lot of stuff happens via chat; we have a chatbot that creates a channel for every canary, and invites everyone that has any code in the release (and interested parties that want to know whenever code is released to a certain platform). It tells the room whenever there are conflicts, and will say information about the canary as it progresses. You can also control the canary, advancing or reverting via commands in the channel.
Whoever is advancing the canary is responsible for checking the A-B graphs for any problems, and verifying that everything is working before advancing. The system will tell you if it sees issues, but it is up to the person who wrote the code to make sure things are good.
I run the DevOps team for a large CDN, and am the lead developer of our canary system. We deploy a wide variety of software to a number of different systems, and can have anywhere from 20-50 simultaneous canaries deploying code to 30,000+ servers. With that many servers and canaries going, it can be easy to run into conflicts when you want to deploy a new version of a piece of software while another canary is still deploying the previous version. Canaries often need to go slowly enough to get full day-night cycles in a number of global regions (traffic patterns are not the same everywhere).
The way we get around this is through the heavy use of feature flags, and the idea of having all feature flags enabled by default is the opposite of our strategy.
My old boss used a 'runway' analogy for canaries; you have to wait for the previous release to finish 'taking off' (be deployed) before you can release the next version. So we need to deploy quickly so you don't 'hog the runway'
So it seems we have two disparate goals; we want to let canaries release slowly to see full traffic cycles, but need to release quickly to get off the runway.
We solve this problem by having the default behavior for any new feature be disabled when releasing the code. This allows us to deploy quickly; once you verify in your canary that your new code is not being executed (i.e. your feature flag is working), you can deploy quite quickly. Your code should be a no-op with the feature flag disabled.
Once your code is released, THEN you can start the canary for enabling your feature flag. Feature flags (and in fact, all configuration choices) are first class citizens in our canary system; you can canary anything you want, and you get all the support of a slow controlled release, with metrics and A-B comparisons.
Since the slower canary is simply enabling a feature, other people can keep releasing new versions of code without interfering with your feature flag canary. Now, you can have confounding issues with multiple people making canaries on the same systems, but our tooling allows you to disambiguate which canary is causing issues we find.