
How to use feature flags without technical debt - sboak
http://blog.launchdarkly.com/how-to-use-feature-flags-without-technical-debt/
======
barrkel
Hmm. This is a different kind of feature flag than I've used, to solve a
different kind of problem.

If the feature you're writing takes several man years of effort, you can't
have a feature branch living for several months; continuously keeping it up to
date with the trunk is expensive and easy to procrastinate.

Migrations are expensive and you want to front load them to make turning the
feature on less stressful. And you may want to let customers use the feature
on a beta basis for a few months before committing to it, and then it may take
a year or more before all customers have moved.

For a big feature that cuts across large segments of a big app, I don't think
there's an alternative to if statements.

Different apps, different business models, etc.

------
ori_b
This misses the problem -- often, the new feature is buggy in ways that are
not seen in the initial testing. In critical systems, it's often desirable to
keep the ability to flip back to the old code around for a couple of release
cycles. In a previous life, that has saved my team's bacon.

This comes at a cost, and somehow saying "just delete the flags promptly" is a
facile solution; If it was easy to just delete them quickly, it would be even
easier to just land the change without the flag, and use rollbacks as your
'undo a bad feature' hammer.

~~~
gioele
> This misses the problem -- often, the new feature is buggy in ways that are
> not seen in the initial testing.

Indeed. The only good way to enable features gradually is to use a tool like
GitHub Scientist [0] that exercises the new code path and records its effects
but then uses the effects of the old code path in production. This allows
weird edge cases to be found and dealt with before enabling the new feature.

[0]
[http://githubengineering.com/scientist/](http://githubengineering.com/scientist/)
Previous discussion:
[https://news.ycombinator.com/item?id=11027581](https://news.ycombinator.com/item?id=11027581)

~~~
pkaeding
Yeah, I actually wrote about something similar recently (in the context of a
database migration): [http://blog.launchdarkly.com/feature-flagging-to-
mitigate-ri...](http://blog.launchdarkly.com/feature-flagging-to-mitigate-
risk-in-database-migration/)

------
aaron695
To clarify...

This is to roll out a feature to a small amount of customers for inital live
beta testing before rolling out to all customers?

If so i think this is good. It is documenting the real issue of complexity of
comming back to old code and old problems (old being weeks), even if you don't
merge it cause of merge hell. At least you have a document of what to do.

And you are culling dead/dangerous code.

~~~
pkaeding
Yes, this clarification is accurate. I was thinking of 'canary launch'-style
releases, where you release the new thing to a small group, then a larger
group, etc, until everyone is getting it, and you don't need the flag any
more.

------
ozten
In my experience with large codebases and multiple teams, another developer
might copy that flag into another part of the code to get some desired side-
effect.

Yes, this is horrible, but in the real world...

I find you have to grep through the code and think about all the changes that
impact your feature flag before systematically removing it. You're cleanup
branch isn't being maintained and is could provide a false sense of safety.

~~~
pkaeding
You are right that the real world is always more complicated. However, I think
the idea holds. If another dev needs to use the flag in another area of code,
the flag cleanup branch should be maintained with this change.

The point is not to make flag cleanup automatic. It is to front-load the work
of cleaning it up when the complexities involved are fresh in your mind. That
way, when it comes time to clean it up, it is much easier to be more confident
that you found all of the edge cases.

------
throwaway6497
"You will need to merge master back into your cleanup branch periodically, but
that is usually easier than it would be to recall all of the context relating
to the original change."

Won't there be merge conflicts when you do this the first time as the clean
feature branch code be different from the flag based feature on the master? Of
course, all the subsequent merges should be conflict-free.

~~~
pkaeding
There shouldn't necessarily be any merge conflicts if you branch the cleanup
branch off of your feature branch. So, it might look like this:

    
    
      -master---------------*--------------------*--
        \-feature-branch---/                    /
                          \-cleanup-branch-----/

------
perlgeek
On a tangent, how long-lived are feature toggles usually?

I have very limited experience, and it points to a wide range from a few days
to few months. When I stumble about a 1y+ old flag, I tend to delete it (and
the dead code path that it comes with).

What's your experience?

~~~
kazinator
Feature toggles can last decades.

For instance GCC has a feature flag called -ansi which gives you C90
compatibility.

C90 was superseded in 1999 by C99, and so that's 17 years of compatibility,
and counting.

~~~
startling
I don't think this is a feature flag in the same way the rest of the
discussion is using the phrase.

~~~
kazinator
How so? C99+ support/conformance is a compiler feature. That feature
breaks/conflicts with some aspects of C90 support, an existing, older feature.
So you need a feature flag. Inside comiler there are various places where you
have the equivalent of "if C90 do this, else do that".

~~~
startling
"Feature flags" tend to be for behavior that is being developed and tested.
The -ansi flag is more like configuration. It's pretty valuable to continue to
support C90.

------
marc_omorain
I've had good success in the past adding @deprecated annotations to the old
code when adding new code behind a feature flag.

It makes it much easier to come along later and know which functions can be
deleted when the decision is made to kill the old code.

------
0x0
Sounds dangerous. Later commits on master might actually add more if(feature-
flag) statements, so if you then just mindlessly merge the cleanup branch,
you'll miss the added ifs.

I'd prefer to create a cleanup branch like any other feature branch only when
actually going to clean up, and spend the extra cost getting back in context,
studying all the if(feature-flags) from master. Otherwise you might miss some,
or you might forget some interaction that you learned after feature deploy.

~~~
pkaeding
Yeah, I think if you mindlessly do anything, you're gonna have a bad time.

Think of the cleanup branch as a running list of changes that you know you
will need to make to remove the flag. Any future references to the flag should
keep this cleanup list in mind. Code reviewers should keep these cleanup lists
in mind.

This list of cleanup tasks happens to be expressed as a branch in your VCS
(this is a pretty good way to express changes that need to be applied to a
codebase). You will still need to be careful when you execute that list, but
it will be helpful to have the running tally of things that need to be done.

------
vemv
Haven't tried it myself, but why not use authorization libraries instead of
specialized 'toggle' libraries?

After all, both are concered with whether user X is allowed to do Y.

Using just one approach might be a clean, maintainable approach.

The original code `if can?(:use_feature_x, user)` is written just once, and
then never needed to be removed. The only thing that changes, gradually and
cleanly, are the business rules in :use_feature_x (e.g. update the method in
your ability.rb, using Ruby CanCan terminology)

~~~
pkaeding
In canary launches, you might want to roll a new feature out to 10% of your
users, then 20%, etc. Once it is released to 100% of your users, you might
want to remove the check, since it is a no-op.

I'm not aware of any authorization libraries that let you grant access to a
percentage of your users, but maybe they are out there? It is a strange use
case from an 'authorization' standpoint.

~~~
vemv
no-op point is true (except for logged out users - then the check is still
useful)

Anyway, how do you consistently decide to which 10% you show the new feature?

That piece of data is better stores in your Users table, as I see it. Plays
well with authorization libs.

~~~
pkaeding
The way LaunchDarkly does it is to hash the user key, along with the feature
key. This way the same users aren't always included in the '10% set' for all
features, but they are consistently in the 10% set for a single feature.

This also allows the decision to be made in memory, without an additional
round-trip to the DB.

------
kazinator
In the TXR language interpreter, I have a -C option which takes a numeric
argument: it means, simulate the old behaviors of version N. If you don't
specify -C, you get the latest behavior.

Throughout the code, old behaviors are emulated, subject to tests which look
similar to this:

    
    
       if (opt_compat && opt_compat < 130) {
         /* simulate 130 and older behavior */
       } else {
         /* just the new behavior please: -C was not specified,
            or is at 130 or more. */
       }
      

I think that tying specific old behavior to a proliferation of specific
options is a bad idea. It does provide more flexibility (give me some old
behavior in one specific regard, but everything new otherwise), but that
flexibility is not all that useful, given its level of "debt".

The purpose of compatibility is to help out the users who are impacted by an
incompatible change; it gives them a quick and dirty workaround to be up and
running in spite of the upgrade to the newest. They can enjoy some security
fix or whatever, without having to rewrite their code _now_.

However, they should put in a plan to fix their code and then stop relying on
-C.

If users are given individual options, that then encourages a behavior whereby
they use new features with emerging releases, yet are perpetually relying on
some compatibility behaviors. This leads to ironies: like being on version
150, and starting to a feature that was introduced in 145 and changed
incompatibly in 147 and 148---yet at the same time relying on a version 70
behavior emulation of some other feature. Hey we don't care that this new
thing was broken recently twice before being settled down; we never used it
before! But we forever want this other thing to work like it did in version
70, because we did use it in version 70. It's like using C++14 move semantics
and lambdas, but crying that GCC took away your writable string literals and
-fpcc-struct-return (static buffer for structure passing).

It's very easy to hunt down the opt_compat uses in the source code just by
looking for that identifier, and the version numbers are right there. If I
decide that no emulation older than 120 will be supported in new releases
going forward, I just grep out all the compat switch sites, and remove
anything that provided 119 or older compatibility. The debt is quite minimal,
and provides quite a bit of value.

~~~
retbull
Is there any way you could explain that again? I don't quite get what you are
doing.

~~~
shoo
the user can optionally specify which version of behaviour they want. This is
named the `opt_compat` value in the code. All through the code there are
checks against the `opt_compat` value to decide which version of which
old/current behaviour to use.

~~~
kazinator
And the opt_compat has C integer/boolean semantics in this case, so the test

    
    
      if (opt_compat) ...
    

tests whether the option has a nonzero value (has been specified).

And so

    
    
      if (opt_compat && opt_compat <= 130)
    

means "user has requested compatibility, with a value of 130 or less".

By the way -C 0, which would look as a Booealn false, as if -C were not
specified, is not allowed. If the user specifies -C N such that N is lower
than the oldest version that we emulate, the implementation terminates with an
error message like "sorry, compatibility with versions less than 70 is not
supported by version 140".

------
jupp0r
If you do feature flags by inserting if blocks throughout your code you will
create tech debt anyways. The goal is to have one if block and hide the
changed behavior behind interfaces (or polymorphic functions if you are using
functional languages). Dependency injection is your friend.

If you don't do this, you won't scale beyond a hand full of feature flags.
Chrome has hundreds, for example.

~~~
backslash_16
Can you expand on this? I'm interested to see how this works in a real code
base.

I'm thinking something like an initial (maybe massive) if block in the setup
of the application that sets all of the behavior/features by declaring which
implementations get set to which interfaces? After this if block, all of the
DI stuff is set?

This of course means you need to use a DI framework of some sort.

Using feature flags is something I'm investigating because our current model
is a git branch for every feature, and I wonder/fear it only works because
we're a small team that has worked together for a while and in the future when
we grow this will break down.

~~~
adamconroy
I was about to say the same thing as your parent. I recently had a nice
feature toggle experience using a strategy pattern with DI.

Basically there was one 'if' statement in the DI container configuration code
that looked up a config. Basically

if (newPricingStrategy) bind IPricingStrategy to
NewPricingStrategyImplementation else bind IPricingStrategy to
OldPricingStrategyImplementation

------
johansch
This obsession about avoiding technical debt is quite strange. It's a tool to
use when it makes sense, like loans in the bank...

~~~
dahart
Debt, both technical and financial, is _always_ best avoided. And in both
cases, once you have it, you have to pay it down sooner or later, and the
later you do the more expensive it gets.

It is a tool, but having it is always a negative that is offsetting a bigger
negative. By all means, take the loan when you need a boost that you can't
otherwise afford. But take the smallest loan you need, and pay it back as fast
as you can.

~~~
adamconroy
no, often you don't have to pay for technical debt. the code may never be
refactored, and / or the whole system / module replaced before lots of code is
touched.

