
Piranha: An Open Source Tool to Automatically Delete Stale Code - vinnyglennon
https://eng.uber.com/piranha/
======
ublaze
One of our internal experimentation systems handles this by capping rollouts
to 95%. So the author is forced to clean up their code if they want a full
rollout.

~~~
ThouYS
I'm sorry, but I don't entirely get this. Does it mean that after I've
finished something I want to merge, I have to remove 5% of that? Or of the
entire code base? Or something completely different?

~~~
rkangel
I think they're saying that any new feature controlled by a flag is only
rolled out to a maximum of 95% of the user base. To make sure it gets rolled
out to 100% you have to remove the flag controlling it (and therefore the old
version of the code).

------
iandinwoodie
Here is the link to the Piranha repo since I didn't see it mentioned in the
article: [https://github.com/uber/piranha](https://github.com/uber/piranha)

------
ginko
Wouldn't "nibblefish" be a more apt name for this?

[https://en.wikipedia.org/wiki/Red_garra#Relationship_with_hu...](https://en.wikipedia.org/wiki/Red_garra#Relationship_with_humans)

------
ResidentSleeper
A while ago I deleted my Uber/UberEats account (they didn't make it easy btw)
because of how utterly unusable their apps were compared to competition. After
me and my girlfriend spent 30 minutes trying to order food on 3 different
platforms, which has been a repeated experience with them for years, I decided
it's time to give up. Same obvious UI/UX bugs as over a year ago - I'm
actually dumbfounded at how this is possible, _do they even use their own
app_? To sum up, I tend to look at tech coming from Uber through a somewhat
sarcastic lens these days.

~~~
nyanpasu64
Try rooting your phone and installing a CPU monitor like
[https://f-droid.org/en/packages/com.eolwral.osmonitor/](https://f-droid.org/en/packages/com.eolwral.osmonitor/)
. Then watch for apps burning CPU in the background. Uber might show up
_wink_.

------
ramraj07
Can someone point to a good write-up on how to introduce and start using
feature flags in a repo? Seems like I just can't find any good resources on
this topic!

~~~
sulZ
If you're looking to try it out in a simple way, you could probably
dynamically load different functionality or modules with environment
variables. But I'm also looking for a good resource

~~~
dabeeeenster
Hi - we're developing a 100% open source feature flagging platform that can
achieve this - [https://bullet-train.io/](https://bullet-train.io/) \- would
love feedback!

------
RickJWagner
It sounds pretty fancy. I'd want a very thorough set of unit tests before
trusting code to delete code, though.

~~~
lazaroclapp
Piranha co-author here. I'd say you trust it a bit more than we do, then ;) As
the paper and blog post mention, we use Piranha to create diffs/PRs against
the original author of the code. Developers are expected to code-review the
changes before they are ever landed. As used within Uber, Piranha won't ever
auto-land code deletions.

That still saves the time of: a) remembering that the flag is stale and must
be removed, b) actually removing the code and putting the diff up. Piranha
diffs for a single flag are also generally small enough to fully read them
during code review.

We do have comprehensive unit tests for both the Piranha tool and the target
codebase(s), and the majority (65%) of diffs generated by Piranha are landed
without changes (and the most common change in those that are changed is to
delete extra lines that Piranha couldn't prove as stale). Thus far we haven't
had an outage caused by a Piranha deletion, but certainly there have been
incorrect diffs generated and caught either by CI or manual reviewers,
requiring us to update the tool. We would not recommend landing diffs
generated by Piranha - or other stale code removal tools - to master without
any reviews right now :)

~~~
victords
Follow up question: In general, when a diff is generated, is the standard
process to just do a manual code-review and CI, or do you also test it
manually to check for unexpected side-effects?

~~~
mkr-plse
Manual code review and CI only. In our experience of deleting more than 2.5K
flags, testing would have helped in one case but then the code was not tested
well when the flag was introduced.

~~~
lazaroclapp
Just as an extra note: there are manual testing steps (and automated end-to-
end test, and internal alpha test, etc), independent of whether the diff was
Piranha- or developer-authored, but all those happen for the continuous
delivery internal version of the app, which is after the diff has been landed
to master, but before a release is sent to app stores. We would count an issue
discovered there as an outage having made past "our tests", even though it
could well be caught before it gets to any external users.

------
X6S1x6Okd1st
Neat! I experimented with something similar and found the state of java AST
transformers to be in a kind of painful state. I'll have to look though the
code base to see what y'all used.

~~~
lazaroclapp
We used Google's Error Prone framework. Pros: you get full compiler
symbol/type information, in addition to the AST. Cons: you need to build the
code to process it (Error Prone runs as a javac plugin), and if you want to do
things efficiently then you are restricted to basically a single AST
traversal. It's very use-case dependent whether you are better off just using
e.g. javaparser.

------
bluesign
If I am not mistaken this tool requires technical debt to begin with. I think
more important part is what caused this debt to begin with.

In my opinion it lies in the implementation of feature flags.

~~~
another-dave
Can you expand a bit on what you mean here?

How would you introduce feature flags in a way that doesn't leave any debt
behind in contrast to their approach?

Or are you saying that feature flags themselves are inheritly debt-generating
(which I think they acknowledge, but are taking it on prudently &
deliberately) — in which case, what would you do instead?

~~~
bluesign
Actually what I meant was this tool is kind of a after thought of bad feature
flag life cycle. When you set a feature flag, normally you should put a
deadline on it, after this deadline, you should either choose to keep it, or
remove it or you can extend the deadline.

But seems like they accumulated a lot of flags, which stayed stale over time,
and went unchecked and lack of feedback created the debt.

Also unrelated but I prefer 2 stages for retirement of flags, from production
and from code. From the example in the article. Instead of using direct access
to:

experiments.isTreated(RIDES_NEW_FEATURE)

I think it is better to indirectly refer this from a function like
(considering we are in context of RIDES module/class):

function isNewFeatureEnabled(){ return
experiments.isTreated(RIDES_NEW_FEATURE) }

So when you want to retire this flag to disable new feature but want to keep
code for some time (for few versions), but don't want to bloat your binary you
can simply do:

function isNewFeatureEnabled(){ return FALSE }

~~~
lazaroclapp
> When you set a feature flag, normally you should put a deadline on it, after
> this deadline, you should either choose to keep it, or remove it or you can
> extend the deadline.

This is actually a feature we did identify as important to have in order to
increase Piranha's effectiveness, and it is being added to our internal flag
tracking, but it doesn't negate the need for the tool. An expiration date
makes it easier for the tool to know when to run on a given feature (rather
than using heuristics based on % rollout and days without changing the
experiment), it still means that Piranha: a) reduces manual effort by auto-
generating the (candidate) removal patch, and b) acts as the reminder portion
for the expiration, so it's more actionable than just adding a task.

The thing to note is that, even on a steady state where flags are removed as
soon as they go stale, enough new experiments are being created every day that
reducing the time spent cleaning them up is valuable.

Also, you definitely don't want to block someone from fixing a crash because
they have pending expired flags, so all you can really do with any expiration
policy is to remind them. With Piranha, you are reminding them _and_ reducing
the friction to solve the issue. After all, the diff is right there for them
to review and click 'land' on.

As for the hardcoded flag value in your example above? What does that
accomplish? It looks to me like you'd only be shipping dead code, since there
is no runtime way of re-enabling the `RIDES_NEW_FEATURE` behavior (which is
the main difference between "rolled 100%" vs "Piranha-removed"). It also makes
harder to remove the related code later, since the semantic information about
it being part of the feature is lost. If it's just about having the old code
available, then version control does that already, no? What am I missing?

~~~
bluesign
I am in favor of all automation and I totally agree that the tool is valuable.
I am sorry if my comments seemed on other direction.

What I was trying to say was basically, without tooling also you can manage
the debt from feature flags.

Answer to the RIDES_NEW_FEATURE question is about "you definitely don't want
to block someone from fixing a crash because they have pending expired flags"
mainly.

When I am disabling a flag, if I set isNewFeatureEnabled to False, basically,
I am removing bloat instantly. Then when I have time to review the code I can
also remove the dead code. Actually this is fixing the concerns in your blog
post about "accidental activation" and "bloat" without waiting developers to
fix the code. Piranha can set flag to stale value, then later can send the
developer task to fix the dead code.

My flow is little bit more complicated then I replied actually, I have also
assets etc related to feature flags. CI pipeline also removing non-used assets
for that flag when it is stale. So basically I have more like: function
isNewFeatureEnabled { return isFlagEnabled(flag) &&
getFlagValue("RIDES_NEW_FEATURE") }

What I am curious on this topic, do you have any kind of conflict detection
for your feature flags?

~~~
mkr-plse
BTW, we also have a tech publication at
[https://github.com/uber/piranha/blob/master/report.pdf](https://github.com/uber/piranha/blob/master/report.pdf)
where we discuss some of the design tradeoffs pertaining to the stale flag
cleanup problem.

Can you elaborate on what you mean by conflict detection? Are you trying to
understand how flags are dependent on each other?

~~~
bluesign
Thanks a lot, I will read it asap.

Basically sometimes I have some conflicting flags that can introduce bugs.
Especially some rarely used flag and a new feature. Basic example, 2 different
flags, setting same property to different values on an UI object.

Although more testing coverage probably can help, but I am curious, if you
have some automation to detect those cases.

~~~
mkr-plse
Interesting. We don't have tooling for this yet but extending the Piranha
static analysis may help detect the issue.

------
unnameduser1
It relies heavily on feature flags. Makes sense for big systems with lots of
feature flags.

However it works for now only with Java, Swift, or Objective-C Code.

Check it out if your projects fit those two conditions

------
clumsysmurf
> Currently implemented for Objective-C, Swift, and Java programs

Are Uber's Android apps written in Java? I would have thought they would be
Kotlin.

~~~
lazaroclapp
Most of the existing Android codebase is Java. There is some Kotlin code, and
PiranhaKotlin is certainly in the roadmap somewhere but, especially when the
goal is removing older features, Java makes the lion share of the use case for
such a tool right now.

------
awinter-py
this seems to work by comparing feature flag configs

I've always been a buyer of tools to make runtime stats available in the IDE

would be awesome to overlay trace / route timing _into the codebase_ on
whatever part of the call graph is on the screen

------
abjecton
So it's like Gilfoyle's anton but it only deletes stale code

