
Not all bugs are worth fixing and that's okay - kinbiko
https://blog.bugsnag.com/application-stability-monitoring/
======
nowayjose2
I have a funny feeling that the nontechnical people on my current project
would be nodding their heads along to the article, but the truth is that our
applications have bugs that 100% of our customers are running into; they
simply aren't immediately noticeable to a layperson. That doesn't mean they're
not important. The business relies on complying with the rules of third-party
organizations and the software is blatantly violating those rules right now.
The clients are aware but still choose to prioritize new features over fixing
the bugs that are creating these compliance issues. If we're caught out by
those third parties before the bugs are fixed, there's a good chance it could
sink the whole company. But our users don't "see" these bugs so they're not
considered a priority over new products, new features, or anything else
marketing might want.

I looked up Pinedo's background and she's not a developer; she's a social
media manager. This is kind of what I figured because her perspective on
development seemed really out of whack to me. There are many kinds of bugs
that can't be measured with a simple stability calculation, and IME there are
definitely error states that are worse than death (crashes). Plus 97% of teams
are definitely not following agile principles. Every dev team I've ever been
on said it was agile, and most of them just meant that there was a kanban
board and something that vaguely approximated a sprint.

~~~
ryandrake
One of the toughest things I struggled with while transitioning from a larval
junior developer to a senior tech lead to a project manager was the fact that
(at least in the context of a for-profit business) not all bugs need to be
fixed, even the ones you personally think are really really bad. The goal is
to make money, not necessarily by producing the most perfect software. The
quality bar needs to be high, but there is always a point beyond which the
returns for fixing a bug are outweighed by the costs of fixing it: the direct
engineering cost, the opportunity cost of not working on a feature, the cost
of missing a deadline and not releasing in time for Christmas, the cost
associated with the extra risk you’re taking by making a late change, etc.
Good places judge all these costs, and the best ones have formal processes for
judging lots of bugs at scale and constantly re-evaluate whether a bug should
be fixed at this point in the project or not. Sometimes the clear right answer
is “no.”

~~~
wilun
That's fair, but one has to remember that some of the key points are to be
balanced, and that, like you said, "the quality bar needs to be high".

And I'm more for prioritizing trying to not introduce bugs than to fix all the
old ones. Which is challenging on, how could we call that?, "legacy" software.
So that priority can and must be reversed temporarily when that "legacy" is
too much.

So it's all very context dependent, and not having anybody (or too few)
working on making things better when its needed is not going to deliver any
kind of velocity in the long term (and probably the short term velocity is
_already_ way too low in those cases). Too bad for the mythical time to
market...

So you have to be able to say no to bugfixes, but you certainly also have to
be able to say no to the eternal rush of new half-backed features, when
needed. A short-term ever obsession on "opportunity cost of not working on a
feature" could yield quite paradoxical results if trying to build them on some
kind of zombie legacy code (that is only ever edited with disgust and great
difficulty, but never seriously refactored).

Not only this balance is hard to achieve, but your role as a senior tech lead
and project manager is certainly to consider carefully the cleanup needs, and
be an advocate for them when needed, including by pushing back against feature
creep pressure. Because if you are not, most of the time nobody else will. As
a tech lead, this mean among other things, that a black box approach of parts
of the maintained software is out of the question (of course you can delegate,
but even then its imperative to stay in the equation for that purpose, only
with less details). Paradoxically, even if the quality is crap and the
organization notices and tracks loads of bugs, most people will be happy at
the moment the bugs are triaged and assigned and eventually "fixed" by more
horrible garbage (that is, the impression that something is done), rather than
doing the right thing that is to organize a cleanup of the software more in
depth.

I've got the impression that it is rare to find projects where this balance is
achieved correctly, but maybe it's only because of bad luck. Well in lots of
cases, the famous ones (I'm thinking on the level of Linux, Firefox, Python,
etc. not just your random niche software) are actually not that bad, and their
competitors have a way shorter lifespan when not as balanced...

~~~
dgreensp
This is my experience as well. Product progress is prioritized over code
quality, and the ultimate cost to velocity is real but often unacknowledged.
Bugs go unfixed because they are very hard to fix, and this high cost is taken
for granted.

It’s definitely on the technical leadership to stand up for code cleanliness
and push back against product, and on the higher-ups to recognize the
importance of this dynamic.

------
wristmittens
Something about this rubbed me the wrong way and I realized it was because
this pretends that ignoring much of the long tail of userbase is not only okay
but beneficial to the majority. If, say, Quip ignored bugs in IE6 that's
likely fine because my parents using their CRT iMac aren't going to be using
Quip, but imagine if a crucial app like Gmail ignored older browsers; suddenly
all the disadvantaged people that can't afford new laptops lose access to
their email.

If it's a bug that 10 users are hitting because they were migrated from an
earlier version incorrectly, sure it might be okay not to fix, but if 10 users
are hitting it because they're legally blind and using an extraordinarily
large font to use your product, it's crappy to say they don't deserve a fix.
You have to understand what part of your userbase is hitting a bug and then
decide from there.

~~~
trav4225
Something to perhaps consider: if both groups of 10 people are experiencing a
bug, especially one not caused by their own doing, why is one group more
"deserving" of a fix than the other?

~~~
taejo
A bad migration can be worked around by a clean install, but blindness can't,
so there's one. Legal reasons are another (e.g. the Americans with
Disabilities Act).

OT: does anybody know of a site with similar interesting content and
discussion to HN, but with a fraction of sociopaths closer to that of the
general world population?

~~~
trav4225
Thanks for the input!

I hope the sociopath comment wasn't triggered by my question. I tend to
question widespread assumptions, perhaps more often than I ought to. But I
find that, more often than not, people don't have a good reason for the
positions they hold. I abhor groupthink which, sadly, dominates our culture
today.

~~~
taejo
The sociopath comment was indeed triggered by your question, though it wasn't
fair that it got directed at you rather than any of the hundreds of other
comments that make me feel the same way, and I'm sorry for that, but I suspect
that any of those commenters would have given a similar defense.

It matters very much _which_ widespread assumptions one tend to question, and
which subconsciously get a free pass, and I find that HNers on the whole are
likely to question assumptions like "people should be kind to each other" more
than most people, while questioning "companies are legally required to
maximize shareholder value" less, where in fact it's the latter that's false,
and while the former isn't a statement of fact, it's a very healthy axiom for
humanity.

Shared values are not "groupthink": they're what allow us to have society at
all, and while we should be allowed to discuss them, dismissing them
carelessly _is_ anti-social.

------
kyleperik
> There’s no such thing as a bug free application

This is a stretch. Seems like many people think of code as a living thing that
just does what it wants, and us programmers have to beat it into submission.

The truth is, there can be bug free applications.

The problem I think is the complete opposite of the point of the article.
Programmers need time to write good software. Without stopping to fix the
things we run into, technical debt does what it's known for, and exponentially
increases, and kills time that could be spent writing features.

So maybe software is like a living being in a way, that it needs to be cared
for gently.

~~~
GoToRO
I remember the time when people around me started using expressions like "My
PC is not feeling good today." _feeling_

It's about time, it's about having managers that were programmers and not just
managers and so on. Bug free is possible for sure.

------
nathan_long
Good article. It reminds me of Sandi Metz's treatment of well-designed code as
a business proposition: the goal is to save money, because a good design makes
changes cheaper.

It makes sense to consider the cost of having a given bug vs the cost of
fixing it. Of course, such estimates will almost always be hand-wavey.

I would also say that when in doubt, fix it. A bug is, by definition, the
software not doing what it's expected to do; I think it's better to make fewer
promises and keep them. A user who encounters a bug loses trust in the
software, and there's a tipping point where they abandon it. You might not
know where that is.

You also might not realize what a bad day it could give someone, even if
they're only one person. Eg, if you're an email platform and you have a bug
that drops one email in a million, that might seem OK. But if missing that
email gets someone evicted...

~~~
quanticle

        It makes sense to consider the cost of having a given bug vs the cost of 
        fixing it. Of course, such estimates will almost always be hand-wavey.
    

I agree with that approach in theory, but in practice it turns out that that
it's a lot easier to estimate the costs of fixing the bug than it is to
estimate the cost of having the bug. As a result, because of our biases, in
any ambiguous case, our bias will be for keeping the bug, since the cost of
having the bug is the impact of the bug multiplied by the probability of
someone hitting it, and it's always easy to lowball those probabilities. "Oh,
no one will notice that," or "Yeah, but that's a really obscure case." And
then you find out that all it takes is one obscure case for your trading
application to lose hundreds of millions of dollars a day. Or for hackers to
breach your systems and make off with millions of credit card numbers. Or for
malware to turn your IoT devices into a botnet.

------
larrik
"97% of respondents said they practice agile in their organization"

Yeah, they all _SAY_ they practice agile, but a lot of them practice waterfall
with agile naming conventions.

~~~
betadreamer
100%. They think they are doing agile as long as they have Sprints.

Startup I'm working at right now told me "Agile" is overkill for us. But
ironically it's more iterative then any company I worked before that did
sprints.

~~~
joombaga
> "Agile" is overkill for us.

What does this mean/did they mean by this?

~~~
betadreamer
yeah being agile is good. I meant by incorporating Agile frameworks like
Scrum.

------
mnm1
The problem is that many organizations don't know how to tell important bugs
from non-important ones, don't have proper processes for bugs reported by
customers, and generally ignore customers. Here's a couple of popular apps
that I have used and the reason I have quit or will quit them: Hulu with the
TV package constantly tells me I'm streaming to more than 2 TVs and won't play
anything even though I'm not streaming to any. Youtube TV constantly plays the
wrong thing when I click something to play. These are egregious bugs that I'm
sure I'm not the only one experiencing because they happen on multiple,
different mobile devices, tablets, the web apps, etc. The strategy outlined in
the article may be viable when customers are not paying anything, but when
companies charge an arm and a leg for software (> $40 / month) users expect
the software to not have any major bugs like this that prevent its basic
functioning. With the type of support offered by such companies, users will be
moving on quickly to competitors that hopefully have their basics worked out.
I wouldn't use an email program that can't send or receive email and these
bugs are on par with that.

------
ataggart
>at a certain point, it’s too expensive to keep fixing bugs because of the
high-opportunity cost of building new features.

While I may agree with this in the abstract, in practice most folks don't
really know whether they're at that point. It also doesn't consider cumulative
effects over time.

Bugs don't just affect application stability or user experience. A system that
does not behave as designed/documented/expected is a system that will be more
difficult to reason about and more difficult to safely change. This incidental
complexity directly increases the cost of building new features in ways
difficult to measure. Further, new features implemented by hacking around
unfixed flaws will themselves be more difficult to reason about and more
difficult to change, exacerbating the problem.

The larger the system grows over time, the more people working on it over
time, the faster this incidental complexity problem grows over time. At a
certain point, it's too expensive to not fix the bugs because of the
increasingly high cost of building new features. At that point, folks start
clamouring for a rewrite, and the cycle begins anew.

~~~
wilun
If the only alternative is between a rewrite, and not-fixing the mess
gradually, then I'll take the rewrite anytime and let the cycle continue.

The problem is: is your rewrite really going to be a full-rewrite, or some
kind of hybrid monster (at the architectural level, of course, there is no
problem in reusing little independent pieces, if any exist)? Because you can
easily fall in all the traps of both sides, if the technical side is not
mastered well enough by the project management...

------
yurishimo
The most important thing I take away from this, is that even if you don't fix
the bugs, you should be aware of them and tracking them.

Maybe it is a bug that only affects 1 out of every 10,000 customers. But if
you get enough of those it can start to add up. Keeping track of them allows
you to go to management with the data to support spending a sprint on bugfixes
and code maintenance.

------
default-kramer
This doesn't mention that crashes aren't the only kind of bugs. In fact,
crashes are the bugs I fear least because I know about them immediately. It's
the bugs that happily do the wrong thing that worry me. Like sending email to
the wrong person, for example. I review email-sending code 3 times more
closely than other code.

------
phyzome
The author of this article seems to think that occurrence frequency is the
sole metric of whether a bug should be fixed.

Which of these two is more important?

    
    
      - 0.1% of my users lose their data irrecoverably
      - 30% of my users get an error page and have to refresh
    

The article never waded into this at all, which is disappointing. I don't feel
like I learned anything.

------
charleslmunger
This article is strictly true - there are bugs that are not worth fixing. But
the process of figuring out which ones are and aren't isn't as simple as
crashes/sessions.

There are some categories of bugs that must always be fixed, regardless of how
infrequently users run into them - security, privacy, accessibility, data
loss.

There are also cases where many low impact bugs all share a common root cause
- the value of fixing any one bug is low, but the sum of fixing all current
and preventing all future occurrences is high value. Enforced static analysis
tools (like error prone for Java) and libraries/frameworks with safety checks
(autoescaping template languages, polyfills, etc) are a great way to address
these long tail bugs. I generally write a new compiler error after
encountering the same bug class three times.

------
nine_k
Both parts of the title are correct:

* Not all are worth fixing, that is, the (financial) upside of their being fixed is too low compared to the effort required.

* And it's okay, that is, it's something we have to accept and live on, though it's not really nice and satisfying.

------
hullsean
Thanks Kristine for the link to my article...

Myth of Five Nines - why high availability is overrated
[https://www.iheavy.com/2012/04/01/the-myth-of-five-nines-
why...](https://www.iheavy.com/2012/04/01/the-myth-of-five-nines-why-high-
availability-is-overrated/)

------
User23
Dijkstra raised the distinction between pleasantness and correctness. Since
most programs are unspecified there is no logical grounds for declaring their
behavior incorrect. Unpleasant behavior abounds however, but if your users are
willing or forced to accept that unpleasantness then it can be rational to let
it stand.

------
lurker456
I don't see any mention of security or legal concerns. Not all bugs can be
identified by exceptions, exception frequency is not an indication of their
impact, and in some cases stability is less important then correctness. Using
bugs per session as the (only?) metric is a horrible way of doing product
management.

------
debt
Bugs shouldn't happen. Actually, in some critical systems, bugs _can 't_
happen.

Bugs aren't magic; they happen for a reason. It could be a broken
dependency(unsupported versions, fatal bug in a dependency, deprecation,
configuration etc.), resource limitation(out of memory, security breach etc.),
poor design which leads to poor implementation(logical errors, bad data
abstractions).

Abstractly waving your hands and saying "we can't fix all bugs" doesn't feel
right. Identify the underlying cause of the bugs and address that.

One solution is to reduce dependencies, increase resource allocation, and rely
on a less rigid design. As a business grows, dependencies will increase,
resource allocation will increase and the design will become more complex.

~~~
AnimalMuppet
> Actually, in some critical systems, bugs can't happen.

[https://blog.acolyer.org/2017/05/29/an-empirical-study-on-
th...](https://blog.acolyer.org/2017/05/29/an-empirical-study-on-the-
correctness-of-formally-verified-distributed-systems/) says that they found 16
bugs in three formally-verified systems (including two bugs that didn't get
caught _because of bugs in the verifier_ ). So, I'm pretty sure that bugs
_can_ happen. (Unless you mean that in certain critical systems, bugs can't be
allowed to happen, in which case I agree.)

More, I'm pretty sure that most bugs don't happen for the reasons you list. I
suspect that the majority of bugs are just poor implementation.

It's possible, however, for the opposite extreme to happen: The program wasn't
buggy, but circumstances changed, and now it is. I'm thinking specifically of
crypto code, which can be perfect... until a new attack is devised. Then the
software is buggy, because it can't stop an attack that didn't exist when it
was written.

When we do get software that has "no" bugs in critical systems, it's because
of extreme care at every step: specification, design, implementation, review,
and _testing._ Obsessive testing, and testing, and testing, and testing.

~~~
trav4225
From my personal experience, the vast majority of bugs happen due to an
astounding failure on the part of developers to consider even the most basic
edge conditions. Also, terrible contract documentation...

