Hacker News new | past | comments | ask | show | jobs | submit login
The “bugs are 100x more expensive to fix in production” study might not exist (theregister.com)
159 points by sohkamyung 7 days ago | hide | past | favorite | 130 comments

Just a paraphrasing of the original article, which was discussed 2 days ago (165 comments):


This article is about the poor state of research in the software field. The 'bugs are 100x more expensive to fix in production' bit is just a glaring example of that. The article isn't in any major way refuting 'bugs are 100x more expensive to fix in production', it's just stating that the study may not exist, so there's no need to defend your beliefs as many are doing in the comments.

I don't recall encountering this specific 100x claim, but I do recall Steve McConnell making a similar case early in Code Complete. In chapter three, he has a chart showing that the average cost of fixing defects post-release is 10-100 more expensive than during requirements, which he supports with the following citations:

"Design and Code Inspections to Reduce Errors in Program Development" (Fagan 1976)

Software Defect Removal (Dunn 1984)

"Software Process Improvement at Hughes Aircraft" (Humphrey, Snyder, and Willis 1991)

"Calculating the Return on Investment from More Effective Requirements Management" (Leffingwell 1997)

"Hughes Aircraft's Widespread Deployment of a Continuously Improving Software Process" (Willis et al. 1998)

"An Economic Release Decision Model: Insights into Software Project Management" (Grady 1999)

"What We Have Learned About Fighting Defects" (Shull et al. 2002)

Balancing Agility and Discipline: A Guide for the Perplexed (Boehm and Turner 2004)

I only have one of these (Dunn), and it's boxed up in the attic, so I can't readily check their sources, but I somehow doubt that all of these simply launder the study under discussion.

I don't want to trivialize some of the good points that Hillel Wayne makes about soft-science research applied to software productivity, but he would have us dismiss all of these citations out of hand, simply because they predate capital-A Agile, which of course changes everything. That doesn't strike me as a particularly compelling approach either.

> … he would have us dismiss all of these citations out of hand…

Does he? I thought the point of his talk [1] was that we developers ascribe way too much weight to few, small studies. So rather than saying that we should dismiss the claims, we should instead take great care because the claim may not be generalizable at all.

[1] https://www.hillelwayne.com/talks/what-we-know-we-dont-know/

This is what I'm referring to:

> A lot will be from before 2000, before we had things like "Agile" and "unit tests" and "widespread version control", so you can't extrapolate any of their conclusions to what we're doing. As a rule of thumb I try to keep to papers after 2010.

While I admit that even in the late 90s, it seemed strange to me to see citations from the 70s or 80s about projects from the 60s (like, say, OS/360), I'm not convinced that so much has changed in the last ten to twenty years as to render all previous research irrelevant.

Yeah I'll admit it was an off-the-cuff quip that really isn't all that accurate. I don't put as much time into editing the newsletter as I do into my proper essays, so stuff that I'd normally polish out gets through. I do prefer to keep to papers after 2000 in general, less because of dramatic quality differences and more because it leaves fewer ways for people to dismiss stuff without looking at it.

It's also bizarre to see claims that unit tests are so new. I can't say I really know about other communities, but Perl at least was doing a lot of unit testing using the Test Anything Protocol (TAP) back in 1987.


If anything it raises a much better question that a survey of past research might help answer, is there a difference in productivity over the last 20 years since Agile/source control/unit testing became popularized?

I enjoyed the book _Leprechauns of Software Engineering_ which did track down all the chains of citations to find where the original work was misunderstood or even nonexistent. I would bet that it covers most or even all of these citations, but I’m not taking the time to pull it out and cross reference. https://leanpub.com/leprechauns

One of the most recent citations (Shull et al. 2002) is freely downloadable:


Its own citations may not be satisfying, but I find it nevertheless interesting. Here's the summary of the eWorkshop discussion of "Effort to find and fix":

> A 100:1 increase in effort from early phases to post-delivery was a usable heuristic for severe defects, but for non-severe defects the effort increase was not nearly as large. However, this heuristic is appropriate only for certain development models with a clearly defined release point; research has not yet targeted new paradigms such as extreme programming (XP), which has no meaningful distinction between "early" and "late" development phases.

Yes, and it’s worth noting that Hillel was referring to _Leprechauns_ when he talked about the cost to fix claim.

Heck, the subtitle even says it: "It's probably still true, though, says formal methods expert"

What's more interesting anyway is the circumstances around the nonexistence of the study. It's not a case of falsified research! It's a case of someone citing apocrypha as fact, and then other people citing that citation, etc. And nobody questioned it because it's so obviously true!

Is it obviously true though?

More expensive, sure? But 100x? Is it not 50x? Or 300x? Without some actual measurements I'd want to put some very very big error bars on my feelings there!

Exactly: If bugs cost 50x more in production, but only bugs that are relevant for the core product are reported , then I’ve won 90% of the development time. That’s what I did for my startup (no tests, no 3-tier), and it was extremely valuable seeing all reports focus on a few areas, half of the reports being requests for improvement, extremely useful in determining 90% of the critical path and proving the ballpark of revenue that people would put into it, and since we can now hire, we’re now fully 3-tier with modern tech an all.

Good management is judging where to put the team’s energy.

I would be very surprised if there is a meaningful number without adding much more context. Some early stage startup might be able to fix a low impact bug in production at only 10x cost, but I wouldn't be surprised if the two bugs exhibited in the first test flight of goings Boing's Starliner crew capsule have cost them over a million times as much as fixing them earlier, with the retests, the massive timeline slip, and all the knock on effects from the delay that majorly benefited their most promising competitor

It is a case of drawing something to a large extension in order to demonstrate a point. This is a rhetorical device and people shouldn't read the number 100 as literal.

We could go so far as to say that focusing on the number 100 is not seeing the forest for the trees. Even though there isn't a literal trees or forest, and there is no need to create a study to find the trees or the forest.

And it all depends on how you define "bug."

Misallocating memory? Use after free? Off-by-one? Probably cheap to fix even in production.

A misunderstood or missed requirement? Sure, that could be very expensive if it affects a data structure or some other interface with other systems that then also have to change. But that goes beyond what I normally think of as a "bug".

That depends on how expensive it is to get your code into production, how many other dependencies need to move in lockstep, how much manual regression testing there is, etc.

Fixing a ten-second bug on my PC might easily be 100x cheaper than organising a release.

That subtitle really just reads as a doubling down on information that is in dispute.

That is, we have research that shows you strengthen beliefs by weakening evidence.

Now, to your point. This weakened the evidence, it did not falsify it. Does feel very similar.

I came across it in a book that we were supposed to read at university thirty years ago, but the book was outrageously expensive. It cost more than all of my other textbooks put together, and the university book shop didn't have it and told me I'd have to wait months.

It's probably different now that I can click around a little on some web sites and three days later a book arrives from New Zealand. But that was thirty years ago.

Can't remember how I read that book, but I did and I know that most others in the course didn't. I find it very easy to believe that people might assume things about what it said rather than actually read the text, and hand it down to others as fact. And so what's illustrative anecdote transforms into assumedly well-researched fact.

Replying to myself... I wonder what book that was. It was a large slim hardback published sometime in the 1970s, around A4, so much earlier than the 1987 book. I don't have any course materials any more.

As I recall, they argued that making a change in the requirements specification was ~10 times cheaper than finding and fixing a bug while writing the implementation, which in turn was ~10 times cheaper than fixing it after delivery, ie. on other people's production computers at other site. There was some argument around it, but the text didn't imply exactness, as far as I can remember.

Truly seems to get to the root of some of software's troubles...

It's buried a bit, but it actually does kind of refute that.

> Here is a 2016 paper [PDF] whose authors "examined 171 software projects conducted between 2006 and 2014," all of which used a methodology called the Team Software Process. The researchers concluded that "the times to resolve issues at different times were usually not significantly different."

Link is to https://arxiv.org/pdf/1609.04886.pdf

Except that "time to fix" isn't the only factor that contributes to cost once a bug makes it into production.

There's damages caused by the bug, possible downtime due to deployment, testing and possibly certification, a potential need for data migration including dry-runs and more. These are just direct costs, too, there's also potential damage to reputation, cancelled orders, etc.

It shouldn't really surprise anyone that it doesn't really matter when you fix the bug in terms of effort required to just fix the bug itself. It's the process involved in communicating, planning and performing the rollout of fix that might sting real hard.

It comes down to:

What's the cost of testing/specifying/formal methods/etc to a level that catches a bug before production vs what's the cost to fix a bug and its downstream effects.

If I work in an unregulated environment with on-demand deployment, the cost to fix a bug isn't that big, except for bugs that persist bad state/data and especially things involved with schema changes; those changes that are costly to fix necessitate costly testing. If I produce a game rom for unconnected systemsin million+ unit batches which sit on shelves for unknown time for people to use, it would be very costly to fix bugs and a costly test procedure for the whole thing makes sense.

If it's life safety, then yeah, lots of testing (and don't hire me)

All of that is true. Also, deployment from the cited date range of 1967 to 1981 didn't mean the same for a lot of software as it does today. For an integrated mainframe and minicomputer hardware and software business with a bunch of isolated customer sites and huge support contracts, we're talking customer outreach and physical media at the barest minimum. There's no "push to production" automated pipeline that publishes to a web server or package update that gets picked up from the repo by the next system cron at the customer's site over a fiber connection.

A significant issue with this is simply designing a study or a way of measuring.

First of all, define "bug". Is that a ticket filed in a system? What about the "bug" that was found and fixed by the developer testing it a few minutes after finishing the new feature (and thus never had a ticket filed)? What about QA finding a problem with a new feature while testing it (and sending the feature ticket back to the developer)?

Consider: Filing a bug ticket itself takes a bunch of time. If you find and fix a bug early enough then even the overhead time of filing/verifying/closing the ticket isn't spent. How do you measure the time compared to a bug that effectively didn't exist?

You can't do any like-for-like tests: two developers could take significantly different amounts of time to fix the same bug. If essentially the same bug occurs again in another area of code, the fix still can't be compared: if the developer applies essentially the same fix, it'll be much faster than the first time; if they apply a more systematic fix like refactoring the code to avoid that type of bug, that's not comparable.

There's also some bias that will make production bugs take longer. A team that has good quality controls (unit and automated testing, QA team, staging environments) will likely catch a lot of bugs before they get to production. As a result, the bugs that do end up in production are more likely to be complicated ones -- such as requiring several obscure conditions to be met, or only happening at large scale -- and those naturally take more time to address, even if in some cases the time is just spent figuring out the reproduction steps or replicating the environment.

This all makes comparing overall trends pretty difficult -- maybe to the point of being entirely useless.

I guess that 100x is just an arbitrary number that means 'orders of magnitude'. If production means going to Pluto on a super expensive space mission, and the bug ruins all the scientific experiments, and your future funding depends on the success of the mission, I'd say that the cost tends to infinity. If the product requires some kind of polishing that only interaction with real users can provide, then the cost of bugs at production be even negative (spend less time on the requirement side, etc.).

You need to remember the context when this study supposedly took place. "Production" used to have a literal meaning -- producing physical media. 100x was a reasonable number to believe in that case even without extreme situations like a mission to Pluto.

I first learned this in QA training. It's been decades so I don't remember if 100x was mentioned. They listed all the people required to fix a bug once it had gone to production. Off the top of my head that included:

- Customer support handling calls from customers -- possibly thousands of times. This isn't necessarily part of the cost of fixing the bug, but it's part of discovering a bug in production.

- Some combination of managers and product managers prioritizing the bug and scheduling programmers to fix it.

- Programmers finding and fixing the bug.

- QA validating the fix.

- Writing an installer for the fix.

- QA validating the installer.

- Producing the fix.

- Shipping the fix to effected customers -- which may be all customers.

We were being trained to do QA, so we compared that to the cost involved when QA discovers a bug. That was essentially just the cost of us writing a bug report and programmers fixing the bug.

Adding onto this, if the fix requires some kind of notice or error message detail that didn't previously exist it can also require work from your internationalization team to translate that error into N languages your software supports.

I did forget that. I'm pretty sure that was actually listed in the training because the company I was at made software that was translated into a lot of languages.

Replying to myself because I got too excited and hit enter.

Compare the above to a modern SaaS company. Customer support handling the reports is a fraction of the cost because it doesn't require spending time on the phone with each customer. Most don't have QA, so those costs are gone. No writing and validating installers. No cost of making and shipping physical media.

One other thing is that programmers were a lot cheaper then. So the time a programmer spent actually fixing the bug was even less significant.

The costs are not gone, they became externalities. If you're a SaaS customer, and wrote some code that (unwittingly) depends on the bug, now you have to rewrite it, for instance.

There's a more subtle issue with fixing bugs once you have shipped: Some customers might consider the bug as part of the product and built functionalities around it.



Learned of this one [0] the other day, in the remarks of System.Net.IPAddress.TryParse:

"Fixing this bug would break existing apps, so the current behavior will not be changed"

In other words, the obvious method to use to validate whether or not a string contains a valid IP address can't be used because existing applications expect the broken behavior, and thus it will not be fixed. Correct parsing of an IP address is an exercise now left to the user (to get wrong).

[0] https://docs.microsoft.com/en-us/dotnet/api/system.net.ipadd...

100x is probably conservative in an enterprise pure waterfall development environment, regardless of application.

This is why so many developers can’t be left to their own devices to decide when a product is launch ready. There will always be bugs. Always. You can ship with known bugs. It’s about being pragmatic and fixing it at the right time. Not all bugs have the same impact.

You could make the exact argument about product managers..

This is why product managers / business can't be left to their own devices to decide when a product is launch ready. Not all bugs have same impact, and they are not experts to understand which are high impact.

This statement is just as true as yours. Doesn't mean you should ignore either one group to make a decision.

Ha! you're being too kind. :-)

How about life saving medical applications (insulin pumps, pacemakers etc), infrastructure (power, transportation, internet) let alone weapons of mass destruction. Then, consider subtle security holes as serious bugs to be exploited en masse. People try to air gap the hardware, and even that's not always enough (see StuxNet).

Yeah, but this is the point. It really depends on the context.

Trying to fix every bug in the pure informational website but missing your product launch? Not worth it, just fix it later. (or not at all, have you surfed the web with the console open? It is flooded with warnings and errors)

But miss the critical bug in the webshop, that cause customers to legally buy for free? That is expensive.

As a software engineer who has been mostly focused on testing over the last few years and leading a team focused on this, it's clear to me now that balancing what's worth testing and what's not is a delicate art and very difficult to always get right.

I do believe that a common failure is not investing in adequate regression focused tests. I've been on projects were these were not in place and an update lost a company millions of dollars very quickly prior to being noticed in production, I've been on projects were delaying any further to release would have lost critical learning data during a period of time that could not be guaranteed to come again quickly (for example market volatility or turmoil).

The best engineers in my opinion love testing as much as building new features because they understand that production fire fighting is the least fun activity of all.

Quick plug, I believe solid test reporting makes the ROI on test development clearer. I'm founder at Tesults - a test results reporting app (https://www.tesults.com) and I'd love your feedback on it, send me an email if you have a moment.

Even if 'it's orders of magnitude' it still must be orders of magnitude cheaper than for some other industry. I bet recalling millions of cars due to some small mechanical bug is very expensive.

Or see the case of the installer on Myth 2, that you could also say the cost was infinite, since the but on the installer made the company litearlly go bankrupt and get purchased by Microsoft.

1. It was the uninstaller, not the installer.

2. That is not at all what happened. Bungie never went bankrupt.

It doesn't really matter to me if the study is factual, because it is not about whether this study actually took place. It is not even about having a general law that bugs are always more expensive to fix in production.

It's just an idea that you can use in an argument. When you invoke this idea, what you are actually saying is: "I want to prioritize this bug now, because my experience tells me it will save cost and/or effort." You use this rhetorical device to put weight behind your argument. It's like invoking "80/20 principle" or "technical debt".

Maybe I've become cynical but the longer I've been working, the more I like the postmodern theory that almost everything is a language game. I start to recognize bullshit, and to recognize when it's the right time to use BS myself. (Not saying that truth doesn't matter. Just that sometimes you have a correct but not convincing point, and you can support it with a not-100%-scientifically correct, but more convincing point.)

> Maybe I've become cynical but the longer I've been working, the more I like the postmodern theory that almost everything is a language game. I start to recognize bullshit, and to recognize when it's the right time to use BS myself. (Not saying that truth doesn't matter. Just that sometimes you have a correct but not convincing point, and you can support it with a not-100%-scientifically correct, but more convincing point.)

After 17 years of career I can relate to this feeling, at least the start of it. I'm not at the point where I can safely evaluate when it's the right time to bullshit myself, as a way to avoid putting effort on convincing someone about intuitive feelings I have, from all my accumulated experience, on why something is good or bad to do.

I'm still at the stage where I pick my battles instead of trying to use language to empower an argument I feel strong about, I'm still too "naive and honest" and would feel it's wrong to do it.

But fucking hell, how many discussions I have simply avoided or jumped out because it would take too much energy and time to explain to someone that's been too stubborn or sure about some solution or process. I have much less patience for these, I got more in the mode of "show, don't tell" and if I feel I don't have energy for a quick demo I know it's not a battle worth of my time.

I think I'm still too much of a modernist and seeking truth...

And while 100x might not be true, you KNOW it's a lot more damaging, costly, and disruptive to have to fix a bug that makes it way to production - arguments in favor of testing, canary releases, code reviews, robust programming languages and architectures, etc. They won't stop all issues, but the less that make it to production the better.

That one doesn't take language games or political games, I think. It shouldn't anyway.

> you KNOW it's a lot more damaging, costly, and disruptive

Maybe. It certainly is more damaging to the ego, but ego is cheap. Bugs are not all created equal and have different costs to fix, and different affects on the business. For example, and architectural mistake will cost a lot to fix and can be exploited by competitors to take customers. A tiny logic error might take 5 minutes to fix, and affect a small number of users and have zero impact on the overall business. There's also tiny bugs that cost a lot (oops decimal in wrong place on bills)...

I don’t disagree with the maybe, because of course some bugs really aren’t that bad. But especially in the case of those architectural bugs, it is incredibly difficult to be confident that the impact of the bug is as limited as you think it is. Additionally, if you couldn’t take the time to fix it in dev, it is also pretty likely that it will live in production for a long time, possibly forever. And these bugs have a habit of prompting code smelly workarounds, interacting with other bugs, and generally making your system harder to reason about.

It could be the right decision to not fix an architectural bug. But it is a gamble, with a very long tail of cost for getting it wrong.

Lots of bugs in production code never get triggered or never get fixed and lots of production code eventually gets deleted. There is a real problem with confirmation bias here, as the bugs that cost weeks to fix in production are extremely memorable, and the bugs you might have thought would be a big deal but never actually had to deal with are nearly invisible.

If the idea isn't based on fact, you can't use it in an argument if you want the other side to believe you. Making up facts doesn't make them true. That's the whole point. You can give anecdotes, sure, and enough anecdotes makes for a likely story, but unless you or someone else puts in the work to test that story, it's still just a story.

I'm not saying you should make up lies. If you want to tell somebody that persistance is important, you can say "constant dripping wears away the stone", and they will understand the metaphor. It doesn't matter that this is not true for every stone or that you haven't done the experiment yourself.

Stories are nearly equal in convincing someone to believe you. Just look at US Politics right now to see that we live in a post-truth society that strongly values narrative.

so... you made my point for me? "Stories are convincing even if they're entirely baseless"? They convince those who don't care about whether something is true, and so... we should roll with that?

How is that an acceptable attitude?

No, stories are stories, they're the thing snake oil sellers sell the masses in order to sell their snake oil: put them to the test. If their story is a good story: write it down and then go "cool, so what whether this actually has any merit, or truth?".

This isn't Orwell's 1984, just because people like something doesn't mean it's even remotely trustworthy. Let alone pandering to.

> It's like invoking "80/20 principle"

Power law dynamics are empirically demonstrable. There are lots of conventions experts collect around that evidence later proves to be false.

The times to resolve the bug in code may not be significantly different at all. In the 1967 to 1981 period cited, the expense isn't putting it in the main branch of your repo and letting your pipeline put it everywhere. It was putting it on tape or disk, sending it to a customer, sometimes flying your own technician to the customer's site, installing it, maybe performing a backup beforehand, letting your customers know this is all necessary, publishing errata in a physical manual, and convincing these customers you still take quality very seriously.

Rather the last few bugs are 100x more expensive to find and fix.

That will be this much costly even if you try to find those in pre-production.

The "Y2K bug" cost an estimated US$300 billion in remedial work, having been in production for ~40 years.

Corollary: The most expensive bugs to fix are the oldest bugs.

Hm, Y2K issues were the result of conscious design trade-offs, not bugs. (Nobody ever looked back and said: oops, we accidentally left the century off of our dates.)

I do wonder what 2038 is going to bring since it goes well beyond the IT department. I get the feeling the bill is going to be massive. I'm just glad we finally were able to replace the Windows 95 box running the boilers.

I am not sure if you can answer the question. It depends on the bug of course. If data sets of a production system are affected, it can indeed mean a lot of work

For some other bugs there might not be a difference. There cannot be a general answer to this question, there are too many factors involved.

Modern development makes some things easier than it was in the past. Then again there will also be new problems.

It is also very field dependent. I worked on a web project where we could deploy with a line of command. Also worked on a project where "testing the new version for real" meant that we had to charter an airplane, hire a pilot, drive to the airport, strap things to the aircraft, calibrate the system and then get airborne. Understandably this changes the cost of errors, and thus people's attitude to quality.

Of course we could answer the question if we really wanted to. It’s not actually very hard to count the time spent on a bug and then multiply that by the salaries of the people involved.

This is an oversimplification of the problem. Nobody is interested in the cost of "a bug", but their set of bugs. The cost is not just salary, it's also "missed sales". It may be the one bug, otherwise harmless, that shifts the opinion of a major decision maker. It may be the bug that causes some amount of users to finally look elsewhere, already being frustrated from previous bugs. It may be the bug that triggers a disgruntled user to write a rant on your product that ends up as a top google search result.

Direct costs are all anyone means when they say that “bugs are 100× more expensive to fix in production”. There’s nothing in that statement about indirect or reputational costs, just the cost to fix. The factors you mention all exist, but they are irrelevant to the cost of fixing a bug. They are very relevant to the cost of _failing_ to fix a bug.

What is relevant is your definition of the various stages of development. The numbers that are the source of the “100×” figure are from the development of an anti–ballistic missile system from the sixties or seventies. They would have been programming in assembler (thus the “27 lines of code patched per change” is not unusual). Finding and fixing a bug in the software after you’ve built all the missiles and shipped them to every ballistic missile base in the US could indeed be very expensive! But if your release process is just copying the new code to a server (or even a fleet of servers) and restarting some services, then it’s not expensive at all. Most of us have a very different idea of what “releasing” the software means than the folks on this project had.


I never expected a study.

The neat 100x value, the lack of context, it sounds more a saying from grandma, which is wise, but not to take at face value.

Sometimes (rarely) it is actually cheaper to fix in production, sometimes, it is way more than 100x. I had a coworker who worked with IC cards and fixing a bug in production meant scrapping a whole batch of cards (can be millions).

The key is what is the right question:

The right question is not wether it's more expensive to fix bugs in production or earlier. The key question is:

Who bears the brunt of the cost of fixing bugs at one stage and another?

This is a parallell to Conway's law: Which cost-center benefits for NOT working with agility (not rituals) and using Dev(Sec)Ops (collaboration) to improve processes and techniques?

It's that simple. Anything else will not get oxygen to thrive, simply because of no allocated priority and funds to bridge the disconnect.

What to call this law? Maybe "The Law of Cost Centers", or something similar.

Anyone trying to fix this flaw will be fighting windmills head-on every step of the way. The big international corporations will have budgeted this due to scale and a must-have need, to be much more available and reliable than smaller shops.

It seems like there's something interesting here but I'm having trouble getting from the questions to the actual answer. Could you please state directly, what is the (proposed) "Law of Cost Centers"?

Sorry if I'm being dense!

Law of Cost Centers: The output of complex organisation will, given time, become delineated arbitrarily between cost centers. Attempts to meet in the middle will have to deal with asymptotic costs, inevitably overshooting diminishing gains from combining efforts.

In other words, with cost centers present, the whole becomes less valuable than its parts. Or stated another way: Some parts will, in presence of cost centers, inevitably take precedence from the whole, as well as other parts.

Indeed. Many custom software vendors justify their high maintenance fee (in part) with "ongoing bugfixes and improvements". What would be their incentive to write higher quality software, once they've signed the deal?

This is the classical squeeze of the employees, between management and vendors, neither of which have incentives for improvements of software due to costs. So it is in many cases up to each customer to discover all bugs, and often even specify(!) technical solutions to a vendor that maybe even forgotten the domain, mix and misplaces their branches (yay, new bugs!) and require handholding and escalations to get through to. In management's eyes, this is of course all problems with employee laziness: That employees aren't fighting enough of their windmills. Documentations of testresults and bug statistics go to /dev/null. If you're really unlucky, you are expected to lobby for others' targets, but not your own. Good times.

Sorry if this is naive, but my impression is that the statement "bugs are n times more expensive to fix in production" itself is ill-defined. How do you measure it? I can think of two ways:

1. You measure the average cost of fixing bugs before they go to production. You also measure the average cost of fixing bugs after they go to production.

I think this measurement is uninteresting, this might be because easy bugs get fixed first, and they are naturally low cost. Nothing to do with being in production or not.

2. Identify 2N bugs before going into production. Randomly select N bugs and don't go to production before they are fixed. Then go to production and fix the other N bugs after a while.

I think this would be more interesting, but I suspect that nobody does this, as this measurement itself is possibly more costly than all of the 2N bugs together.

It really depends on how the system is deployed.

For a web service, the cost might be almost identical, except reputation damage, which is impossible to properly quantify.

For a game, where almost everyone plays it on launch, if it has too many bugs, the product might be sunk, or it might be a rolling update with minor extra cost of compatibility.

For embedded firmware... Well... You might never really get it fixed, as not everyone will upgrade.

For mission critical software, it can literally kill people or destroy hardware, or cause huge costs.

“this might be because easy bugs get fixed first, and they are naturally low cost. Nothing to do with being in production or not.”

I think you’re using the wrong definition of cost. A bug might take an engineer 3 seconds to fix, but still could cost millions if pushed to production.

Certainly historically, that could mean producing thousands of CDs or floppy disks, or burning things to PROM. With products such as interplanetary probes, a bug could harm quite a few people in their career (some people do their PhD on an experiment for a Mars rover. If that crashes on landing, those designing it still have their design to show, but questions will linger whether it would have worked. Those doing a PhD on the results of that experiment truly are hosed)

And a product being delayed due to unimportant bugs could also cost millions, if I use the right definition. The point still stands, how do you actually measure it?

Bugs are infinitely more expensive to fix in production. The best example is 737 Max - how do you quantify monetarily the lost lives? The reputation damage? The lawsuits? The future contracts that were lost?

But you don't have to go to that extreme. For instance, I just heard today of a lost contract from a coworker: the client was fed up with a call management system that was dropping calls like crazy and nobody was able to fix since you could not even replicate the system (the last time they tried it, the test system pulled stuff from the production db and it basically brought everything down).

How do you calculate the cost of this? Cost of debugging, lost revenue, lost referrals, brand damage, and the toll on developers and everyone doing crisis management.. I'd say 100x is a very optimistic estimate. It can very well cost much more than that.

Trivial bugs are trivial bugs, and not all the bugs are the same. A misaligned table is nothing compared to a monolith/blob of 50Mb (either a binary compiled from C or minified JavaScript) that crashes with "segfault" or "undefined" but runs fine in debug mode.

Bottom line: bugs in production, especially on complex systems, can be very hard to identify and fix - and sometimes can cost you your business.

As much as I like to produce correct code in the first place, I think it is very relative and depends on the project.

There are projects where nobody cares much about bugs in production and it would cost a lot to make sure no bugs enter production. These projects typically tend to rely on feedback from users.

Then there are projects where production errors are just not acceptable (think Solid State Booster controller or ABS break controller). These projects tend to fail catastrophically in case of a production error.

Most projects are on the spectrum but that spectrum is very, very wide.

Because I like to produce correct code and I don't like to be pressured by the "business" to hurry up I tend to choose projects where errors tend to be costly if they get into production, but other people may have different preferences.

Now, it is important to understand what is the cost of the bug.

We kinda understand cost of production bugs (the damage to company + all actions necessary to remove the bug and any lasting effects).

The cost of removing the bug before it gets into production is less well understood. If you invest in better developers or put automation or work to imbue your developers with "drive for excellence" or delay it to give them a little bit more time -- it is difficult to quantify these costs because there is no baseline. You don't know how much the project would cost if you did not do all these things and you don't know how much more crappy the end result would be.

I thought the numbers came from:

Boehm, B.W. 1976. “Software Engineering.” IEEE Transactions on Computer SE-1(4):1226-1241.

As cited in "The Economic Impacts of Inadequate Infrastructure for Software Testing" by RTI.


Pretty sure that both of those papers exist, and the NIST table cites up to 990x and refers to the 76 Boehm paper, not any IBM org.

Yes. That first paper is Boehm, Barry W., "Software Engineering", IEEE Transactions on Computers, December 1976, pp. 1226-1241, vol. 25 DOI Bookmark: 10.1109/TC.1976.1674590, https://www.computer.org/csdl/journal/tc/1976/12/01674590/13... At the time Barry W. Boehm was at the TRW Systems and Energy Group.

The Boehm paper says "Fig. 3 shows a summary of current experience at IBM[4], GTE[5], and TRW on the relative cost of correcting software errors as a function of the phase in which they are corrected." and indeed figure 3 (page 1228) shows exponential growth. It only shows averages or ranges for each data source, and that's a legitimate critique. That said, it does show them for multiple companies, and then presents a trend line that plausibly follows from the data provided. Boehm has a good reputation, I expect that this really was a reasonable observation from real data.

It's legitimate to question whether or not that is still true. Computer "science" is notorious for having almost no science - experiments are almost non-existent. I would love to see this & many other experiments conducted to see what's true today.

I bet it was true, in the context of the time:

Card punches, shared compute time, physical packaging and shipments; imagine the cost of fixing a bug in a program in the rope ROM of the Apollo guidance computers (had they actually bothered fixing one).

I'd love to see the study redone in our postmodern OTA, client/server, no physical media world. The physical overhead cost of fixing a product bug would likely be orders of magnitude lower than in 76, but the units deployed in the field would be many many orders of magnitude higher; so cost per unit is down, but there are waaaaay more units in production. If the bug caused a material cost to the user, I could easily see it being very expensive.

As for there being no experiments; computer science is more of a mathematical discipline than a scientific one. You might say the same thing about theoretical physics, but I still think it's likely that we can't break the speed of light.

At any rate, the IBM study may not have existed, but others that took me less than ten minutes of googling to find definitely did, and do show results in the same order of magnitude as the questionable IBM study.

The question I have is - what are we really arguing for here by refuting this data, allowing more bugs to reach production? Having more broken code in the wild than we already do? Less rigorous development methods? There's barely any rigor in SWE as it exists today, anyway, yet things mostly work through a generous application of trial and error.

Does this even need a study? Isn’t it simply obvious? Perhaps less so with online applications and continuous updates but in the classical model of software development that most of the industry still depends on the costs and hassles of fixing a bug will obviously increase the further a piece of code goes along the development pipeline the more people, communications and effort it requires to right it. A stitch in time saves nine goes the saying. This reality is at odds with the cult of “move fast and break things” that seems to be somewhat popular these days. In any sector where quality is a factor this will always lead to increased expense and stress.

Of course if you’re just running a website and you’ve got a nice network effect going on, or you’ve got your customers locked into contracts based on matters other than the users’ satisfaction all of a sudden matters of quality become a “cost” and you all of a sudden have to come up with this hokum to try and discourage your developers from doing a good job contra their intuition.

I have a close acquaintance who works in medlab and frustratingly we’ve seen this mentality creeping into the equipment even they use where the system can be down for days at a time due to some software update and then they won’t even properly resource field engineers because they then refuse to reap the costs that the 100x model predicts. Certainly they’re probably saving pennies on the dime but who ends up bearing costs but users and healthcare budgets.

There are those that will quibble this is but an anecdote but you can see this kind of sloppy crack-handedness creeping in everywhere. Oh ship it today we can fix it tomorrow when we ship a whole slew of new bugs too.

Sorry. A bit of a sideways rant.

> Does this even need a study? Isn’t it simply obvious?

There are "obvious" and "trivially true" things in science that turned out to be false.

While it might be easy for certain engineers to accept, there is real value in having validated our assumptions on how we operate.

Yeah, and there's no single golden rule. Everything would depend on:

a) the bug

b) the product

There can also be cases where it's more expensive to find and fix the bug as opposed to some customer finding a bug and reporting it.

There's reasonable amount of testing you should do. You can never be sure that there aren't any bugs and at certain moment, proceeding tests wouldn't any longer be cost beneficial.

A bug could be some button's focus border being of wrong color. No one notices this despite having spent 20 hours testing, running through the site in dev, and then one day designer happens to notice it and makes the report. Developer will have to change one line in CSS, they don't even have to deploy it immediately, but can deploy it together with everything else. Depending on the pipeline of course.

In retrospect are you going to go ahead and conclude, that we should have done another 20 hours of testing to definitely spot that "bug" because it's 100x more expensive to fix it in production?

As a practicing engineer this should be readily observable in our day to day lives. Defects escape, and the further they go the more hassle it is. Only people who haven’t experienced this would need it “validated” and I think this type of thought is the same as denying climate change. Oh you have a lived experience well I have this study.

That is not to say that the economics necessarily follow the costs … and that is very likely the route of the problem.

> Defects escape, and the further they go the more hassle it is.

As a practicing engineer, what is readily observable is that this is highly dependent on context. You see how hard it is to reach any consensus without rigorous measurement? How many more hours are needed to correct a defect down the line? How much does the context affect this? Is it possible to change the context such that the cost is the same?

I can't understand people that are against having scientific knowledge, when time and time again we see that real advances are mostly made by acquiring that. Many times overthrowing what was "obvious".

> I think this type of thought is the same as denying climate change.

Climate change is one of the best examples of how having scientific knowledge is absolutely crucial, instead of guiding ourselves by what we experience.

Denying climate change is ignoring science and studies as opposed to requiring studies to be sure.

> Does this even need a study? Isn’t it simply obvious?

Actually checking what seems obvious is good science.

What goes up must come down.

And we're still testing that rule. Most recently for antimatter.

Then you learn about escape velocity (to choose an example where no "tricks" are involved).

And you end up right back where you started. Day to day, for most people in most cases what goes up must come down.

That there are “special cases” such as “flight”, “parachutes” and “space travel” doesn’t invalidate the fact that - notwithstanding additional feats of engineering and expense - if you jump off this building you will break your leg.

To continue the bridge metaphor... there's a pot of gold under the bridge. Is jumping off the bridge survivable (low bridge, water underneath, you have a bungee cord)?

Knowing the special cases has value.

Not arguing with that. Special cases don’t invalidate a rule though. Kirchhoff’s circuit laws continue to be a thing even if they break down at higher frequencies and give way to radio …

> Kirchhoff’s circuit laws continue to be a thing even if they break down at higher frequencies and give way to radio …

Do you think maybe we know this because someone decided to check the assumption?

That was the original point of this thread. You just gave an argument in support of it.

If that’s what you think then you haven’t been paying attention

> Day to day, for most people in most cases

Right but we're engineers - we're considering things outside the day to day and beyond most cases.

I think this is the correct take. Engineers are in the business of solving problems.

So if the question is "How do I make something go up and NOT come down?" or "How do I make something come down differently?" then the obvious take isn't as useful.

> Does this even need a study? Isn’t it simply obvious?

Well at the very least, commercial software used to come in shrinkwrap boxes, and now as web applications that can be updated at any moment.

Surely the cost of fixing a bug in the first situation must be higher than in the second.

The statement sounds equally obvious for both, but must be wrong for one.

> Of course if you’re just running a website and you’ve got a nice network effect going on, or you’ve got your customers locked into contracts

It’s perfectly fine as a rule of thumb and regardless of how the software comes to be in your possession it is relevant unless there is absolutely no delivery pipeline or other people involved in said delivery. Even for personal projects the recency effect comes into play and it will be harder for you to address something 6 months down the line as you struggle to recall why you did it that effin way.

Even assuming it's true (and I believe it is, if 100x is really "orders of magnitude"), there is value in having valid research on the real costs of fixing bugs across various industries and software types. It can drive innovation in the testing/QA space. It can drive investment in test pipelines. Currently, large companies likely have enough software, with enough defect escape, to generate their own metrics. Smaller companies are left to rely on the "100x" rule of thumb and guess at what level of defect escape maximizes their resource allocation.

Even if we take for the sake of argument that they are, that's not enough information to form the foundation of a policy. You've also got to look at how much it costs to find bugs at each stage of the lifecycle, and also the direct costs of any damage the defect might cause.

I also just can't believe that any of these numbers are universal constants. For example, at my current job, a bug being discovered is likely to cause a moderate amount of damage, but actually shipping a fix for it might take months. At my previous job, I think my record time to have a fix for a defect deployed to production was about 30 minutes, but I also once had a bug do more than my annual salary's worth of tangible damage in just a couple minutes. At the job before that, production defects were typically relatively low-impact and also fairly inexpensive to fix.

It should come as no surprise that each of these companies had developed a dramatically different approach to quality assurance.

Where I'm going with that is, it's not just that the research is bad, it's also that our ways of trying to use that research are nonsensical. Even if that study existed and were accurate, it would still be useless for most of us. Because it would only describe how things worked under one set of conditions that is almost certainly not the set of conditions under which you are operating. And yet, people often quote it without offering any real context whatsoever, as if it were some fixed constant that applies to everyone everywhere.

It's not exacltly that the figure's veracity doesn't matter. Garbage in, garbage out. But if the thing you're putting stuff into and out of is something like a trash compactor, then the more immediately relevant principle is, anything in, garbage out.

I think it would be wrong to focus on the number. The important piece of wisdom here is that bugs get more expensive, even exponentially more expensive, to fix the later they are found.

If you find a bug during unit testing while developing a feature you just fix it.

If the bug is found in integration tests it may pull more resources and people to process. Even more so if found during integration tests.

If found after release to customer it might get very, very expensive.

The takeaway is to find and fix bugs as early as possible.

If you want to put an actual number on how much more expensive, you indeed need to carry a study but that will probably vary with the field or type of products and may in the end not provide additional actionable insights. Especially, I think this is risk mitigation so one would want to consider the worst case and its likelihood and not only an average number. That's probably difficult to capture in a generic study and product teams will know best what applies to their specific product.

They don't though.

Bugs have three costs. Immediate corporate costs, corporate restitution costs, and cost-to-fix.

Immediate corporate costs are loss of customers, loss of sales, and loss of goodwill leading to future loss of customers/sales.

Restitution costs are costs to repair reputational damage and make good on customer losses. This element can be zero or it can be billions, depending on the exact nature of the loss.

Cost-to-fix can be trivial, or it can be months of work.

Cost-to-fix is unrelated to corporate costs. There is absolutely no relationship of any kind between the ease with which a bug can be fixed in code/production and the amount of corporate loss it caused.

There is a very rough relationship between age and cost-to-fix, but it's very approximate. If code is documented and maintained and there are no forgotten areas written by people who have left, age effects should be minor.

If the code is an impenetrable ball of mouldy spaghetti written by people who are long gone without leaving docs, you're screwed.

It's not later as 'old', it's later in the production pipeline.

If you fix a bug before production then you don't incur 'immediate corporate costs' and 'restitution costs' and the cost-to-fix is also lower the earlier in the pipeline the bug is found.

> I think it would be wrong to focus on the number. The important piece of wisdom here is that bugs get more expensive, even exponentially more expensive, to fix the later they are found.

This is always how I interpreted it.

It thought this was the obvious interpretation, since it's all but certain that no universally applicable number exists.

Let me present an axiom of software engineering: "The effort (cost) to fix a software defect is proportional to the time between defect's introduction and discovery".

Please note, that the axiom does not say "100x more", it says that cost increases (usually; it may even decrease, why not? consider a bug in ransomware that prevents it from encrypting files properly). The concrete cost may be 1.5x, 10x or 1Mx, it all depends on the circumstances.

And here's a pretty picture from the research paper on the subject: https://www.researchgate.net/figure/Cost-of-Fixing-a-Defect-...

"Research paper", ha. I came across this a few years ago, I call it the "Frankenpaper". See this gist for why I call it that: https://gist.github.com/Morendil/85336bf97211f9f31102ce2ee4e...

It might be a rule of thumb, but it's not an axiom (otherwise, there could be no counterexamples). Some software defects are only discovered years later, but can be fixed in minutes.

The variables are way, way too carried to make any broad claims.

I've fixed production bugs in literal minutes. Because the service was web API, that reported error as soon as that endpoint was hit. Sentry gave all the info I needed to know exactly what was wrong. Our procedures and tools for hotfixing made rolling out super fast.

Same bug in smartform app would have been days, and dependent on user upgrades.

Same bug in industrial or medical, etc proly months and much more dev effort due to testing verificatiin.

Point being bug cost has more to due with platform(web, app, desktop, embedded), ops environment(game vs lives at risk), and how much effort has been prespent in clean codebase, procedures, monitoring and tools. Than dev/prod environment.

It seems the real conclusion is that while the phrase captures some vague truth-y reality that people can identify with, it's not a scientific enough hypothesis to even be testable.

If a study did exist and had the 100x figure based on real data, then it would still have been paraphrased enough to get back to the same status and no one would be aware of what the original conditions and assumptions were. So in many ways it doesn't matter if it exists and so it's non-existamce is not the problem that needs fixed, but how we document our knowledge generally.

I'd say it is an engineering truism, gleaned from experience with designing and building physical things, that has been adopted by the software community.

The way I originally heard it was that problems are 1 x harder to fix during design, 10x harder during prototype and 100x during production. It's about orders of magnitude rather than the numbers. Another way of putting it is: problems get harder to fix the later you detect them.

Seems pretty irrelevant to the underlying truths. A spate of even small bugs can cost you customers. Big bugs can cost you colossal fines from data protection regulatory agencies, especially if you operate under a "we'll fix it later" attitude.

If course there's a point of diminishing losses to aim for. Sweeping statements either way on software testing aren't helpful.

And I would offer does not match the experience of those working on web apps with large amounts of at-rest data. Internet connected services are subjected to constantly changing data inputs which break software unexpectedly. The only place to see these data outliers is production. The companies that make this automated, auditable, safe and repeatable win.

Obviously this depends on context, as others have stated. One other factor should be the effort of test deployment. If it is very easy to test and deploy then it is not 100x. And may even be less than 1x in certain cases when you can’t test something but your users can. Whether or not you care about the bug surfacing is a different story. Context, people.

I know that bugs in silicon are 100x more expensive to fix. Sometimes if you find a show-stopper bug on your chip after you have "taped-out" (sent the design to the fab' on magnetic tape storage), then you just go bankrupt or the team is fired because you can't pay the ~$10 million for a new silicon manufacturing mask set.

Dynamically typed languages are so prevalent, where the approach is "blow up at runtime". I think in some contexts, this is fine. But in other contexts, it's more like a 1000x problem. Nothing bothers me more than runtime errors that would not have been possible had a good statically typed language been used.

Let’s say that we have an error in the requirement definitions. The error may make whole product almost useless.

Is it cheap to detect error from initial requirement specifications or when the product is in sales and nobody buys it because of the requirement flaw?

If we make wrong kind of products in a way or another, it is always very expensive.

I'm sure this study that may not have been actually performed has been loosely replicated by many managers using their own teams' actual data. I've certainly worked in organizations very familiar with figures they calculated for the cost to fix defects early vs late.

If you would try to unearth the "10x developer" study you will be in for a similar surprise.

Probably, but there's a grain of truth in it.

A "1x developer" will do what they're told, write code, do the 9-5 churn, go home, sleep, repeat.

A "10x developer" will / should apply a bit more critical thinking and see if there's a ready-made solution, a product, or a workaround to avoid writing code in the first place. They use their experience and, I presume, personality, to provide 10x more value to their employer or project than the head-down-and-write-code developer.

Remember, code is just a tool to achieve a goal, it is not THE goal itself. At least, it shouldn't be, I've seen enough projects where they turned code and architecture into the goal itself (think: microservices well before launch, Scala to stroke own egoes and weed out mediocre developers, writing your own frameworks or ready-to-run cloud platforms, etc)

> Remember, code is just a tool to achieve a goal, it is not THE goal itself.

Which is why I find it strange that we're hired as "software developers", not "problem solvers." But since we are, it pigeonholes the solutions, and with that code does become the goal. In most shops "I'm going to need a lathe and a mill" would never fly with management, even if that is what would actually provide the best solution. Once hired as a software developer, the business expects that you will find a software development solution.

this is the study:

"If the cost of fixing a requirements error discovered during the requirements phase is defined to be 1 unit, the cost to fix that error if found during the design phase increases to 3 - 8 units; at the manufacturing/build phase, the cost to fix the error is 7 - 16 units; at the integration and test phase, the cost to fix the error becomes 21 - 78 units; and at the operations phase, the cost to fix the requirements error ranged from 29 units to more than 1500 units" [0]

[0] https://ntrs.nasa.gov/citations/20100036670

I have heard this expression for a long time, at the time it was very logically true(ish, the number is inexact), no study needed.

Now with most update and delivery mechanisms it should be studied.

They may be 100x more expensive but they sure are 100x more stressful to debug :-) (especially on friday afternoon)

I mean, the fact that the study came up with exactly 100x is pretty suspicious

Do we even need a study for this? Production means you have limited time to make changes on there. It’s like when they went to that planet in Interstellar where each second represents a day. Time is money, you age fast on production. It will obviously be more expensive.

Yes, we need to study this. "Obviously" is not good enough. People want to find the best use for their time, to get maximum profits. An optimization can be done for some extent by relying on "obvious", but if really you want to maximize productivity, you need to measure things. You need to go from qualitative reasoning to quantitative one.

We need not just measure comparative costs of bugs in productions with bugs in development, we ways to measure amount of not found yet bugs, we need to predict speed of bug-hunting in advance, we need a lot of tools to really quantify a development process.

To make a decision we need a way to predict future events, it allow to compare consequences of different decisions and to choose one wisely.

> Time is money, you age fast on production.

Bugs is also money. So we need a way to compare them quantitatively.

At the risk of derailing the discussion even further, I'd say: yes, we do. There's quite a difference between 1.25x and 100x more expensive, especially if you're building the next AWS.

However, such a number would need to be conditioned. I can imagine that fixing an error in the i9 microcode will be a lot costlier to fix in "production" than your average NaN JS bug.

That’s an intuitive explanation for why the claim might be true.

An intuitive explanation for why the claim might not be true is that production has your entire user base testing it in real time. Recreating, diagnosing, and collecting feedback on a fix would be much cheaper.

Of course that doesn’t necessarily account for the expense to your customers. Which is why it’s important to have both a specific explanation for what a claim is claiming, and some empirical verification to back it up. This claim seems to have neither.

Bugs in production often become features. Fancy that, zero cost to fix!

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact