Hacker News new | past | comments | ask | show | jobs | submit login
Technical debt as a lack of understanding (daverupert.com)
661 points by BerislavLopac on Nov 6, 2020 | hide | past | favorite | 291 comments

I've had to explain this to non-technical stakeholders many, many times over the years, and I always use the restaurant metaphor:

If you run a commercial kitchen and you only ever cook food, because selling cooked food is your business -- if you never clean the dishes, never scrape the grill, never organize the freezer -- the health inspector will shut your shit down pretty quickly.

Software, on the other hand, doesn't have health inspectors. It has kitchen staff who become more alarmed over time at the state of the kitchen they're working in every day, and if nothing is done about it, there will come a point where the kitchen starts failing to produce edible meals.

Generally, you can either convince decision makers that cleaning the kitchen is more profitable in the long run or you can dust off your resume and get out before it burns down.

Been thinking of explaining technical debt using a book library as an example...

Say you want to start a lending library, you hire one person and stock 25 books. The stock is small and one person can easily remember all of them so the employee just piles them up in a corner. If a customer wants fiction or literature or whatever, the employee could easily look up the pile and pull it out.

Over time the books grows from 25 to 50, the stocks still small and there's just one employee so they are all just added to the pile.

50 grows to 150, you hire one more. The old guy feels that since there are two of them one can lookup the first 75 while the other can search the next 75 and organizing would a waste of time and space.

When you hit 500 books the debt starts to kick in but again you try to solve it by hiring more people. Some of the new hires want to categorize the books into proper racks but that would mean shutting shop for few days and not adding any new books. This is unacceptable to a non technical manager so things continue to be the old way.

By the time you hit 1000 books some of the employees are fed up with the time consuming work and quit, the replacements have no clue what is where. Most of the customers were served purely based on the muscle memory of the old employees and now that they have gone the business starts to crumble.

I especially like this metaphor! A codebase doesn't exist in a vacuum; a team of developers work on it and also produce auxiliary artifacts like configuration and documentation. The software only "works" so long as the whole system "works". When too much knowledge is held only by people, and isn't recoverable from the codebase and other desiderata, the whole enterprise (lower-case e) falters once that knowledge is lost.

I just finished reading The Name of the Wind, a fantasy novel where the organizational scheme of the library at a magical University is a minor plot point. Every time the school gets a new Master Archivist they come in with bright and ambitious ideas about how the catalogue should be organized and abandon their predecessor's scheme, with the result that centuries later there are dozens of overlapping and even explicitly competing systems extant in the library and many books are basically impossible to find. This, too, reminds me of basically every large codebase I've ever seen.

Oh my god, I just finished this book, and I had the exact same thought. Serendipitous

Great series! I heard the third book is set to come out some time this century.

Don't ask the author though, or he might push it back further out of spite!

Reading this I just clicked the audiobook. Sounds like something I could love working as a technical oriented data analyst after having studied literature.

Actual, if somewhat limited example:

At one particular location at one of our favorite customers it was a mess to get out from their site every day after work.

Why? Because they collected everyones id-cards/drivers licenses etc in the gate on their way in and then dropped them into a box - unsorted.

This meant for everyone that wanted to leave they had to sift through the box.

Now my colleague asked them at one point why they didn't sort the id cards as they received them and got a scolding to the tune of "can't you see how busy we are already? Now imagine if we had to sort the cards as well!".

The next time he was there however it had sunk in and leaving was as fast (or faster) than arriving.

This is a good parable, but a bad metaphor: no one would open a staffed lending library with a collection so small that the staff could keep everything in their head. The key to a good metaphor is that it should hit something familiar and normal, so that the listener can devote their mental cycles only to the relevant understanding.

There are really two properties which are essential for a metaphor to convey understanding (which is not to say no other aspects have an effect—they certainly do, but I'm just trying to narrow down the 'functional core' here):

1. It needs to be expressed in the terms of a subject already understood by the listener.

2. It should have a 'structure' which is similar to that of the new, not yet understood subject.

These two things allow an identification to be made between something already understood and something not yet understood, granting new understanding.

The idea that the library should be 'normal' is not essential; it just needs to be understandable. (granted if it became too unusual it could cause problems, but I'll just leave it up to the reader's judgement whether "a library with 25 books" is too large a stretch of the imagination)

What's really nice about the above metaphor is that its structure matches the situation with coding very closely: technical debt is about the up front time-cost of introducing new systems of organization to replace ad-hoc, unsystematic methods whose cost-effectiveness decreases as project complexity (to put it loosely) increases.

While this is a pretty abstract notion in itself, the library scenario captures the same key points and dynamics in a way that's comprehensible to anyone with an understanding that libraries use 'systems of organization' (even if they don't know an abstract term for it)—this still works even if the particular set up for the 'library' in the example would not be found in the real world.

We’ll have to agree to disagree. I think a metaphor that has nothing to do with reality causes people to think too much about it. You end up replacing one abstract thing with another abstract thing (and the reason to go metaphoric in the first place is to make the concept less abstract). The metaphor makes more sense you because you’re already deeply familiar with the concept of technical debt so you’re able to easily evaluate (2) — you’re not actually using it for understanding. If I told this metaphor to my spouse (for whom I’m often using metaphors to explain things), the reaction would be: “What are you talking about? Library with 25 books? Huh?”

> (and the reason to go metaphoric in the first place is to make the concept less abstract)

I completely agree with you there. But I disagree that the unusual library example is more abstract.

As a point of contrast, I gave an abstract account of technical debt for the purpose of comparison to the more concrete library metaphor (to show that the abstract account would be more difficult for many):

> technical debt is about the up front time-cost of introducing new systems of organization to replace ad-hoc, unsystematic methods whose cost-effectiveness decreases as project complexity (to put it loosely) increases.

A library with 25 books is not more abstract, though it is a hypothetical concrete thing [1]. If someone has difficulty with hypothetical concrete things, that could certainly make using metaphors with them more difficult—and I can see why you would add additional constraints onto what makes a good metaphor. In my personal experience, saying, "imagine a library, but with only 25 books, which is run by a single person," would be not asking too much of an audience. But I see your point that some may struggle with it.

[1] There is a handy way of considering this distinction: becoming more abstract involves making some 'parameter,' which was fixed, free instead: so for instance a more abstract notion than any particular library is: a manager of a collection of objects which are loaned out to people for limited durations of time. In that example, we free the parameter 'book' so that it isn't set to anything specific. If we instead say, "a library but for CDs instead of books," it's equally concrete since we haven't freed any parameters, we just replaced one concrete value for another. That's what happened in the "library with 25 books" example, which is why I say it is not more abstract.

>2. It should have a 'structure' which is similar to that of the new, not yet understood subject.

If you'll forgive my pedantry, at this point you're not discussing metaphors and have veered into analogies. Maybe we can meet in the middle and call these metaphorical analogies? ;)

Metaphors are when you say something is something it is not ("my wife's smile is the bright sun I wake up to"). There need not be anything unfamiliar with a metaphor or any explanatory power behind a metaphor. Analogies are when you explain or frame a concept using an already understood concept ("the car is out of gas, like when you're out of energy and don't want to run around.")

As an aside, I'll mention that the idea that "analogy as the core of cognition"[0] has been kicking around in academia for years, and comes from Douglas Hoffstader (author of Gödel, Escher, Bach).

[0] http://worrydream.com/refs/Hofstadter%20-%20Analogy%20as%20t...

> There need not be anything unfamiliar with a metaphor or any explanatory power behind a metaphor

The purpose is still explanatory, it's just possible for the explanation to be in terms of perceptions rather than structure: in your example with wife/bright sun the purpose is still to make an identification between two domains, one familiar, one not ("bright sun" is familiar to the reader; author's wife is not), but rather than the two being "structurally similar" it's that they evoke similar experiences in the author, and the author is able to explain their experience by sharing this identification.

That said, I agree with you that what I was discussing is closer to definitions of analogy in certain areas of cognitive science (e.g. Hofstadter's work, like you've pointed out—or John Sowa's: http://www.jfsowa.com/logic/theories.htm)

But the original example with the library would've been better termed an analogy to begin with, rather than a metaphor—I just adopted the previous commenter's terminology :) (though admittedly I'm more inclined to interchange the two fluidly because I don't find the differences essential: I would just call the kind of metaphor you used a perceptual analogy. Maybe I am missing something important about metaphor though—I'd be curious to hear if so!)

If your metaphor becomes detached from reality people focus too much on the side that’s supposed to be familiar.

If you're being that pedantic about your metaphors, I don't think a good metaphor exists.

I tend to agree with him. It takes too long to tell that story. In a boardroom meeting type environment people don't have time or the attention span to settle in for story time. You need to be able to illustrate your point in 3 sentences or less.

Whenever I had to explain it to management I tell them something along the lines of: When was the last time you clean your office cupboard? If anyone else from management needs to find something in there without your help how long would it take compared to you? It's the same with our scripts. They need cleaning up and documenting in case one of us decided to call it a day.

Edit: When the situation allows(tbh I rarely miss a chance) I follow up with: You have to thank our sales guys making unrealistic promises for our lack of documentation.

Thus, "technical debt"...

that is pointless. It isn't about message, it is all about messenger. If they don't want to listen to you, they wouldn't. My friend who has made his way pretty high here in the SV told me recently how the same CXX/"boardroom" people who wouldn't listen even to his 1 sentence, now have all the time and attention in the world to listen to him all day long.

This is true. You can always try to convince people but if you're not string puller, you're in mercy of listeners. If they fix on something (e.g.profit, own promotion), your story should be aligned to that(their KPI). Then only you can have hope.

When shit breaks, no one wants to take blame and people start to forget 'selectively' your earlier warnings of shit.

It could be an additional service to something else, like a café. I think the metaphor is reasonable.

What an interesting metaphor. Thanks!

Interesting. Like a month ago I came up with a similar metaphor. Only, instead of a health inspector, I focused on the fact that dirty dishes get piled up.

You put them in the sink. The sink gets full. You put them on the table, on the stove, on the counter-top, on the floor. You can hardly walk. You're working on a little corner on the table because the rest is full of stuff. How are you supposed to continue to work efficiently with all that clutter?

The manager cares only of how fast you can get food out the door. To them, washing dishes is a waste of time and money. Customers aren't paying you to wash dishes. They're paying for the food.

The problem is seeing how the buildup of clutter affects the speed of preparation of food.

Of course, quality is also affected. For example, you need a sieve. It's dirty; you used it for flour. Now, you want to use it for powdered sugar. Well, that's similar enough, right? There's no time for washing; the customer is waiting.

Totally. Another part of the metaphor I use is that messes are easier to clean right after you make them. Leave the pans for a week and cleaning things up is a real pain. Cleaning after the meal is done is easier. And easier still is to clean as much as possible as you cook.

And Anthony Bourdain has a nice related bit on how working clean reduces error and increases efficiency. From Kitchen Confidential:

Mise-en-place is the religion of all good line cooks. Do not fuck with a line cook’s ‘meez’ — meaning his setup, his carefully arranged supplies of sea salt, rough-cracked pepper, softened butter, cooking oil, wine, backups, and so on.

As a cook, your station, and its condition, its state of readiness, is an extension of your nervous system...

The universe is in order when your station is set up the way you like it: you know where to find everything with your eyes closed, everything you need during the course of the shift is at the ready at arm’s reach, your defenses are deployed.

If you let your mise-en-place run down, get dirty and disorganized, you’ll quickly find yourself spinning in place and calling for backup. I worked with a chef who used to step behind the line to a dirty cook’s station in the middle of a rush to explain why the offending cook was falling behind. He’d press his palm down on the cutting board, which was littered with peppercorns, spattered sauce, bits of parsley, bread crumbs and the usual flotsam and jetsam that accumulates quickly on a station if not constantly wiped away with a moist side towel. “You see this?” he’d inquire, raising his palm so that the cook could see the bits of dirt and scraps sticking to his chef’s palm. “That’s what the inside of your head looks like now.”

That is a very close analogy to coding for me. If I keep the code clean, the inside of my head can be clear, well-ordered, precise. And if it's a mess, I become a mess.

Unrelated to technical debt (or maybe it is and someone can find a way to make the connection) but this quote from Bourdain resonates well with me and very much aligns with why when I used to entertain guests (before covid obviously) I would not let guests try to help clean. It’s an appreciated effort but I really am that particular about how I like my space and kitchen maintained, much prefer if they nearly place a dish in the sink if they would like to (otherwise I have no issue “bussing”) and resume enjoying themselves with the rest of the guests.

Made for an annoying moment once when one guest would not relent on being allowed to clean and made the situation of cleaning my kitchen about them and their life hangups in a weird and inordinate amount. They haven’t been invited back. Bourdain is right: do NOT mess with the cook’s meez.

(Also Kitchen Confidential is such a stupidly marvelous book in general)

Great quote. Thanks for sharing.

you can always just throw away those dishes and buy new ones. they're clean and shiny and new that way!

and... there's talk of 'self-cleaning' dishes coming next quarter. we should just get those...

I think the metaphor still holds, if we consider manual processes to be disposable dishes. Each time you use one, it incurs a cost, and the costs increase linearly with how many of them you use. However, with disposable dishes / manual processes, you don't have to think about logistics so much, and in certain situations it genuinely does make sense to go with the disposable solution. You just need to know which situation is which.

Ohhh boy! The big rewrite! We're throwing all the dirty dishes out. This is what we've been waiting for! <Gleefully rubs hands> My resume is ready.

Some restaurants use disposables for exactly this reason.

On a good night I will be cleaning while I cook. I don’t want to serve dinner with a pile of every dish that I touched during prep piled dirty in the sink. In an ideal scenario, after the last course is placed on a serving dish, I have exactly two things left to clean; the pan that cooked that last thing, and the utensil used to transfer it.

On a truly good night, the kitchen ends up cleaner by the time dinner is cooked than when I started.

Often the first thing I do before cooking a meal is some cleanup.

OK, I’ve abused this metaphor enough now. Thank you for that one!

And yet the people managing engineers are supposed to be those "smart business people" that "know things" that those nerdy engineers don't understand about business. Somehow they completely cannot understand tech debt no matter how many times it's explained to them.

Tech debt is a bad metaphor. Having lots of debt can be a smart business decision. The problem is usually engineers do a terrible job telling the story around why having lots of tech debt will cause problems. Usually if you've got non technical managers the best case is they've lived though it before and know.

Debt is a good metaphor. Carrying technical debt can be a smart business decision. Constantly taking out more loans and never making payments leads to bankruptcy.

In my experience the problem is usually it's hard to quantify the costs. It's easy for managers to tell themselves engineers are just perfectionists. Someone's bonus might depend on not listening and they probably won't be held accountable when something goes wrong.

And bankruptcy is just the extreme of debt. It's also possible to survive, but be slowed by the drag of debt.

Financial debt reduces profit, which slows expansion and reinvestment.

Technical debt does the same, but slows novel development or expansion of the system.

Actual debt has a few major differences with tech debt though. First it's easy to quantify the cost of. Second and more importantly where it's good and bad are almost the opposite of where tech debt is good and bad. You want to use debt for financing when you have a relatively stable mature business where as when things are still uncertain you want to use equity.

Conversely both have the same basic effect: they magnify the consequences of decisions. Fiscal debt is essentially a multiplier on your sensitivity to revenue changes. Tech debt is a multiplier on delivery time for new features, or your sensitivity to changes in the platform's assumptions.

You are hiring them wrong.

If they can't code or aren't from a top N business school, they have no place managing engineers.

What in your experience do the top N business schools teach that make managers form there able to manage engineers, that other business schools don't?

They are more selective.

Most management classes are really glorified accounting. Staring at budget and spreadsheet trying to cut costs.

As someone who struggles with contrived metaphors for everything, I think this is a pretty excellent one :)

Although I'm not sure dishes are the best. You're not really going to make it a day without cleaning the dishes.

Deep cleans are a good metaphor, but maybe also just general maintenance of kitchen equipment? E.g. having a stove where half the burners don't work half the time, an oven with wildly variable temperature, duller and duller knives, etc. seem much more akin to technical debt as they make it continually harder to produce quality results and work efficiently. However, you can ignore them for quite a long time, and you don't need to pay attention to problems every single day. (It's fine to lose an oven for a while, but you'd better fix it quickly.)


The Capability Trap. The core idea is that you have pressure to deliver a product (get "real" work done), but also maintain or improve your ability to do the work. If you ignore that maintenance portion, you end up being able to produce less over time. And the cost to recover the capability increases over time due to neglect.

In the kitchen example, cleaning and maintaining the fryers every (period of time) means that you can get years, if not decades, out of them. But failing to do so may force you to turn them off (produce less food for customers at a time). Then you have to either replace or pay for expensive maintenance and repair work, which is usually more costly than just having someone come in and drain the system, clean it, and give it a once over every (period of time).


I thought that paper had been submitted and commented on more recently, but the last commented submission was from January 2015. So it's now submitted here:


> The Capability Trap. The core idea is that you have pressure to deliver a product (get "real" work done), but also maintain or improve your ability to do the work.

This nudged a decade-old memory of how non-profits (and to an extent, the public sector) find it much easier to get funding for specific outcomes, but incredibly hard to get funding for what they usually call "capacity building". The medium-term solution seems to be outsourcing to a vendor that can both deliver an outcome and amortize capacity building across many customers in your sector, but over the long term this leads to a form of organizational learned helplessness, and vendors that eventually switch to rent-extraction rather than delivering outcomes or building capacity (or rather, the capacity they are still building is largely the ability to secure contracts).

Ironically, the longer the vendor takes to switch to a rent-extraction strategy (and the more slowly they boil the frog), the more entrenched they become as a defacto standard, and the harder it becomes to eventually displace them (for a competitor) or replace them (for a customer) as a vendor.

> You're not really going to make it a day without cleaning the dishes.

I always do a review right before I'm ready to commit my code, to see if there are any obvious refactorings, need for comments, or style issues to be addressed.

I find this constant tidying up, like cleaning the dishes after every meal, prevents having to do a massive clean up at the worst possible time down the line.

Deep cleans seem to corner the experience too much.

To the article's institutional knowledge idea, specialties, allergies, or chemical compatibilities might be a good catch all.

The more features you add with more customers and developers from different teams, the harder it is to guarantee any dish won't affect nut allergies. Mixing orange juice and milk WILL happen, and you'll only find out about it when the wrong customer experiences it.

Metaphors are a powerful tool when trying to illustrate complex topics. They're very helpful in philosophy. I need to learn making better metaphors.

I just though of another analogy.

Software development is like a city planner with unlimited space.

The developers are also the construction workers that use cranes, trucks, and tools to build the buildings. But the problem is that they don't have time and leave a lot of equipment behind. Now there are finished buildings with cranes still assembled, cement trucks parked in the middle of the road, piles of dirt on the ground, and scaffolding everywhere.

Luckily space is unlimited so the developers build cat walks, helicopter pads, and bridges over obstructions. If city planners have a lot on their plate without random construction equipment laying around, imagine how hard it would be if they worked knowing construction a lot gets left behind.

I like this analogy because it also hints at some of the problems with overzealous tech debt work.

Sometimes especially new devs push for removing or replacing a crane even though it's working fine just because they don't understand how to use it or because a new model is available.

Cities would never get built if we replaced all the cranes and scaffolding every few months.

My favorite metaphor: bridge and road maintenance. Every now and then you have to fix potholes and repaint the bridge. Sometimes in order to do this you have to close some lanes to traffic. If you don't do this, you will accrue more and more potholes, and the bridge will start to rust. The longer this goes on, the harder it becomes to fix. If you ignore the problem long enough, the bridge will collapse one day with very little warning.

I think that metaphor is not very apt, because it doesn't capture the drag on future work. Bridges and roads are stationary.

This is why I don't like this kind of metaphor, and I don't like the Technical Debt metaphor. Because it doesn't convey the right concept. It conceals, obfuscates actual truths.

- Bad code - Bad decisions - Bad design - Lack of maintenance - Lack of proper business concept understanding

Those things mean something, you can make them more precise, but they could be actionable. Technical debt always needs a follow up question: so what is actually going on?

I use the example of publics works projects in San Francisco. There's a road called Van Ness that is going to have a bus rapid transit lane running down the middle of it. Should be really easy to do. But it's been about a decade and they're just hitting the last steps now. Why? Well, they needed to also upgrade the utilities below. But when they dug they found layer after layer of abandoned utilities, undocumented utilities, etc. So what should have been a simple operation dragged out for years because each action revealed something new.

The city now wants to add bike lanes to Market Street and is anticipating the same issues.

I think it doesn't convey any new information by calling those utilities technical debt.

The issue at hand is self-explanatory.

> I think that metaphor is not very apt, because it doesn't capture the drag on future work. Bridges and roads are stationary.

Poorly maintained roads and bridges ARE obviously a drag on future work, since nearly all other work depends on them for transportation, it drives up costs (eg. potholes start to drive up maintenance costs for vehicles and increase commute times), and reduces tax revenues (whether it is through depressing real estate prices or other mechanisms).

That resonates as I am currently trying to corral a group of neighbors in fixing up a bridge at the entry to our rural road.

The key piece I would add is to point out that it is much cheaper to maintain the bridge rather than replace it if it fails.

To say nothing of the fact that it is much more likely to fail when it is under load, so it's not just a question of replacing the bridge, but also the likely loss of vehicles and possibly lives.

Additionally, proper maintenance (or its analog to other disciplines) will usually identify when it's time to do a full replacement. Which is much better than finding out by way of catastrophe.

I wonder if this is an area where, if you can get the initial agreement, a special tax district or some kind of cooperative with dues makes sense. That way the bridge stays maintained.

Definitely! The key part is "initial agreement".

The I-35W bridge collapse in Minneapolis is the perfect example of this. A contributing factor in the collapse was the weight of the maintenance equipment they had on the bridge at the time.

> or you can dust off your resume and get out before it burns down

Last time I left a job, this was a big part of it. The company simply refused to learn from their mistakes.

Now I find myself in a similar boat again. The only thing that seems to matter is the day's emergency, and everything else is put off indefinitely. This, obviously, leads to more emergencies in the long run. And yet for some reason the company seems to be coping and is growing rapidly. Perhaps it is time to dust off the ole CV again, or perhaps this is a transient thing. Time will tell?

IME (22+ year career so far) there's a balance to be struck. If you wait to ship till you're happy w the code, you probably waited too long. Successful businesses are often built on surprisingly shoddy software, and time to market matters. OTOH, only ever running in naive startup mode -- "ship it yesterday, add features nownownow, any means necessary, shortcuts be praised" -- is a recipe for disaster. Given your strong feelings on the matter, maybe you could find a way to help surface the costs and help your org to start fixing them. Because some amount of it is inevitable, and things within a given company only get better when someone cares enough to do something about it. If it works, you're a hero and things improve. If not, you can still jump ship, but with a much better narrative: not "I got tired of the intolerable levels of tech debt", but "I did (X) in an effort to address [same], but ultimately couldn't overcome the institutionalized patterns that led to it, so I'm looking to work for a company that cares about software quality".

One way to separate the two is took look for a steady stream of removed features. A startup that pivots from trading magic the gathering cards to selling bitcoins can obviously remove a lot of dead code. That’s essentially what it means to be an agile company.

Hoarders however only try and add features which always results in a drop in quality as the UI becomes unmanageable. Long term it’s an unavailable dumpster fire in the making and a clear sign to jump ship.

In an ideal world, this is how things would work. The problem is, those features were created for customers. They will not take kindly to those features being removed. And no-one wants to lose customers.

This is why the second law of thermodynamics applies also to software engineering. Entropy only increases.

Upvoted; great point! Esp, 1st paragraph.

In an ideal world, this is how

Mark hoarded nothing wrong !

That is an interesting image. "surfacing the costs". It evokes the idea that most technical debt is below the surface (hidden by the UI ?) and so is rarely noticed by anyone other than the programmers.

The trick is, how do you let the internal problems and inconsistencies become visible to the end-users so the decision makers will decide to put time and effort into resolving them without being fired because the program appears to be falling apart ?

Thanks; and yes - I do believe, strongly, that business owners / stakeholders generally radically under-appreciate the costs of unintentional tech debt, which is mostly invisible to them.

Regarding your proposed (tongue-in-cheek?) tactic of somehow making it apparent to end users as a way to get the attention of said stakeholders: user-perceived performance is actually, seriously, the bridge you're looking for.

Companies are increasingly aware that performance (in the sense of site speed / WPO / UX latency) matters ^1. Suboptimal performance follows directly from failure to address tech debt. Efforts to improve performance are always stymied by tech debt. Thus, execs who learn to care about performance have become your allies.

1. https://wpostats.com

Does the code just look like a mess, or is there a deeper problem?

Joel Spolsky compared code quality to the apparent cleanliness of the bakery where he got his first job:


It looked filthy to him when he first started working there, but then he learned what was really needed for a "clean" bakery and what was just cosmetic.

(Been a long time since I read the rest of the article, so not going to vouch for it.)

So which is the case at your company? Is the code just cosmetically ugly, or is there something more fundamentally wrong? If they are able to continue delivering features and growing the company, maybe it's more cosmetic, at least for now?

If not, yeah, you should update your CV sooner rather than later.

Kitchen staff use a kitchen and surface clean the kitchen, but rarely are responsible for building the kitchen, performing real maintenance on the equipment in the kitchen, or the periodic deep cleaning of the equipment. If a piece of equipment is on the fritz, they cope with it as best they can; if it slows down how quickly they can get an order out, or it results in a lot of wasted product, or it forces items to be taken off the menu, then so be it. Sure, the kitchen staff may be frustrated with how crappy their kitchen is, but they've still got to get orders out and make the best of it that they can.

Customers don't see that, they only see the plated dish. The manager in charge of the front of house knows nothing at all about kitchens, but knows they're pushing out the orders coming in and not getting any complaints from customers. So from his perspective, whatever the state of the kitchen is acceptable.

The back of house manager (or head Chef) is responsible for that. If business has outgrown the original kitchen's capacity, he needs to make sure the head boss understands the need for investing in new equipment or remodeling, and the risk of not being able to keep up with orders if that doesn't happen. If equipment is fritzy, he needs to make sure the head boss is aware of the potential consequences of using it in it's fritzy state, as well as the risk involved if that equipment completely goes out.

But even if that occurs and the risks are successfully conveyed, the head boss may decide to leave things as-is. If the kitchen is currently keeping up with demand with things as-is but the waiters aren't, the "emergency of the day" for him is hiring more waiters. If the business doesn't have the capital to do both at once, he'll spend the money on more waiters first and punt the kitchen's needs. Or if the equipment only impacts a few low volume items, he may decide the risk of it going out is acceptable vs. the cost to address the issue. Or if the whole kitchen needs renovated to ensure it's robust enough to keep up with the growing demand, but the kitchen is already operating at capacity, there's just no good time to renovate it. You'll either be operating at reduced capacity while it's happening, or completely shutting down for a remodel. Both of which carry all kinds of risks - the business could lose market momentum, order quality could go down while the kitchen staff break in the new equipment. If the kitchen staff is already stretched to capacity, disrupting the kitchen is likely to cause customer-facing issues. So even if it means the kitchen staff will be dealing with a constant stream of "emergencies of the day", as the costs/impact of addressing the kitchen's issues may genuinely be outweighed by the risks of not doing so. At least in the short term.


Sometimes a company can end up in the situation you described because of incompetence or biases in upper management. But other times it can be competent and unbiased management fully comprehending the situation and deliberately deciding what to prioritize. It's also not limited to tech - every functional area of a company has to deal with their own equivalent of "technical debt", and are subject to having their needs prioritized or marginalized based on what upper management deems the most critical. As an employee in a situation where your team's needs are being marginalized, the best you can do is determine how transient or permanent the marginalization is, and whether what you find out is acceptable for your situation or if it's time to look elsewhere.

Similar with middle management - the best you can do is ensure you successfully articulate to upper management the current state of what's under your purview, the potential risks that exist for not investing more in what's under your purview, and the potential benefits that exist from doing so. Based on the results you get from doing that, you can get a read on how permanent or transient your situation is, and what the likely long term aspects are for what you're in charge with, and decide what to do from there.

You left out the part where the kitchen staff collectively quit and go work for the competitor across the street, with the brand new kitchen (VC funded).

The VC funded kitchen doesn't need you, they replaced you with a robot[1].

That aside, you're absolutely right that higher turnover is a distinct and real possibility. Which is one of the risks that should be articulated to the powers that be, and taken into account when they make decisions.

[1] https://misorobotics.com/flippy/

We probably work at the same place.

The best part is how everyone acts surprised about it at the same time as knowing exactly what is wrong.

Since we're all naming our favorite metaphors, I guess I'll name mine: It's the ol' Dull Axe analogy. There are a lot of variations and even little fables/stories around it, but the gist of it is, don't be the guy who spends all day chopping wood with a dull axe, and who doesn't have time to sharpen the axe because chopping wood takes all day.

> the guy who spends all day chopping wood with a dull axe, and who doesn't have time to sharpen the axe because chopping wood takes all day.

Also a good analogy for continuous integration. I can't implement CI because I'm spending too much time doing manual deployments...

I've seen this play at my current work for the last 2 years, but it plays out in such slow motion that a lot of people just don't realize it. Or, they do, but they've got a deployment to do so they get on with that and they'll come back to thinking about it when the deployment is done.

Nothing made me understand how awful and damaging this dynamic truly is than playing factorio. I got to see it play out in real time in the microcosm of my factory. Running on the treadmill of maintaining resources to produce ammo to kill biters and fix things they broke never making any forward progress until a dramatic shift in strategy that freed up a lot of cycles.

What's interesting is that tech debt is already a metaphor. You can take out a loan to be able to move faster and earn more, but you will need to pay interest, and at some point that will drag you down if you do not repay it.

Financial literacy is sadly fairly low, so it's not surprising to me that the loan metaphor doesn't immediately reach intuition.

In the video linked in the article, Ward Cunningham says he came up with the debt metaphor to explain to the finance guys he was working with.

Not a sufficient one though, because debt has to be paid off, whereas technical debt is much easier to ditch by doing a rewrite, suffering unmeasured business outcomes, etc.

No, debt does not need to be paid off, it can be continually rolled forward, as long as someone is willing to provide you credit.

When a government or a corporation has bonds due, it normally pays off the principal by selling new bonds.

The analogy with technical debt is honestly pretty good. As your creditworthiness declines, the cost renewing debt increases as creditors demand higher and higher interest rates.

It’s not (only) decision makers that need an explanation, it’s developers. Too many confuse technical debt with technology.

Maybe your kitchen is clean but not so optimized with classic tools. Instead of cooking each meal manually, why not build advanced machines for each category of food, on different sites (microservices and components)?

At first it’s very efficient, but surprise it’s not McDonald’s and the menu needs a change. The first version of the kitchen can adapt relatively fast, but in the second it’s less trivial. It can only accommodate small changes (parameters), otherwise the machines have to be completely reworked. The difficulty is compounded by the dependence of the machines to each other on different sites.

By definition the second version has more technical debt despite having the newer “tech”.

Still good thing about kitchen is fixed menu (either small or big). Sadly today's agile software-world, nobody works hard to know customers and fix deliveries. It is like, you'r running italian restarunt. But if someone wants indian curries, you will run to market, will buy ingredients, learn recipes and deliver before customer leaves.

So Final result -

- you delivered food which looks like Indian cuisine but doesnt taste like one.

- now your customer is unhappy

- so your boss is unhappy

- you're tired after doing long circus but you feel good as you learn little bit about Indin cuisine.

- Now let kitchen die (anyway your boss is not happy) and start applying Indian restarunts saying you can also cook Italian

- Mean while in next day, you will do same circus for Japaneses cuisine

- Now you will start applying Japanese restarunts also, saying you can cook Indian and Italian

A slight twist would be to argue that a customer will get food poisoning. This avoids the metaphor needing an external force (health inspector) and instead correlates well with production breaking from poorly tested/rushed code.

Disagree - I think the point is that for a restaurant, the health inspector will shut it down first. For the code, you won't have anybody to shut it down before your customers start getting food poisoning.

It seems like you're arguing for my point. Code doesn't typically have external orgs playing roles like health inspectors so health inspector could be removed from the metaphor.

I've used the document drawer metaphor in the past.

When you start getting paper documents, it's okay to put them in their own drawer and you're done with it. But they keep piling up, and after a while, if you do nothing, you will keep searching through a big pile until you find the one you need. You need to organize them into categories and folders. You need to rethink the way you store them.

Or like having a mess in your room.

Its a great metaphor.

The thing is that if you talk to any professional chef, keeping the kitchen in clean, organized, and well prepared is absolutely paramount. Its the one thing they drill in to you in chef school. Its core to their culture. Kitchens are hard places to work because everyone is expected to stay in line, and keep their workstations 100% perfect. Discipline is required. If asked how long it takes to deliver meals, no one would dream of not including preparation and cleanup time, because its core to the culture.

Software has the opposite culture. People glorify "hacking together", always look for easy short cuts being new languages, frameworks, or straight off copying from stackoverflow. No one wants to do hard work, learn anything properly, or architect anything for the long term. Everyone just wants to go to a weekend hackathon, put something together with tape and rubberbands and somehow make it big. It never works. Our culture is destroying so much of our work.

This is one reason the relationship between engineering and whoever needs the software should be "customer-supplier" rather than "boss-subordinate".

Because a restaurant customer ordering "a ribeye steak, and please don't wash any dishes" is absurd.

Depends on the billing model, right? Hourly contracts create the worst micromanagement situations. As an employee I can just decide that something needs doing & spend the afternoon on it, and my boss trusts my judgement. An arms-length customer isn’t going to be easily talked into paying for that time.

I prefer to say that code is like a Rube Goldberg machine.

While it technically does the job, it is over complicated making it hard to add new features and easy to introduce bugs.

If we rush to add a new feature, it will be like adding another crazy contraption making it even harder to make changes to in the future.

But if we take a little time to clean up the code, it will be like a well oiled machine giving us a solid foundation to build new functionality on top of.

Another interesting point is that although Rube Goldberg machines are complicated, they are extremely decoupled from start to finish... I've never thought of decoupling code in a negative way before...

In my experience the fields which are seen as critical and involve software, are quite regulated. For example finance. There are lots of various standards that are enforced, for example PCI-DSS for credit card processing and so on.

Yes, programming is not regulated, but many business areas which are basically result of programming, are.

Finance eh? Experian, Capital One, probably a few other data breaches I'm forgetting. All of them were PCI compliant. But hey, at least we got away from TLS 1.1...

Think of laws and regulations as another form of code.

They can have bugs. That's why they are continuously patched.

I don't like this metaphor at all. What are the features? The meals? Every day, the same meals? With clear recipes and easy estimating when they're done. Hundreds of features everyday and doing the dishes pretty much resets everything to a perfect clean slate for the next iteration of hundreds of features. Everything else needed for the features, stove, fridge, utensils, deteriorates on a totally different timescale, years and are always completely replaced.

I think the embedded video raises a very good argument that metaphors is dangerous and can/does shape thinking. So I don't want this to catch on and get people arguing with their stakeholders about "we need to do the dishes" when there are heavy refactorings that need to be done. :/

This requires incentives that promote long term thinking. It's been my experience that in most organizations the buck stops with a product owner/manager. I can't think of a worse person to be deciding if we have time in the sprint to refactor. I advise teams to go up as high as you have to in the organization until you find someone incentivized to deliver value over a longer time horizon, someone who will feel the consequences of too much tech debt. Often no such person exists. At this point it should be clear to everyone what the future will look like and they can make their own personal career choices.

THANK YOU for this apt analogy! It's about as close to perfect as I can imagine. (I've been doing software for a living since 1998 and somehow never encountered this one before.)

> Generally, you can either convince decision makers that cleaning the kitchen is more profitable in the long run or you can dust off your resume and get out before it burns down.

A better commercial kitchen analogy might be mise en place, since it's literally about productivity and is completely uncontroversial.

Another one. Say you want a jet that can fly much rather than the previous one from a few decades ago.

So you start to design a new jet. But the executive has no time for that. So he says to just slap some bigger engines on the old one and add some software to compensate for how much it wants to stall.

For many startups the business model is to sell the company before severe technical debt becomes a problem, or, before you really have to understand your code.

There is a whole section on this in 7 Habits.

Sharpen your saw.

Imagined reply: how do lines of code get dirty? I never heard that story before. You must be kidding me.

You might want to clarify that this is a devil's-advocate kind of point. I thought at first you were serious and apparently so did someone else.

by being an unmanageable mess.

That's a good analogy. I'm going to try that one next time I run into problems.

That's a great example, thank you for sharing it.

Software engineering is different from many other fields. You're evolving a product over a long period of time, each step has uncertainty and requires creativity and exploration, and you can change not just you process but the materials and tools you work with.

It's tempting to find a simple metaphor to try and explain why technical debt matters, but if you ignore the things that make software engineering unique then you're probably not helping your case. Also, equating yourself to fast food workers is not a good way earn respect.

I more and more think we should challenge that difference. IMO software engineering is bad in many ways because we lack experience. And we do lack experience because we do not repeat what we do. Most developers work on legacy code that is older than their own stint at the company. Yet, the constantly ship new features. That means they are constantly working as maintainers and producers in parallel.

My suggestion would be to have dedicated producers and dedicated maintainers. And software should have a planned shelf life. You should buy a piece of software the way you buy a house. Huge up front investment followed by a long period of negligence interrupted by frantic maintenance efforts.

We also insist that we're different instead of examining the lessons learned from related disciplines. Perhaps the closest to software engineering is systems engineering. They have a huge body of work that closely parallels what we've done in software engineering, along with many ideas on how to model and understand both our systems and our approaches to development and organization.

But as an industry, we seem to want to make our own path, rather than learn from others.

Many other field are constrained by the limitations of the physical world. To build a house you have to source the materials, transport them, assemble them, and so on. That's the bulk of the work.

In software things of that nature can be automated. The only things left are the things that are custom to a project.

When managing a team you also have to deal with human psychology and I've noticed that many get bored doing maintenance and want to leave their mark designing and building new things.

We thus rotate maintenance/new and difficult and fun work based on the energy levels of the team.

Unpopular opinion, probably here, but definitely among people I've shared this with before: The phrase technical debt is a red flag itself and probably should never be used.

Most people seem to get that the issue of technical debt is really a business issue about tradeoffs. So, if the person who is concerned about technical debt is senior enough they will have some ability to see the business tradeoff and thus can completely avoid the phrase. They'd say instead something like "I was thinking that if we changed X we would get value Y."

A person who uses the phrase Technical Debt is signaling that they aren't capable of saying something that direct.

That's one issue.

And then the other issue is that a huge amount of technical debt is just code that a more senior engineer would refactor as they went. It's a junior engineer who says, "this is a huge mess and I need a full stop of my contributions to the business so that I can focus on untangling it."

But the problem with junior engineers doing refactoring/rewriting is that they end up creating a different mess of code and bugs.

Am I right 100% of the time about this? No. But I'd love to raise for engineers here that although the phrase technical debt caries a lot of emotional weight, it also brings a lot of baggage from all the times management has heard the phrase from someone that didn't understand the bigger picture. With a little bit of work you could understand the business implications and then end up completely avoiding the use of the phrase.

> And then the other is that a huge amount of technical debt is just code that a more senior engineer would refactor as they went.

I'm quite senior, and I don't think this is true.

Refactoring "as you go" is a no-go for a few reasons: (1) Roadmap (read: business) goals need to be reached and unless technical debt is a part of roadmap discussions -- it's usually not, because business people don't "get" code -- then it will naturally be of lower priority. (2) On a technical level, technical debt changes/re-arch should not live in feature branches. On a deeper technical level, technical debt pull requests and reviews will also usually incur a higher level of cognitive load (and thus will take longer to merge). Conflating technical debt and new features is a Bad Idea™. (3) This final point is a bit "20-20 hindsight," but with proper roadmapping and architecting (and unless you're building an MVP or proof of concept at your two-person startup), technical debt should basically not exist. Technical leaders should always push for high-quality code (even if product timelines are stretched) and technical debt is, generally speaking, a byproduct of bad technical leadership.

I think you just fell into the trap op was referring to. Tech debt is used too broadly and has become a keyword without any real understanding.

No matter what you do, there will be tech debt of some sort. There's no perfect solution when you unpack a problem.

My metaphor for this is a stocked product warehouse. Organizing the shelves of product and keeping it that way is obviously the right solution so people dont have to hunt when pulling. That requires someone to maintain (typically the floor manager). That's not a random schmuck you can hire for min wage to do (oddly enough). You now have a form of debt requiring you to vet a proper individual to do that job if its opened up (time spent finding someone in a smaller pool that typical temp unskilled labor). Next you have pulling with forklifts. You're supposed to hire certed folks for the job, thus a skilled worker. They cost a bit more than people who can only toss crates. But the trade off is moving pulled pallets faster. Thus you have a higher per worker cost, but higher efficiency.

Tech debt is what you're willing to trade off. Highly abstracted code in an esoteric language while super efficient and well maintained requires vetting the right devs from a very small labor pool. You lose too many of those, your business can suffer greatly if you can't fill those spots quickly. Does the project require that? Can you get away with common, off the self libraries in a mass market language and structure the code where you can take low skilled devs and ramp them up quickly or ideally need no ramp up to continue working? Over engineering, no matter how well done and beautiful is not always the answer. Like, you dont want an Formula 1 designer making everyday sedans.

Beating that dead horse, tech debt is what you're willing to put up with depending on your circumstances. Can it roadblock you? Sure. But that's just piss poor planning or forgetting the original purpose. A lot of times when tech debt is brought up, it's like the person is leaving the possibility open to place a Falcon 9 rocket on their car. It may not be that the software architecture is wrong, sometimes the idea is just stupid.

I am really enjoying the way you are expressing this argument, and I think you are mostly right.

I’d like to add a caveat though. At least the places I’ve worked, tech debt is a polite and convenient way to provide political cover to all individuals.

“We made some dumb choices and I don’t want to point fingers but the more we let this go the more exponentially time consuming to maintain this will become,” and/or “some of our team can’t figure out how any of this works and we don’t want to say those people are _dumb_ exactly, but working this piece of the application could improve our ability to make modifications and generally improve the competency of our team” or maybe even “this star engineer gets obsessed with this code-shape and we don’t want to lose her, so we should let her write it ‘the right way’ to keep her happy.”

These are examples were “tech debt” is being used not necessarily for lack of understanding about the problem, but rather as a shorthand to discuss a *mostly already understood problem” while providing political cover to everyone involved. Maybe management insisted on some dumb things, maybe engineering did some dumb things, maybe if we don’t do this dumb over-engineering task we’re just going to be distracted by it for the next year.

When the “tech debt” is NOT a clear trade off being discussed using those words as a shorthand, I think it’s that there’s a lot of little issues that can come back to haunt us in a wide variety of ways that we already know how to fix (eg security, documentation, efficiency of a core process could sink us if we have unexpectedly good success, whatever, and general “risk reduction” activity that is a million little things that are too plentiful to identify in isolation with any amount of efficiency). Calling it “tech debt” in this scenario is an admission that we don’t understand the risk profile of our own code well enough to make good choices, so we’d like to spend some time organizing the code so that we 1) can understand it will enough to maintain and modify and 2) better understand the potential risks of our code (and potentially mitigate some risks along the way).

It’s sort of like, you’ve got a bunch of floor managers, but you had them all just tossing product into the warehouse as fast as possible rather than doing it in an organized way, and now they’d like to maybe do their job so that we can actually have an inventory and get what we need when we need it. Yeah, you hired floor managers, but it doesn’t magically get organized if you reprioritize them to shove everything in in whatever way is fastest. Ie, it was a dumb idea to do it that way, and maybe some of that dumb is on upper management for telling the floor managers to “just shove it in ASAP” and maybe some of that dumb is on the floor managers for not confronting upper management about it before starting, but now that we’re here, let’s just agree that there was some dumb that happened.

Disagree on point 3. Technical debt is not bad code. It's expedient decisions made to achieve short term business value at a long term cost.

On point 2, you're right they should be separate, but I think the solution is, refactor to make your solution easy, then build the solution.

Point 1, if you have a significant rearchitecture to mention on your roadmap, do so. But don't make the mistake of thinking business leadership cares. You'll get farther with strong technical leaders who can push for appropriate timelines, not haggle over technical tradeoffs with executives.

Parent post is terrific and makes a lot of great points.

I think technical debt can be either. I agree with you that it isn't by definition bad code, but it can also be bad code. Let's also not assume an ideal case where every bit of tech debt was the best possible code that could be written under the time constraints of the time... with absolutely no influence due to lack of skill, ownership, motivation, etc. of the author.

Tech debt is a luxury that only a measure of success affords.

If you work in entprise and find bad code that's just plain old vanilla bad engineering and we should carve that out of the definition.

If you work in early stage or barely mature then the luxury of defining and scoping tech debt is something that should be treated as a necessary step to maturity. After all, whatever you think of the code or design decisions you've found, it has clearly justified itself just on basis that the company is successful enough for you to be there finding it.

I think this is more of a survivorship bias thing. If a bunch of companies write bad code and a few of them survive, so you work at those companies and see their bad code, it isn't necessarily true that permitting bad code enabled them to succeed. It may be that writing better code would've made them even more likely to succeed, but the advocates of better code were not able to make that case to stressed out management.

I've almost never seen a situation where intentionally bad code has helped even in the short term. Maybe bad code was the best that a particular engineer could do, and nobody else was available to do it, so that made it acceptable. But any engineer writing code less well than they are able to has never led to anything good in my experience.

Even in 45 minute interviews, if I take shortcuts, I usually end up creating more problems for myself than if I had just written code the right way in the first place.

I’ve worked in a startup where they were unable to deliver integrations in time because previous work had been done “quick and dirty”, and significant amounts of money were lost as a result.

Don’t kid yourself. That is technical debt, it was our fault.

Technical debt isn’t “bad code” it’s what stops you from shipping features; and it’s more important for startups to manage, not less.

What youre referring to is very early stage stuff, where the total LoC is so trivial that a few people can just remember all the details of it in their heads, so it doesn’t matter how messy it is.

...true, but there are finite physical limits to humans; you can only keep doing that for so long before you flub it up, and lose a lot of money for your employer.

This is one of those moments that people just flat out fail to have intuition about.

Same process, same results right?


Same process with diminishing returns <— reality of software engineering over time.

So specifically:

> After all, whatever you think of the code or design decisions you've found, it has clearly justified itself just on basis that the company is successful enough for you to be there finding it.

This is wrong.

It was the correct decision/code at the time for the scale at the time, which, since you are there looking at it, it is probably no longer suitable for.

Not intuitive is it?

That's why I always thought analogies comparing it to physical engineering were never helpful.

Software unlike physical stuff doesn't rot/depreciate, the world instead changes around it while it stays constant. Business context and requirements change around it changes to the point where the software itself isn't as useful anymore and it has to be contorted to do something it was never designed to do over time. That's it.

In tech where things the world can move fast this is particularly true.

Software suffers from bit-rot and it the value of the asset does depreciate in accounting practices.

My experience with startups has been that "tech debt" is more the result of people not knowing what they were doing. Not so much "we haven't quite found market fit and need to try different segments" but technical incompetence. This is because startups tend to be cash-strapped, so either you have non-technical people basically learning to code on the job or they hire very junior developers.

For example one bit of advice I've heard is that things like unit tests and a CI pipeline are a luxury for early-stage startups. An experienced developer would probably spend a day or two setting things up with, say, Heroku and a Gitlab pipeline, and at least write some unit tests for core business functionality and a few tripwire tests for the HTTP endpoints. They've done the same thing a few times and could probably just pull in a starter template for 80% of that work. Very little time spent with zero impact on velocity but lays a good foundation for later.

Another junior dev tell is getting power hungry with the freedom of a greenfield project. For example, rather than going with the boring but stable PHP framework they used at the last company, they want to do all the cool things like serverless or JS framework du jour or Kubernetes or GraphQL, depending on whatever Medium or Hacker News post they last read. An experienced dev will use "boring" technology they are familiar with as much as possible, and adopt or write new tech when there is an explicit need. For example, if your startup's secret sauce is an ML algorithm connected to an HTTP API and web app, you want to focus on that ML algorithm and tweaking and optimizing that, not figuring out how to get authentication to work with your serverless/graphql/svelte cool stack.

I'm also a senior engineer and I think you're wrong. A lot of technical debt you can just refactor as you go, but a lot of developers don't do it. Whether that's skill, or just not seeing it, time pressure (usually completely false, and self imposed, often caused by under-estimating and over-promising), or even just laziness, I don't know.

I don't know what it is either but some people have refactor itch and others don't.

Making code changes means one takes on the risk of breaking things. I think the people who don't break stuff are the ones who continue to refactor. The ones who don't either are scared of breaking stuff or had a bad experience with breaking stuff.

I suspect the people who refactor have what can be describe as perfectionism or obsessive compulsive.

I'm in that bucket. I can see certain code which is too complex or didn't handle a requirement well and I want to get it to be stated more simply.

> A lot of technical debt you can just refactor as you go

You either didn't fully read my post, or didn't understand it. It's not that you can't, it's that it's a bad idea. I would instantly reject any pull request that wasn't compartmentalized to the issue at hand. It's an organizational nightmare to handle both features as well as refactoring in the same PR.

But obviously to each their own.

In my experience knowing when to refactor and when to not is a main distinguishing factor for a senior vs junior role but seniors generally account for that in their planning and knowing when to push some additional time to prune the code. Similarly identifying when over-engineering is occurring is the flip side of that coin. One is preventative and one is reactive but they are both important.

That's why you need good engineering managers. If you can convince people that something is important then all of a sudden the business finds a way to make it happen.

What would happen if you said you need another developer to hit that delivery goal. Good chance you'd get it, right? A good EM should be able to articulate the value from non-functional work and refactoring.

Depends how big the debt is. Did you borrow money to buy shoes, bike, car or house.

The former might be a big deal for a junior but can easily be paid off with pocket money by a senior (refactored on the go). While the later more naturally should be part of roadmap and business decisions, even for a senior.

To clarify the different levels, first could be something trivial as not following naming standards, missing unit-test or using global variables instead of DI, and the bigger debts would be whole stack depending on some vendor cloud lock-in.

I've found that tech debt (I also find the term to be a bit too "fuzzy," but there's really no better term that comes to mind) can be non-code stuff, like documentation. It can also be things like unit tests.

Both of those can be anchors that can assure quality and coherence, but they can also become anchors that prevent the ship from moving.

Documentation is a huge pain for me. I hate writing it (but I'm actually fairly good at it). Code documentation is something that I've inculcated into my personal process[0], but "product docs" can be a whole different story (not just things like PDFs, but also support sites, etc.).

The biggest problem with both documentation and unit test suites, is that they can easily become "concrete galoshes"[1], that add a great deal of work to a lot of product modification and improvement strategies.

Can we call something that prevents or makes extension and modification more difficult "tech debt"? Maybe, maybe not.

It's definitely possible to write unit tests in a fashion that helps to modify codebases, but that takes a fair bit of planning and care; not something that our industry is known for, when we are in a "ship crunch" (I have known many organizations that deliberately try to maintain a constant state of "ship crunch," because their managers believe it to be more productive). I tend to prefer test harnesses, over unit tests, because of this[2]. In fact, just this morning, I was lamenting that I need to make some changes to a heavily unit-tested codebase that may cause issues with the unit tests. The fixes need to happen, even if they pooch the tests, but I will need to go back and re-run the tests, and fix any that erroneously fail. It's a really big job, for this codebase. My testing code is generally much larger than the code under test.

[0] https://littlegreenviper.com/miscellany/leaving-a-legacy/

[1] https://littlegreenviper.com/miscellany/concrete-galoshes/

[2] https://littlegreenviper.com/miscellany/testing-harness-vs-u...

Agree. It's just tradeoffs.

Some projects don't warrant over engineering because the requirements are often not clear in the beginning. There's a cost in terms of resources and opportunity which often doesn't make sense to incur at an earlier stage of product development.

As the first few versions play out in the market, the requirements are better understood, the product starts fitting the market better. Obviously, a lot of the earlier choices don't seem optimal for what the product becomes at a later time. Yes, theoretically, "it could have been done better", but nobody could have known for sure. It should simply be refactored so that the design of the software better fits what it is doing at a given time... and this should be an expected way to develop products.

I think "technical debt" is often the wrong term for what is going on. I'm sure there is a better analogy in terms of business and finance to what is happening... but I'm not well-versed in finance to say what it is.

I work in an huge enterprise. We have incredibly customized software and stacks that have not changed much for 30 years, because they did not need to.

Now the people who wrote those stacks and who understand them are retiring/quitting. Kids coming out of school don't want to learn these systems, nor do people off the streets. You can only pay people to come out of retirement so many times to keep the plant running. This is above and beyond mainframes, and is intertwined deep in the code that powers every single application that runs the plant today.

We can't run off the shelf software on-prem, a huge level of customization is needed to bring it in.

We cannot pivot quickly to new things or support new languages.

We really struggle to add new features/releases and add new software to drive revenue. The IT overhead that just goes into keeping the plant running every day is astounding.

This is what I think of when I hear technical debt.

Going down this road did give us advantages for a long time, but now we're in an enormous crisis. It's not an insurmountable challenge, but I would be surprised if there aren't a lot of large companies who are brought down by their technical debt as faster moving competitors move around them. I certainly feel that unless we get our act together, we will be disrupted.

I found this response surprising.

Code that has worked for 30 years is more technical debt than the ability to support 'new languages' or 'pivot' the code? Maybe I am an old fart, but the opposite seems right to me.

Large code bases are difficult to add functionality to.

A code base that is easy to pivot and switch languages sounds more like a nightmare. Isn't it more likely that your new, zeitgeist-capable software would torpedo development in far less than 30 years. As I understand "new programmers", the 'reinventing the wheel' speed is measured in a few years not decades now.

Or did I misunderstand you?

I think it represented poorly. It's more than just code in one system. It's systems built upon systems built upon systems. It encompasses our network, our software deployment stack, our proprietary extensions to standards and much much more. Unknown dependencies on unknown dependencies on unknown dependencies (and it's not like we're slacking on trying to map that/keep the asset inventory up to date).

It's basically paralyzing. It's so hard to get a release done, add capacity, or add new features for our lines of business (we have dozens!).

How do the execs plan to handle the problem?

How aware are the execs about the situation? Or is it primarily the engineers, maybe technical managers, who can see the problem currently?

(Is it ok if I ask what's your job role?)

Job Role/Further details are risky to discuss because this forum is read by my colleagues and likely the nerdier execs. I could leave it at I am someone very senior and actively involved in trying to tackle our problem, so I see the efforts and the challenge first hand.

I can tell you that execs are very aware of the problem. Higher ups have spoken about it at townhalls, though they use softer language than I do. Since 2017 a lot of modernization attempts have been made(go cloud, use standards, use off the shelf software as much as possible), with very little to show for it so far.

Obviously it isn't all just tech that causes this, culture has a big part to do with it.

It feels like the scenario in the phoenix project almost, it'd be funny if it wasn't so serious.

Resonates with some big corp's I've worked with in the past; especially in industries where technology (rather than old business models) are becoming the primary channel of sales. Technology companies with technology management often disrupt the companies managed by the old way of thinking (e.g. finance, insurance, etc).

Some of the things you mention are red flags though at least to me having worked in them before - for me they normally make me question the companies management. The biggest one seen in a previous place I worked IMO - is buy vs build as much as possible off the shelf. How many successful tech first companies who have disrupted actually use that model for their core platform? The companies I've seen get away with it until they face competitive pressure OR technology isn't their primary advantage. In fact as you get to a certain size it can make sense to build your own and reduce your vendor count + ongoing costs and take advantage of your economies of scale. How many big tech firms rewrite db's, parts, dev-ops tools or are at least open to when the advantage is there? The successful tech first companies often do, even if they were once things like book stores where it "wasn't their core business". They even open source their components often to support their business and give their tech people more cred; allowing them to attract even more tech talent. The most successful/nice to work tech places usually err to building when it concerns their platform, with some pragmatism thrown in to use modern tooling from elsewhere if required normally open source but can be bought if it offers nothing of differentiation (e.g. cloud products, databases, etc).

Cloud is just a potential enabler IMO; you still need the culture to execute. A big corporation has a lot of interacting requirements, and needs the long term flexibility to change it without being on the hook in a vendor's backlog competing with other firms. It also potentially leaks your roadmap to other competitors. Common business software (e.g. document writing, email, chat, etc) are the exception; if your a big corp your usually in a monopoly/oligopoly position - there aren't too many people doing what you do at the scale you are and most vendor solutions are really just "outsourced builds"; where the long term flexibility as the vendor pivots/changes deteriorates.

That is definitely not technical debt.

Technical debt is the idea that just like you'd take a loan for your business to grow faster, you can take shortcuts while coding your first versions (for eg. not having adequate amount of test coverage, not making it modular or extendible etc.) to get the product out or meet a deadline. Just like your business loan, you get into technical debt knowing that you will eventually pay it back (i.e. write additional tests, refactor etc.).

Not every software problem is technical debt.

While your problem seems much larger, the same problem happens in smaller scales in every software company when engineers quit and leave a complicated codebase behind for new hires to take over. You should be looking specifically for people who have experience working with legacy code in your next hiring round.

I don't know that it has any actionable information for you (or anyone, really...), but what you describe sounds a lot like the situation in The Phoenix Project.

That is definitely not lost on me. Life imitates art.

Agree - I came here to say the same thing.

Most engineers have an eye for quality and maintainability which is great, however what they sometimes fail to see is that in a commercial reality, rapidly iterating on features to be first mover advantage is "worth" the piles of dirty dishes incurred, even if it's a huge headache later on maintaining it.

Often it's an existential threat - you either move fast (and often do the 95% solution to ship) or you are gone.

I find quite often engineers without the big picture/reality of the business can often forget that everything is an engineering tradeoff and maybe copy+paste and a bit of spaghetti is your friend today because your firm just expanded.

All of the previous analogies about libraries and dishes need an added layer: if they shut down for a week at the start to re-organise and clean, perhaps they would have gone bankrupt and the stupid solution may have actually been the best one all along.

Most engineers can tell the difference between things they wish were better and things constantly in the way of things the business wants.

> wrong term for what's going on.

Exactly. It's such a loaded term and it can almost always be replaced by saying something specific.

I didn't know it can be a loaded term (for some people). What are some things to say instead? "Past mistakes" maybe? In some cases

> The phrase technical debt is a red flag itself and probably should never be used.

I think the counter argument is that not everything needs to get turned into a negotiation around justifying business value. Each developer should just by default get 25% of the year to improve the code they're responsible for. If a developer wants to spend 6 or 9 months one year doing some big refactor then that's one thing, but it's not a good work environment if there needs to be weeks of meetings every time someone wants to fix deprecations so that they can upgrade some library or whatever.

Oh for sure. I believe this 100%. A lot of the root cause of technical debt is inability to prioritize. Every stake holder in the company wants to hear that their project is being worked on and so everyone is working in parallel and thus there are tons of meetings for coordination and status. And if everyone would just slow down they would actually move a lot faster and could afford to bake in slack time for craftsmanship.

Thank you for this comment. It matches quite closely with my thinking on the issue.

A generic term like “tech debt” is lazy and does not capture anything about the business value or decision making process that led to the code or would lead you to want to replace it.

For example in many organizations you make a completely justified business decision to put less time or more junior engineers on a project that isn’t that important. This isn’t tech debt, it’s a conscious decision made by completely rational people.

On the flip side, if that project that was of lesser importance becomes important again you can make another completely rational decision to go in and clean it up to make it faster/more reliable/simpler or whatever the goal may be.

Debt you can afford is still called debt.

> So, if the person who is concerned about technical debt is senior enough they will have some ability to see the business tradeoff and thus can completely avoid the phrase. They'd say instead something like "I was thinking that if we changed X we would get value Y."

In my (somewhat limited) experience the kind of value that you get from reducing technical debt is not easy to explain to the business side, because it doesn't immediately translate into new features.

> And then the other issue is that a huge amount of technical debt is just code that a more senior engineer would refactor as they went.

A limited amount of spaghetti can be untangled during the normal development process, but I don't think significant changes in design can be done while working on features. Once you get to that point, inevitably you will have to set time aside specifically for dealing with tech debt.

Not to mention that the kind of organizations that accumulate tech debt rapidly tend to also complain if you spend more than the minimal amount of time necessary on a feature. "It took you, a mid-level dev, 3 days to finish a feature that a starter could complete in a day by copy-pasting existing code with some SO stuff? You must not be too good at your job."

> In my (somewhat limited) experience the kind of value that you get from reducing technical debt is not easy to explain to the business side, because it doesn't immediately translate into new features.

This is because, regardless of whether people agree on its definition, most of the things that fall under "technical debt" incur indirect or delayed costs for the business.

The managers have, at some point, a choice: We have $X. We can either spend some percentage on churning out products that directly and immediately make us money. Or we can divide it between that, and maintenance activities.

Choosing to neglect (or underfund) the maintenance activities doesn't cost the business anything for some period of time (the delay). And performing them does not (within the period of that delay) bring in any revenue. It is only a cost to them, it has no visible upside within their time horizon. By the time the costs start catching up, it's death by a thousand cuts. Usually they'll go from 100/0 dev/maintenance to 99/1, then 98/2, and so on. Then one day it flips from 80/20 to 20/80 and they realized they fucked up. But by then they've been promoted, so who cares?

You have to convince them, and perhaps they'll only learn by being burned (but I doubt it), that they should extend their time horizon by a month or a year so they can see the consequence of this decision. I've never succeeded in this.

At my previous job we were experiencing a massive number of retirements. Literally centuries of experience leaving every month (aggregated across all teams). No hiring process, without comprehensive training, could possibly keep up with this loss. But the retirements were always (for each team) just far enough over the horizon that they didn't realize the issues until after the retirement and the new hire floundered (due to lack of training/mentorship).

> Choosing to neglect (or underfund) the maintenance activities doesn't cost the business anything for some period of time (the delay). And performing them does not (within the period of that delay) bring in any revenue. It is only a cost to them, it has no visible upside within their time horizon. By the time the costs start catching up, it's death by a thousand cuts. Usually they'll go from 100/0 dev/maintenance to 99/1, then 98/2, and so on. Then one day it flips from 80/20 to 20/80 and they realized they fucked up. But by then they've been promoted, so who cares?

Pretty much what I noticed as well.

> You have to convince them

This is exactly the part that I hate. Good practices, code quality, design etc. are not a hobby of mine. The business is not doing me a personal favor by allowing me to practice these ideas.

I actually quit my last job because I couldn't convince them. You see enough people charging towards and off a cliff, eventually you get tired of it.

I agree with your assessment. Using the language "tech debt" makes it appear that there is a correct way to design something and that if you're not doing it that way, you will need to pay a debt on it, now or later.

The reality is it's all trade-offs and sometimes those debts never need to be paid at all. "Bad code" that is reliable within the constraints it is exercised and that is isolated may appear to have to tech debt, as in "if we changed the design, we could support XYZ feature" but if XYZ feature never materializes, it's a waste of time to touch it.

Maybe we can come up with more appropriate language for this? Something akin to "inflexible" code. Calling something "debt" assigns too much subjective value to it while making it seem objective.

It seems you think "debt" is a bad word. To me it's a neutral word

I think for the general public "debt" has negative connotations. It's something that they feel must be repaid later. It definitely can be a financial tool and can be neutral, but I don't think most people think of it that way.

I 100% agree, the concept of Technical Debt doesn't exist in any other profession. I think that's a clue.

In most cases, the concept doesn't really clarify anything, but people don't seem to know how to translate 'our' problems into something that is actually meaningful for the other people in the room.

The term 'technical debt' just obfuscates what is really going on. It doesn't provide any real insight.

You may like the blog post I linked in another post here.

It most definitely does exist. They just don't call it that. Every organization that's responsible for a system that produces their revenue and requires upkeep runs into this, they just either don't have a specific name or it's not a name we (programmers) see.

See: http://web.mit.edu/nelsonr/www/Repenning%3DSterman_CMR_su01_... [pdf] for an example of an analysis of a technical debt analog in other industries.

Thank you, this is maybe my point: don't use this 'technical debt' term that everybody and their horse abuses, but use those concepts that are also well-understood in other industries.

Does that sound acceptable?

Technical Debt is just another name for those concepts that are well understood in industries, but renamed to be directly applicable to the type of work.

And it should be noted that "technical debt" appears to have been coined before "capability trap". And many other industries lack a proper term to describe the same concept, even if they possess it.

Strongly agree. It's just trade-offs. Tying debt relief to demonstrable value is good. Refactoring without that is prone to vanity stuff, Chesterton's fence, second system effect, etc.

Of course it shouldn't be used. The company I worked with has relabeled 'Technical Debt' as 'Technical Wealth Investment'. Not sure how it helps, but somehow still nothing gets done.

But isn't the term "Technical Debt" meant to help business-people understand an issue that, while real, and though inherently grasped by people who write software, not necessarily by others in positions to make resource allocation decisions? By calling it "debt" it is put into financial terms everyone understands. If it is a red flag, it is in the sense that people are making decisions about software, who don't understand how it is built and maintained. That is the case in a lot of environments.

There is an underlying assumption business people are unable to understand the real issue. That seems weird to me. Also technical debt seems like an escape for engineers to do things they want so but don't feel like explaining why.

Our experiences with freelancers who signal velocity by making 40% of the stuff happen quickly and then say "we have to stop and refactor, it's too messy". This being a website on which they've been working for a couple of months.

I disagree, but I see technical debt differently than it's usually used. When you buy a house, or start a company, you commonly borrow resources to have the thing you want today instead of years from now. I think technical debt is exactly the same: you're borrowing future work so that you can launch your product now instead of far in the future after you have it absolutely perfect.

It's a little wild to imaging buying a house without incurring some debt. This is normal and OK. You don't go crazy with it, though, like getting a loan for 20% over the mortgage with a 12% interest rate. Same with technical debt: cut a few corners if it lets you go to market more quickly and start earning income before your competitors can launch, but not at the cost of utter spaghetti code written in MUMPS or something. Your class hierarchy isn't perfect? Fine - your customers aren't paying you for that anyway.

So go ahead and borrow. Just don't go so far in debt that paying it off becomes burdensome.

There are a lot of expert beginners that make a mess so big that it takes ages to unravel their ball of mud. Most of the time by the time people are talking about it, it's because it's way too big for anyone to ignore.

I had a notion that this can never get so big that you can't unravel it, but I think it's a matter of how much power the problem people have. I've seen one project that will be broken forever because, while they've gotten a couple of the biggest architectural astronauts away from it, one of the most prolific will never leave, and does not accept that just because you can do something doesn't mean you should do it.

There are three parts to a problem. Why what how. "Technical debt" is the what. "Change X we would get value Y" is why. (How do it if left as an exercise to the reader)

> although the phrase technical debt caries a lot of emotional weight, it also brings a lot of baggage from all the times management has heard the phrase from someone that didn't understand the bigger picture.

I think this is an excellent point, but we still need a shorthand way to talk about it.

> They'd say instead something like "I was thinking that if we changed X we would get value Y."

The problem I have with this is it's very hard to make a compelling argument, and without one, nothing gets done. A lot of the time technical debt is tantamount to a slow death by a thousand cuts, and "value Y" is more like "save some unknown but non-zero amount of time in the future".

One example is when you have (tens of) thousands of lines of code that don't have unit tests, have high coupling, and no or outdated documentation. Maybe there's some known bugs, but they're of low importance (in other words: fixing them is low value to the business). That code works today, and if not modified it'll continue to work. If it does eventually get modified, then there are no guarantees: maybe the modifications won't break anything, or maybe they'll cause dozens of new bugs.

If we refactor today and add lots of tests, it'll probably mean that future modification causes fewer bugs (saving time/money), but we actually can't guarantee anything -- aside from the refactor will cost a bunch of time, now.

> a huge amount of technical debt is just code that a more senior engineer would refactor as they went

In a case like I described above you can sometimes refactor as you go but it's been my experience this is often a deep rabbit hole. Sometimes your refactor requires touching a dozen layers and related bits and before you know it it can easily be several orders of magnitude more work and more risk vs the "quick fix".

Another thing I often run into that we describe as "technical debt" is about foundational designs that are (currently) wrong. This includes things like database schema and core structure of the application. I say "currently" because some where not clearly wrong at the time they were made, but it had become clear since then and yet more stuff was still built on top.

As an example, I'm working on an application that was built to run on a single instance but we'd like to be able to scale horizontally. One of the problems is the application uses what is effectively an in-process cache for a lot of the database objects. The objects are not pure DTO-style objects and so the cache can't just be moved to external (Redis or something). The schema is not that well designed, so entirely removing the cache would almost certainly kill performance, and might mean refactoring half the code anyway. A layer up, several parts of the app (including UI forms and some background processing) are built assuming they have a consistent view of things, and ignoring these issues causes all kinds of strange consistency problems as data is silently overwritten/changed.

That type of problem is not at all fixable by "refactoring as we go". In some sense I'd love to do a totally clean rewrite, but that just isn't going to happen largely because the business has no appetite for that. In the past I've done the massive rewrite thing before, and going a couple years without delivering anything from it is crazy stressful and no one is happy.

Instead, we're working on rewriting major (but approachable) chunks of this one at a time, while retaining compatibility with other parts of the system and still trying to find things that provide customer/business value as part of it (either fixing long-standing bugs or introducing new features). It's hard, and overall is more work than just rewriting from scratch, but means we can deliver value as we go. Maybe this is what you mean by "refactor as we go" (or maybe that's just how I should describe what we're doing to higher-up), but to me we're basically starting from zero and at best, pulling small chunks of the old code in, so it definitely doesn't feel like "refactor".

(I know this post is explicitly about not calling this "technical debt" but I can't help but think this approach is analogous to a payment plan. :) )

Fully agree!

> When we notified management they replied, “We’re in a build the plane while flying situation, how can we get this out now without doing a big rewrite?”

Even worse I tried to catch this before it happened at one company and was stopped. I was transferring to a new project and was the only person who had any experience with a system like they were building at all and was not consulted at all for the initial design. They had made a fatal design mistake that I knew would sink the project and I raised my concerns. Turns out a "technical" manager had made the decision and he wouldn't budge on it for anything. All of management then attempted repeatedly to gaslight the entire team claiming that they had "decades" of experience with the tech and that it was the correct decision despite being given solid references about the problem from many reliable sources outside the team, and continued to say things that directly contradicted the sources. The project hadn't even started yet so the entire situation could have been avoided.

Nearly every competent team member left within a few months. Before leaving I witnessed new hires do everything from pushing AWS creds to git to pushing over a GB of docker `.tar` files to a repo... it was bad.

Seems just a few middle managers can drown a project/company in technical debt pretty quickly.

Edit: "every competent team member" to "nearly every" not sure why some are still there.

I recently went through a similar experience, but I outlasted the manager type somehow. Respectfully dissenting in every phone call, email thread, and chat eventually worked. Some people with better office politicking skills have told me I don't know when to shut up, but it's worked for me my whole career, and I don't know how to be any other way.

what was the fatal design mistake ?

And what was the thing you were building?

Wow, that "manager" is really bad, since that "manager" does not accept any suggestions (equals having a big ego), and it seems that there was no backwards compatibilty to be worried about. Why do I feel that "manager" is trying to micromanage everything?

Otherwise I would definitely not say they tried to micromanage everything. Maybe they thought they had a good understanding of that particular problem I am not sure, but the solution had already been pitched to the customer before I transferred to that project. It likely had more to do with politics than his ego. That it was never going to work would be obvious to anyone who had worked in that space though.

Can you give a high-level description of the error without doxxing yourself? Sounds interesting.

Technical debt is best understood exactly as the term suggests, like a loan. When you keep incurring more and more loans, you have to pay interest. At some point, the interest payments will get so high that it makes sense to restructure the debt somehow. Loans and debt are good because it helps you grow when you need it, as long as the interest payments make sense.

The same goes with code. When you let bad code enter a project, you do it with the understanding that you will need to pay some interest payments on that for the duration that the code is in the repo. Some technical debt is a good trade off between agility and code quality. Saying you should never have technical debt is the mark of an immature engineer.

But if you keep piling in bad code into a repo, you will have to keep paying interest payments on more and more bad code. At some point, there is so much bad code that the interest payments are too high even for a senior manager to deny.

I disagree that you should refactoring code "so that the entire team can understand the code". Teams turnover much more frequently than a code base should. If you need to keep rewriting every 3-5 years, your senior engineers are probably very inexperienced and architected the code poorly.

This is exactly how I look at technical debt, yet it seems so many people miss the appropriate financial metaphor, which seems obvious given debt is in the name. It seems popular to conflate technical debt with bad engineering, but appropriately pushing work off to the future is actually very thoughtful engineering. Technical debt is a tool that can be applied (just like financial debt) to work towards an end goal when you otherwise don't have the means to pay the price upfront.

Getting a startup off the ground requires an excellent understanding of how much technical debt you can incur and realistic estimates about your ability to pay it off as your team grows.

This blog post we're all commenting on is great precisely because it links Ward Cunningham's talk. His understanding is precisely this financial metaphor.

I have considered giving a talk, “When Technical Debt was a Good Thing”. Like you say, it gets mis-equated with bad engineering but originally the point was that technical debt was a good thing, like, let's have some technical debt! This is gonna be great!

If you’re looking at 1980s attitudes on software development, there was a strong focus on getting the engineering right, we're gonna be like generals and we're gonna have a vague direction of success and then our lieutenants will reify that into a concrete plan of success and then the privates will go off into the trenches to, here the metaphor finally breaks, build the application: and then we'll evaluate their performance on fidelity to the design that the lieutenants gave and we'll evaluate the lieutenants on how well that design matches the goals the generals set out to do. Very hierarchical, design-first.

And the point of this metaphor was, “No, stop. Stop with all the BS. Get those privates building something right now. Doesn’t matter if it’s not the thing the lieutenant would have designed. Let’s intentionally build the wrong thing.”

And it’s like, shock and horror, why would you ever want to do that? Then comes the metaphor. “It’s like spending money you don’t have, people act like it’s impossible or perhaps like it’s immoral—it’s neither. Credit systems exist. Debt exists. We’re just lifting that notion to the technical sphere.” Why would you want to do it? Same reason you take on any other debt: you think that you can outrun the interest. I take out a car loan because I think that the car will help me earn enough to cover the loan repayment amounts and save me time and/or money.

In this case, because we don’t have the final design we’ll only conform 90% to what we’re supposed to be producing, say. That 10% loss is the interest. We pay it because we think we can cover it later.

Very much, Agile is an approach to creating technical debt. The opponent of both is the same, a "waterfall" style where you know exactly what needs to be built before you build it.

Hmm, that's not what I was taught about Agile vs Waterfall.

It was about the number of design cycles –

(discussion with client / design / programming / bugfixing / client feedback)

– before the final version was out : only 1 for the most rigid waterfall, and usually weekly cycles for the most agile.

So it was not so much about any technical debt, but rather about the lack of understanding of what the client wanted (often the ignorance of the client himself of what he actually wanted).

Incidentally, the most radical agile cycle would involve throwing out all the code on each cycle, which would of course throw away all the technical debt in the process (and hopefully prevent from making the same technical debt mistakes twice).

Your understanding is correct. Waterfall (may it die a fiery death) is at an extreme that assumes it's possible to know everything upfront. What most people fail to understand (who like it) is that Waterfall fails to scale. Agile is (almost) at the other extreme.

Waterfall cannot scale beyond either well-understood domains (a web shop that uses RoR could probably apply Waterfall to a new project at this point, with over a decade of experience for each member of the team) or smaller systems (I'd say no bigger than around 100k SLOC of C code, beyond that your system usually becomes too complex) or short time lines (less than 3-6 months, which implies smaller systems). The assumption of complete and proper understanding will bite you on either a larger project or novel domain (novel to you or the world) or a longer timeline.

Agile (I'll take Scrum's 1-2 week cycle + extreme programming) aims to shorten the feedback loop so you know that not only have you built what you thought you built (verification) but also what your customer wanted you to build (validation) on a regular basis.

If you're wrong with Waterfall, you've wasted years. If you're wrong with Agile, you've wasted weeks.

And boy can you be wrong with Waterfall. I joined at the tail end of a 5-year Waterfall project, and they had fucked it up. Over 300 people had worked on it, and it was the wrong damn thing (had half the features needed by the time it shipped, and those half barely worked, and if they worked they were too slow to actually be useful). 1500 years, let's say an average of 75 years per lifetime. That was 20 human lifetimes wasted. And let's not talk about the billions of dollars that went into it. (300 is probably a low estimate, once you get into the subs of the subs of the subcontractors it's hard to get a good tally.)

Ward talks about this occasionally, but generally there's a frustration among the original agile manifesto authors about how the principles they were after became distorted by the consultant industry that spawned behind the terms.

The financial metaphor above is definitely what Ward meant when coining the phrase technical debt, and it dovetails into agile in the sense one should be making conscious decisions about balancing upfront investment vs uncertain goals. Technical debt can be an opportunity in the sense of doing the learning on the cheap, but you can't run up that debt indefinitely.

> So it was not so much about any technical debt, but rather about the lack of understanding of what the client wanted (often the ignorance of the client himself of what he actually wanted).

I cannot emphasize enough how much I want you to read, watch, listen to Ward Cunningham in his own words. :)

If you know what technical debt really is, then this sentence kind of sticks out as a sore thumb, as it is literally saying "So it was not so much about any technical debt, but rather about the technical debt."

Technical debt viewed this way is not just “hurdles in my code that I don’t like to jump over.” It is a subset of those hurdles and kludges and nicks and grumbles. What subset? Precisely the subset that corresponds to our lack of up-front understanding of what we were building when we were at first building it.

Put another way, every single problem we can solve well has at least one (maybe many) perspective from which that solution seems “relatively straightforward.” If you can specify that perspective in a way that a computer can understand it, you can think of that as “creating an algebra” for that problem and similar problems, or perhaps a “domain-specific language.” Technical debt is precisely the mismatch between the language you are programming in, and the domain you are programming for. Syntactically, this "language" might be some programming language like Java. But semantically, you get to define your own classes and methods and create your own little world inside that language, where things can be combined and chained together, and the richness and algebraic predictability of that little world informs the spaghettification and robustness of your code to implement the algorithm within that world.

Viewed this way, a lot of programming today spends absolutely no investment into that "little world" and just consists of writing FORTRAN. Just pages and pages and pages of FORTRAN. I mean, you're writing it in some other language, maybe Python, maybe Java: but you do not have any sort of little algebraic world that you phrase your problem in, you just say "fetch this input, fetch that input, begin subroutine with vaguely suggestive name to label the following steps, compare the two inputs, fetch other data as needed, perform these side-effects, modify these variables, return to the beginning of some loop." Very verbose individual commands being operated on individual data structures.

So I tell you "make sure that if I press this button a hundred times you don't start the job a hundred times, only one job per project name should be running at any given time, but if I press the button while a job is ongoing you should still probably trust that I made a mistake when I first pushed the button and now I want to abort if possible and retry with the latest version" and because you don't have a rich vocabulary for that you're stuck programming something in a "redis assembly language" or whatever,

    redis_job_flag = "job_id_for_" + project.name;
    current_job = redis.get(redis_job_flag);
    if (current_job) {
      redis_complete_flag = "job_is_complete_" + current_job;
      add_followup = redis.setnx(redis_complete_flag, "afterwards=run_again", 60*minutes);
      if (!add_followup) {
        followup = redis.get(redis_complete_flag);
        if (followup == "afterwards=run_again") {
          // someone else got there first
        } else {
          // race condition, it completed before we could
          // set the followup attempt, so we need to retry
          // this method to restart it
          return recurse();
      } else {
        // ok we can trust that they will redo what they just did
    // if we're here then we did not see any prior jobs.
    my_id = generate_random_id()
    try_to_setnx = redis.setnx(redis_job_flag, my_id, 60*minutes);
    if (!try_to_setnx) {
      // race condition, someone else got started before me.
      // better still set a follow-up just in case.
      return recurse();
    // hooray! I am the unique one running the job.
    redis_complete_flag = "job_is_complete_" + my_id;
    looping = true
    while (looping) {
      really_done = redis.setnx(redis_complete_flag, "complete", 60*minutes);
      if (really_done) {
        looping = false;
      } else {
        // let someone else tell us to rerun again
        // bug we ran into where someone made us rerun jobs
        // for an hour, we reset the timeout for this here
        redis.set(redis_job_flag, my_id, 60*minutes);
and then this requirement is restated for other buttons and that gets copy/pasted everywhere and other mutations get inserted in various places.

And I think what Ward would tell you is not "never write all those lines of code" but that this eventually needs a refactor so that it matches the business language,

    repeatableDebounce("project_jobs", project.name, doTheActualThing);
    // repeatableDebounce defined elsewhere, 
    // it is part of our "little world"
Basically, Ward wants you to start doing "aspect-oriented programming" because it turns out that is a felicitous perspective from which to view these requirements about needing to cache whether jobs are in-transit and if they are then to retry them and whatever else. Technical debt is the noise in the above implementation being pasted and tweaked across the entire codebase because we didn't have this perspective when we were starting.

Ok, I'll try to find some Ward Cunningham's work.


> Technical debt is precisely the mismatch between the language you are programming in, and the domain you are programming for.

That's a concept on a very different level from

> the lack of understanding of what the client wanted (often the ignorance of the client himself of what he actually wanted)

And the term "technical debt" seems to be a very bad name for either of those. Just look at the examples the others provided : the word 'technical' makes one think that it's an issue with tool (= codebase) maintenance first !


> semantically, you get to define your own classes and methods and create your own little world inside that language, where things can be combined and chained together

That is if you're even using OOP in the first place. I'm now viewing it with suspicion, especially after the only course we had on it, in Java, where we weren't even warned about things like "Composition over Inheritance". (Ok, we also had a half-course on UML.)

OOP seems to me more like a tool that should only applied to specific problems : for instance making a GUI (I have some Python-Qt experience).

In the same way, I do see programming languages themselves as more or less fitting to certain problems. Speaking of which, this semester I had to pick between C+ (C++ with a minimum of libraries, and god forbid, no OOP) and Fortran, and I picked C+ because Linux is mostly written in C. And hopefully one can find a language where you can keep the "impedance mismatch" between the language and the problem to a minimum. But for that you have to figure out what the problem actually is in the first place, which might take several design cycles !

But I'll try to read up on this "aspect-oriented programming".

> I disagree that you should refactoring code "so that the entire team can understand the code". Teams turnover much more frequently than a code base should

I didn't read it as saying the code should be refactored every time the team changes. I read it as saying that as the organization's understanding of what the software does evolves, the structure of the code should evolve with that understanding.

What if technical debt happens on purpose ? Clever programmers incur technical debt, then leave company. Debt incurred becomes another bullet point on their resume, and they eventually become Senior Developers. Their debt is paid by someone else. Maintaining an app affected by technical debt is a thankless job and you don't get much bragging rights afterwards. But the programmer who incurred debt can look good in the eyes of management, after all he worked SO FAST and you're the bitter, toxic person who complains.

That would split the profession into maintainers and initial writers pretty quickly. Maintainers will become scarce and hence quite valuable.

I think the split has already happened, but I'm not convinced maintainers are perceived as valuable.

> I'm not convinced maintainers are perceived as valuable.

I'm pretty sure an accountant classifies technical debt control as not being valuable.

"A cost center is a department or function within an organization that does not directly add to profit but still costs the organization money to operate."

Want to stay compliant with privacy laws / norms / standards / file formats in your useful custom software? Gotta pay some technical debt dealing with that new complexity; just so that you can continue to operate at the same provided value.

Necessary, but not valuable.


And then you go and try to get a raise.



My cynical side tells me that it is a sound personal strategy to declare that old languages like Java are outdated, and that the only way forward is with the newer shinier languages.

This way you would only get greenfield projects.

I should do that, even though I love Java.

In my experience, accountants want to capitalize software development costs (makes the bottom line look better) but must expense maintenance (i.e., bug-fixing). So that translates to a (slight?) preference for quality software that has fewer bugs.

Not when you can hire an overseas "maintainer" for 1/4th the cost of the average dallas developer

To me tech debt is the result of callous introduction of invariants.

An inexperienced engineer keys your users table by their email. Suddenly a user changes their email address, and now you either refactor to use an autogenerated user ID (as you should have in the first place), or you have a huge brittle process for migrating a user between emails, or your entire codebase starts getting littered with statements looking for previous email addresses and copying data on the fly.

The mistake was that email=user identity is not a true invariant.

> The mistake was that email-user identity is not a true invariant.

That's true. I also wonder if there's an issue with lack of modeling experience or strategy. Natural key vs surrogate key is a really important distinction to make when modeling first party data. I've observed a common preference towards the latter (with uniqueness constraints) as a default. I think that's because generally it's more useful (and common) than not to be want to insert new things into your collection with monotonically increasing unique primary key values autogenerated without you having to do any of the accounting. I think natural keys are still useful, just in a RO/projection/ETL based use case.

> you have a huge brittle process for migrating a user between emails, or your entire codebase starts getting littered with statements looking for previous email addresses and copying data on the fly.

It's so unfortunate how easy of an error this is to make and how much it makes any future work on the codebase shittier. Eventually, it all needs to be refactored to regain any measure of sanity. I see so many companies that are bitten by this. Once it gets going, it becomes really awful for the engineers that have to maintain it. Morale takes a hit and attrition amps up. So much net human suffering that could have been avoided by hiring competent people to do the upfront work in the first place when the code was being written.

I used to have an Amazon.com account under my@email.com.

When Amazon arrived to my country, I was encouraged to migrate, but Amazon.mc didn't have the same selection.

After a while I wanted to buy a laptop from Amazon.mc and registered under a phone number.

Last week my music keyboard broke and I wanted to buy a digital piano from Amazon.mc but they insisted now that I register an email address. And then my email address is already in use.

E-mails should never be used as IDs (or logins), since they aren't case-sensitive…

(the domain part by the RFC, but the local part is often made case-insensitive (and more!) by the e-mail providers.)

Login ids should NEVER be case sensitive. What would happen if you allowed admin, Admin, ADMIN as different accounts, or JohnSmith, johnsmith etc. that would be a security nightmare.

And yet Unix* has case sensitive logins, and is renowned for its security.

*recent Unixes, since older Unixes only had support for upper case characters.

But I see your point, and it looks like it's not a clearly cut question. But I'm still leaning towards case sensitivity, because passwords are case sensitive, so it's easier to train users to consider both of login and password as case-sensitive. And it's a shame that some e-mail providers caved in and started making e-mail case insensitive…

P.S.: Should we also forbid the O/0, l/I characters in logins, like base58 does, because they can be ambiguous ?

P.P.S: And Unicode has added a whole new level of issues to this. Of course ASCII-only logins are completely unacceptable these days –

(think of users that don't have a latin-like alphabet as their native one)

– but I've recently ran into the issue where a service didn't properly normalize my password, so the µ from my previous keyboard layout ended up being considered as different from the μ from my new keyboard layout !

Who doesn't have an auto increment Id field on every table? Ok I know its not a 100% perfect solution for a user id.

The other gotcha is using ss on ni numbers as identifiers

> Who doesn't have an auto increment Id field on every table?

In large organizations with a lot of teams, natural primary keys make interoperability easier across databases, as opposed to a forest of artificial keys that are all different. SQL experts who advocate natural keys point to ON UPDATE CASCADE as the solution to natural keys that occasionally change.

So the absence of an auto increment Id field might be a dumb error, or might be super smart, or might be both. Both means you're a SQL expert but haven't made sure everybody else working on the project always will be.

> Who doesn't have an auto increment Id field on every table?

Anyone who exposes their id fields, e.g via APIs, and doesn't want other people gaining insights into their volumes by watching the rate of increase in id values.

I've seen a situation where old IDs were re-used for new users, like we were about to run out of numbers or something… XD

And with former users coming back, of course expecting to have their IDs back too !

This being on a paper-first (often paper-only) database, with no locking process, where we also had the issue of new users being added at the same time, for the same ID.

If only I already had been through a database course at the time…

I meant the internal auto increment id field - not the actual user id.

I think I just proved my point here.

Anyone who generates IDs outside of the database. Anyone who wants to avoid contention on sequential IDs.

What is "ss on ni"?

I read that as "social security number or national identity", the "on" being a typo.

Probably "session".

Related treatment:

ABSTRACT: The term technical debt was coined by Ward Cunningham in 1992. In recent years, people have broadened the definition to include ideas not present in the original formulation, including lack of skill, expedient hacking, and obliviousness to software architecture. By lumping these together, it’s harder to choose the right repair. This article proposes that we use the term ur-technical debt to refer to the original idea, which was: When I build systems iteratively, my understanding of the problem and solution grow gradually, and inevitably my current thoughts do not match code I wrote earlier, so I must expend effort to fix that code, much like paying interest on a debt. https://ieeexplore.ieee.org/document/9121630. Ur Technical Debt

This is well put. I want to expand on one point that seems to be lurking beneath the author's remarks. Technical debt does not necessarily come from technical people. If you're in a startup, there is almost no way that you can end up with a small amount of tech debt because the target that you're aiming at will change. And if you over-optimize for code cleanliness too early, you'll end up shipping code way too slowly. Slow, deliberate code writing is better suited to large, established companies than small, scrappy startups. You cannot escape tech debt; it can only be managed.

In my experience technical debt almost always comes from management's misconception that:

(a) A product that works now will always continue to work, even after you accumulate 10X the number of users.

(b) Adding new features only involves adding code, not rewriting stuff that already works.

(c) For a product A, delivering milestone A1 by deadline A1 is independent of delivering A2 by deadline A2. In reality the existence of A2 affects the timeline and architecture decisions of A1.

All of these are rampant massive fallacies in technical management.

Most startups can expect to either A) have a lower cost of capital in the future, or B) to fail.

This means opportunity cost is ridiculous, and expedience often can be worth a whole lot of future wasted resources and decreased velocity. Survive to tomorrow, where you have more resources and can pivot to doing things in a cleaner way.

So it's not completely stupid to pretend A & B are true. You can pick up velocity now, in exchange for future problems. And as long as the interest payments aren't too high it's well worth it.

Sure, it's a valid way to think for startups, but startups should gauge and plan for the amount of technical debt.

It's management being ignorant of technical debt where things become problematic.

Confession: As a developer I've personally fallen for all three of those misconceptions, and I will probably keep falling for them.

I'm also pretty bad at actually knowing when I'm running the "technical debt credit card" or even checking what's left in the balance to pay off. Sometimes when I'm trying to pay down the balance, I end up adding a lot more to it. What I understand well enough, I then have a hard time communicating effectively.

I'm a pretty bad influence, so when I end up spending a lot of time working with any other developers I eventually catch them doing the same thing. Some more than others. What's worse is that we are the only ones who could possibly know anything about the technical debt that we understand so fuzzily.

All that management can hope to get is a secondhand account of all this, so I tend to award mine only partial blame for being pretty consistently terrible at making technical debt decisions.

(c) is why I suspicious of Gantt charts – sub-project separation is never this clean !

"Our oven is from the 1950s, and only one of our 5 cooks knows how to operate it, but it's ok because she's usually there everyday."

"Our knives are downstairs locked in a storage cabinet that our manager upstairs has the keys to. So we usually chop everything with scissors because we need to get food ready fast."

"What do you mean you can't make this recipe that's 48 pages long? The chef who used to work here made it, he was a genius. Aren't you a chef too?"

The analysis here is so spot-on that I'd argue it sometimes applies even to Wikipedia as well. (OP offers up Wikipedia as a service that can "reorganize to reflect its understanding" of what's changed as new features keep getting added.)

Often, yes, especially on relatively simple pages that relate to geography, mildly famous people, etc. But take a look at Wikipedia's page on Facebook, which is now a 539-source accretion of diverse headlines as they happened, now held together only by subject headers that are the digital equivalent of very large paperclips. Or a ship's log.

Fixing this would be brutally tough! We're talking about what's now a "semi-protected article," alterable only by trusted players, because otherwise it's more likely to be defaced than improved.

In fact, looking at the Talk section for the Facebook article, it looks as if it was deemed a "good article" from 2006-2011 but then lost this status.

A telling comment at the time: "Compared with the latest reviewed version of the article, the prose is very choppy and unclear. In some sections, every sentence seems to function as its own paragraph, while in other sections there are random paragraph separations where there shouldn't be. The entire Website section is a mass conglomeration of Facebook features that needs to be sorted out in some way."

What's visible now is beyond the stage where incremental fixes (or additions) can fix things. Meaningful improvement will come only from a top-to-bottom rewrite and rethinking of the article, so that Facebook's role, history and features are presented with the clarity and insight of author(s) who can put all the pieces together in a framework that's strong enough to overcome the "ship's log" problem.

Programming is all about trade-offs. The trade-offs of an implementation can lead to technical assets as well as technical debt. It's just that most of the time the technical profit isn't noticed as much as the technical debt.

And as with regular debt, the "interest rate" can also vary wildly. Some debt just sits there, doesn't change much. The code is a mess, makes fixing bugs harder than it should be, but it doesn't impact new work much.

Other debt is like a credit card, where you continue to pay a price every day, and it can severely affect your ability to implement new things.

Then again the same is true with technical assets. Make good choices and you can reap the benefits for a long time. Implementing new features can become 5 minute jobs rather than taking days.

Sometimes you're aware that you're about to swipe that technical credit card. Other times it can be an inadvertent decision, you went right instead of left while thinking of something else. That's the tricky ones.

Tech debt is always described as a metaphor. The word itself is a metaphor for something ill defined. The problem with the metaphor is that a vague definition of the problem means that there will be equally vague solutions. No one is really sure what tech debt is exactly. There is a formal property to the program that correlates with our level of understanding of the program and our ability to change the program with limited understanding.

You give a metaphor of a kitchen to help someone understand the problem. To solve the problem I give another metaphor: Wash the dishes and scrape the grill to prevent technical debt. Does that help you understand the solution? No. It gives me nothing. The solution is as ill defined as the problem.

There is another phenomenon that the kitchen metaphor fails to describe. The failure is this: All measures to prevent technical debt will fail given enough time. Technical debt can be eliminated but it cannot be prevented. Even if you try to write the cleanest code and you're given unlimited time to organize your code, all of your attempts at prevention will fail.

How do I know this? Because the most insidious form of technical debt is always unexpected. People may take shortcuts here and there but those shortcuts aren't taken with the knowledge that there will be huge problems down the line. The worst technical debt always appears as mistakes that weren't known. A shortcut taken now can be valid and never lead to problems in the future.

Technical debt that actually causes serious issues exists only in hindsight.

What is the formal definition of Technical debt? The set of all code that isn't a combinator.

Every time you add a new feature you're basically creating technical debt in one way or another, often in a way which is not obvious until you've built a few additional changes on top. It's best to refactor a bit with every new code change. Most people I know never refactor, always say they'll do later, and then end up asking for permission to rewrite the whole thing, only to make a new mess

this matches my experience as a developer. now as a manager, i treasure those who can build most of their refactoring in as a "tax" onto every task, and then explain anything bigger that won't fit clearly enough - we can usually find a future initiative that could expand to contain and benefit from the desired refactor if it's actually a good idea.

TechDebt is CareerDebt.

More often than not, techdebt is added by somebody else (who got rewarded for their product impact) but is your careerdebt since organizations seldom reward clearing techdebt, even at the best of tech companies in SV.

The only time organizations reward clearing techdebt is when the sh*tshow gets so bad it either has visible velocity/infra/product impact that just can't be resolved without a dedicated 1 year effort to clear the crap. I have seen smart engineers pile on to the techdebt as there is no publicity or heroic story around taking a bit of time to clean up the mess on a regular basis. Sadly it is a game of visibility so engineers do what the management chain wants to see and knows is a problem

> If you develop a program for a long period of time by only adding features but never reorganizing it to reflect your understanding of those features, then eventually that program simply does not contain any understanding and all efforts to work on it take longer and longer.

The author doesn't take the next logical step: what factors prevent the reorganization of a program?

One factor that comes to mind: lack of automated tests. Those working on the program are scared to change anything for fear of breaking something.

> One factor that comes to mind: lack of automated tests

These tests have to be at the right level. Unit tests are the easiest to write, but unit tests can be pretty brittle. Reorganizing the codebase can result in many unit tests breaking (or being made redundant), which might lead to stagnation from fear of breaking tests (and the extra work of fixing them). System tests are much harder to write, but are invaluable during a major refactoring. You want to be sure that you're not changing the behaviour of the system while changing its structure.

The V-model is (IMO) an underappreciated tool in the development lifecycle.

Writing tests without fixing the lack of understanding first is even worse.

Now the next cycle will have to understand not only the code, but why some test is validating some unused endpoint with data you never thought possible

That’s where comments come in. I tend to do a brain dump around the code I wrote in comments: why does this code exists and in particular why is it doing it in this horrible, convoluted, suboptimal way. I also wrote small functions and pick my variable names carefully.

My colleagues think I’m an idiot writing too many comments and being too careful.

We can’t ship any feature in less than 3–4 weeks, most of the codebase is an inscrutable mess, and we introduce regressions all the time (unit tests are for losers)

The author is on to something, but still, most cases of technical debt aren't really technical debt as how Ward Cunningham meant the concept.

His idea is to regularly pause, and take your time to incorporate your new knowledge about the problem at hand into your code. There is no other way, because if code concepts don't reflect the problem space, development will slow down, become cumbersome.

But what I see and what a friend of mine observed:

- Most technical debt is just bad code - Most technical debt is just mistakes - Most technical debt is just a lack of basic mainteance.

Why call that technical debt? What does the term teach you or anyone else?


The ugly code can be dealt with. But we can't dealt the ugly environment.

The most severe technical debt is the environment, that is OS, toolchains, framework, library, were fixed at the time development started.

Updating the environment shall be part of the cost of development, but we tend to ignore it for more present short term gain, burden the cost to the future self.

Within a few years, the environment is too old to work with. We have to deal with the bugs that were fixed years ago in upstream, reinvent the features that was also present in the upstream.

5 years past and we seriously consider updating the environment but since there was no update, existing code relies on old behaviours so we have to fix all of them but that doesn't introduce any short term gain so updating was abandoned.

10 years past and software is dead.

An even easier solution is to simply work for a tech company that values you as an asset instead of cost-center.

In such companies refactoring discussion is normal and protected (within reasons). Where as in a cost-center company, refactoring often felt like moving Mount Fuji with a spoon alone.

I'm disturbed by the ending justification for rewriting or refactoring code. I've seen countless failed attempts.

Either you make a clean break and create something small and simple enough to solve 80% of the old products problems and grow from there, while eventually letting the old product die on the vine.

The other option is to take out the cancerous parts of the older system, carefully making sure you didn't break anything.

This is correct. Ossification of the application means greater technical debt but throwing it all away is bad unless either it is an internal application and you can spend time identifying related applications that operates beyond your documented APIs of the application or met the ire of your external customers (equals PR disaster).

The correct handling of this on a new project is to insist on fully documented public APIs so that you can rewrite that without too much worrying.

On existing ones, rewrite in parts with extensive internal testing (and if possible testing with your customers, but please do not use the modern Microsoft way of almost no internal testing) and full rollback plan in case it went disastrously wrong. Or just as this comment said, just create a new product and have a workable migration plan.

“The other option is to take out the cancerous parts of the older system, carefully making sure you didn't break anything.”

To me that’s the preferred approach. You can rewrite the really bad sections and you can also restructure the code so it’s possible to rewrite components in other languages/stacks later.

But it’s hard to communicate this to other devs and also up the management chain. Devs don’t like it because it’s not “cool” and also requires understanding of the current system and management doesn’t like it because it adds overhead to development so new features get rolled out a little slower.

I would say they are the worst two options. The rewrite is mythical because not only does it invariably take much longer than it seems that it should, how are you going to carry on working on the existing system while spending 2 years rewriting?

Depending on your codebase, removing a specific part of it is invariably extremely brittle and the people who understood the original app might not be around any more.

The third, and imho best option, is to deploy something new alongside the existing application. For example, the existing email system is massively over complicated so we created a new email microservice system which didn't only permit a rewrite, but was a small achievable system and could bring with it many other benefits over the code in the monolith, it was then relatively easy to both test and redirect the original calls to the new service. You can also do this with applications. We run two alongside each other behind a proxy to hide the movement between the apps. We only need to migrate one page/area at a time, test it and then automatically redirect that page to the new system.

This approach is much more practical and also gives the benefit that if you are doing a major redesign, you simply migrate that page at the same time to the new app and you test everything at the same time. Leave the old stuff alone and eventually delete it all.


You've described what I would call a "rewrite", just one that was thoughtfully planned out. Another benefit is that you have the existing, running system, to use as test validation. If the new black box works exactly like the old black box, you're good. Of course you need to do a few things like go through an end-of-year with both systems to make sure you found any lingering differences.

If you have a reasonably sized/established code base, this may not work. My company is trying to deploy new things as microservices, but unless we actively make major efforts to pull things out of the existing monolith, it will grow faster than the sum total of the microservices.

The choice is an illusion. Both strategies require understanding the requirements implicit in the current system. The first just assumes many of them are low-priority; this is frequently a dodgy assumption.

So, regardless of which path you take, most of your journey will be extracting requirements, writing tests, discussing priority with stakeholders. At the point where you're finally writing new code vs modifying existing, most of the journey is already behind you.

There are many dimensions to technical debt, but one of the biggest is the sociopolitical one: developers hate fixing each other's bad code, but the people who wrote it are often in a position where they can transfer the ownership of this code to someone else, either by being their managers or by quitting their job.

I actually liked doing that. I wouldn't want it to be my only job, but management usually wouldn't even let me do it for a few hours a week. It's like a puzzle, it was very satisfying to take something functional but sloppy and create something functional and maintainable (often reducing code size significantly, reducing memory footprint by identifying redundant variables, speeding things up by identifying redundant code paths, etc.).

I agree it can be very meditative. I actually seek these types of jobs nowadays.

Development is only a step of the whole process of selling a good. One must consider the entire pipeline, from sales to QA, in order to understand where are the bottlenecks. At one point, _if_ technical debt becomes a problem, non-tech people will pin-point this cause, and they will ask for either mitigation or resolution.

When found, the entire pipeline should be made aware of the bottleneck, because some small changes at some steps could lead to huge gains in other steps. Example : "if ops could do this thing, we could remove the entire library from the codebase" (ops), or "that feature was actually never used" (customer feedback), or "if we can get the customer to accept this alternative, we wouldn't need this piece of complicated sub-feature" (need sales)

Then, even the smallest change could impact all other steps such as tests, documentation, reviewing, sales, QA, etc

Personal preference : I would rather hire developers who ship the requested ticket/story, rather than "fixing tech debt" under the hood. To drive the entire company, all its parts must correctly report and correctly execute.

Edit : from the article , “We’re in a build the plane while flying situation, how can we get this out now without doing a big rewrite?” is exactly what non-tech people would ask, because they can take action to mitigate at other steps of the process

I stopped using the term 'technical debt' with non-technical people a long time ago to describe needing to maintain code. One guy even tried to convince me he was correct that the term is BS because, 'he heard from some other <unnamed> guy that technical debt is programmer BS code-speak for wanting to work on pet projects, and not what the business wants or needs.' And that he should ignore anyone who tries to tell him otherwise. Ignoring the fact that if development slowed to a crawl, basically the company would fold and we'd all be out of a job. Anyway.

I liken maintaining code to maintaining a car/engine.

The engine/pit crew analogy has worked well on more than a few occasions: - We're trying to keep this thing running, right? - We want it to run fast, and be impressive, right? - We want to be able to corner quickly, right?

Well, when's the last time you had an oil change? If there's a bunch of gunk in the engine, and we're carrying this 1000lb load of sugar that that one customer asked us to carry, how fast do you think we can go?

If we tend to the engine, we can keep it running smoothly, if at all.

So, we have a choice: Do we (a) take it to jiffy-lube once every 5 years, or never because who wants to invest in maintenance? and let that one kid who doesn't know what he's doing change the oil? Or do we (b) let an experienced pit crew tune, monitor and clean the engine and tires regularly so we can fly around the track and win races and make lots of money?

You like cars, right? Hopefully you get it. You decide.

The problem with analogies is that software is fundamentally new.

It's not debt where you can just pay it off after the launch. It's not a mess where a cleaning crew can have it taken care of in a day or a week. It's not a structure that will collapse because you added one too many storeys.

Software takes all the guardrails off of complexity. A swiss watch is a mechanical masterwork, but the complexity is limited because you have to fit the gears into a limited space. Everything else we deal with has some kind of pushback on complexity, with the possible exception of biological systems that take millions of years to change.

Software can grow in complexity with no obvious bound. You can tackle any one particular bug with an extra branch to say "don't let this happen". But a gigabyte of branches is a hell of a lot of complexity.

Software engineering is an attempt to wrangle that complexity through all kinds of strategies from "architecture" (another poor analogy) to type systems and OOP and FP and the actor model and everything else.

Technical "debt" is really the mismanagement of complexity. It's hard to understand the costs because the costs are inherently unknown unknowns. If you mismanage complexity, then all estimates are meaningless because at any point you could hit a never-ending fractal of problems. It might be completely intractable to add any significant new feature.

Developers want to ship features, call it a job well done and take some time off for Christmas. When working with technical debt, no matter how smart the developer is, it's really just luck of the draw who hits a fractal of problems and never finishes and who doesn't and converges on a solution (and when it's bad enough, the latter just never happens).

One big danger of "technical debt" is it can appear impossibly shiny. People want to work on it, despite there being high-priority features that need to be implemented, for instance. This is because technical debt is a code problem, while adding features is a business problem. Programmers like to stay in code-land.

And soon enough, implementing any new feature is impossible without breaking anything else. And it takes a month.

I actually think that fragility and the things people actually call tech debt are two different things. No one cares if an architecture is perceived as clean and elegant, but takes a long time to implement anything (due to many levels of abstraction and indirection). Just as long as it's (perceived to be) clean and elegant! Tech debt is unfortunately rarely measured or quantified, it's usually a gut instinct.

But does it matter?

If a company's able to meet the revenue target with a shitty code and hire enough engineer to bandage the code to and be able to meet the next target, that's good enough, right? Who cares about code quality or best engineering practices? As long as company grows which in turn, my compensation grows and if I retire sooner than planned, then that's great, right?


Sadly, I think this is the truth in many of the tech companies in the Bay Area. I've seen too many times why managers can't understanding such a simple thing takes too long.

Common problem the blog explains already is the 'go-go-go' product cycle. A dirty hack is applied to the code base to solve the problem quick but it'll make code less easier to evolve.

Another problem I've seen is projects led by opportunists. There are engineers who build projects just good enough to solve the problem, gets all the credit before there's enough soak time for the new shiny FooBar application and jumps to a new company with a new title and higher comp. The FooBar starts to have issues but the core developer is no longer with the company. Company needs checks to make sure these opportunists don't go wild.


My reason for trying to reduce tech debt regularly is simple: Learning dirty code at company X gives no value to me for a long run. Also trying best to write clean and manageable code is something I honor as well enjoy doing it.

But... interestingly enough, I started a job at a company recently and code here is one of the worst I've seen in years. Copy-pasted codes, implicit behaviors where a code change is pretty much a whack a mole game, zero automated testing, no local testing, list can go on. However, company's doing fantastic business-wise. Exceeding revenue target every time which in turn, we can just increase more head counts to write code on top of it just enough to make it "functional".

so... does the tech debt matter?

It's a ponzi (pyramid) scheme.

I think there are ways for the company to escape that scheme, maybe by investing into other fields, outsourcing, buying growth and such.

I still find it risky, especially if you join late.

I've been thinking about this topic a lot, and I ended up putting some things into words [1]. I think what we are lacking is proper taxonomy for code issues. Nowadays we fall too much into classifying everything as "tech debt".

For example: I especially love the metaphor of technical debt. But we should consider the other aspects of debt - You only take on debt deliberately and so that you can invest the time / money on something that will yield a return on investment. Calling bad code "technical debt" breaks the metaphor for me. We probably should stop calling bad code "technical debt"

[1] - https://isidoro.io/writing/tech-debt-broken-windows/

Interestingly, one place I worked we had a 20% set-aside for technical debt. This resulted in endless arguments with product owners who wanted to categorize bug fixes as technical debt. Sadly, some of the less experienced engineers would take this side in the argument, as well.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact