
A Taxonomy of Technical Debt - edroche
https://engineering.riotgames.com/news/taxonomy-tech-debt
======
methodover
This is a fantastic article.

Contagion is a really great term. I've seen my poor abstractions be replicated
by others on my team, to my horror -- "don't they see why I did that in this
particular case, and not in this other case?" Of course, that's entirely, 100%
my fault. I picked a poor abstraction, I put it in the code, I didn't document
it well enough, and of COURSE other programmers are going to look to it when
solving similar problems. They should!

That said... Sometimes I spend a bunch of time finding the right abstraction
for a feature that we end up not expanding. And then it feels bad that I spent
all this extra time coming up with the "right" solution, instead of just
hacking out something that works. Hmm...

~~~
wpietri
One team I was part of kept a separate backlog of technical debt and
experiments. It was nice to have a place to say, "in 30 days, look at this
hacky thing and see if it's worth making better". Or, "I noticed this is a
mess, here's how I might clean it up." We'd occasionally talk over the backlog
and prioritize it, which helped communicate both the general make-things-
better spirit and specific issues like you mention. I really liked it.

One thing that made it work is that we worked on it in small slices all the
time, without involving the product manager. It was still visible, so there'd
be the occasional question, but as long as we kept delivering user value,
nobody worried to much about our mysterious code concerns.

~~~
drinchev
Funny enough, most companies I worked for, I had to follow "You can refactor
if the PM doesn't catch you spending those precious minutes for this".

There was only one time, where we had every Friday, time to improve the
codebase. 2 months later it became every 2nd Friday, though.

I'm really pissed that technical debt is considered as "Hey the dev guys are
complaining again".

~~~
dtech
> I'm really pissed that technical debt is considered as "Hey the dev guys are
> complaining again".

That's because it's very untransparent to anyone other than the engineers
working on a project.

I've had a limited amount of success by making this more transparent.
Signaling every time a feature will take longer because of a piece of
technical debt the team wants to fix caused the fix to get priority before
implementing the 4th and 5th feature affected.

~~~
specialist
How is "technical debt" handled in meatspace?

Don't the bean counters at Ford Motor Company (for example) nark on the
assembly line workers and industrial engineers and QA/QC folks have work pile
up, broken machines lying around, uncleaned trash?

~~~
midgetjones
It's risk/reward to the people who want to decide how their money is spent,
isn't it?

In your example, the worst-case scenario is that someone could die, and that
tends to spur on investors to discover the probity within themselves to spend
some money avoiding an expensive lawsuit.

But when the devs are complaining about the old code being terrible and making
their lives hard, it never seems to hinder them that much to management. They
keep banging out new features and fixing bugs, and nothing bad seems to
happen. But the drip-drip-drip of bugs keeps increasing, and the new features
take a little longer each time, and nobody dies at least, but the thing
becomes a haunted moneypit that nobody wants to touch, and you're stuck with
it now unless you rewrite it all at huge expense, etc., etc.

Maybe everyone should just treat a piece of software as they would a life. I
bet we've all seen some codebases where if it were a friend, you probably
would have staged an intervention by now. Your software baby needs absolute
care from the get-go until the very end, or it will get sick and probably die,
and most likely in a very prolonged and painful way.

~~~
singingfish
The place I used to work in has been hiring (junior) people like crazy. Part
of the reason they need so many is the crushing foundational technical debt at
the core. When they hired someone to capable of improving that they were
unable to merge the changes due to fear, and the management couldn't see the
business value of doing so. They've had a few nasty outages recently too. I
believe the insides of the Atlassian kit are similarly riddled with technical
debt.

------
kashyapc
This reminds me of the following, from the book Team Geek[1], chapter
"Offensive" Versus "Defensive" Work:

 _[...] After this bad experience, Ben began to categorize all work as either
“offensive” or “defensive.” Offensive work is typically effort toward new
user-visible features—shiny things that are easy to show outsiders and get
them excited about, or things that noticeably advance the sexiness of a
product (e.g., improved UI, speed, or interoperability). Defensive work is
effort aimed at the long-term health of a product (e.g., code refactoring,
feature rewrites, schema changes, data migra- tion, or improved emergency
monitoring). Defensive activities make the product more maintainable, stable,
and reliable. And yet, despite the fact that they’re absolutely critical, you
get no political credit for doing them. If you spend all your time on them,
people perceive your product as holding still. And to make wordplay on an old
maxim: “Perception is nine-tenths of the law.”_

 _We now have a handy rule we live by: a team should never spend more than
one-third to one-half of its time and energy on defensive work, no matter how
much technical debt there is. Any more time spent is a recipe for political
suicide._

[1]
[http://shop.oreilly.com/product/0636920018025.do](http://shop.oreilly.com/product/0636920018025.do)

~~~
hinkley
The XP guys had it right. Amortize all defensive work across EVERY piece of
offensive work.

In the tech debt parlance most people are paying interest only payments
instead of paying against the principle. Every check you write should do both
(extra payments are good but they aren’t good enough).

------
eadmund
It's a great article, but I do have one quibble.

> A hilariously stupid piece of real world foundational debt is the
> measurement system referred to as United States Customary Units. Having
> grown up in the US, my brain is filled with useless conversions, like that
> 5,280 feet are in a mile, and 2 pints are in a quart, while 4 quarts are in
> a gallon. The US government has considered switching to metric multiple
> times, but we remain one of seven countries that haven’t adopted Système
> International as the official measurement system. This debt is baked into
> road signs, recipes, elementary schools, and human minds.

A not-so-hilariously stupid mistake is to think that the traditional
measurement system is stupid. His picture illustrates one of its virtues: the
entire liquid-measurement system is based on doubling & halving, which are
easy to perform with liquids. The French Revolutionary system, OTOH, requires
multiplying & dividing by 10, which is easy to do on paper or with graduated
containers, but extremely difficult to do with concrete quantities (proof:
with one full litre container and two empty containers, none graduates,
attempt to divide the litre into decilitres).

The _real_ foundational debt is that we use a base-10 system for counting, due
to the number of fingers & thumbs on our hands, rather than something better-
suited to the task. If we fixed _that_ problem, then suddenly all sorts of
numeric troubles would vanish. There's actually a lot to be said about the
Babylonian base-60 system, to be honest.

~~~
TeMPOraL
That's an... interesting point I haven't seen brought up before. Makes me
appreciate the "traditional" system more.

Still, I guess we aren't going to drop base-10 any time soon, so I believe the
US should just accept the "traditional" measurement system as something that
used to be very practical, but no longer is due to progress of technology, and
switch to SI.

~~~
LtRandolph
Agreed; this is a really interesting perspective. It points to how different
applications yield different optimizations. Base 60 is fucking cool. I really
like musing on how we arrived at the duration of a second.

I stand by the assertion that being one of 7 countries that only sometimes
uses SI has very real costs. [https://www.jpl.nasa.gov/missions/mars-climate-
orbiter/](https://www.jpl.nasa.gov/missions/mars-climate-orbiter/)

~~~
eadmund
> Base 60 is fucking cool.

It really is! The number of digits might be a bit much for normal use, so
perhaps base-12 is more realistic. If we're going to upend tradition, might as
well do it for good, well-founded reasons …

> I stand by the assertion that being one of 7 countries that only sometimes
> uses SI has very real costs. [https://www.jpl.nasa.gov/missions/mars-
> climate-orbiter/](https://www.jpl.nasa.gov/missions/mars-climate-orbiter/)

Of course, that would have been equally a problem had one team been using
kilogramme-metre-seconds and the other gramme-metre-seconds, and could have
been avoided by standardising on customary _or_ on French Revolutionary units!

------
mitko
Great article, loved how the examples were presented.

In my time as an engineer, I've found that thinking of tech debt as financial
debt also helps. There is the initial convenience (borrowed money) of using
the debt-ed approach. Then there is fix cost as Bill Clark name it, i.e. how
much to pay back the debt if it were money. The impact is akin to the
amortization schedule, i.e. what is the cost every time. For normal money,
amortization schedule is over time, but for tech debt it is over usage. The
amortization schedule of tech debt is discounted over time, as with money,
_now_ is more important that _later_.

Contagion is a great concept, and I think it is a better name than interest
rate, as the debt will spread through the system, and not just linearly with
time.

Tech debt is also multi-dimensional and not fungible like money, which makes
it a harder thing to reason about.

But the good news is, in my opinion, that sometimes it is perfectly fine to
default on some tech debt, and never pay it back, delete the code. Then taking
that tech debt was a win, if the convenience was more than the amortized
payments.

~~~
baddox
I think the main difference is that technical debt is not fungible, i.e. you
can’t necessarily easily choose to pay off the highest-interest technical
debts first like you would for your personal financial debt.

~~~
oculusthrift
put another way: you can have one item that is 5 days of work but really
critical and another that’s 2 days of work but way less critical. If you have
2 days to work on tech debt, you basically are forced to do the 2 day one.
especially since you are evaluated on what you finish, not how much you worked
towards some long goal.

------
mmsimanga
In data warehousing and BI, it's MacGyver and data technical debt all the way
down. MacGyver because of all the "urgent" reports whipped up for CEO,
duplicate copies of data and the reports done by consultants who barely
understand industry. Data dept because of all the bugs and changes passed down
as data from source system.

~~~
worldsayshi
Does any programming paradigms protect better against data debt? The only way
that I can imagine to significantly protect against this would be if there was
some way to generate data migrations based on type changes.

~~~
paulmd
I don't think there is any technical obstacle or pattern which can prevent
dumbasses from shitting things up, humans are just too creative. As soon as
you allow any extensibility, someone is going to start shoving integers in as
strings, "Y/N" strings as booleans, etc.

It would help a lot if there was a well-formed, unambiguous specification for
both sides to hold to. Something like the IETF terminology, in terms of
MAY/SHALL, specifying things like "true/false" vs "Y/N", etc. Providing sample
responses with decent coverage of the possible options is good as well.

Then you at least have the leverage to say "aha but the spec says it should be
like this, why are you doing it wrong".

~~~
worldsayshi
> It would help a lot if there was a well-formed, unambiguous specification
> for both sides to hold to.

It does sound like you're describing the schema language part of GraphQL. I
think that GraphQL is a great tool for making sure that the right stuff goes
in and out. Although it's far from solving all input validation problems. Hmm,
perhaps you're describing a different problem.

After having worked with GraphQL user input validation at least seems like a
manageable problem. There still seems there should be even better methods for
handling contagion problems in the data of historical mistakes though.

------
jeffdavis
What about "fear"?

The most pernicious thing about technical debt, in my opinion, is that it
creates fear in the sense of "I don't want to touch that module".

Even if you try to be objective and use hard facts to overcome the fear, it
doesn't matter, because fear destroys creativity, so you've already lost.

~~~
kraftman
Your tests should reduce that.

------
humanrebar
I might have missed it, but missing from the taxonomy: "Pay In Full" Debt.

In this debt, you pay the entire cost until the last use of it is cleaned up.

This kind of debt is especially insidious because there is no incremental
benefit to cleaning it up.

~~~
piinbinary
I'd be curious to hear an example of that (I don't believe I have personally
seen one that fits that pattern in the wild yet).

~~~
oculusthrift
removing a library? you can remove 100 usages of it but not until every single
one is gone can you remove it.

~~~
TremendousJudge
porting to a backwards incompatible a language version? you can't use most of
Python 3's new features while some part of your codebase is in Python 2

------
jedanbik
Reminds me of risk analysis: Impact times Probability equals Risk.

Contagion seems like a probability factor. Impact is the cost of leaving
things unchanged. Fix cost is the cost of fixing the problem.

Risk management in this context then means comparing Impact cost to Fix cost
in terms of impact for the business.

~~~
stouset
The one difference is that contagion is multiplicative over time (potentially
logarithmically, linearly, or exponentially—probably a reasonable definition
for 1/5, 3/5, and 5/5 respectively).

------
jimmaswell
Somewhat aside, but the brain having to "flip" visual information because it's
"upside down" seems suspect to me. Turn it sideways while maintaining all the
connections it has to the rest of the body, and what changes? Is it getting
visual information sideways that it has to rotate now? Probably not.

~~~
ninkendo
Moreover the idea that the collection of neurons that your retina connects to
has any concept of "orientation" is nonsense to begin with IMO. It's not that
"there's an upside-down image that your brain has to fix", it's just that your
brain interprets signals from your retina as a picture in your mind, full
stop.

Rods/cones in the top of your retina connect to your brain through neurons, so
do the ones at the bottom. But to say that "this 'top' retinal cone should
really connect to a 'top' neuron in your brain", doesn't even make sense to
me. Since when do the locations of the neurons interpreting the input even
matter?

It would be the same with hearing too... you have a left and right ear, but if
for some reason those were swapped and your left fed things to the right half
of your brain and vice-versa, your brain wouldn't be "flipping it back",
because how could the absolute location of the neurons interpreting the sounds
even matter?

~~~
evanwise
This is the right way to look at it. In fact, your brain is plastic enough
that if you wear glasses that flip your vision upside down for several days it
will eventually relearn the mapping of retinal cells to neurons so that you
see things normally while wearing them. This was studied in the 1890s by a guy
called George Stratton.

------
lifeisstillgood
There are writers who just ooze technical depth of understanding - i thinks
it's something to do with trying to explain something at a laypersons level,
but leaving many assumptions just there for the reader to follow. It's almost
the opposite of baffling with bullshit.

Good read and a really useful concept

------
monkeydust
I am a senior product manager for a large financial technology company.

Over the years I have learnt to become comfortable with allowing my
engineering teams to refactor code whilst delivering new functionality.

This has been a process and largely one of trust between me and the
engineering leads.

It has also helped that I have seen payback from the investment made from
reducing down the debt in terms of us delivering new functionality quicker and
less error prone code. Although, this payback can take a while to see (6months
+ which is a long time for a product person operating in a competitive space!)

Most of my managers don't get this or if they do they are too blinded by
immediate kpi's from further above they can't justify it so in most cases I
just tell the engineering guys to add a spread to their estimates to cover the
paydown of the debt.

Over they years this has definitely helped me build tighter relationships with
engineers which as any product manager knows can have huge benefits.

------
billysielu
I find it's always worth asking "will this get better over time, or worse" for
everything, ever. Folks just fail to see past the next few months, having at
least one person in the room asking this question makes them at least ignore
it intentionally instead of complacency.

------
hywel
"I’ve rarely encountered discussions of contagion."

This surprised me: contagion is a good metaphor because it is a compounding
measure of the growth of the problem. Just like an interest rate (a
compounding measure of the growth of debt).

Most senior developers I've met have considered the interest rate of the debt,
which seems like it has been renamed here as contagion. Maybe I've been lucky
to just know smart people!

From the point of view of explaining these concepts, I'd suggest keeping the
metaphors consistent. Tech debt should have an amount owed and an interest
rate, tech infection (?) should have a potency and a contagion level.

------
drawkbox
At pretty much every game studio there is an epic internal battle of standard
libs vs custom. std::string and [some custom string class] here it is AString
is usually the spark. A constant of internal game development is that they
think they can always build better strings, lists, dictionaries, collections
etc than the standard lib, basically thinking the standard lib is as it was in
the 90s and all the work that went into them is bunk. In some cases if you are
really pushing memory and not writing custom allocators or using something
like boost then yes, but in most cases the technical debt of custom classes
written by an ancient from generations ago internally is more technical debt.

> _One of the best examples of MacGyver debt in the LoL codebase is the use of
> C++’s std::string vs. our custom AString class. Both are ways to store,
> modify, and pass around strings of characters. In general, we’ve found that
> std::string leads to lots of “hidden” memory allocations and performance
> costs, and makes it easy to write code that does bad things. AString is
> specifically designed with thoughtful memory management in mind. Our
> strategy for replacing std::string with AString was to allow both to exist
> in the codebase and provide conversions between the two (via .c_str() and
> .Get() respectively). We gave AString a number of ease-of-use improvements
> that make it easier to work with and encouraged engineers to replace
> std::string at their leisure as they change code. Thus, we’re slowly phasing
> std::string out and the “duct tape” interface between the two systems slowly
> shrinks as we tidy up more of our code._

So now there are two string classes, that is technical debt... and one should
be consolidated on and the arguments against std::string are sometimes valid
but you can also do custom memory allocators or use better standard lib
iterations.

EA even rewrote the whole standard lib EASTL [1] to adjust for some of these
issues i.e. fragmented memory. Some games require it, some it is pure ego in
game development teams. Game development teams have the highest ego driven
development (EDD) I have ever seen and lots of _tricks_ that take _five
minutes_ (but add 2-3 months to testing due to five minute solutions) but are
more spaghetti than templates that write templates.

The one problem that comes about with your own standard lib or thinking you
are better than boost or similar, is that the learning curve on the internal
lib replacements add technical debt and start up costs, and the original guy
that wrote them is long gone usually. Also, in the end portability suffers as
there is invariably 3-4 versions of the internal libs.

Developers have to weigh the technical debt of your own custom classes outside
standard libs and see if that outweighs the memory issues that may arise.
Today most machines are not as affected by memory fragmentation issues and
there is more cpu/memory to go around, and where they are you can write custom
allocators for std/stl or use something like boost.

I do love Riot Games and all game development teams just I have never worked
in one or with one that doesn't have the standard lib vs custom battle and
wastes lots of time when one isn't standardized on or when not necessary. Some
games and game engines require it, where they do you should fully commit one
way or the other. Though going custom leads to slowdowns in coding for new
devs and invariably there will be multiple versions of those internal libs
over time that add up in the debt department.

[1]
[https://github.com/electronicarts/EASTL](https://github.com/electronicarts/EASTL)

~~~
badloginagain
One of the biggest problems of this is the tribal knowledge that develops
around it. I worked at a studio that had something very similar to EASTL, but
had joined the studio after an exodus of senior people.

It meant I had no idea how to use the custom libs. No documentation, no one
left in the office to tell me how its used, no Stack Overflow to answer even
trivial questions.

I left after less than a year. The studio closed down 2 months after I left.

~~~
mattnewport
EASTL mostly doesn't have this problem because it as far as possible is a
compliant implementation of the STL with a few specific extensions, mostly
around memory management. It's not a library that provides STL-like
functionality with a different API. Much of the custom allocator stuff has
finally now been superseded with polymorphic allocators in C++17.

Source: former maintainer of EASTL (not the original author).

~~~
badloginagain
I would imagine that being a larger organization, there are dedicated
resources for at least some documentation/POC's for questions/changes/etc.

The smaller the company, the less resources you have to maintain, the more
issues you're going to run into.

------
rickbad68
In my experience often the term 'technical debt' will be hijacked by product-
oriented folks resulting in feature debt being presented as tech debt.

------
arca_vorago
This seems far too focused on dev tech debt, which has a very narrow scope. I
like the article, so I'm not knocking it, just offering a little perspective.
As a senior sysadmin in the past my primary issues have been technical debt
across the entire board, number one being too few hires for too much workload
due to cheap or nearsighted execs, but I would definitely agree that contagion
is a great term for how techdebt grows faster the longer it's left alone.

It's worth remembering the CTO and senior sysadmin and a few others are
dealing with all the tech debt of the entire company and IT department of
which dev is only a subset (of course this depends on the company, but on HN
sometimes I see convos like this where it feels devs are just talking at each
other and not receiving much outside feedback.)

~~~
LtRandolph
I'm not surprised at all to hear that I have blind spots outside of "dev".
I've been working on shipping games for a decade, so I'm very fixated on the
types of stuff I run into day-to-day in that dev process.

~~~
arca_vorago
Nothing wrong with that at all. Has that all been on LoL? You guys did a lot
right with it from the start. Unfortunately I burnt myself out on mobas due to
HoN.

------
scarface74
I've been binge listening to Software Engineering Radio for the past few
months. I am currently listening to an episode where they are talking about
technical debt.

He has the opinion that clean code is not as important as shipping code - ship
the code first and then refactor as needed after you get customers.

[http://www.se-radio.net/2015/04/episode-224-sven-johann-
and-...](http://www.se-radio.net/2015/04/episode-224-sven-johann-and-eberhard-
wolff-on-technical-debt/)

~~~
nkristoffersen
Definitely. I build a lot of MVPs. So I focus on "make it work, then make it
work well".

Shipped code is so much more valuable than unshipped code :-)

------
debt
“We gave AString a number of ease-of-use improvements that make it easier to
work with and encouraged engineers to replace std::string”

Are you absolutely sure this itself won’t become Foundational technical debt?
You seem overly confident, given the metrics, that replacing std::string is a
good decision.

~~~
LtRandolph
We certainly can't know for certain. But we've had a significant, measurable
reduction in CPU cost due to "hidden" memory allocations from things like
passing a char* into a function that takes a std::string and stuff like that.
(I may be being mildly inaccurate, as I wasn't the guy doing the perf captures
etc. I just talked to him about it).

I'm particularly impressed by AStackString, which is a subclass that has
initial memory allocated on the stack, but automatically converts to dynamic
allocation if you exceed that space. So we get quick stack allocation by
default, but it will safely handle when it needs to expand.

Most of the quality of life stuff is around having in-built support for printf
style formatting, string searching (including case-insensitive).

------
jtchang
I love this article. Quickly breaks down the types of debt.

------
carapace
(MacGyver's name is Angus!?)

------
matte_black
I would love the idea of a technical credit score. For example, if you’re the
kind of dev that racks up technical debt and never pays it down, you should
have a shitty technical credit score, and be considered a poor hire. Whereas
someone with great credit, would be a great asset to bring onto the team.

~~~
mic47
Tracking "credit" score sounds like good idea, but I would not go as far and
assuming that persons with bad credit scores are poor hires.

Maybe person who creates tech debt is really great at prototyping, fixing
urgent issues with unconventional methods (aka MacGyver) or do other tasks you
find boring. While credit score of this person will be low, such people are
also great assets in the team.

In general, this metric could be useful as tracking number of pull requests,
lines of code, and so on: to spot anomalies and investigate: maybe that person
is suddenly blocked by something, overwhelmed and need help, or just works
differently, or on different tasks and the anomalous metric is ok.

~~~
brightball
A metric like that would also discourage people from actually documenting the
debt.

“When a metric becomes a target it ceases to be a good metric.”

------
ebbv
Cool write up and classification system. A category that affects us is
VendorDebt. Things that are inflicted on us by external vendors. Classifying
then in a similar manner might help us decide which vendors to dump.

