Contagion is a really great term. I've seen my poor abstractions be replicated by others on my team, to my horror -- "don't they see why I did that in this particular case, and not in this other case?" Of course, that's entirely, 100% my fault. I picked a poor abstraction, I put it in the code, I didn't document it well enough, and of COURSE other programmers are going to look to it when solving similar problems. They should!
That said... Sometimes I spend a bunch of time finding the right abstraction for a feature that we end up not expanding. And then it feels bad that I spent all this extra time coming up with the "right" solution, instead of just hacking out something that works. Hmm...
One thing that made it work is that we worked on it in small slices all the time, without involving the product manager. It was still visible, so there'd be the occasional question, but as long as we kept delivering user value, nobody worried to much about our mysterious code concerns.
There was only one time, where we had every Friday, time to improve the codebase. 2 months later it became every 2nd Friday, though.
I'm really pissed that technical debt is considered as "Hey the dev guys are complaining again".
That's because it's very untransparent to anyone other than the engineers working on a project.
I've had a limited amount of success by making this more transparent. Signaling every time a feature will take longer because of a piece of technical debt the team wants to fix caused the fix to get priority before implementing the 4th and 5th feature affected.
Don't the bean counters at Ford Motor Company (for example) nark on the assembly line workers and industrial engineers and QA/QC folks have work pile up, broken machines lying around, uncleaned trash?
In your example, the worst-case scenario is that someone could die, and that tends to spur on investors to discover the probity within themselves to spend some money avoiding an expensive lawsuit.
But when the devs are complaining about the old code being terrible and making their lives hard, it never seems to hinder them that much to management. They keep banging out new features and fixing bugs, and nothing bad seems to happen. But the drip-drip-drip of bugs keeps increasing, and the new features take a little longer each time, and nobody dies at least, but the thing becomes a haunted moneypit that nobody wants to touch, and you're stuck with it now unless you rewrite it all at huge expense, etc., etc.
Maybe everyone should just treat a piece of software as they would a life. I bet we've all seen some codebases where if it were a friend, you probably would have staged an intervention by now. Your software baby needs absolute care from the get-go until the very end, or it will get sick and probably die, and most likely in a very prolonged and painful way.
The really cool thing is that eventually you're able to deliver large, complex tasks in very brief times and then spring on the PM/management that you're able to do this _because_ you've been refactoring. That's made a believer out of at least one of my PMs.
Obviously this doesn't work in all circumstances - it's not always feasible to get the really systemic, contagious debt cleaned up as part of feature work, and if the PM catches on then it makes this tactic difficult to continue.
The bigger obstacle I've had, though, is other developers who haven't fully bought into a culture of continuous improvement. Fear of breakages causes refactor paralysis, which makes it easier to break things when working on them, which increases fear, and so forth. I'm not really sure the best way to deal with that aside from adding a bunch of unit tests (which I still sometimes get pushback on)
The pushback I received was that keeping the framework code in Source Control would result in it being caught in the JS build/minification script, as well as my spec files. The individual that pushed back was also concerned about JS exceptions since we were up against a release, which speaks to a need for training about how unit test files work. Ultimately I .gitignored the framework folder but wouldn't budge on leaving the test files in, since .gitignoring unit tests defeats the purpose. Then I learned that the build script wouldn't grab those files anyway. :)
My boss at my last job had the mind set of "refactoring only makes it different, not better". I asked him if I could spend some time refactoring our build system. He said no. I eventually did it anyway a few months later, spotted a bug due to the changes, and all of a sudden, build times were cut in half or in 10 in many instances.
Same story for a pretty nasty hunk of code we had for handling sparse arrays. Asked if I could refactor, got told no, did it anyway a while later, and all of a sudden a problem that had been considered borderline infeasible takes like 1 day of work.
There is always some risk that refactoring makes code not only different, but worse. Corner-cases are often there for a reason, and refactoring sometimes misses them, especially when there isn't complete unit test coverage. Since it's often easier to get the core logic right, this likely leads to issues that are discovered in production.
A good boss knows to get out of the way, clear a path if necessary.
If boss can't trust the minions to do the right thing, someone's got the wrong job.
There are many people who have wrong jobs.
More importantly, there are many people who are good, but not perfect. They do some aspects of their work greatly, other aspect less well. Good boss has some idea about that and is able to work with people who are not super great.
Least not last, even very good people often disagree about many things, including whether refactoring is needed or not or what kind of refactoring to do. Even if boss trusted all and listened all, he would still be told plenty of contradictory opinions.
Of course, this only works well on teams where your PM and eng lead don't have a fundamentally adversarial relationship. I like to think this is most teams but does take some getting used to in terms of eng lead and PM communicating priorities and needs, between product moment and code quality.
When an item of debt is first created, the people making it are often well aware of what they have done and are therefore in a relatively good position to fix it, but that knowledge quickly dissipates, to the point where it is often forgotten that there is a specific issue there. Furthermore, there is a tendency for it to be made less obvious as further changes are layered on top and around (this is distinct from contagion, as it can occur if the later changes are themselves debt-free, or at least independent of the decisions that created the debt and their consequences.)
An entire day is excessive in a CD setup, but for a two week release cycle it worked well. Kept the rough edges out of customer view very well.
'tl;dr "contagion" is the most important attribute because its properties are similar to interest rates. Having a small loan (small impact/fix cost) but high interest rate (high contagion) can quickly dwarf large loan small interest rate.'
My gut feel is that it's not necessarily about what you write in the first place, but what you refactor -- sometimes you can get away with a gradual replacement strategy (like std::string => AString from the article), but if the original pattern is contagious and bad, then you might have to take a more aggressive one-shot refactoring approach.
I've definitely seen this where a localized refactor is made to try to find a better way of doing something, we decide that we like the new way, and then don't find the time to replace the rest of the usages, resulting in a confusing state of affairs where you need to know which is the "blessed"/"correct" way of doing things.
I think that "contagion" is a good lens to use when assessing what the refactoring strategy should be for a given change to the codebase.
The worst offender was a team that once used the prototype of a shared class as a mixin, duplicated/mocked just enough of my implementation logic to get three or four methods working, and then left it at that. Of course, the next time I changed any of my code, even in the constructor, their page broke.
My experience has been that when other teams see these patterns, they see a single page or feature that's working at the moment and assume "this must be fine." They don't see the three or four frantic show-stopping bugs that got logged last month.
When I would confront teams about this, often the response that I would get was "Well, if it's good enough as a quick fix for them, why can't we do the same thing? Why are we the only team that has to fix this?"
Of course, when teams don't want to be the first one to break from a bad pattern, the end result is that nobody changes anything.
“Comments are for human context, code is for computers”
I have implemented a bunch of things that, while helpful short term, had clumky hacks to make up for either lack of tooling, or due to time constraints. And then the solutions get replicated verbatim, because "they work". The more time passes, the worse they become.
[...] After this bad experience, Ben began to categorize all work as either “offensive” or “defensive.” Offensive work is typically effort toward new user-visible features—shiny things that are easy to show outsiders and get them excited about, or things that noticeably advance the
sexiness of a product (e.g., improved UI, speed, or interoperability). Defensive work is effort aimed at the long-term health of a product
(e.g., code refactoring, feature rewrites, schema changes, data migra-
tion, or improved emergency monitoring). Defensive activities make the product more maintainable, stable, and reliable. And yet, despite
the fact that they’re absolutely critical, you get no political credit for
doing them. If you spend all your time on them, people perceive your product as holding still. And to make wordplay on an old maxim: “Perception is nine-tenths of the law.”
We now have a handy rule we live by: a team should never spend more than one-third to one-half of its time and energy on defensive work, no matter how much technical debt there is. Any more time spent is a recipe for political suicide.
In the tech debt parlance most people are paying interest only payments instead of paying against the principle. Every check you write should do both (extra payments are good but they aren’t good enough).
> A hilariously stupid piece of real world foundational debt is the measurement system referred to as United States Customary Units. Having grown up in the US, my brain is filled with useless conversions, like that 5,280 feet are in a mile, and 2 pints are in a quart, while 4 quarts are in a gallon. The US government has considered switching to metric multiple times, but we remain one of seven countries that haven’t adopted Système International as the official measurement system. This debt is baked into road signs, recipes, elementary schools, and human minds.
A not-so-hilariously stupid mistake is to think that the traditional measurement system is stupid. His picture illustrates one of its virtues: the entire liquid-measurement system is based on doubling & halving, which are easy to perform with liquids. The French Revolutionary system, OTOH, requires multiplying & dividing by 10, which is easy to do on paper or with graduated containers, but extremely difficult to do with concrete quantities (proof: with one full litre container and two empty containers, none graduates, attempt to divide the litre into decilitres).
The real foundational debt is that we use a base-10 system for counting, due to the number of fingers & thumbs on our hands, rather than something better-suited to the task. If we fixed that problem, then suddenly all sorts of numeric troubles would vanish. There's actually a lot to be said about the Babylonian base-60 system, to be honest.
Still, I guess we aren't going to drop base-10 any time soon, so I believe the US should just accept the "traditional" measurement system as something that used to be very practical, but no longer is due to progress of technology, and switch to SI.
I stand by the assertion that being one of 7 countries that only sometimes uses SI has very real costs. https://www.jpl.nasa.gov/missions/mars-climate-orbiter/
It really is! The number of digits might be a bit much for normal use, so perhaps base-12 is more realistic. If we're going to upend tradition, might as well do it for good, well-founded reasons …
> I stand by the assertion that being one of 7 countries that only sometimes uses SI has very real costs. https://www.jpl.nasa.gov/missions/mars-climate-orbiter/
Of course, that would have been equally a problem had one team been using kilogramme-metre-seconds and the other gramme-metre-seconds, and could have been avoided by standardising on customary or on French Revolutionary units!
In my time as an engineer, I've found that thinking of tech debt as financial debt also helps. There is the initial convenience (borrowed money) of using the debt-ed approach. Then there is fix cost as Bill Clark name it, i.e. how much to pay back the debt if it were money. The impact is akin to the amortization schedule, i.e. what is the cost every time. For normal money, amortization schedule is over time, but for tech debt it is over usage. The amortization schedule of tech debt is discounted over time, as with money, _now_ is more important that _later_.
Contagion is a great concept, and I think it is a better name than interest rate, as the debt will spread through the system, and not just linearly with time.
Tech debt is also multi-dimensional and not fungible like money, which makes it a harder thing to reason about.
But the good news is, in my opinion, that sometimes it is perfectly fine to default on some tech debt, and never pay it back, delete the code. Then taking that tech debt was a win, if the convenience was more than the amortized payments.
It would help a lot if there was a well-formed, unambiguous specification for both sides to hold to. Something like the IETF terminology, in terms of MAY/SHALL, specifying things like "true/false" vs "Y/N", etc. Providing sample responses with decent coverage of the possible options is good as well.
Then you at least have the leverage to say "aha but the spec says it should be like this, why are you doing it wrong".
It does sound like you're describing the schema language part of GraphQL. I think that GraphQL is a great tool for making sure that the right stuff goes in and out. Although it's far from solving all input validation problems. Hmm, perhaps you're describing a different problem.
After having worked with GraphQL user input validation at least seems like a manageable problem. There still seems there should be even better methods for handling contagion problems in the data of historical mistakes though.
The most pernicious thing about technical debt, in my opinion, is that it creates fear in the sense of "I don't want to touch that module".
Even if you try to be objective and use hard facts to overcome the fear, it doesn't matter, because fear destroys creativity, so you've already lost.
In this debt, you pay the entire cost until the last use of it is cleaned up.
This kind of debt is especially insidious because there is no incremental benefit to cleaning it up.
As far as I'm aware, it's still a 32 bits application.
Contagion seems like a probability factor.
Impact is the cost of leaving things unchanged.
Fix cost is the cost of fixing the problem.
Risk management in this context then means comparing Impact cost to Fix cost in terms of impact for the business.
Rods/cones in the top of your retina connect to your brain through neurons, so do the ones at the bottom. But to say that "this 'top' retinal cone should really connect to a 'top' neuron in your brain", doesn't even make sense to me. Since when do the locations of the neurons interpreting the input even matter?
It would be the same with hearing too... you have a left and right ear, but if for some reason those were swapped and your left fed things to the right half of your brain and vice-versa, your brain wouldn't be "flipping it back", because how could the absolute location of the neurons interpreting the sounds even matter?
Incidentally, these neurons theoretically could go anywhere (as long as they're connected correctly), but in practice they end up arranged retinotopically (https://en.wikipedia.org/wiki/Retinotopy).
> light, coming from your right, hits a cone on the left of your retina. Light coming from above, hits a cone on the bottom of the retina.
Maybe person who creates tech debt is really great at prototyping, fixing urgent issues with unconventional methods (aka MacGyver) or do other tasks you find boring. While credit score of this person will be low, such people are also great assets in the team.
In general, this metric could be useful as tracking number of pull requests, lines of code, and so on: to spot anomalies and investigate: maybe that person is suddenly blocked by something, overwhelmed and need help, or just works differently, or on different tasks and the anomalous metric is ok.
“When a metric becomes a target it ceases to be a good metric.”
Good read and a really useful concept
Over the years I have learnt to become comfortable with allowing my engineering teams to refactor code whilst delivering new functionality.
This has been a process and largely one of trust between me and the engineering leads.
It has also helped that I have seen payback from the investment made from reducing down the debt in terms of us delivering new functionality quicker and less error prone code. Although, this payback can take a while to see (6months + which is a long time for a product person operating in a competitive space!)
Most of my managers don't get this or if they do they are too blinded by immediate kpi's from further above they can't justify it so in most cases I just tell the engineering guys to add a spread to their estimates to cover the paydown of the debt.
Over they years this has definitely helped me build tighter relationships with engineers which as any product manager knows can have huge benefits.
This surprised me: contagion is a good metaphor because it is a compounding measure of the growth of the problem. Just like an interest rate (a compounding measure of the growth of debt).
Most senior developers I've met have considered the interest rate of the debt, which seems like it has been renamed here as contagion. Maybe I've been lucky to just know smart people!
From the point of view of explaining these concepts, I'd suggest keeping the metaphors consistent. Tech debt should have an amount owed and an interest rate, tech infection (?) should have a potency and a contagion level.
> One of the best examples of MacGyver debt in the LoL codebase is the use of C++’s std::string vs. our custom AString class. Both are ways to store, modify, and pass around strings of characters. In general, we’ve found that std::string leads to lots of “hidden” memory allocations and performance costs, and makes it easy to write code that does bad things. AString is specifically designed with thoughtful memory management in mind. Our strategy for replacing std::string with AString was to allow both to exist in the codebase and provide conversions between the two (via .c_str() and .Get() respectively). We gave AString a number of ease-of-use improvements that make it easier to work with and encouraged engineers to replace std::string at their leisure as they change code. Thus, we’re slowly phasing std::string out and the “duct tape” interface between the two systems slowly shrinks as we tidy up more of our code.
So now there are two string classes, that is technical debt... and one should be consolidated on and the arguments against std::string are sometimes valid but you can also do custom memory allocators or use better standard lib iterations.
EA even rewrote the whole standard lib EASTL  to adjust for some of these issues i.e. fragmented memory. Some games require it, some it is pure ego in game development teams. Game development teams have the highest ego driven development (EDD) I have ever seen and lots of tricks that take five minutes (but add 2-3 months to testing due to five minute solutions) but are more spaghetti than templates that write templates.
The one problem that comes about with your own standard lib or thinking you are better than boost or similar, is that the learning curve on the internal lib replacements add technical debt and start up costs, and the original guy that wrote them is long gone usually. Also, in the end portability suffers as there is invariably 3-4 versions of the internal libs.
Developers have to weigh the technical debt of your own custom classes outside standard libs and see if that outweighs the memory issues that may arise. Today most machines are not as affected by memory fragmentation issues and there is more cpu/memory to go around, and where they are you can write custom allocators for std/stl or use something like boost.
I do love Riot Games and all game development teams just I have never worked in one or with one that doesn't have the standard lib vs custom battle and wastes lots of time when one isn't standardized on or when not necessary. Some games and game engines require it, where they do you should fully commit one way or the other. Though going custom leads to slowdowns in coding for new devs and invariably there will be multiple versions of those internal libs over time that add up in the debt department.
It meant I had no idea how to use the custom libs. No documentation, no one left in the office to tell me how its used, no Stack Overflow to answer even trivial questions.
I left after less than a year. The studio closed down 2 months after I left.
Source: former maintainer of EASTL (not the original author).
The smaller the company, the less resources you have to maintain, the more issues you're going to run into.
Certainly there are tradeoffs where you may have to know the standard libraries well enough to know their performance characteristics, or how best to mitigate worst case scenarios, but if the people paid to build standard libraries are doing their jobs (that you pay them for when you buy that compiler), it should be less debt work to workaround an existing solution than build one from scratch.
For places that have their own legacy containers and actively try to move more code to them—I dunno! I think at some point back in the 90’s the standard library got the reputation of being junk (perhaps rightfully) among game programmers, and this belief has been cargo culted all the way into 201x. Who knows.
Sometimes you reimplement a certain standard class (vector, string...) to adapt it to the very needs and usage patterns you have. Standard libs tend to be too general, plagued with allocations and other useless (in this specific context) behaviors that may negatively impact your performance/cache friendliness/memory fragmentation...
I agree a simple tiny game doesn't need all of these but when you need to squeeze all the performance you can there's no other option.
So please, do not just dismiss all the gamedev wisdom like that.
I clearly stated there are good reasons to do so and some games do require it. Mostly though they don't.
> disabling C++ exceptions
stl::throw is pretty lightweight unless you use the exception objects, you can not catch exceptions, and you can also pass -fno-exceptions.
RTTI merely helps with casting, usually none of that is going on at runtime as game loops need to be clean and perform zero allocations if possible, it should be already loaded up in memory, architecture of the game loop and game can remove this concern. You can also disable RTTI with -fno-rtti and enable it per class with virtual void nortti(); per class or on ms compiler __declspec(novtable) per class.
Rarely do exceptions or RTTI affect the game loop and framerate as most of that should not be needed during runtime game loops.
Usually the complaints that are valid are about allocations/fragmentation but you can also write custom allocators and other solutions and like you mentioned the code style/api style. It can also be a simplification not using stl but usually things start to grow in custom libs to re-implement much of the same functionality.
>> EA even rewrote the whole standard lib EASTL  to adjust for some of these issues i.e. fragmented memory. Some games require it, some it is pure ego in game development teams.
In engineering there has to be a GOOD REASON(s) to start maintaining buckets of new code and libs. There are also ways to do it that still allow for most of the standard and promote documentation and understanding to it.
EASTL is a great way to go about it and I linked to it to demonstrate that.
I was mainly calling out using both standard and custom, that seems like more technical debt, if you truly do need custom libs then go all in. Having both lead to more problems but understand there is weight/debt to it and it isn't always better.
> So please, do not just dismiss all the gamedev wisdom like that.
In no way did I dismiss it, I just said there is a constant of this battle (stl/boost/others vs custom) in all gamedev studios and many times it is unnecessary bike shedding and yak shaving that doesn't have a runtime difference on the game or make the game better.
Find me a game studio that doesn't have a stl/standards vs custom battle and I say... wait for it...
We did a ton of tracing and perf/memory captures to identify that string allocations were a significant drain in many locations in the code. We don't see those issues with our other uses of std:: (vector, unordered_map, set, etc.), just with std::string. So it was a logical place to do targeted optimization.
We did that optimization before lua because of the fact that there's a very clean way to make the foundational debt into MacGyver debt, since there's a trivial conversion between std::string and AString. Sadly we haven't been able to come up with any bite-sized moves that we can do to phase out the wasteful use of lua as kvp storage buckets. It's an all-or-nothing problem that makes it a much bigger chunk of work to undertake.
We were in a similar situation and using our own string implementation improved performance and reduced memory fragmentation.
Some people refuses to accept that certain software cannot rely on general purpose libraries and need to roll their own solution adapted to their specific needs.
Usually standard lib vs custom arguments end up in the weeds like tabs vs spaces at game companies but ultimately it has almost nothing to do with framerate or runtime. Largely it is about that EGO. Why maintain a standard lib instead of improving gameplay and networking? Well some want to be a lord in their feifdom where they are the controller the code and the ring.
Riot Games has std::string and AString, but what happens when player two enters the game and you got BString? Then BString invites its friends and you got CString and DString. Now your 'standard' has many standards and is more standardy and like warring lords within internal factions like a Game of Thrones.
It's worth remembering the CTO and senior sysadmin and a few others are dealing with all the tech debt of the entire company and IT department of which dev is only a subset (of course this depends on the company, but on HN sometimes I see convos like this where it feels devs are just talking at each other and not receiving much outside feedback.)
He has the opinion that clean code is not as important as shipping code - ship the code first and then refactor as needed after you get customers.
Shipped code is so much more valuable than unshipped code :-)
Are you absolutely sure this itself won’t become Foundational technical debt? You seem overly confident, given the metrics, that replacing std::string is a good decision.
I'm particularly impressed by AStackString, which is a subclass that has initial memory allocated on the stack, but automatically converts to dynamic allocation if you exceed that space. So we get quick stack allocation by default, but it will safely handle when it needs to expand.
Most of the quality of life stuff is around having in-built support for printf style formatting, string searching (including case-insensitive).