There's the case of mysterious and unsolvable breakage. The product simply stops working, and the team is unable to get it working again, period. This can happen with really ancient legacy products where the original team is gone, or young products that are written badly by inadequate teams.
There's the case of unpleasantness. A product is so difficult and slow to work on that the company simply loses interest in it, and shuts it down rather than suffering through more maintenance. This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.
There's real antiquation. The product is dependent on a product of an outside vendor that is no longer available/maintained. I've dealt with this on a mainframe replacement, and it was horrible. I've also dealt with this in Java, and it was plenty painful there too.
And finally, there's replacement. A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language, and by the gods, this time it's not going to be stupid and suck like that piece of crap the morons on the old team built! Most of these projects fail before they ever replace the old, working code, so I'm not sure this counts as technical debt failure.
One thing often feeds on the other. Because the system is hard to change, it does not get necessary features. Because it does not have necessary features, it provides less business value. Because it provides less business value, there is less of a budget for improving it. And so on.
If there are changes in future numpy versions we want, it's up to us to backport them, which is nowhere near our core business.
There's a lot to be said for standardization and 'boring.'
Now add in a script that updates this artefact with the latest version, breaking changes, and it all goes wrong and it's hard to find the correct previous version of the binary artefact to build your code any more. Especially if the build has been broken for a while due to the project being on the back-burner and it quietly dies when no-one is looking.
Also, this case can impact not just products, but organizations. You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work. I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)
Sounds like Rational ClearCase.
I actually thought Subversion would be the last version control system, when it came out. Of course, now it's git. Maybe someday we'll get something better, and git will look decrepit.
I'm not an expert on these innerworkings, but in theory there's nothing stopping someone from creating a new UI that maintains most or all of the same strengths, except that everyone already uses and is used to the current way.
I suspect if you came out with "SuperVCS" that was ultimately just a new UI on Git you'd have more success than releasing the exact same project as some kind of Git enhancement.
In the cases of this I've seen, it's always been because management and team priorities were unaligned.
Management in those places cared about a minimum level of productivity and minimizing risk.
Teams cared about maximizing productivity and their work days not sucking.
As long as teams kept managing to soldier through... rarely saw things change in those shops.
mysterious and unsolvable breakage: Helping another startup work through one now. It's a case of reclaiming functionality from a mystery outsourced codebase (without source control) meets inexperienced developers who try their hand at sysadmin plus a 100% rotated bevy of actors (the whole team, PM and all, have jumped ship), no documentation and no technical oversight. Offshore outsourcing adds cultural fun.
unpleasantness: I would expand this to unpleasant or incomprehensible. I have seen projects be de-resourced because of lack of management comprehension when they literally paved the best and most rapid path to profit (later taken successfully by the now-dominant competition).
antiquation: The best example of this I've seen was a hardware product an employer was developing as a joint venture in Taiwan early in my career. Engineers had made the decision to use a sucky chipset from a struggling company to save money, but the supplier went under and the API froze (bugs, missing functionality and all) before our product development could complete. The target feature set was literally impossible to implement on the hardware and nobody wanted ownership. Many millions of USD, wasted.
replacement: It can work out, just infrequently. Generally when it works it's a smaller system with well defined interfaces.
I'd imagine a number of these cases are caused by a heavy reliance on third party technologies that are no longer supported, or very few people still understand.
If the code base is not a pile of dung anyway, the cost of find/replacing and refactoring obsolete or replaced api:s once is so much smaller than the running costs of maintaining an extra layer of leaky abstractions for many years.
It is guaranteed that the abstraction will not work without a lot of changes anyway, and what typically takes the most time is the regression testing.
This is perhaps a subtle and underappreciated reason that code quality is so important. If you're looking at an obviously well-written piece of code and you see something you don't understand, you can figure it's probably there for a good reason. If the code has visible sloppiness, it's much more difficult to tease apart the good parts from the bad.
300 column lines: Support for management buying everyone nice, new, giant monitors.
Logic in 20 places: But what if we want subtle differences between each implementation?
On a more serious note, a product that I've worked on was started in about 1998 in C++. We support something like 15 different platforms, and we've got our own implementations of things like vectors because we needed a least common denominator codebase; the standard libraries of a lot of platforms didn't provide what we needed, or provided implementations that were incompatible with other platforms. By the time everything we needed to support was modern enough (in about 2010), the system had a few million lines of code, and replacing things with library functions/classes would've been a nightmare. New development is saner, but the legacy stuff is entrenched.
On the plus side, there are only a few platforms that they still have to support gcc 3.x on, and all the ones that ran on 2.x are out of support (until a customer holds a few million dollars in management's face, as happened a few weeks ago with AIX 5.1).
G.K. Chesterton, 1929:
>In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.
Or "They did what they did because they were the first one's to do it, and it only looks whack in hindsight"
Or "They did what they did because they were living within totally different constraints - like having to support old crap browsers like ie9, or a 'lowest common denominator' of slow end-user PCs etc., or some old chipset on the firmware code they wrote, or some old language paradigm, old libraries etc."
They did something in an horrible way knowing it was horrible but because they have been asked to deliver a feature as soon as possible and at any cost.
This is a slippery slope. It lets a company move faster till it reaches the point that the software becomes an unmaintainable pile of hacks.
Technical debt may not have killed the company directly, but we have to wonder how we might have done if we could have spent more of our time on new development.
The project wasn't killed specifically because "you have technical debt". It was killed because there was no way for anyone to be effective with the combination of poor undocumented code.
"We need to change the email message that goes out when someone registers". This took a team of (4?) people 5 calendar days to change. As a contractor, I had to vpn in to one system, then remote desktop over another vpn to another system. Building web apps, these dev systems were not allowed to talk to the internet at all, so things like pulling external dependencies (security libraries, templating libraries, etc) was impossible - pretty much everything was handrolled, largely due to this restriction.
The last big killer was that the system was not passing accessibility audits. Trying to determine where to make a change to any single element would take minutes to hours, vs seconds to minutes you'd normally expect. Much of the 'templates' used were the result of a SQL statement joining 12 tables (html_meta, html_form, html_link, html_grid, etc) and complex concat()s, so adding a page or making a change might take an hour to track down the appropriate collection of tables, then figure out a SQL script to run, then send it to the person who had permissions to make updates to the SQL, then wait and see.
Did the technical debt itself kill the project? Technically no, but the inability to do anything productive in a reasonable amount of time forced the project to shut down.
I went through one of these projects. The tech debt was never as bad as you describe, but it was a small company operating on a short runway. It also taught me an unfortunate lesson about non-technical founders and the dangers of outsourced code.
The MVP for the company had been bought off the shelf. It worked fine, but the code was abstruse and utterly resistant to change. As the price (in time and dollars) of change requests grew, they sensibly in-housed development. Unfortunately, their clients had some idea what to expect in terms of features per day and dollar. Requests like "let us use our logo and custom color scheme" turned out to be serious challenges since every color and style decision was clumsily hardcoded, so we took far too long to achieve them.
Ultimately, we ended up a contract behind - bringing in business to fund delivering on the previous request. Most startups operate under the gun like that (with either fundraising or contracts), but they start there and labor to escape. We started solvent, and had no clear plan to break out of tech debt - a rebuild would have been too slow, 'working smarter' wasn't viable, and expanding the tech team would have come too late and too costly.
So, we died. Not because we couldn't do work, but because we couldn't do it at a competitive speed.
Stop me when you recognize this one: "Hey your product is great, but we really want something that does [totally different thing]. If you just add that thing, we will pay for all the NRE and you can sell it to others as part of your product! Win win!" Advice to junior developers: If you hear such talk in the hallway, RUN!
P.S. "Are you" is not directed to the OP but to the business owners/leaders that don't know what they are doing.
On the other hand, when they ask for something off the roadmap, we get into more complex issues (is this market-demand data, or custom work?) Particularly for grunt-level custom work (say, adding a support for tracking data on a niche wearable device that we don't currently support) there's a lot more questions that follow.
One of the most insidious of the latter, IMO, is that if it's just for one contract, then we're either hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer. At our small size and need for high-quality people, I consider this to be a real cost too.
>when they ask for something off the roadmap
Then we also get side tracked and lose focus. Leadership and management expend too much energy trying to figure out what to do. Then they want estimates from the developers so they can figure out an estimated ROI. But they rarely seem to worry about the true income potential, focusing mostly on just the initial development cost.
Pursue it? Don't pursue it? If we do, how will we? Will we be >hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer.
Then is it really surprising that this lack of focus and discipline trickles down to those doing the work and the work itself? Technical debt in the making. It starts at the top.
One of the more frustrating things I've experienced is when I got push-back for implementing more project management process (we have a very light process, but when I took over it was sticky-notes-on-the-desk level). The complaint was "we can't slow down development to do more process". Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.
But of course, accurate estimates were needed, just, you know, without making measurements. I implemented some process anyway. We actually increased development speed from less churn and lowered communicated (consult docs before breaking someone's flow), improved estimates, and we've been able to better contain our tech debt.
I suspect you could go a long way with the heuristic "If engineering asks for more process, always give it to them."
It's not flawless, but it's like hearing Ron Paul call for a new regulation - when a request is that out of character, you should usually suspect that there's some good motivation.
He also added that if you're a product shop doing less than 70% off-the-shelf, you're probably screwed, while 90% off the shelf is really the ideal (again, enterprise software).
I think the more interesting question is "what counts as professional services?" This gets much trickier, for example when you start building out APIs to make second- or third-party integrations easier, is that "product" or "professional services"? It certainly seems like product building, but if you're doing for a customer's use, it gets real blurry real fast. If you're not using that API internally, you're almost certainly on the professional services side. If you do use it internally, is it rock solid enough that you can support and expose it without that support becoming professional services?
Drawing sharp lines aside, this all probably seems kind of trivial, but the first time I ran through our product design with him and we discussed this, I went back and radically re-thought a lot of our strategy, particularly at the customer interfaces.
It was enterprise sales, so customization was unavoidable, but no one was differentiating between big and small changes, or big and small buyers. The product was desperately struggling to do ~3 things at once, and still being sold to potential buyers on the promise of a fourth thing it would do "soon".
The last one even spun off a dedicated team that built (hacked) prototype customs in order to secure sales, then threw away the prototype and, after collecting the commission, told the new customers that it would take several years to get what they just saw in production but in the meantime we can do our existing product with some mods.
I imagine the pressure to accept these deals is immense though. Why let an innocuous little feature request hold up such a great deal?
That was part of the problem: the sales people couldn't push back on most requests because they were often quite reasonable. When they were more demanding, it was usually from a large prospective buyer so we had to bend over backwards.
The result was that we had huge tasks to do with no (current) revenue, and small tasks to do that took 10x as long as they should have. Since servicing existing revenue streams (even on reasonable requests) became so time-consuming, handling big enterprise demands became totally untenable.
Our target market was very reluctant to moving from a paper system to a software system, so there was a lot of foot-dragging and feature requests. That delay had just never been budgeted into schedules or runway.
And it was one line of code, after several hundred lines had been torn out and rearranged to ensure that different clients could insert their own pictures of different sizes without everything exploding. The whole team was desperately trying to force enough flexibility into the software that one-line changes could be made in <10 lines, instead of >100.
"But I have all this speed. I'm agile. I'm fast. You know, this easy stuff is making my life good because I have a lot of speed."
What kind of runner can run as fast as they possibly can from the very start of a race?
[Audience reply: Sprinter]
Right, only somebody who runs really short races, okay?
But of course, we are programmers, and we are smarter than runners, apparently, because we know how to fix that problem, right?
We just fire the starting pistol every hundred yards and call it a new sprint.
...It's my contention, based on experience, that if you ignore complexity, you will slow down.
You will invariably slow down over the long haul.
...if you focus on ease, you will be able to go as fast as possible from the beginning of the race.
But no matter what technology you use, or sprints or firing pistols, or whatever, the complexity will eventually kill you.
It will kill you in a way that will make every sprint accomplish less.
Most sprints will be about completely redoing things you've already done.
And the net effect is you're not moving forward in any significant way.
Some people really seem(ed) to have an allergy to plain files for storage. A plain file with OS level caching will beat most (if not all) databases for static content. But doesn't sound as fancy, so it's probably harder to charge a lot of money for it.
Also, just repeated your comment to a friend who said "that's the worst thing you've seen? can i have your job?" :)
Your story here makes me laugh if only because of a very painfully familiar memory. Luckily this wasn't a big production system but rather an internal tool (that I guess clients did also use but it wasn't part of 'production' per se) that was written entirely in perl_cgi filled with cryptic regular expressions written in complete spaghetti code and it would concatenate together entire webpages that had bits of them rendered by including the contents of files strewn all over the file system and of course the logic to concatenate all the html together was strewn across a fistful of files which were in disparate locations. In short I was once asked to make a simple change to some html and after 5 days of reading through perl_cgi and developing a pure hatred for Larry Wall, I decided to do a java re-write that took 3 days. I mean... crikey. Haha.
This means lots of business-rule crap gets softcoded into the database or ini files (increasing complexity and bug-risk) just to support a hypothetical future where somebody needs it changed without a full sprint cycle.
No one could install anything locally - everything had to be done on their locked down remote systems (some were Amazon remote desktops).
For the accessibility testing, the auditing company used JAWS. The company I was contracting to had one license (or so I was told) so I couldn't have one. We actually tried to install JAWS on an Amazon desktop, but it just crashed the entire virtual desktop, requiring re-imaging. That happened twice, so we gave up.
So, the proposed workflow was, I'd make a change, push code, email someone to move that code to a system that an internal tester could look at it. I'd get an email back, then email the internal tester that the code was ready to go look at. The internal tester would go to the screen(s) in question, using JAWS, then "tell me what JAWS said". That would often take several hours or a day.
I was then supposed to make changes based on that feedback, then repeat the cycle until things were 'fixed', then we'd ask the auditing company for another test, which they'd schedule for 2 weeks in the future. Then we'd wait.
During the first iteration of this part, sr mgrs kept asking me "when will this be done?". I kept trying to explain that we didn't even know what "done" was - the auditing company just had blind folks that would use the system with JAWS enabled and if they felt it was usable, they'd say so, otherwise, they'd report back "hey, this isn't usable", and we'd have to start digging in again.
I don't see how a big project could be coded without containing anything specific to the project. And even then, the architecture by itself is unique and deserves documentation.
The second time was in a small company whose product was a search engine for consumers. The web layer was written in a mixture of JSF, JQuery and Ajax. While that combination already slowed down development on the front end, the main problem was the performance of JSF on the server. Because JSF is rendered on the backend, it placed a massive load on our server for certain heavily used pages and we just couldn't scale any further. Skipping JSF for a framework that was rendered on the front-end would be the solution but that was a massive refactor for which the company just didn't have enough resources. Eventually the company had to skip their search product and change their business model to a more community based website.
I wonder, would the result be different if you had access to competent Eiffel developers? How large was the Eiffel codebase?
Eiffel is an interesting language, with a somewhat unique feature-set (I think only Ada is coming close). Design by contract and static typing as core language features - if used right - should greatly help with both stability and ease of refactoring.
How large the codebase was is an important question, also how bad it really was. I saw a similar story - external codebase getting worse and worse from some point on - with Clojure at the center. The code quality was quite ok for a couple of months, then it worsened. At that point and for a couple of following months the codebase was possible to save - a single competent Clojure programmer would make a difference, I think. The project was less than 10k LOC then. However, more than 1.5 years and 60k LOC later, doing anything became nearly impossible for anyone, including original authors.
OMG no - run for the hills.
95% of software systems are not inherently sophisticated - they are 'complex' - yes - maybe there are many features, and moving parts - but there are no pieces of the system that should be hard to understand by anyone. Decent architecture + decent design and coding and an entire banks system should read like a long, but well articulated user manual.
Unless you're doing super low-level stuff, complex algorithms, heavy math stuff, or issues with massive scale or performance etc. ... the end result should almost be mundane in most cases.
That is, technical debt is not necessarily tangled over-engineered code. It is more compromises that were made to actually ship and operate in the world. You can see this in the world with devices.
Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices. It is not the reason that your coffee machine that also grinds and whatever, is likely to fail within the year.
Another example; Technical debt is the reason we are still predominantly using petrol for automobiles. It is not the reason the dashboards are horribly non-responsive on modern cars.
Bad example. AC power has many desirable characteristics for the local transmission grid. If you were to do the grid over from scratch you'd still use AC. You're also too focused on household electronic usage, which is a very tiny percentage of the overall electricity used.
Which, amusingly, is fitting for the tech debt debate. Eradicating some choices from the project is likely to be missing the point. Just as eradicating AC from all power would be short sighted/wrong.
It is much cheaper to have a power supply on every electronic device turning 100-200 volts to 5 volts than to have one big power supply turning power line voltage to 5 volts. Of course a lot of computers need 3 volts or less, so the power supplies exist anyway. It is also more efficient big power supplies running at low loads are inefficient, the power supply on each device is sized to what the device needs and so it more likely to be operating in a high efficiency area.
That's orthogonal. What you really mean is that you want high(ish) voltage to distribute power in a home, in order to mimimize losses due to wire resistance over distances of dozens of meters.
You don't need AC to do that. In fact, with modern power electronics, the switching converters we now use for supplying LVDC to our devices can work just as well with DC as with AC input power.
The primary advantage of AC over DC is that it can be converted between voltage levels easily with transformers. But today, we can do the same thing with DC using DC-to-DC converters. These didn't really exist in an economical way before a couple decades ago, maybe even more recently.
If for some odd reason, western society decided to re-engineer and replace the whole power grid, it's quite likely I think they would simply switch to DC for everything. With deployment at that scale, the cost issues with the equipment should go away, making it no more expensive to replace everything with DC converters than transformers. DC is more efficient than AC because it stays at its peak voltage, and because it has no skin effect. But the technology needed to make it inexpensive to use for power transmission has only been around for a somewhat short time (namely, modern power electronics). Up until recently, it was simply a no-brainer to use AC because of its simplicity in generation, transmission (with transformers for stepping up the voltage), and usage (with AC motors).
I'm not sure. AC has some important safety considerations that would make it better even if the efficient was significantly worse.
Switches, fuses and circuit breakers that work with DC are more expensive than AC. When a circuit opens there is a spark, and this spark can in some cases create a conductive plasma. With AC the wave goes to zero and the plasma disappears, while with DC it continues. There are cases where a DC fuse blew but the fuse continued to conduct. Of course this can be engineered around, but generally with larger and more expensive parts.
When someone touches power accidentally, AC is slightly safer. With DC your muscles will grab and never let go. AC gives you a chance to let go. This is a low probability thing, but is a factor.
The guy who wanted us to debate is wrong for one other reason though: I'm approaching the limits of what I know on the subject, while you seem to have a lot more knowledge.
Basically, he was assuming practical real-world considerations, I'm going off on a tangent about ideal conditions. His argument is about whether it's better to stick with the current AC system that your house has, or if it's better to install a low-voltage DC system to supply 5V, 12V, etc. to all your devices from a single, central, whole-house power supply as many people who don't understand electricity will frequently suggest. He's completely correct: low-voltage DC is a terrible way to supply power over any distance more than a meter or two because of resistive losses, so it'd require massively large copper cables or busbars. And power supplies are generally very low-efficiency when operated at low load. So our current approach (separate little optimized power supplies for every device, plugged into a higher-voltage AC supply) is actually optimal.
Which seems compatible with what you are saying, but the parent was specifically claiming I was wrong.
That is, you seem to be echoing my point. But seem to be claiming it is different. What am I missing?
As I understand it, "tech debt" is something that has to be reckoned with at some point, or else you're going to have real problems in the future (just like refusing to pay off a money debt will generally cause you real problems at some point when the creditor sues you and gets a judgment). You can't just let it go on forever; eventually you need to "pay it down" (by cleaning up the codebase, migrating to newer technologies, etc.), or else catastrophe happens (the company is unable to compete and goes under). One common factor cited in these stories is that the code becomes too unmaintainable and unreliable: too many weird changes for customers pile up and introduce serious bugs which cause the product to not work properly.
This isn't like that at all. We can go on with our current household AC power systems indefinitely. Maybe we could get a 1% improvement by switching to DC systems (at an enormous cost because most of your appliances and devices won't work with it without adapters), I don't really know exactly how much better DC would be (not much really), but what we have now works fine. Furthermore, it's not like the whole electric grid system needs to be changed: it's entirely possible, for instance, to switch distribution systems to DC and leave household systems AC. Instead of distributing the power at 30-something kVAC in your neighborhood and using outdoor transformers to step it down to 240VAC for your house, it could be distributed in DC form, and those transformers replaced by modules which convert the 30-something kVDC to 240VAC. In the old days, this was hard and expensive to do, but with modern power electronics it's not. But even here, the question is: are the gains worth the expense? And the answer is very likely "no". (For reference, I'm not a power engineer, I just studied it in college as a small part of my EE curriculum.)
So this does not, to me, resemble "tech debt" at all. It's just a system that we use for legacy reasons and which is extremely reliable and works well, even though it might not be the absolute most efficient way to solve the problem. This is no different than many other engineered systems. Perhaps you have a decent and extremely reliable car. Could it be better? Sure: you could build the chassis out of carbon fiber, use forged aluminum wheels instead of cast, etc. all to save weight and improve fuel economy. Are you going to do that? Of course not, because the cost is astronomical. There's cars like that now, and they cost $1M+.
So for AC systems that we're talking about, the question is: what is wrong with them that we want to consider replacing them with something else, instead of just sticking with them even if they're not quite as efficient as they could be? Because the cost to upgrade them would be enormous, so you need to have a very good reason.
It is this second sense that I was latching on. It --tech debt-- will drive decisions today. But it is not clearly bad. Just a constraint on current decisions that was made in the past. Often for decent or really good reasons.
Bit rot is another term for things that start to decline in how well they work. That is generally different, though. Usually a by product of replacing implementations without keeping functionality. Such that people relying on old behavior are left cold. (I can see how tech debt can easily turn to bit rot. But it is not required.)
Consider, LaTeX being an old code base is often used to call it tech debt filled. People want to modernize it. Not because it doesn't work. But because they think there are better ways, now. And they do not consider all of the documents made on it as infrastructure.
Now, i concede that all of this is my wanting the terms to have unique and actionable meanings. Elsewhere I was told "tech debt" is a catch all term now. That seems to rob it off usefulness.
Edit:. I forgot to address the monetary aspect of the analogy. I like that, to an extent. But most debt is taken in very specific terms financially. Unlike colloqually termed debts between friends. That is, there is no interest in this metaphor that works. Nor is there a party you are borrowing from.
I'm not so sure about this. To me, "debt" is something that has to be paid eventually. Otherwise, why use the term "debt" at all?
So if something works fine, why waste your time and energy replacing it with something newer?
Usually, the reason for this is the assumption that sticking with something deprecated will eventually bite you in the ass: something you're depending on won't be supported, will have security holes that won't get fixed, etc., and you're going to wish you had fixed it earlier. So this is a valid use of the term "tech debt" IMO.
But if something is just something someone doesn't like, that isn't "tech debt" at all. I don't like .NET, but it's invalid for me to call all software written in .NET "tech debt". I don't like Apple's ecosystem, but it would be pretty ridiculous for me to call all iOS software and apps "tech debt" when many millions of people use and enjoy that software every day.
So, for your LaTeX example, I don't consider that tech debt at all; instead, it's just like iOS and .NET software to me. If someone doesn't like it, that's their problem; the fact that it isn't brand new isn't a problem for me and all the people who still happily use it.
So personally, I think anyone using the term "tech debt" to just refer to things they don't like is using it incorrectly and in a totally invalid way.
So, in this case, AC/DC fits if we agree there is a chance the "best overall" solution is DC. (Which, I fully grant, is not a given.) There is also a bit of playing loose with "short run."
Then, skip back to the top of this thread, where you will find: "products that are written badly by inadequate teams" and "case of unpleasantness" and "A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language..."
All of this is the first, most highly voted, post. The next post is a highlight of poorly engineered solutions.
My point? Find a case study that has the usage you are referring to here.
Now, certainly rhetorically it has this appeal to people. But I have never seen it used in a way that it fits the metaphor. Just used to hit the emotional strings of "you must pay back your debt!" While usually claiming that the design or lack of some technology is the debt.
"The best overall solution" is up for debate. It's the same with programming languages; one team will say that C is the best overall solution for a certain problem, another team will say it's Python, another team will say it's one of the .NET languages. I'm sure you can find plenty of engineers who will claim that mission-critical real-time avionics systems or automotive ABS controllers should be redesigned to use x86 CPUs and run Windows and have the code written in C# instead of using C/C++ and running on a small RTOS on an embedded microcontroller.
The implication I see with your Wikipedia definition is that implementing something easy in the short run instead of something that really is the best overall solution will eventually lead to more work to fix the shortcomings of the quick-n-easy solution. So, like I said before, a "debt", because it has to be paid back eventually (with work). The problem I see is that not everyone agrees on what is the best overall solution, and unlike a money debt that's easily seen by looking at a dollar figure, the only way to really know how much "tech debt" you have is through experience, i.e. accumulating it and then finding out over time how much work you have to expend to fix things when your quick-n-easy solutions start having real, demonstrable problems. If your solution has no actual, demonstrable problem (e.g., you use LaTeX and it continues working great year after year for your use-case), then I don't consider that to be "tech debt" at all, even if some people don't like it.
Alternatives may have advantages. However, often the advantages of where one is at are ignored in the debate.
My gripe in this debate is more from actual uses of the term. Not from any ideal use of it.
It'd be vastly more expensive to wire up an entire house for low voltage DC than it is to include the simple rectification components in every light bulb. In a house you're talking about many wire runs of many dozens of meters. This is not a good environment for low voltage DC at all.
Of course, the cynic (and, ironically optimist) in me still has this as evidence that "technical debt" is often used in BS circumstances by people that just don't fully understand the reasons for the things they are talking about. :)
 2009 - https://martinfowler.com/bliki/TechnicalDebtQuadrant.html
It sounds like your company seriously screwed up the design if you can't scale your web tier code horizontally. I've also never had a view technology take up a significant chunk of cpu resources - it's always the Java code carrying out the functionality. E.g. I would expect the largest factor in CPU usage in the list of search results to be... generating the data for the search result. If the largest factor was rendering the result, then something was probably seriously wrong.
The code was pretty sloppy, but didn't deviate much from standard Rails idioms. Not many people on the team understood Rails well enough to read it, but I did. Bug reports were constantly flooding in. I suggested taking a sprint to build up an integration test suite and then letting loose on the backlog.
We did build up a sufficient test suite in one sprint. But the bug reports never slowed. By the time we had the confidence to truly start tackling bugs at speed, the battle had been lost. We had been so busy writing tests that we forgot to manage the bug tracker. The impression was that we were overwhelmed and unable to make progress. The project was swiftly closed.
People remembered that codebase as an exemplar of sloppy code and technical debt, but that's not the lesson I took from it. I had seen, and others would see later, much worse. The lesson I took was that perceptions are as important to manage as results.
I still think Robustness principle is a croc and strictly controlling inputs is one key to happiness. It also, frankly, helps your users in the long run by giving them exactly what they want and it actually cuts down on the amount of thought they have to put into it. Chaos and disappointment do not make a good user experience.
I totally agree about Excel importing, but CSV is trivial, no? Here is an Erlang version I happened to write yesterday:
fun(Row) -> string:tokens(Row, [SepChar]) end,
EDIT2: Thanks for the interesting comments! Not so trivial after all!
Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.
Excel takes that, adds some fun things like people using color and formatting to store data, and things like Excel auto-corrupting values which look like dates and may not have been noticed before you do something with the data.
They handle all kinds of theory and technical stuff, like normalization and processing Excel-corrupted dates. But they also handle a lot of easy-but-agonizing tasks like regularizing single quotes into apostrophes, which crop as soon as you let humans enter free-form data.
I'm talking about the actual conversion from tabular data to relational. Most of the applications I've worked on had this in one form or another.
So you end up with users downloading an export of their data in CSV, editing it in Excel in various ways, and then reimporting it in the application.
Every company I worked for, this kind of feature was always in the top 3 in term of support load.
A "relationship" in an ER diagram maps to a "reference" in relational theory. This is part of the type safety/domain system of RDBMSs.
If these concepts are muddled, SQL will never quite make sense :)
Relational database can be expressed in tabular form, but tabular data is not necessarily relational.
> (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)
A relation is a system of one or more functions (in the mathematical sense) each of which has a domain that is a candidate key of the relation and a range that is the composite of the non-key attributes.
From the Wikipedia article on relational databases, subsection relational model.
"This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples."
"No, honest guys, I knew CSV was more complicated. I just didn't need to make my code safe."
Here's a csv parser in Erlang that actually attempts all that trivial stuff:
That's a lot more code than yours. And the notes even say it's not tolerant of badly formed CSVs.
Also, I did try to make clear that the given code was created 'for a specific use case in which I knew' that the format of the input files was tightly defined.
You can define a narrow subset or version of CSV that is trivial, but that doesn't reflect what one finds in the wild as "CSV", which was not systematically defined or described until well after many mutually incompatible things by that name.wdre well established.
"She said, \"Hello, world!\""
We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)
Thanks for your thoughts. As I have stated elsewhere, the code handles all of the cases I needed it to handle, due to the stability of the input file format (which was emitted from another program). I don't see that this should be too hard to believe.
I also said in my second edit, on the top line, 'Not so trivial after all!'. If I was putting on some kind of act, wouldn't that have been dropping it? Further, I noted in my first edit, before I had received any replies, that I 'know this version won't support escaped separator/newline characters', so I am not sure what you were trying to add with your example?
I think that my central point (and I totally accept that I didn't express this well) is that depending on the specifications of your program, the required CSV parser /can be/ very short. When one compares this to other data exchange formats, for example JSON, it is clear that the barrier to /entry/ is much lower. The shortest JSON parser I could find with a cursory look was 200 lines of C.
I totally appreciate that to write a CSV parser that works for all cases would be extremely longwinded. It has been interesting to hear other people's experiences and opinions about that. But the fact remains true that /in some cases/, depending on the requirements of the program, the parser can be very short.
> We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)
It is funny you should say this. I needed the CSV parser because I thought it would be fun and interesting to see if I could build an anti-malware tool in a week (I am taking a malware detection class at the moment, I wanted it done before the next lecture). I did not expect I would be able to have anything good working in that time, but by the early hours of the next morning I had a perfectly functional anti-malware tool. It can use ClamAV signatures (so it can detect everything(?) that ClamAV can), runs in parallel, has a nice text console with DSL, and is fast enough (processing 210k small files in ~5 minutes, checking against ~60k sigs). It is about 650 lines of Erlang (including comments). I am saying this not to boast(!), but to make the point that I greatly underestimated how productive I could be, beat my expectations by many fold, then people comment about my hubris online the next day. It is funny how life goes!
Every failed product/project I've worked on in my professional career, which had full intent to ship from the start, was killed by technical debt. It's usually indirect, but it's always the root cause.
It takes many forms:
* Too buggy to ship, due to a creaky old code base being over-stretched to a product with too high reliability/experience expectations.
* Product form factor, efficiency, user experience not good enough to sell well, due to spaghetti code base which couldn't be whittled down to removable pieces. Result: large runtime, more expensive, less efficient hardware.
* Existing old codebase deemed too bad to ship a product, requiring a rewrite-from-scratch, but timescale too long to make any sense -> product killed.
It's difficult to elaborate more while maintaining some discretion about exact companies and projects. The general point is: technical debt isn't just some fuzzy intangible issue — it indirectly creates enormous costs in people and time, can affect the physical form products take on, and impact the user experience. Products always get started without taking this debt into account, but when it's finally realized, it can change basic features, and then it kills them.
Products are designed with faulty assumptions about what existing resources can be applied to them.
I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?
I've been mostly in consumer electronics related companies, where a product which ships and then becomes too hard to maintain usually doesn't "fail". It just gets phased out. In a way, this is another way technical debt has an indirect, but large impact on products: obsolescence becomes a necessity. Not so much planned — which implies malice — as simply realizing it's not possible to maintain indefinitely.
> I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?
Usually very quickly, or after far too long.
The better projects know ahead of time that there are Dragons lurking in the code base. But that's effectively saying there are projects which never even got past brainstorming because we knew the technical debt was too high.
On the other hand, there are projects where it only becomes apparent how much debt there is after a lot has already been invested. It's like you'd expect, e.g "There's a performance problem because of a basic primitive this library uses everywhere. And that was originally a workaround for a compiler performance bug. We could fix the compiler bug, but it turns out other libraries relied on it..." and so on. Extra time-to-market makes a product make less and less sense — fashions change, hardware improves, new tech arrives — and so it gets killed. Or worse, shipped.
The class that is no longer appropriate for new requirements gets canned for a better abstraction etc.
In aggregate, over time, you may kill the product to avoid technical debt!
That's the most concrete reason I can come up with why the technical debt will kill them, but there's plenty of vaguer reasons why it's been killing them for the past 5 years and will finish them off over the next 5. The attrition rate have been around 20% a year since I joined. For most of the time I worked there they compensated somewhat by hiring new people. Word has gotten around though, and they've run out of qualified candidates willing to work on their mess. Hell, we even had a couple of gifted hires leave after a month or two while shaking their heads.
My current workplaces main product is using the same tech, is the same size (loc) and has the same functionality of the other company, but serving a different market. They did the oracle to postgres migration in 2 months. 2 MAN months, one guy.
New workplace: 15ish developers, serving the same amount of customers, doing similar revenue, making stable releases every week
Old workplace: 80 developers at its peak, doing non-hotfix releases around every 3 months. Just a mess in every way. Mostly stemmed from the codebase and the architectural choices that had been made along the way.
Yeah, once you get that deeply entrenched in Oracle, it's almost impossible to get away, and after that experience I vowed never to work at another Oracle shop.
I wasn't directly involved in but had a good view of our university's finance modernisation woes: http://news.bbc.co.uk/1/hi/education/1634558.stm https://www.admin.cam.ac.uk/reporter/2001-02/weekly/5861/1.h... - although in fairness the inflexibility and disorganisation were existing features of the institution, and Oracle merely exacerbated them.
Ideally you'd have some kind of plan though from the start, for which other cloud provider you would use and how the services would map, in case using AWS becomes untenable.
Doing it after the fact in a politics-heavy organization is confounded by not just the technical difficulty of the task, but the glad-handing and perception management that has to happen to keep your team from getting fired during the process.
I am currently working in a business where there is a nearly 8-year old Rails app (600+ models, 250+ controllers, 400+ libraries, LOC around 60k), that sits at the heart of everything we do.
The company is struggling to grow and believes the cause is that engineering is slow. We have asked to refactor this code base multiple times, and point to the technical debt as the cause features that should take a day to implement taking between 3-4 weeks, typically.
It is only recently that the penny has finally dropped and they've realised if they don't invest in replacing this thing (there is too much technical debt to fix, we're calling bankruptcy and moving to a brand new architecture piecemeal), the business is likely to fail within 1-2 years.
That means my current employer is likely to go bust because of technical debt within 2 years max unless we become really good at fixing this.
We are optimistic.
We have to be, right?
Basically, for a long time, the company never really re-evaluated what it had learned and spent time trimming things down, so as a result there is this ungodly mess. At the heart of what the business does, there is no real need for more than a dozen models. So why do we have so many more? Nobody ever refactored away stuff we didn't need any more, and so weird things happen.
There is also a coupling issue that is endemic to all monoliths. We're moving to a micro-service architecture with clean domain separation, and we'll probably go to 1/10th of the code base in LOC terms within 12 months, even if we move some of that functionality into Go, Java or Python services (all options).
I work on such type of codebase, but we have a fully covering testing suite, so applying changes is not a problem (interestingly, I've just realized that the line count of the testing code is 50%+ more than the base application code itself).
So ultimately I think company culture (that is, emphasis on automated testing, for dynamically typed languages) is the crucial factor.
And that wasn't a stressful place to work with insane deadlines - it was fairly relaxed for the most part.
A 60kloc C++ project is small and easily manageable, a 60kloc Ruby hairball can drive a person insane.
If, for example, they're in banking and finance, and those LOC deal with fine details of tax code... Oh boy.
If anything, we've gone too fast and not spent enough time going back and understanding what we really need to keep.
This could happen if you have a lot of dependencies, switched compiler versions but left the binaries "in place" and deployed changes incrementally.
In short, my predecessor had attempted a move to SOA without understanding dependencies, circuit breaking and failure modes. This would then cause scenarios where the entire front-end would fail to render on a single down-stream service taking a little longer than necessary.
When identifying how to stop that happening, I discovered a large number of comments tagged "TODO" with statements like "Refactor this when we have time" or "We need to find a way to do this better".
Further down on the downstream services there were rather esoteric SQL queries doing large joins that nobody had done a query plan on. It was hard to identify these because the ORM had been trusted to do magic, and it was happy to do so, but there was a point where it was not apparent _why_ these joins were happening, but when you found the code, there were more comments "This needs improving", "We should refactor this", etc.
We were able to get something back quite quickly with liberal application of indexes, and it took us a day or two to refactor the queries enough to mean response times came down, but the error rate was still > 20%, and it was random, so 1-in-5 page loads of the front end service would fail.
We refactored the code to circuit break and handle degraded services better, but that took a few days, and then we started working down to the back end service and figuring out the final steps.
It was a small team looking after legacy code that everybody knew was a bit messy.
A few weeks before this code was shuttered, I heard from a friend that some of our content did not render at all on certain Android devices. I identified the cause as a half-finished refactor (again, my predecessor), that had never been finished because he had been pushed to work on something else. This caused a dramatic decline within a key market segment that resulted in declining ad revenue, subscriptions and overall viability of the business.
Basically, when you start something, finish it. If you find yourself putting in comments like "We should refactor this" anywhere in your code base, and you're doing so because the business is pushing you to work on new features, you have a massive problem culturally that is going to cause a rise in technical debt that raises risk to revenue.
All technical debt ultimately will lead to problems that the business will see on balance sheets, but they will rarely successfully identify the cause as being technical debt because they can't see, understand or rationalise it. They think it's engineers being grumpy idealists.
People play too fast and loose with the concept of "MVP" for my tastes, and it's a problem I see over and over again. The risk of that is, long-term, it will cause business failure.
The original codebase was about 20 years old. It was control code for something best described as an industrial robot. Written for the last 20 years by greybeards who knew a lot about the manufacturing process, and were reasonably good at getting a product out the door.
But the whole thing was riddled with #ifdefs for this customer or that, or one batch of machines or another. All long forgotten, written by people who had since left, or been pensioned.
It was in dire need of improvement and extension, but it would have been superhuman to inject new features into this rat's nest. Plus their electronics supplier was discontinuing the control electronics the system was designed for. The UI also looked like it had been designed by German engineers in the 1980s. Which was the case.
So they made the defensible decision to start from scratch. A team of engineers was to develop an brand new machine, with all new electronics and all new code. They got to work -- and had to scrap the new software about three years in. It was just utterly misdesigned, and riddled with bugs.
It featured wonderful WTFs like the embedded realtime code depending on the Qt libraries.
I observed its instability myself: it would just spontaneously crash every five minutes, sometimes just while idling. Once the project lead was on holiday, the programmers revolted, went to the head of the company, and the project lead found himself without a project on his return. Whee.
Now we've started from scratch again, and have at least succeeded in making different mistakes this time around. Fingers crossed, this might end up working.
Should I ever inherit an #ifdef mess again, I intend to replace #ifdefs with Strategy patterns.
#1 figure out all the known defs in actual use
#2 rerun the preprocessor with each variant (combo)
#3 capture the output(s)
#4 aggressively apply the Strategy pattern, refactor code
Last time, I removed dead code piecemeal manually. It sucked.
Not to say I haven't seen this effect myself many times...
Management ordered the creation of new software. Shouldn't that be enough?
The project lead was responsible for this design, and above him there was nobody with any expertise in the matter.
From what I've heard he's an extremely good C++ programmer. He's just a terrible architect.
If you're trading automatically, you'll need a very, very solid deployment and audit process, even if you're just a small company. The reason banks are so slow in deploying software is because most of them lost a few millions at some point due to some bug.
Startups that think they can act faster than banks just haven't had that bug yet. That's also why I'm rather negative on the whole Fintech scene at the moment.
In the enterprise, this is called "mature" and is a sign of great sophistication.
My first ops job back in 2008 was at a large exchange's NOC where we shut down and clean the application environment every day. Every Friday, we would have to take a backup of the ~20 or so production databases - by hand, in an ancient CDE based UI with a . Right click -> menu -> submenu -> backup database. Very little room for error, and you weren't allowed to do it without somebody else watching you. Throughout the weekend, customers would then run tests against the production databases. Once testing was done, we'd restore the prod databases back to their original state to wipe out test data.
At one point, I asked my boss if it was alright if I automated it after showing him a POC and was rejected because, "We don't trust automation to do it accurately every single time." Mind boggling. In mild fairness, in the 15 or so years they were doing that, I don't think anyone did it wrong.. which is an enormous miracle in itself.
(That was a strange company. My boss was a JW who'd worked there for 30 years regularly tried to convert me and would spend four hours a day on spreadsheets for his church. We'd also manually kick off stock split processing from a ~10" CRT monitor from the early nineties.)
"The consequences of the failures were substantial. For the 212 incoming parent
orders that were processed by the defective Power Peg code, SMARS sent millions of child orders,
resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately
45 minutes. Knight inadvertently assumed an approximately $3.5 billion net long position in 80
stocks and an approximately $3.15 billion net short position in 74 stocks. Ultimately, Knight
realized a $460 million loss on these positions. "
"The new RLP code also repurposed a flag".
I've never seen a flag repurposed without catastrophic effects.
Was the issue technical debt or a sloppy deployment?
I've seen this play out probably close to a dozen times now, at different employers and consulting clients.
For a company that makes software as a product, or to directly support or create their main product, not being able to add new features is a really bad place to be.
About 1986 I was tasked with moving a small block (a few KB) of data very quickly from cabinet A to B, with the racks full of custom electronics - no PCs, all original stuff on a flight sim with 386 Intel processors all over the place. The racks had Multibus backplanes.
I suggested a 'TAXI' fast optical link (oooh - optical..too radical) or a pair of Intel 589 (Ethernet) cards for an off-the-shelf solution. Nope, too expensive. Engineering Management suggested a twisted pair ribbon cable between the two adjacent racks - um, OK..
Long story short - me and the senior design engineer decided to use the Intel 8257 DMA controller chip to grab the bus and blast the data between the RAM on two cards.
After a short period of fails, we found that the engineers who designed our 386 cards did not bi-directional buffer the DMA request line onto the backplane as they never expected any other card except the master CPU ones to initiate a DMA, so the CPU cards could not see the line being toggled from elsewhere.
Engineers would not accept a change request for 'reasons'
Intel 589 cards is it then!
All because someone chose to omit one tristate buffer.
I am a big fan of constant refactoring on a small scale but I am very skeptical of large refactoring of a whole project. You may end up with something that's just different but not really better.
I'm not sure what the differentiator is. I'd be curious if others have ideas. I think part of it is that in both cases it was a small team, who caught the issues early enough that it hadn't gotten too bad yet, but late enough that the right direction to move in was clear.
I'm okay with the occasional week-long rewrite of a subsystem, but usually only after I've spent some time coming to grips with exactly why the old one is terrible and have a firm grip of exactly how the new one will be better.
I do remember a competitor dying of not releasing their big refactored next version soon enough, and running out of cash.
Spolski tells it better than me:
I worked on a 300k LOC business basic application at one point.
The big question everyone was asking is how do you move to something else? Everyone wanted something else, they started writing new services on top of the old system, they had some ideas on where to go, but it just didn't seem like a gradual rewrite was possible.
And to be honest, a Greenfield rewrite just wouldn't work work for something this size with the resources they had. So it stayed in business basic.
You can learn a lot of lessons from Netscape, but this isn't one of them. Servo is a great example of how a rewrite should / can work. Mozilla hasn't devoted 100% of resources to Servo, but instead is letting servo build all on its own, and someday unclearly defined in the future, the two could merge. (but might not!) It's a separate product, and nobody is pinning all their hopes and dreams on it.
The successor of Netscape Communicator was Mozilla (IIRC it was just called that, later renamed Mozilla SeaMonkey), and the successor of Netscape Navigator was Mozilla Phoenix (later renamed Mozilla Firebird and eventually Mozilla Firefox). Firefox and Thunderbird were once again separate clients.
Mozilla was still considered bloated, but Phoenix was far less bloated which is nice on lower RAM machines, and allowed the start of Web 2.0. It was also the return of doing one thing and doing it right: browsing the WWW. As Netscape Communicator (unlike its predecessor, Netscape Navigator) came with a Usenet client and e-mail client.
Later in development, addons became a thing, and you could add features which were previously part of Netscape Communicator such as calendar, HTML editor, etc. You can also add such features with addons to Mozilla Thunderbird.
Then Google Chrome happened, and people switched to that, but I'm not entirely sure why.
So when it comes to it, the most likely outcome will be a kind of "my grandfather's axe" scenario where over time parts of Servo replace Gecko within Firefox until Servo has completely replaced Gecko.
It's a huge, audacious, hairy project, which might happen if a startup said "OK let's rewrite everything from scratch!"
What Firefox devs did was to take the browser part, make it stand alone, and replace much of the XUL UI with native widgets (GTK on _nix).
* In the early 2000s, they added support for Windows NT to the product. Unfortunately, they did this with an MPE compatibility layer that means the entire thing still thinks it's running on an HP 3000, so controlling it programatically means writing MPE job streams.
* It was originally written to store data in COBOL records. When they added support for SQL databases, they apparently just copy-pasted the schema verbatim from the COBOL copybook format. This means the database has no foreign keys, FLAGS columns all over the place (including tables where you have to JOIN ON SUBSTRING), and, most egregiously, a table with ITEMNO_001, ITEMNO_002, ITEMNO_003, PRICE_001, PRICE_002, PRICE_003 and so on, which has to be queried three times and UNIONed to get the data out.
* Printing packing lists requires not only a specific model of printer, but also an extra several-hundred-dollar chip to be installed in that printer. I'm told that this chip's sole function is to enable barcode printing.
I have no insight into what goes on inside the company that makes this thing, but it certainly looks to me like they have a severe case of technical debt. Any bug fixes generally take 4-6 weeks in the best case scenario, and frequently either don't fix the bug or introduce new ones instead. Their only customers are the ones that have been using the system for so long that they're stuck with the system, and can't switch--in fact, many of them are still running HP 3000 systems, which HP has been trying to end-of-life since at least 2006.
The end result of this is that the product is dying a slow, agonizing death of attrition. I think the only reason it still exists at all is because the company that makes it is stuck with support contracts that haven't expired yet.
I built this extranet app for a Fortune-class / NYSE company in 2001. They were a Lotus Domino shop so for that and various other reasons the extranet was deployed in Domino. The initial rollout was considered quite successful, but it was definitely "v1" code, and I'm being really generous with the code quality. Plus, Domino.
The application was considered a stopgap until the shop had become fully Microsoft-centric, at which point it was expected to be migrated to .NET. That was expected to be in ~5 years.
The result was that no investment was made into the app for over fifteen years. Every now an then an enhancement would be needed, and a contractor would be called up to bolt on a feature in shockingly slipshod manner (this app is much too complex for the average Domino dev). But no technical debt was ever cleaned up, because "meh, we're going to replace that app by 2007."
2007 was 10 years ago. In the meantime two projects to replace the app were spun up and killed. The app is finally being retired this year. I was called up at the 11th hour to jump back in (15 years later) to help support the thing through the conversion, as the one existing Domino dev they had on staff finally (wisely) jumped ship.
I cannot even begin to describe the state of this app.... it's a case study in "how to not manage IT."
Another recent client was a content-creation shop (think glossy magazines). Their outgoing sr dev had deployed a CMS that nobody had (or has) ever heard of. This CMS was originally developed during the glory days of XML. Believe it or not, the app worked by loading all of the CMS content into a single in-memory XML document. This was probably OK for a brochure site, but this was a site with hundreds of thousands of pages of content. As a result the application required a server with 64GB of RAM just to launch. Also - launching the app took about ten minutes after the server OS was loaded. And there was no server farm, just the one server. If the app was ever stopped, it would stay down for at minimum 10 minutes.
I came in to fill in temporarily and to try to find someone to staff the position permanently. Even with a competitive salary, nobody qualified wanted the job.
Meanwhile, the same company also had a set of blogs that they managed in WordPress....
They were already using WordPress for blogging. A custom WordPress implementation would have easily solved their CMS problems and devs are trivial to find.
The point was that the thing had just been rolled out the prior year. There was no budget or appetite for throwing the thing away. It did work. So there it stands, aside some dozen WordPress sites...
my product was http://www.teamkpi.com/
I hired 3 mid-level PHP and jsp developers in Thailand and had them make the website + reporting page.
total nightmare. Don't hire developers and assume that they will rise to the occasion (learn new tricks). I gave them as much time as they needed to research and make sound engineering decisions, I ended up with a spaghetti nightmare Frankenstein mix of server side scripts mixed with client side script mixed with server side that generates client side script.
In Thailand at least, you always need a manager to force architecture and design decisions, and force devs to refactor poorly thought out solutions.
I was naive and thought that I could have a team of 3 figure out the web part while I write the desktop client and provide PM-level guidance.
The problem was two-fold:
1. The relevant tools (Unity3D) were extremely immature and the problem was quite diffuse. No profiler, poor quality of generated code, tiny caches, etc.
2. A problem in string-handling code that was quite diffuse throughout the game. As near as I can tell, it was blowing out the tiny CPU cache hundreds of times per frame.
On desktop, this code was a complete non-issue. On the puny little ARM on the iPhone? It was the difference between having dozens of towers and 50 enemies in play vs half a dozen towers and less than a dozen enemies in play. The impact on game dynamics, and need to re-balance everything by itself would add weeks to the shipping schedule.
There were plenty of other things that needed to be scaled WAY back of course: Switching from 3D to 2D to get vertex count and draw call count down. Completely rebuilding the entire UI. Revamping the pathfinding and suffix caching to not play havoc with the CPU cache. Moving from a 24x24 grid to a 12x12 grid. All of that combined helped a LOT, but not nearly enough.
The string manipulation was for a hierarchical property system that let me parameterize all sorts of attributes for enemies/spells/towers/projectiles in a set of text files. Ultimately, I had over-engineered on the assumption that I would be tweaking many more things -- with much greater frequency -- than I wound up actually tweaking.
Had I ripped most of it out and just had local properties on each prefab that I assigned manually, I might've hit that market opportunity. Finding that that was the cause was a multi-month project because of how interwoven it was with everything else. Hell, it would've been fine, had a not over-generalized it into a shared component on each prefab that the other components inquired with to get property values. But I did. And it took me vastly too long to identify it was the major problem it was.
Opportunity missed, and that was the final nail in the coffin for my fledgling game studio.
Technical debt destroyed the team.
As someone else pointed out - technological debt is not a cause per se; it's an indication of some deeper problem - usually of human, not technological, nature.
So any business plan that includes the steps "A miracle occurs" and then "We get bought out" is probably going to suffer that fate?
One application was a web application built in C++ in the 90's. It didn't have the STL, it implemented everything from XML parsing to PDF rendering from scratch. It stored all data in XML files on the file system. It was a single-threaded CGI application. And it was the core product of the small business that created it.
There was no series-B/C/D/E that was going to appear so we could hire more developers and re-write everything or develop a new, superior product, etc. This is where I learned how to maintain and extend legacy software. I spent hours pouring over Michael Feathers' book. We did manage to extend and breath new life into the system. We wrapped the old code in Python, wrote a tonne of integration and unit tests on every change, wrote some code to sync data to a database alongside the XML file storage scheme it used. We even got to a place where we started replacing code paths from the Python API with functionally-equivalent (as far as our test suite was concerned) code written in nice, clean Python (and gained some features along the way thanks to Python's nice libraries!).
We kept the lights on without having to spend too much time hacking on undocumented, untested C++ code and without trying to just re-write everything. It was much more difficult to make progress than a typical greenfield project in a dynamic language but that would've cost more upfront without a clear payoff... so we did what we had to do.
Another company? Well they decided to use a document-based data storage system as the source of truth in a hot-new micro services architecture that was going to save everything... only there was no schema validation and their use cases were killing performance in some scenarios. Random breakages cause by changes at a distance. It hasn't killed their business but it has limited their options.
"Working Effectively with Legacy Code"
A rewrite was started, but never got anywhere. The company folded under the weight of its massive salary costs.
We're slowly killing (i.e. no big new developments, but only maintenance for existing customers) and abandoning it. And luckily we're not rewriting it. :-)
Though they only really succeeded on the shopping part. They didn't ever get to a credible booking engine that anyone would buy. Which may point to something other than tech debt being the biggest barrier to modernizing an airline reservation system.
Edit: And, worth mentioning that your competitors wouldn't have had to be better than, or even as good as QPX. "Good enough" would have squashed several big sales, since shopping was typically bundled in with what their customers already paid.
A booking engine (CRS/GDS) would be used by either airlines or a reservations system (Amadeus, Sabre, etc). That's the piece they didn't deliver on.
Edit: Reference to the announcement of abandoning the booking space: https://skift.com/2013/05/15/google-and-ita-software-abandon...
"This is indeed a bitter pill for ITA Software’s founders to swallow as they put years and millions of dollars into their dream to transform the nuts and bolts of the way airline reservations systems...are handled"
I don't think technical debt alone will kill you. But it may render you unable to cope with another problem, which will then kill you.