Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Have you ever worked on a product that was killed by technical debt?
306 points by nicostouch on Jan 17, 2017 | hide | past | favorite | 324 comments



There are multiple cases of "killed by technical debt".

There's the case of mysterious and unsolvable breakage. The product simply stops working, and the team is unable to get it working again, period. This can happen with really ancient legacy products where the original team is gone, or young products that are written badly by inadequate teams.

There's the case of unpleasantness. A product is so difficult and slow to work on that the company simply loses interest in it, and shuts it down rather than suffering through more maintenance. This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.

There's real antiquation. The product is dependent on a product of an outside vendor that is no longer available/maintained. I've dealt with this on a mainframe replacement, and it was horrible. I've also dealt with this in Java, and it was plenty painful there too.

And finally, there's replacement. A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language, and by the gods, this time it's not going to be stupid and suck like that piece of crap the morons on the old team built! Most of these projects fail before they ever replace the old, working code, so I'm not sure this counts as technical debt failure.


> This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.

One thing often feeds on the other. Because the system is hard to change, it does not get necessary features. Because it does not have necessary features, it provides less business value. Because it provides less business value, there is less of a budget for improving it. And so on.


Bingo.


Some of the worst examples of that are when a project uses a custom build of a library of which the source no longer exists and no record of the changes exists either.


Reintegrating future changes in the upstream is also made nearly impossible as a result; our tech lead made a change to numpy a few years ago, didn't manage to get it accepted by the project, and we're stuck with this version until the sun burns out.

If there are changes in future numpy versions we want, it's up to us to backport them, which is nowhere near our core business.

There's a lot to be said for standardization and 'boring.'


Well you could estimate the impact of backing out of the changes on the application side, with the upside being continued savings in operational complexity. Or, you could address the operational complexity with processes or tooling - which would be easiest with something like a developer OS image.


Yep, seen that. You might be thinking of tweaked builds of open-source components, but before package managers were so common, I've seen also internal projects with a `/lib` folder full of artefacts like `MiscDbUtils.dll` which are internal "useful" utility functions that are widely used.

Now add in a script that updates this artefact with the latest version, breaking changes, and it all goes wrong and it's hard to find the correct previous version of the binary artefact to build your code any more. Especially if the build has been broken for a while due to the project being on the back-burner and it quietly dies when no-one is looking.


Builds that involve non-version-controlled files that exist only on a certain developer's machine, and because that developer is a control freak (or overworked), he refuses to automate or put those files in version control...


I believe this is called "Job security." I've seen it a few times including a dev deleting the source code repo and substituting all copy of a set of scripts he was responsible for with compiled binaries. This was discovered due to a platform incompatibility between one of the hosts running the script and the binary wrapping. Data was then restored from backups and dev was summarily let go.


I should add here that the "antiquation" case is the one that has caused the most observed grief in my career. The forces causing failure are coming from outside the code/business (dependence on an outside vendor), and sometimes collide with forward momentum of other parts of the code (i.e. that graph library will never, ever work with Java 7, to name an example). These become life-or-death situations, and the tendrils of the product dependency are often deeply integrated. It might be easier to rewrite than to fix.

Also, this case can impact not just products, but organizations. You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work. I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)


>You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work.

Sounds like Rational ClearCase.


Yeah, that. But even things like cvs, that were modern and hip in 1998, are still floating around 20 years later.

I actually thought Subversion would be the last version control system, when it came out. Of course, now it's git. Maybe someday we'll get something better, and git will look decrepit.


I have little doubt that git will be replaced eventually (or perhaps severely modified). It seems like a fad to me. It is indeed very powerful, but its UI is sheer insanity. At least SVN is very straightforward to use and understand. It just doesn't offer the distributed nature that git does, as it relies on a centralized server.


The cool thing about Git is that the underpinnings ultimately boil down to a key-value data store.

I'm not an expert on these innerworkings, but in theory there's nothing stopping someone from creating a new UI that maintains most or all of the same strengths, except that everyone already uses and is used to the current way.

I suspect if you came out with "SuperVCS" that was ultimately just a new UI on Git you'd have more success than releasing the exact same project as some kind of Git enhancement.


Isn't that basically Gitless? (http://gitless.com/)


What UI? Using git on the command line is exactly the same as using SVN on the command line. At least for basic, every day things like status, add, commit etc.


> I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)

In the cases of this I've seen, it's always been because management and team priorities were unaligned.

Management in those places cared about a minimum level of productivity and minimizing risk.

Teams cared about maximizing productivity and their work days not sucking.

As long as teams kept managing to soldier through... rarely saw things change in those shops.


Yep.

mysterious and unsolvable breakage: Helping another startup work through one now. It's a case of reclaiming functionality from a mystery outsourced codebase (without source control) meets inexperienced developers who try their hand at sysadmin plus a 100% rotated bevy of actors (the whole team, PM and all, have jumped ship), no documentation and no technical oversight. Offshore outsourcing adds cultural fun.

unpleasantness: I would expand this to unpleasant or incomprehensible. I have seen projects be de-resourced because of lack of management comprehension when they literally paved the best and most rapid path to profit (later taken successfully by the now-dominant competition).

antiquation: The best example of this I've seen was a hardware product an employer was developing as a joint venture in Taiwan early in my career. Engineers had made the decision to use a sucky chipset from a struggling company to save money, but the supplier went under and the API froze (bugs, missing functionality and all) before our product development could complete. The target feature set was literally impossible to implement on the hardware and nobody wanted ownership. Many millions of USD, wasted.

replacement: It can work out, just infrequently. Generally when it works it's a smaller system with well defined interfaces.


Would these be cases where Robert Martin's 'The Clean Architecture' [1] would help, where the core enterprise logic is separated from third party dependencies, making the latter easy to swap out and replace?

I'd imagine a number of these cases are caused by a heavy reliance on third party technologies that are no longer supported, or very few people still understand.

[1] https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-arc...


It's a great idea but in reality most third-party tools are going to work slightly differently and the abstractions will leak (unless they were developed against an existing interface, in which case you don't need to create the wrappers yourself anyway).


I think the optimal route is to not bother with (extra) abstractions and interfaces, but try to avoid using things that is unreasonably tied to vendor - unless you save a lot of time.

If the code base is not a pile of dung anyway, the cost of find/replacing and refactoring obsolete or replaced api:s once is so much smaller than the running costs of maintaining an extra layer of leaky abstractions for many years.

It is guaranteed that the abstraction will not work without a lot of changes anyway, and what typically takes the most time is the regression testing.



I've learned over time that its always better to assume that who came before me were smarter than me and knew more than me (I'm rarely proved wrong).


"They did it that way because they where stupid" is a ridiculously common assumption, when the correct answer is often "They did it that way because they knew stuff I don't know".


Well, I think one has to be alert to both possibilities.

This is perhaps a subtle and underappreciated reason that code quality is so important. If you're looking at an obviously well-written piece of code and you see something you don't understand, you can figure it's probably there for a good reason. If the code has visible sloppiness, it's much more difficult to tease apart the good parts from the bad.


There's obviously a lot of stuff I don't know then. Like the benefits of copy pasted code, or 300 column lines, or implementing the logic in 20 places when it has existed in the standard library for a decade. If only the ancient sage I inherited this code base from had left notes to guide me on this path of wisdom.


Copy-pasted code: Data redundancy

300 column lines: Support for management buying everyone nice, new, giant monitors.

Logic in 20 places: But what if we want subtle differences between each implementation?

On a more serious note, a product that I've worked on was started in about 1998 in C++. We support something like 15 different platforms, and we've got our own implementations of things like vectors because we needed a least common denominator codebase; the standard libraries of a lot of platforms didn't provide what we needed, or provided implementations that were incompatible with other platforms. By the time everything we needed to support was modern enough (in about 2010), the system had a few million lines of code, and replacing things with library functions/classes would've been a nightmare. New development is saner, but the legacy stuff is entrenched.


Libreoffice managed to get rid of their legacy containers and moved to the STL, so maybe you can do it too

https://people.gnome.org/~michael/data/2011-05-12-libre.odp


Development of that particular product moved to China and India last year, so I don't have a part in its development anymore, just build+release, because there are some legal benefits to releasing it from this country.

On the plus side, there are only a few platforms that they still have to support gcc 3.x on, and all the ones that ran on 2.x are out of support (until a customer holds a few million dollars in management's face, as happened a few weeks ago with AIX 5.1).


I've been there a few times, and the worst thing is when every five or ten WTFs there's something that seems as ill thought as the previous couple similar blocks, except this time it actually makes sense, as it implements (awkwardly of course) some important corner case.


See, taking a reasonable generalization and interpreting it as an absolute statement is a classic case of "assuming they're stupid".


I don't assume people are stupid. I assume they don't know or care how to write code well. Big difference ;)


Well, the correct answer is often also "They did it that way because it was a reasonable choice then, and they didn't have the benefit of hindsight that I have now."


Its also possible that whatever constraint they were working around no longer exists. Either way, it's a case of not tearing down a fence before you know why it was put up.

G.K. Chesterton, 1929:

>In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.


In my experience Sturgeon's Law usually overrules Chesterton's fence.


"They did it that way because they knew stuff I don't know".

Or "They did what they did because they were the first one's to do it, and it only looks whack in hindsight"

Or "They did what they did because they were living within totally different constraints - like having to support old crap browsers like ie9, or a 'lowest common denominator' of slow end-user PCs etc., or some old chipset on the firmware code they wrote, or some old language paradigm, old libraries etc."


Often it's the opposite.

They did something in an horrible way knowing it was horrible but because they have been asked to deliver a feature as soon as possible and at any cost.

This is a slippery slope. It lets a company move faster till it reaches the point that the software becomes an unmaintainable pile of hacks.


I worked for a company that grew considerably for 10 years and then lost its biggest client and folded quickly. We had spent a few years reworking our platform in a way that might have been successful enough to weather the storm of losing that client, but technical debt really slowed us down.

Technical debt may not have killed the company directly, but we have to wonder how we might have done if we could have spent more of our time on new development.


This is revenue diversification not technical debt. Companies with a single customer funding the business should be actively pursuing a high priority strategy to reduce this risk.


You misunderstand - our software itself was full of technical debt. We spent a lot of time dealing with the consequences of that debt, and I'm wondering if we'd have been able to hit vital targets sooner without it and possibly have survived.


It was both. The combination of two failures put them beyond the point of recovery.


Heh. I have a rule of thumb: any project that starts with the words "this time we're doing it properly" is screwed.


Yes and no.

The project wasn't killed specifically because "you have technical debt". It was killed because there was no way for anyone to be effective with the combination of poor undocumented code.

"We need to change the email message that goes out when someone registers". This took a team of (4?) people 5 calendar days to change. As a contractor, I had to vpn in to one system, then remote desktop over another vpn to another system. Building web apps, these dev systems were not allowed to talk to the internet at all, so things like pulling external dependencies (security libraries, templating libraries, etc) was impossible - pretty much everything was handrolled, largely due to this restriction.

The last big killer was that the system was not passing accessibility audits. Trying to determine where to make a change to any single element would take minutes to hours, vs seconds to minutes you'd normally expect. Much of the 'templates' used were the result of a SQL statement joining 12 tables (html_meta, html_form, html_link, html_grid, etc) and complex concat()s, so adding a page or making a change might take an hour to track down the appropriate collection of tables, then figure out a SQL script to run, then send it to the person who had permissions to make updates to the SQL, then wait and see.

Did the technical debt itself kill the project? Technically no, but the inability to do anything productive in a reasonable amount of time forced the project to shut down.


This is a great example of how technical debt 'kills'. It's not a murder, it's negligence and a slow demise.

I went through one of these projects. The tech debt was never as bad as you describe, but it was a small company operating on a short runway. It also taught me an unfortunate lesson about non-technical founders and the dangers of outsourced code.

The MVP for the company had been bought off the shelf. It worked fine, but the code was abstruse and utterly resistant to change. As the price (in time and dollars) of change requests grew, they sensibly in-housed development. Unfortunately, their clients had some idea what to expect in terms of features per day and dollar. Requests like "let us use our logo and custom color scheme" turned out to be serious challenges since every color and style decision was clumsily hardcoded, so we took far too long to achieve them.

Ultimately, we ended up a contract behind - bringing in business to fund delivering on the previous request. Most startups operate under the gun like that (with either fundraising or contracts), but they start there and labor to escape. We started solvent, and had no clear plan to break out of tech debt - a rebuild would have been too slow, 'working smarter' wasn't viable, and expanding the tech team would have come too late and too costly.

So, we died. Not because we couldn't do work, but because we couldn't do it at a competitive speed.


Seen this a lot. A lot of companies think they are "product" companies, but due to their unwillingness to push back on customers, they become custom engineering shops, bolting on little one-off mods to their project over and over to appease bad customers (or to appease POTENTIAL customers who haven't even bought the product yet).

Stop me when you recognize this one: "Hey your product is great, but we really want something that does [totally different thing]. If you just add that thing, we will pay for all the NRE and you can sell it to others as part of your product! Win win!" Advice to junior developers: If you hear such talk in the hallway, RUN!


We're an enterprise software shop, which necessarily means we do a lot of custom work, but we're careful to consider what we'll do. My mentor is an old hand that been through multiple exits, and in every meeting we have, he hammers this point. You're either a product shop, or a professional services shop, and if you don't know which one you are (or you believe wrongly) you die. Simple as that. The deeper you get into the consequences of knowing (or even forcing) which you are, the more implications it has for everything from product design to business strategy, and it's extraordinary how such a simple seeming thing effects such a vast amount of the company.


Companies that think they are a product shop but chase enterprise customers and do professional services often fall to appropriately charge enough for their services. Enterprise level customers require not only more features, more guarantees, and more support, then require more attention. Are you appropriately including sales time and expense chasing them to get a contract as well as support resources into your CAC? Are you appropriately accounting for all the added expenses (and future expenses including lost opportunities)? If not, you're probably losing your tail.

P.S. "Are you" is not directed to the OP but to the business owners/leaders that don't know what they are doing.


Yes, this. One of the big things I talk about with sales is the difference between changing a priority for a customer vs. adding distinct new things. As a recent example, a client wants better and faster feedback on the trial they're conducting (we're in the med-tech space), and we've already got a new dashboard designed and on our product roadmap. I'm more than happy to prioritize that over other product pieces if it'll get us the contract, because we're already going to do it, we're only changing the 'when'.

On the other hand, when they ask for something off the roadmap, we get into more complex issues (is this market-demand data, or custom work?) Particularly for grunt-level custom work (say, adding a support for tracking data on a niche wearable device that we don't currently support) there's a lot more questions that follow.

One of the most insidious of the latter, IMO, is that if it's just for one contract, then we're either hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer. At our small size and need for high-quality people, I consider this to be a real cost too.


We (I) feel these same things.

>when they ask for something off the roadmap

Then we also get side tracked and lose focus. Leadership and management expend too much energy trying to figure out what to do. Then they want estimates from the developers so they can figure out an estimated ROI. But they rarely seem to worry about the true income potential, focusing mostly on just the initial development cost.

Pursue it? Don't pursue it? If we do, how will we? Will we be >hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer.

Then is it really surprising that this lack of focus and discipline trickles down to those doing the work and the work itself? Technical debt in the making. It starts at the top.


Absolutely. A brief story on tech debt from the top:

One of the more frustrating things I've experienced is when I got push-back for implementing more project management process (we have a very light process, but when I took over it was sticky-notes-on-the-desk level). The complaint was "we can't slow down development to do more process". Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.

But of course, accurate estimates were needed, just, you know, without making measurements. I implemented some process anyway. We actually increased development speed from less churn and lowered communicated (consult docs before breaking someone's flow), improved estimates, and we've been able to better contain our tech debt.


> Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.

I suspect you could go a long way with the heuristic "If engineering asks for more process, always give it to them."

It's not flawless, but it's like hearing Ron Paul call for a new regulation - when a request is that out of character, you should usually suspect that there's some good motivation.


This is a frighteningly accurate description of the company I'm currently at. They spent many years chasing after the enterprise level customers at the cost of alienating their smaller team level users and never had an answer when requests would crop up from the larger accounts asking for features ('just get it done'). Now they're trying to pivot back to the team level customers and are having an supremely difficult time dealing with the tech debt built up by addressing the enterprise level concerns. We tout ourselves as being a product shop when in reality we're trying to be both.


What industry are you in or where is the home office located?


Can you expand on what your mentor defined as a "product shop" vs "professional services shop"?


Sure, but the answer is pretty trivial: If you spend more than half your time on customization, you're a professional services shop.

He also added that if you're a product shop doing less than 70% off-the-shelf, you're probably screwed, while 90% off the shelf is really the ideal (again, enterprise software).

I think the more interesting question is "what counts as professional services?" This gets much trickier, for example when you start building out APIs to make second- or third-party integrations easier, is that "product" or "professional services"? It certainly seems like product building, but if you're doing for a customer's use, it gets real blurry real fast. If you're not using that API internally, you're almost certainly on the professional services side. If you do use it internally, is it rock solid enough that you can support and expose it without that support becoming professional services?

Drawing sharp lines aside, this all probably seems kind of trivial, but the first time I ran through our product design with him and we discussed this, I went back and radically re-thought a lot of our strategy, particularly at the customer interfaces.


Great explanation, thanks!


Ouch, that's almost word-for-word from the company that died of debt.

It was enterprise sales, so customization was unavoidable, but no one was differentiating between big and small changes, or big and small buyers. The product was desperately struggling to do ~3 things at once, and still being sold to potential buyers on the promise of a fourth thing it would do "soon".


Enterprise customers which require enterprise sales require enterprise pricing. If appropriate enterprise pricing is not in place then you risk an enterprise failure.


I think every one of my former employers who have failed, did so by doing those 'customs.'

The last one even spun off a dedicated team that built (hacked) prototype customs in order to secure sales, then threw away the prototype and, after collecting the commission, told the new customers that it would take several years to get what they just saw in production but in the meantime we can do our existing product with some mods.

I imagine the pressure to accept these deals is immense though. Why let an innocuous little feature request hold up such a great deal?


Sounds like poor sales to me


"Let us use our logo" doesn't seem like an unreasonable request.


And it wasn't!

That was part of the problem: the sales people couldn't push back on most requests because they were often quite reasonable. When they were more demanding, it was usually from a large prospective buyer so we had to bend over backwards.

The result was that we had huge tasks to do with no (current) revenue, and small tasks to do that took 10x as long as they should have. Since servicing existing revenue streams (even on reasonable requests) became so time-consuming, handling big enterprise demands became totally untenable.


Sounds like you needed better sales people.


We needed a lot of things. Better (or more technical) sales was one. More mid-level engineers was another. Mostly, though, we just needed more time or money.

Our target market was very reluctant to moving from a paper system to a software system, so there was a lot of foot-dragging and feature requests. That delay had just never been budgeted into schedules or runway.


thats one line of code on some part of some html template if done in a non-idiotic fashion (speaking to the javascript overcomplicators)


Heh, yep.

And it was one line of code, after several hundred lines had been torn out and rearranged to ensure that different clients could insert their own pictures of different sizes without everything exploding. The whole team was desperately trying to force enough flexibility into the software that one-line changes could be made in <10 lines, instead of >100.


I remember my first day on the job too.


Working for a company like that, BUT they allowed me to completely rewrite 3 of the tools from scratch in a more modular fashion so that I could do these things without having to modify the old code bases. Now there are two other applications that I still have to support (and were written by a consulting company we no longer contract through). It's night and day. So this isn't really the worst thing if you're given the authority and power to take full control of an application and rebuild it and take ownership of it. Of course, this doesn't really apply to junior devs.


Last company I worked for did that to great success. All our customer got a custom version of our product tailored to their needs and their project. At least half our customers where doing something that needed at least one new feature that we didn't currently have. If you build your business model around that it is not necessarily a problematic model.


This part in Rich Hickey's Simple Made Easy talk [1] had a lasting impression on me. It really drove home the point on how a build up of complexity (one of the most common forms of tech debt, and one of the hardest to avoid) can eventually "kill" a project in exactly the way you described, slowly and painfully:

    "But I have all this speed. I'm agile. I'm fast. You know, this easy stuff is making my life good because I have a lot of speed."

    What kind of runner can run as fast as they possibly can from the very start of a race?

    [Audience reply: Sprinter]

    Right, only somebody who runs really short races, okay?

    But of course, we are programmers, and we are smarter than runners, apparently, because we know how to fix that problem, right? 
    
    We just fire the starting pistol every hundred yards and call it a new sprint.

    ...It's my contention, based on experience, that if you ignore complexity, you will slow down. 
    
    You will invariably slow down over the long haul.

    ...if you focus on ease, you will be able to go as fast as possible from the beginning of the race. 

    But no matter what technology you use, or sprints or firing pistols, or whatever, the complexity will eventually kill you. 

    It will kill you in a way that will make every sprint accomplish less. 
    
    Most sprints will be about completely redoing things you've already done. 
    
    And the net effect is you're not moving forward in any significant way.
[1] https://github.com/matthiasn/talk-transcripts/blob/master/Hi...


This. It's not like there is a sign saying "Technical Debt Required to Proceed"...but rather the slow death from a thousand cuts to productivity caused by having to analyze every potential system, process, template, stored procedure, etc, etc...to make any stable(ish) change. Even if things are loosely coupled and not dependent on each other...you still have to go in and make those changes. Telling this to a room full of non-understanding management is a whole different challenge...


Templates stored across a database is probably the worst thing I've seen repeatedly across projects. Just because a database can store everything doesn't mean it has to.

Some people really seem(ed) to have an allergy to plain files for storage. A plain file with OS level caching will beat most (if not all) databases for static content. But doesn't sound as fancy, so it's probably harder to charge a lot of money for it.


Template in one database table I can live with (pros and cons, multiple front-ends, etc). One template broken up in to 12 tables requiring an 100+ line SQL statement with concat()s and HTML interspersed is insane. Had there been an API or utilities with it to manage it, it might have been manageable, but nope - just "write some queries".

Also, just repeated your comment to a friend who said "that's the worst thing you've seen? can i have your job?" :)


(blown away by all the responses to my original question!!!)

Your story here makes me laugh if only because of a very painfully familiar memory. Luckily this wasn't a big production system but rather an internal tool (that I guess clients did also use but it wasn't part of 'production' per se) that was written entirely in perl_cgi filled with cryptic regular expressions written in complete spaghetti code and it would concatenate together entire webpages that had bits of them rendered by including the contents of files strewn all over the file system and of course the logic to concatenate all the html together was strewn across a fistful of files which were in disparate locations. In short I was once asked to make a simple change to some html and after 5 days of reading through perl_cgi and developing a pure hatred for Larry Wall, I decided to do a java re-write that took 3 days. I mean... crikey. Haha.


We have a similar application in PHP. By the time I've traced through all of the included files that are touched by a particular function, I've forgotten what I'm looking for. It's truly a nightmare.


Wait until you see a Turing complete DSL programming language stored line-by-line in rows in a database table and executed by pl/SQL using cursors, locking the entire execution to prevent concurrency.


This is why we can't develop nice things.


I'm dealing with one internal project where this happens because there's an artificial IT/build distinction between "emergency" code push and "casual" raw database change.

This means lots of business-rule crap gets softcoded into the database or ini files (increasing complexity and bug-risk) just to support a hypothetical future where somebody needs it changed without a full sprint cycle.


And you aren't kidding about "repeatedly". Personally I associate it with the late 1990s/early oughts and ColdFusion; I think one of the early CF frameworks really encouraged it, and it kind of just stuck from there, particularly in Government web work. But it's probably wider than that...


This has been my experience. Since technical debt is hard to measure, it's more a case of a series of unwise technical decisions leading to a lack of productivity. Due to tight schedules, short-cuts are taken which lead to more unwise technical decisions, and you have a death-spiral.


Isn't this precisely technical debt? Unless you want to split hairs and call this a technical massacre...


It is, but the question is what "killed" by technical debt means. It's uncommon but not unheard of for code to reach the point of "we can't do that". Mostly, though, the proximate cause of death is a funding shortage or management decision to shutdown. Technical debt is just driving the cost overruns or inefficiencies that kill the project.


I don't disagree that the root cause in that situation is not engineering in most situations and that it usually is an indication of a symptom, but attempting to change the meaning of the term itself is not a great approach for communicating that.


ACK - I FORGOT THE BEST BIT... (well, maybe not best, but...)

No one could install anything locally - everything had to be done on their locked down remote systems (some were Amazon remote desktops).

For the accessibility testing, the auditing company used JAWS. The company I was contracting to had one license (or so I was told) so I couldn't have one. We actually tried to install JAWS on an Amazon desktop, but it just crashed the entire virtual desktop, requiring re-imaging. That happened twice, so we gave up.

So, the proposed workflow was, I'd make a change, push code, email someone to move that code to a system that an internal tester could look at it. I'd get an email back, then email the internal tester that the code was ready to go look at. The internal tester would go to the screen(s) in question, using JAWS, then "tell me what JAWS said". That would often take several hours or a day.

I was then supposed to make changes based on that feedback, then repeat the cycle until things were 'fixed', then we'd ask the auditing company for another test, which they'd schedule for 2 weeks in the future. Then we'd wait.

During the first iteration of this part, sr mgrs kept asking me "when will this be done?". I kept trying to explain that we didn't even know what "done" was - the auditing company just had blind folks that would use the system with JAWS enabled and if they felt it was usable, they'd say so, otherwise, they'd report back "hey, this isn't usable", and we'd have to start digging in again.


This account kind of comfort me in what I think of technical debt : most of time, the problem is most likely lack of documentation than anything else.

I don't see how a big project could be coded without containing anything specific to the project. And even then, the architecture by itself is unique and deserves documentation.


I've also got experience of this kind of situation (See my other comment) I think you can definitely call that technical debt.


I think this is classed as lethal technical debt


It happened to me twice. The first time was in a start-up at the beginning of the century, we were developing an electronic health record and we had outsourced the database abstraction layer to a company in Greece. In the beginning things went fine but after a while the development of the DAL went slower and slower and it became unstable as well. Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code. They had tried adding features but that had made things only worse and the DAL had started to crash randomly. We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already and eventually went almost bankrupt and were bought up by a competitor.

The second time was in a small company whose product was a search engine for consumers. The web layer was written in a mixture of JSF, JQuery and Ajax. While that combination already slowed down development on the front end, the main problem was the performance of JSF on the server. Because JSF is rendered on the backend, it placed a massive load on our server for certain heavily used pages and we just couldn't scale any further. Skipping JSF for a framework that was rendered on the front-end would be the solution but that was a massive refactor for which the company just didn't have enough resources. Eventually the company had to skip their search product and change their business model to a more community based website.


> We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already

I wonder, would the result be different if you had access to competent Eiffel developers? How large was the Eiffel codebase?

Eiffel is an interesting language, with a somewhat unique feature-set (I think only Ada is coming close). Design by contract and static typing as core language features - if used right - should greatly help with both stability and ease of refactoring.

How large the codebase was is an important question, also how bad it really was. I saw a similar story - external codebase getting worse and worse from some point on - with Clojure at the center. The code quality was quite ok for a couple of months, then it worsened. At that point and for a couple of following months the codebase was possible to save - a single competent Clojure programmer would make a difference, I think. The project was less than 10k LOC then. However, more than 1.5 years and 60k LOC later, doing anything became nearly impossible for anyone, including original authors.


You had a search engine and rendering the search results was the bottleneck? That's really weird. Don't know a lot about JSF but other templating languages are usually really not ever the bottleneck. Maybe if you have some giant table with thousands of cells each with its own complicated template directive (for loop with conditionals etc).


"Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code."

OMG no - run for the hills.

95% of software systems are not inherently sophisticated - they are 'complex' - yes - maybe there are many features, and moving parts - but there are no pieces of the system that should be hard to understand by anyone. Decent architecture + decent design and coding and an entire banks system should read like a long, but well articulated user manual.

Unless you're doing super low-level stuff, complex algorithms, heavy math stuff, or issues with massive scale or performance etc. ... the end result should almost be mundane in most cases.


This sounds less like technical debt, and more like liabilities of over engineering. Possibly feature creep.

That is, technical debt is not necessarily tangled over-engineered code. It is more compromises that were made to actually ship and operate in the world. You can see this in the world with devices.

Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices. It is not the reason that your coffee machine that also grinds and whatever, is likely to fail within the year.

Another example; Technical debt is the reason we are still predominantly using petrol for automobiles. It is not the reason the dashboards are horribly non-responsive on modern cars.


> Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices.

Bad example. AC power has many desirable characteristics for the local transmission grid. If you were to do the grid over from scratch you'd still use AC. You're also too focused on household electronic usage, which is a very tiny percentage of the overall electricity used.


It's just an illustrative example. And I'm going to bet that most of us, the vast majority of us, really only have experience with household usage. So it would make no sense to get into other usages, which most people won't understand.


My understanding is that HVDC had advantages. That said, I was also intending that to include the distribution in your house.


HVDC does have advantages in certain scenarios (very long transmission lines, for example) but parent is still correct--the majority of the grid makes way more sense with AC.


I meant my follow-on to be a concession, but worded it poorly. I thought it had advantages, but yes, I was thinking small appliance mainly. In particular, in home. And not just computers, but lights and control panels. Seems many things all use the same power characteristics and are now becoming complicated by dealing with AC.

Which, amusingly, is fitting for the tech debt debate. Eradicating some choices from the project is likely to be missing the point. Just as eradicating AC from all power would be short sighted/wrong.


AC is much better in the home. There is no way to get around the fact that you need massive wires to supply low voltage at high amps.

It is much cheaper to have a power supply on every electronic device turning 100-200 volts to 5 volts than to have one big power supply turning power line voltage to 5 volts. Of course a lot of computers need 3 volts or less, so the power supplies exist anyway. It is also more efficient big power supplies running at low loads are inefficient, the power supply on each device is sized to what the device needs and so it more likely to be operating in a high efficiency area.


>AC is much better in the home. There is no way to get around the fact that you need massive wires to supply low voltage at high amps.

That's orthogonal. What you really mean is that you want high(ish) voltage to distribute power in a home, in order to mimimize losses due to wire resistance over distances of dozens of meters.

You don't need AC to do that. In fact, with modern power electronics, the switching converters we now use for supplying LVDC to our devices can work just as well with DC as with AC input power.

The primary advantage of AC over DC is that it can be converted between voltage levels easily with transformers. But today, we can do the same thing with DC using DC-to-DC converters. These didn't really exist in an economical way before a couple decades ago, maybe even more recently.

If for some odd reason, western society decided to re-engineer and replace the whole power grid, it's quite likely I think they would simply switch to DC for everything. With deployment at that scale, the cost issues with the equipment should go away, making it no more expensive to replace everything with DC converters than transformers. DC is more efficient than AC because it stays at its peak voltage, and because it has no skin effect. But the technology needed to make it inexpensive to use for power transmission has only been around for a somewhat short time (namely, modern power electronics). Up until recently, it was simply a no-brainer to use AC because of its simplicity in generation, transmission (with transformers for stepping up the voltage), and usage (with AC motors).


> it's quite likely I think they would simply switch to DC for everything

I'm not sure. AC has some important safety considerations that would make it better even if the efficient was significantly worse.

Switches, fuses and circuit breakers that work with DC are more expensive than AC. When a circuit opens there is a spark, and this spark can in some cases create a conductive plasma. With AC the wave goes to zero and the plasma disappears, while with DC it continues. There are cases where a DC fuse blew but the fuse continued to conduct. Of course this can be engineered around, but generally with larger and more expensive parts.

When someone touches power accidentally, AC is slightly safer. With DC your muscles will grab and never let go. AC gives you a chance to let go. This is a low probability thing, but is a factor.

The guy who wanted us to debate is wrong for one other reason though: I'm approaching the limits of what I know on the subject, while you seem to have a lot more knowledge.


The desire for debate was to increase our collective knowledge. Not to prove someone right. I am fully comfortable with the idea that I was wrong. You both have knowledge I find interesting.


I'd be interested in you two debating this more, since you both clearly know the topic better than I do. This post is reflecting what I thought I had heard. But, I am not in this field.


There's really nothing to debate; the guy I replied to was totally correct about everything except the bit about "AC is much better in the home", where I pointed out that he really meant that a high voltage roughly where our current AC systems are (120V-240V) is much better in the home than some kind of low-voltage DC system, and that with modern technology, it would probably actually be better to have a DC system. But realistically, that's not going to happen because the gains (probably very minimal) aren't worthwhile compared to the enormous cost of conversion, given how standardized our current AC system is and how all our infrastructure, point-of-use devices, etc. are all designed around that.

Basically, he was assuming practical real-world considerations, I'm going off on a tangent about ideal conditions. His argument is about whether it's better to stick with the current AC system that your house has, or if it's better to install a low-voltage DC system to supply 5V, 12V, etc. to all your devices from a single, central, whole-house power supply as many people who don't understand electricity will frequently suggest. He's completely correct: low-voltage DC is a terrible way to supply power over any distance more than a meter or two because of resistive losses, so it'd require massively large copper cables or busbars. And power supplies are generally very low-efficiency when operated at low load. So our current approach (separate little optimized power supplies for every device, plugged into a higher-voltage AC supply) is actually optimal.


I was never arguing that an individual should replace the AC in their house. My argument was, with current technology, the AC setup can be seen as tech debt.

Which seems compatible with what you are saying, but the parent was specifically claiming I was wrong.

That is, you seem to be echoing my point. But seem to be claiming it is different. What am I missing?


I wouldn't call it "tech debt". Present-day AC systems may not be completely optimal (given current electronics technology), but they do work well.

As I understand it, "tech debt" is something that has to be reckoned with at some point, or else you're going to have real problems in the future (just like refusing to pay off a money debt will generally cause you real problems at some point when the creditor sues you and gets a judgment). You can't just let it go on forever; eventually you need to "pay it down" (by cleaning up the codebase, migrating to newer technologies, etc.), or else catastrophe happens (the company is unable to compete and goes under). One common factor cited in these stories is that the code becomes too unmaintainable and unreliable: too many weird changes for customers pile up and introduce serious bugs which cause the product to not work properly.

This isn't like that at all. We can go on with our current household AC power systems indefinitely. Maybe we could get a 1% improvement by switching to DC systems (at an enormous cost because most of your appliances and devices won't work with it without adapters), I don't really know exactly how much better DC would be (not much really), but what we have now works fine. Furthermore, it's not like the whole electric grid system needs to be changed: it's entirely possible, for instance, to switch distribution systems to DC and leave household systems AC. Instead of distributing the power at 30-something kVAC in your neighborhood and using outdoor transformers to step it down to 240VAC for your house, it could be distributed in DC form, and those transformers replaced by modules which convert the 30-something kVDC to 240VAC. In the old days, this was hard and expensive to do, but with modern power electronics it's not. But even here, the question is: are the gains worth the expense? And the answer is very likely "no". (For reference, I'm not a power engineer, I just studied it in college as a small part of my EE curriculum.)

So this does not, to me, resemble "tech debt" at all. It's just a system that we use for legacy reasons and which is extremely reliable and works well, even though it might not be the absolute most efficient way to solve the problem. This is no different than many other engineered systems. Perhaps you have a decent and extremely reliable car. Could it be better? Sure: you could build the chassis out of carbon fiber, use forged aluminum wheels instead of cast, etc. all to save weight and improve fuel economy. Are you going to do that? Of course not, because the cost is astronomical. There's cars like that now, and they cost $1M+.

So for AC systems that we're talking about, the question is: what is wrong with them that we want to consider replacing them with something else, instead of just sticking with them even if they're not quite as efficient as they could be? Because the cost to upgrade them would be enormous, so you need to have a very good reason.


Most instances of tech debt are things you don't have to deal with. Usually, it is the term pulled out for things people don't like. Or generally deprecated methods that have better replacements, but still work.

It is this second sense that I was latching on. It --tech debt-- will drive decisions today. But it is not clearly bad. Just a constraint on current decisions that was made in the past. Often for decent or really good reasons.

Bit rot is another term for things that start to decline in how well they work. That is generally different, though. Usually a by product of replacing implementations without keeping functionality. Such that people relying on old behavior are left cold. (I can see how tech debt can easily turn to bit rot. But it is not required.)

Consider, LaTeX being an old code base is often used to call it tech debt filled. People want to modernize it. Not because it doesn't work. But because they think there are better ways, now. And they do not consider all of the documents made on it as infrastructure.

Now, i concede that all of this is my wanting the terms to have unique and actionable meanings. Elsewhere I was told "tech debt" is a catch all term now. That seems to rob it off usefulness.

Edit:. I forgot to address the monetary aspect of the analogy. I like that, to an extent. But most debt is taken in very specific terms financially. Unlike colloqually termed debts between friends. That is, there is no interest in this metaphor that works. Nor is there a party you are borrowing from.


>Most instances of tech debt are things you don't have to deal with. Usually, it is the term pulled out for things people don't like. Or generally deprecated methods that have better replacements, but still work.

I'm not so sure about this. To me, "debt" is something that has to be paid eventually. Otherwise, why use the term "debt" at all?

So if something works fine, why waste your time and energy replacing it with something newer?

Usually, the reason for this is the assumption that sticking with something deprecated will eventually bite you in the ass: something you're depending on won't be supported, will have security holes that won't get fixed, etc., and you're going to wish you had fixed it earlier. So this is a valid use of the term "tech debt" IMO.

But if something is just something someone doesn't like, that isn't "tech debt" at all. I don't like .NET, but it's invalid for me to call all software written in .NET "tech debt". I don't like Apple's ecosystem, but it would be pretty ridiculous for me to call all iOS software and apps "tech debt" when many millions of people use and enjoy that software every day.

So, for your LaTeX example, I don't consider that tech debt at all; instead, it's just like iOS and .NET software to me. If someone doesn't like it, that's their problem; the fact that it isn't brand new isn't a problem for me and all the people who still happily use it.

So personally, I think anyone using the term "tech debt" to just refer to things they don't like is using it incorrectly and in a totally invalid way.


I find this a compelling view. But, I urge you, just google technical debt. You will see the definition: "Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution."

So, in this case, AC/DC fits if we agree there is a chance the "best overall" solution is DC. (Which, I fully grant, is not a given.) There is also a bit of playing loose with "short run."

Then, skip back to the top of this thread, where you will find: "products that are written badly by inadequate teams" and "case of unpleasantness" and "A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language..."

All of this is the first, most highly voted, post. The next post is a highlight of poorly engineered solutions.

My point? Find a case study that has the usage you are referring to here.

Now, certainly rhetorically it has this appeal to people. But I have never seen it used in a way that it fits the metaphor. Just used to hit the emotional strings of "you must pay back your debt!" While usually claiming that the design or lack of some technology is the debt.


I think we're going off on a tangent here, but even with that definition from Wikipedia, there's no such thing as "the best overall solution". Everyone is going to disagree about that; the best you'll get is a consensus. For instance, back to LaTeX, there's countless academics out there who use TeX/LaTeX/whateverTeX for writing academic papers, and getting beautiful results while not having to mess around with a WYSIWYG editor like MS Word and just typing in some simple formatting codes. That's what *TeX was designed for and has worked well for for ages. But I'm sure you'll find a few people who say this is bad because it's "old" and that they should switch to the latest MS Word for everything, and rewrite all their papers in the latest MS Word. If you look really hard, you might even find someone who thinks both are bad, and that all academics should rewrite everything in WordStar.

"The best overall solution" is up for debate. It's the same with programming languages; one team will say that C is the best overall solution for a certain problem, another team will say it's Python, another team will say it's one of the .NET languages. I'm sure you can find plenty of engineers who will claim that mission-critical real-time avionics systems or automotive ABS controllers should be redesigned to use x86 CPUs and run Windows and have the code written in C# instead of using C/C++ and running on a small RTOS on an embedded microcontroller.

The implication I see with your Wikipedia definition is that implementing something easy in the short run instead of something that really is the best overall solution will eventually lead to more work to fix the shortcomings of the quick-n-easy solution. So, like I said before, a "debt", because it has to be paid back eventually (with work). The problem I see is that not everyone agrees on what is the best overall solution, and unlike a money debt that's easily seen by looking at a dollar figure, the only way to really know how much "tech debt" you have is through experience, i.e. accumulating it and then finding out over time how much work you have to expend to fix things when your quick-n-easy solutions start having real, demonstrable problems. If your solution has no actual, demonstrable problem (e.g., you use LaTeX and it continues working great year after year for your use-case), then I don't consider that to be "tech debt" at all, even if some people don't like it.


I 100℅ agree regarding "best overall solution.". Indeed, in large that is my point.

Alternatives may have advantages. However, often the advantages of where one is at are ignored in the debate.

My gripe in this debate is more from actual uses of the term. Not from any ideal use of it.


Including light bulbs?


Yes, even light bulbs. A typical household LED is very easy to run off AC. You just need a capacitor big enough to hold the charge between each cycle of AC (which is very little). More information here: http://www.ledsmagazine.com/articles/2006/05/running-leds-fr...

It'd be vastly more expensive to wire up an entire house for low voltage DC than it is to include the simple rectification components in every light bulb. In a house you're talking about many wire runs of many dozens of meters. This is not a good environment for low voltage DC at all.


I recall seeing IEEE articles talking about the DC wired home. I confess I stopped paying attention, as it will be a long time before this is actionable for me. Can't claim surprise to know that I had things that were wrong.

Of course, the cynic (and, ironically optimist) in me still has this as evidence that "technical debt" is often used in BS circumstances by people that just don't fully understand the reasons for the things they are talking about. :)


Saying that technical debt is only deliberate is an old argument[1], but usage defines meaning and modern usage is that "technical debt" is a catch-all term. It just means bad code we know should be fixed.

[1] 2009 - https://martinfowler.com/bliki/TechnicalDebtQuadrant.html


Stretching the debt analogy, you can go bankrupt from payday loans (the "just push it out" tech debt) and from getting too big of a mortgage to build/fix up a house (over-engineered tech debt).


That seems to indicate it is a worthless term, then.


For the second one, you must have been receiving a lot of traffic for template rendering to be such a bottleneck. Why not upgrade the server?


They upgraded the server of course, to as much as they could afford. But it wasn't enough, the rendering load soon caught up. First of all because their number of visitors grew, but also because they wanted to add new features to their JSF pages and every new feature required extra rendering power as well.


Could you not scale this horizontally? We do all template rendering server side, though it is JSP and not JSF.


That was considered but it would of course take some refactoring on the back end, and it would still cost quite much in hardware. The thing with JSP and JSF is, they do ok as long as your content is relatively static, because then rendered content can be cached. In case of this company, their most visited page was the list with search results which by its very nature was not very static at all.


Every problem is different, so I hate to judge, but what you're saying doesn't add up to any experience I've had.

It sounds like your company seriously screwed up the design if you can't scale your web tier code horizontally. I've also never had a view technology take up a significant chunk of cpu resources - it's always the Java code carrying out the functionality. E.g. I would expect the largest factor in CPU usage in the list of search results to be... generating the data for the search result. If the largest factor was rendering the result, then something was probably seriously wrong.


What was/is the product/company called if i may ask?


The closest I've come was a Rails project I inherited from a star developer who had just left the company. It was a B2B project that involved importing large Excel spreadsheets of various different formats into a standardized database for itemized review.

The code was pretty sloppy, but didn't deviate much from standard Rails idioms. Not many people on the team understood Rails well enough to read it, but I did. Bug reports were constantly flooding in. I suggested taking a sprint to build up an integration test suite and then letting loose on the backlog.

We did build up a sufficient test suite in one sprint. But the bug reports never slowed. By the time we had the confidence to truly start tackling bugs at speed, the battle had been lost. We had been so busy writing tests that we forgot to manage the bug tracker. The impression was that we were overwhelmed and unable to make progress. The project was swiftly closed.

People remembered that codebase as an exemplar of sloppy code and technical debt, but that's not the lesson I took from it. I had seen, and others would see later, much worse. The lesson I took was that perceptions are as important to manage as results.


I don't think I ever seen an Excel / CSV import implementation that wasn't a huge mess.


Excel imports with Perl did pretty well for me. I was pretty careful on insisting on some rules for the sheet data and enforced them strictly with decent debugging info for the users.

I still think Robustness principle[1] is a croc and strictly controlling inputs is one key to happiness. It also, frankly, helps your users in the long run by giving them exactly what they want and it actually cuts down on the amount of thought they have to put into it. Chaos and disappointment do not make a good user experience.

1) https://en.wikipedia.org/wiki/Robustness_principle


Ruby has had good ETL libraries for a long time. In my opinion, our product team was too lenient concerning the format of the Excel files. Asking customers to fill out a template spreadsheet to submit to our system, rather than letting them submit any old XLS file they happen to have on their computer, would have gone a long way towards simplifying the problem space.


Do you mean Excel to CSV? Or a CSV importer?

I totally agree about Excel importing, but CSV is trivial, no? Here is an Erlang version I happened to write yesterday:

  lists:map(
    fun(Row) -> string:tokens(Row, [SepChar]) end,
    string:tokens(InputStr, "\n")
  ).
EDIT: I know this version won't support escaped separator/newline characters, but I made it for a specific use case in which I knew that would not occur. Adding that functionality would make it a little messier, but still not too bad.

EDIT2: Thanks for the interesting comments! Not so trivial after all!

Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.


You need a lot more than that to handle CSV in the wild (quoting, Unicode, line termination, etc.) but the real killer I see is when it's edited by humans. The special cases for errors and inconsistencies will add up quickly; in some cases you may be able to reject invalid data but you may not have that option or an easy way to tell whether any particular value is wrong.

Excel takes that, adds some fun things like people using color and formatting to store data, and things like Excel auto-corrupting values which look like dates and may not have been noticed before you do something with the data.


I know of at least one company whose entire business is handling this stuff. They find growing companies as they hit critical mass and need to move their Excel data into a real database. The product is just "Your data is hideous and was entered by hand without validation or formatting; it'll never convert and it'll be wrong when it does. We can help."

They handle all kinds of theory and technical stuff, like normalization and processing Excel-corrupted dates. But they also handle a lot of easy-but-agonizing tasks like regularizing single quotes into apostrophes, which crop as soon as you let humans enter free-form data.


I used to use Google Refine (now OpenRefine [0]) for this. It lets you load up the data and then apply rules to see if they are mostly correct. It doesn't get you all the way, but it is better than going blind on manually revising a huge Excel "database".

[0] http://openrefine.org/


What do you use now?


Could you share the company name?


I'll try to remember. I ran into them at a career fair a few years ago, so it's not leaping to mind, but it seemed like they had good software and a great market niche.


Let's not forget Japan Post's CSV for all the Japanese Address data that contains some lines that are line-wrapped, that is, one record spans two or more lines in the CSV file. A line-wrapped CSV... I just can't even.


That's why ASCII was designed with record and field separators. Unfortunately, it's not used (de facto) for delimited files.


That is very interesting, thanks! I hadn't thought about Unicode or tolerating human error. Although the times I have worked with it have been when it is a transport medium between two computer programs.


That's definitely a less-aggravating situation by far. I've had a lot of cases where a significant amount of specialist human time was in a spreadsheet and it's really made me wish there was an Excel-for-data which acknowledges how many people are using it for semi-structured data like this.


Like Airtable?


I'm not talking about parsing. It's a mess in it's own right of course (encoding, line terminators, etc as others mentioned).

I'm talking about the actual conversion from tabular data to relational. Most of the applications I've worked on had this in one form or another.

So you end up with users downloading an export of their data in CSV, editing it in Excel in various ways, and then reimporting it in the application.

Every company I worked for, this kind of feature was always in the top 3 in term of support load.


"Relational" means "tabular". (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)

A "relationship" in an ER diagram maps to a "reference" in relational theory. This is part of the type safety/domain system of RDBMSs.

If these concepts are muddled, SQL will never quite make sense :)


> "Relational" means "tabular".

Relational database can be expressed in tabular form, but tabular data is not necessarily relational.

> (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)

A relation is a system of one or more functions (in the mathematical sense) each of which has a domain that is a candidate key of the relation and a range that is the composite of the non-key attributes.


Interesting definition. Do you have a source for it. It seems ambiguous.

From the Wikipedia article on relational databases, subsection relational model. "This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples."


Ah right, yes, I can imagine that would be extremely messy!


Your edit is delightful.

"No, honest guys, I knew CSV was more complicated. I just didn't need to make my code safe."

Here's a csv parser in Erlang that actually attempts all that trivial stuff:

https://github.com/rcouch/ecsv/blob/master/src/ecsv_parser.e...

That's a lot more code than yours. And the notes even say it's not tolerant of badly formed CSVs.


I submitted that edit before the post had any replies.

Also, I did try to make clear that the given code was created 'for a specific use case in which I knew' that the format of the input files was tightly defined.


> but CSV is trivial, no?

You can define a narrow subset or version of CSV that is trivial, but that doesn't reflect what one finds in the wild as "CSV", which was not systematically defined or described until well after many mutually incompatible things by that name.wdre well established.


CSV is trivial? You may have missed this:

    http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/
:-)


This is interesting, and as the other commenters have pointed out, creating a parser for /all/ variations of CSV can be very tricky.


But your code doesn't even handle the trivial case.

    "She said, \"Hello, world!\""
You can drop the "meh, I know I didn't handle all the complicated cases" act.

We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)


Hi,

Thanks for your thoughts. As I have stated elsewhere, the code handles all of the cases I needed it to handle, due to the stability of the input file format (which was emitted from another program). I don't see that this should be too hard to believe.

I also said in my second edit, on the top line, 'Not so trivial after all!'. If I was putting on some kind of act, wouldn't that have been dropping it? Further, I noted in my first edit, before I had received any replies, that I 'know this version won't support escaped separator/newline characters', so I am not sure what you were trying to add with your example?

I think that my central point (and I totally accept that I didn't express this well) is that depending on the specifications of your program, the required CSV parser /can be/ very short. When one compares this to other data exchange formats, for example JSON, it is clear that the barrier to /entry/ is much lower. The shortest JSON parser I could find with a cursory look was 200 lines of C.

I totally appreciate that to write a CSV parser that works for all cases would be extremely longwinded. It has been interesting to hear other people's experiences and opinions about that. But the fact remains true that /in some cases/, depending on the requirements of the program, the parser can be very short.

> We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)

It is funny you should say this. I needed the CSV parser because I thought it would be fun and interesting to see if I could build an anti-malware tool in a week (I am taking a malware detection class at the moment, I wanted it done before the next lecture). I did not expect I would be able to have anything good working in that time, but by the early hours of the next morning I had a perfectly functional anti-malware tool. It can use ClamAV signatures (so it can detect everything(?) that ClamAV can), runs in parallel, has a nice text console with DSL, and is fast enough (processing 210k small files in ~5 minutes, checking against ~60k sigs). It is about 650 lines of Erlang (including comments). I am saying this not to boast(!), but to make the point that I greatly underestimated how productive I could be, beat my expectations by many fold, then people comment about my hubris online the next day. It is funny how life goes!

Thanks,

Sam


How about: All Of Them

Every failed product/project I've worked on in my professional career, which had full intent to ship from the start, was killed by technical debt. It's usually indirect, but it's always the root cause.

It takes many forms:

* Too buggy to ship, due to a creaky old code base being over-stretched to a product with too high reliability/experience expectations.

* Product form factor, efficiency, user experience not good enough to sell well, due to spaghetti code base which couldn't be whittled down to removable pieces. Result: large runtime, more expensive, less efficient hardware.

* Existing old codebase deemed too bad to ship a product, requiring a rewrite-from-scratch, but timescale too long to make any sense -> product killed.

It's difficult to elaborate more while maintaining some discretion about exact companies and projects. The general point is: technical debt isn't just some fuzzy intangible issue — it indirectly creates enormous costs in people and time, can affect the physical form products take on, and impact the user experience. Products always get started without taking this debt into account, but when it's finally realized, it can change basic features, and then it kills them.

Products are designed with faulty assumptions about what existing resources can be applied to them.


Interesting that you talk about project that never shipped, When I read OP's question I was thinking about already-shipped products that became too hard to run and maintain.

I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?


> When I read OP's question I was thinking about already-shipped products that became too hard to run and maintain.

I've been mostly in consumer electronics related companies, where a product which ships and then becomes too hard to maintain usually doesn't "fail". It just gets phased out. In a way, this is another way technical debt has an indirect, but large impact on products: obsolescence becomes a necessity. Not so much planned — which implies malice — as simply realizing it's not possible to maintain indefinitely.

> I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?

Usually very quickly, or after far too long.

The better projects know ahead of time that there are Dragons lurking in the code base. But that's effectively saying there are projects which never even got past brainstorming because we knew the technical debt was too high.

On the other hand, there are projects where it only becomes apparent how much debt there is after a lot has already been invested. It's like you'd expect, e.g "There's a performance problem because of a basic primitive this library uses everywhere. And that was originally a workaround for a compiler performance bug. We could fix the compiler bug, but it turns out other libraries relied on it..." and so on. Extra time-to-market makes a product make less and less sense — fashions change, hardware improves, new tech arrives — and so it gets killed. Or worse, shipped.


Ironically you avoid technical debt by slowly killing and rebirthing parts of your product.

The class that is no longer appropriate for new requirements gets canned for a better abstraction etc.

In aggregate, over time, you may kill the product to avoid technical debt!


The last place I worked at will die because it will take them years to migrate from Oracle to postgres due to "technical debt" (the codebase is coupled with the database to a hilarious degree; business logic in triggers, huge plsql packages, plain sql queries in the java codebase, halfassed homerolled ORM). They're not getting as many new customers as they could because, for various reasons, the Oracle licensing terms are now unacceptable for the new customers they have been in contact with over the last two years.

That's the most concrete reason I can come up with why the technical debt will kill them, but there's plenty of vaguer reasons why it's been killing them for the past 5 years and will finish them off over the next 5. The attrition rate have been around 20% a year since I joined. For most of the time I worked there they compensated somewhat by hiring new people. Word has gotten around though, and they've run out of qualified candidates willing to work on their mess. Hell, we even had a couple of gifted hires leave after a month or two while shaking their heads.

My current workplaces main product is using the same tech, is the same size (loc) and has the same functionality of the other company, but serving a different market. They did the oracle to postgres migration in 2 months. 2 MAN months, one guy.

New workplace: 15ish developers, serving the same amount of customers, doing similar revenue, making stable releases every week

Old workplace: 80 developers at its peak, doing non-hotfix releases around every 3 months. Just a mess in every way. Mostly stemmed from the codebase and the architectural choices that had been made along the way.


Hey, sounds like we worked at the same place! That, or the "wedded to Oracle for life" is a common antipattern. I'd add "shared everything architecture" to the horrors.

Yeah, once you get that deeply entrenched in Oracle, it's almost impossible to get away, and after that experience I vowed never to work at another Oracle shop.


Oracle is a form of technical and financial debt all its own.

I wasn't directly involved in but had a good view of our university's finance modernisation woes: http://news.bbc.co.uk/1/hi/education/1634558.stm https://www.admin.cam.ac.uk/reporter/2001-02/weekly/5861/1.h... - although in fairness the inflexibility and disorganisation were existing features of the institution, and Oracle merely exacerbated them.


Do you think the same thing could happen with cloud vendors like AWS?


Yes, and pretty easily if you buy into all the new features that no one else will ever have all of. If you stay clean with simple storage, compute, DB, email, then you should be okay.

Ideally you'd have some kind of plan though from the start, for which other cloud provider you would use and how the services would map, in case using AWS becomes untenable.


Cloud product life cycles should definitely be more interesting. Azure for example already has a "classic" model and the new ARM model. Either way, avoid tightly coupling code with some external vendors service.


I don't know if I've ever seen a successful database transition for a large project at a large firm. You basically have to build that from the start.

Doing it after the fact in a politics-heavy organization is confounded by not just the technical difficulty of the task, but the glad-handing and perception management that has to happen to keep your team from getting fired during the process.


I was CTO of a company that had a two-week outage due to technical debt. I didn't sleep much for any of it. We fixed it, and we'd lost about 30% of our subscriber base in that period. The company took on new funding to survive, invested that in a new set of products, and shuttered the old stuff just to stay afloat.

I am currently working in a business where there is a nearly 8-year old Rails app (600+ models, 250+ controllers, 400+ libraries, LOC around 60k), that sits at the heart of everything we do.

The company is struggling to grow and believes the cause is that engineering is slow. We have asked to refactor this code base multiple times, and point to the technical debt as the cause features that should take a day to implement taking between 3-4 weeks, typically.

It is only recently that the penny has finally dropped and they've realised if they don't invest in replacing this thing (there is too much technical debt to fix, we're calling bankruptcy and moving to a brand new architecture piecemeal), the business is likely to fail within 1-2 years.

That means my current employer is likely to go bust because of technical debt within 2 years max unless we become really good at fixing this.

We are optimistic.

We have to be, right?


IMO this is the price to pay for a dynamically typed language. 60K LOC is not much in a static language, you can use tools to refactor it easily or to visualize the control flow. But with a dynamically typed language? Its a nightmare. You change one thing and cannot possibly know what else could have gone wrong.


The legacy app has > 80% test coverage. Refactoring is still slow because there are all sorts of business assumptions put into place that add functionality without ever questioning the need for it.

Basically, for a long time, the company never really re-evaluated what it had learned and spent time trimming things down, so as a result there is this ungodly mess. At the heart of what the business does, there is no real need for more than a dozen models. So why do we have so many more? Nobody ever refactored away stuff we didn't need any more, and so weird things happen.

There is also a coupling issue that is endemic to all monoliths. We're moving to a micro-service architecture with clean domain separation, and we'll probably go to 1/10th of the code base in LOC terms within 12 months, even if we move some of that functionality into Go, Java or Python services (all options).


Unit tests just make the mess a bit easier to solve, but are far from a perfect solution. Wanna move and rename a function? Do it, then spend hours fixing your 10 broken tests, writing new ones, and test you app for hours because unit test dont cover integration. A static language + an IDE does it automatically within seconds.


It depends.

I work on such type of codebase, but we have a fully covering testing suite, so applying changes is not a problem (interestingly, I've just realized that the line count of the testing code is 50%+ more than the base application code itself).

So ultimately I think company culture (that is, emphasis on automated testing, for dynamically typed languages) is the crucial factor.


I would say that 8 years to write 60k LOC is slow. I worked as the sole GUI software engineer for a hardware firm and wrote > 100K LOC in 3 years, not including the test projects proceeding the actual real project. This was in C++, and included client/server stuff, entirely custom resizeable GUI, OpenGL 3D graphics and modelling of 3D assets and textures etc too. And getting it running under OSX + Win32, fixing issues on both.

And that wasn't a stressful place to work with insane deadlines - it was fairly relaxed for the most part.


You can't compare C++ with Ruby code. Rails code especially can very easily become a hairball where everything happens "somewhere else". You don't have static type checking or other compile time hints to figure out what is going on. You can't see which functions are called from which call sites. Refactoring tools? Forget it. Ruby is a very compact and flexible language, but if you're not disciplined you'll pay the price for it.

A 60kloc C++ project is small and easily manageable, a 60kloc Ruby hairball can drive a person insane.


Rails code. You get a lot of bang for your buck. That, and the "greater" metric for the project complexity is the stupidly high number of controllers and models.


LOC differ not only by the language or technical difficulty of the task, but by business requirement difficulty as well.

If, for example, they're in banking and finance, and those LOC deal with fine details of tax code... Oh boy.


As a former C++ dev, you can't compare Ruby LOC to C++ LOC. Just 4x the Ruby and then it becomes more fair. C++ is just verbose. #import <algorithm.h> doesn't fix everything blocks fix.


This is why LOC is a terrible metric. I was using it as a barometer as most ruby devs would see 60k LOC and go "Ooooookaaaay...."

If anything, we've gone too fast and not spent enough time going back and understanding what we really need to keep.


Can you expand on how technical debt caused a two-week outage?


If I had to guess, I'd say it's probably an issue with the build/deploy system. Perhaps someone deployed a broken build, then tried to revert/rollback, and realized that the previous version didn't build "cleanly" anymore.

This could happen if you have a lot of dependencies, switched compiler versions but left the binaries "in place" and deployed changes incrementally.


Yet another reason I swear by Heroku - you can rollback to the actual prior release, not rollback the code and try to rebuild.


Doesn't AWS do this too?


Which service? EC2 doesn't - at least not inherently, there might be a way to snapshot the machine's state; I'm not sure.


Depends on which of their 150 services you're talking about. You can't do it on CodeDeploy.


There are two problems with giving a full answer: firstly I'm still under NDA for some of that, and too much detail would breach that; secondly, my exact memory of it is limited.

In short, my predecessor had attempted a move to SOA without understanding dependencies, circuit breaking and failure modes. This would then cause scenarios where the entire front-end would fail to render on a single down-stream service taking a little longer than necessary.

When identifying how to stop that happening, I discovered a large number of comments tagged "TODO" with statements like "Refactor this when we have time" or "We need to find a way to do this better".

Further down on the downstream services there were rather esoteric SQL queries doing large joins that nobody had done a query plan on. It was hard to identify these because the ORM had been trusted to do magic, and it was happy to do so, but there was a point where it was not apparent _why_ these joins were happening, but when you found the code, there were more comments "This needs improving", "We should refactor this", etc.

We were able to get something back quite quickly with liberal application of indexes, and it took us a day or two to refactor the queries enough to mean response times came down, but the error rate was still > 20%, and it was random, so 1-in-5 page loads of the front end service would fail.

We refactored the code to circuit break and handle degraded services better, but that took a few days, and then we started working down to the back end service and figuring out the final steps.

It was a small team looking after legacy code that everybody knew was a bit messy.

A few weeks before this code was shuttered, I heard from a friend that some of our content did not render at all on certain Android devices. I identified the cause as a half-finished refactor (again, my predecessor), that had never been finished because he had been pushed to work on something else. This caused a dramatic decline within a key market segment that resulted in declining ad revenue, subscriptions and overall viability of the business.

Basically, when you start something, finish it. If you find yourself putting in comments like "We should refactor this" anywhere in your code base, and you're doing so because the business is pushing you to work on new features, you have a massive problem culturally that is going to cause a rise in technical debt that raises risk to revenue.

All technical debt ultimately will lead to problems that the business will see on balance sheets, but they will rarely successfully identify the cause as being technical debt because they can't see, understand or rationalise it. They think it's engineers being grumpy idealists.

People play too fast and loose with the concept of "MVP" for my tastes, and it's a problem I see over and over again. The risk of that is, long-term, it will cause business failure.


Solid lesson. Thank you for sharing.


I'm currently working on the reincarnation of a project that was killed by technical debt -- TWICE.

The original codebase was about 20 years old. It was control code for something best described as an industrial robot. Written for the last 20 years by greybeards who knew a lot about the manufacturing process, and were reasonably good at getting a product out the door.

But the whole thing was riddled with #ifdefs for this customer or that, or one batch of machines or another. All long forgotten, written by people who had since left, or been pensioned. It was in dire need of improvement and extension, but it would have been superhuman to inject new features into this rat's nest. Plus their electronics supplier was discontinuing the control electronics the system was designed for. The UI also looked like it had been designed by German engineers in the 1980s. Which was the case.

So they made the defensible decision to start from scratch. A team of engineers was to develop an brand new machine, with all new electronics and all new code. They got to work -- and had to scrap the new software about three years in. It was just utterly misdesigned, and riddled with bugs. It featured wonderful WTFs like the embedded realtime code depending on the Qt libraries.

I observed its instability myself: it would just spontaneously crash every five minutes, sometimes just while idling. Once the project lead was on holiday, the programmers revolted, went to the head of the company, and the project lead found himself without a project on his return. Whee.

Now we've started from scratch again, and have at least succeeded in making different mistakes this time around. Fingers crossed, this might end up working.


Huh.

Should I ever inherit an #ifdef mess again, I intend to replace #ifdefs with Strategy patterns.

#1 figure out all the known defs in actual use

#2 rerun the preprocessor with each variant (combo)

#3 capture the output(s)

#4 aggressively apply the Strategy pattern, refactor code

Last time, I removed dead code piecemeal manually. It sucked.


Sounds like the software managers were not on top of things, otherwise how would they have allowed this type of design to be implemented?

Not to say I haven't seen this effect myself many times...


Management, you say?

Management ordered the creation of new software. Shouldn't that be enough?

The project lead was responsible for this design, and above him there was nobody with any expertise in the matter.

From what I've heard he's an extremely good C++ programmer. He's just a terrible architect.


Knight Capital lost $465 million in 45 minutes caused (at least in part) by technical debt and poor development practices.

Summary: http://pythonsweetness.tumblr.com/post/64740079543/how-to-lo...


I'd say it wasn't due to technical debt, more a start-up like development approach to a company that trades millions within seconds in full automation. It sounds like the deployment process wasn't that complicated for a company of that size, but it was deployed without a single check by a second person.

If you're trading automatically, you'll need a very, very solid deployment and audit process, even if you're just a small company. The reason banks are so slow in deploying software is because most of them lost a few millions at some point due to some bug.

Startups that think they can act faster than banks just haven't had that bug yet. That's also why I'm rather negative on the whole Fintech scene at the moment.


That's not a startup approach. That's an enterprise approach. Believe me, I've fought tons of resistance in automating deployment operations in the enterprise. There's a perception that automation is dangerous, and you need human checkpoints. In practice, I've worked on projects much, much larger than Knight Capital, where the deployment process was driven by a huge spreadsheet, and orchestrated by non-technical overseers telling techs what commands to run based on the spreadsheet in front of them. It's incredibly vulnerable to human error like "Oops, forgot to deploy to one of the eight servers in the cluster".

In the enterprise, this is called "mature" and is a sign of great sophistication.


Yeah, the amount of resistance towards automation at some larger companies is a total mindfuck.

My first ops job back in 2008 was at a large exchange's NOC where we shut down and clean the application environment every day. Every Friday, we would have to take a backup of the ~20 or so production databases - by hand, in an ancient CDE based UI with a . Right click -> menu -> submenu -> backup database. Very little room for error, and you weren't allowed to do it without somebody else watching you. Throughout the weekend, customers would then run tests against the production databases. Once testing was done, we'd restore the prod databases back to their original state to wipe out test data.

At one point, I asked my boss if it was alright if I automated it after showing him a POC and was rejected because, "We don't trust automation to do it accurately every single time." Mind boggling. In mild fairness, in the 15 or so years they were doing that, I don't think anyone did it wrong.. which is an enormous miracle in itself.

(That was a strange company. My boss was a JW who'd worked there for 30 years regularly tried to convert me and would spend four hours a day on spreadsheets for his church. We'd also manually kick off stock split processing from a ~10" CRT monitor from the early nineties.)


Call it "process debt", or "management debt" (i.e. the lack of investment in proper management and the culture that goes a long with it -- in favor of a "STFU and just add that feature now! I need it yesterday!" mentality). Either way, part of the same boat, basically.


Wow.

"The consequences of the failures were substantial. For the 212 incoming parent orders that were processed by the defective Power Peg code, SMARS sent millions of child orders, resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately 45 minutes. Knight inadvertently assumed an approximately $3.5 billion net long position in 80 stocks and an approximately $3.15 billion net short position in 74 stocks. Ultimately, Knight realized a $460 million loss on these positions. "

https://www.sec.gov/litigation/admin/2013/34-70694.pdf


To me, the single line that stood out was:

"The new RLP code also repurposed a flag".

I've never seen a flag repurposed without catastrophic effects.


^^^ That's the part of the disaster that I see as Technical Debt


To be fair, the only time you hear about someone repurposing a flag is when it has catastrophic effects.


> During the deployment of the new code, however, one of Knight’s technicians did not copy the new code to one of the eight SMARS computer servers.

Was the issue technical debt or a sloppy deployment?


Which was probably due to technical debt. I can't think of another reason you'd manually copy code to 8 servers...


It is surprising there wasn't a circuit breaker here.


Projects rarely die because of technical debt. Instead, it becomes ridiculously expensive and difficult to add new features. But the software itself can remain in use for decades, gradually decaying and rarely adapting to changes in the business environment. Eventually either the software gets thrown out and replaced with something new, or the company is no longer able to compete.

I've seen this play out probably close to a dozen times now, at different employers and consulting clients.


This is only true if developing new features isn't part of how the company succeeds. That's probably true for some tools that are used internally. If you can't modernize your payroll, that might cost you some money, but it's not make or break.

For a company that makes software as a product, or to directly support or create their main product, not being able to add new features is a really bad place to be.


Have a small hardware one....

About 1986 I was tasked with moving a small block (a few KB) of data very quickly from cabinet A to B, with the racks full of custom electronics - no PCs, all original stuff on a flight sim with 386 Intel processors all over the place. The racks had Multibus backplanes.

I suggested a 'TAXI' fast optical link (oooh - optical..too radical) or a pair of Intel 589 (Ethernet) cards for an off-the-shelf solution. Nope, too expensive. Engineering Management suggested a twisted pair ribbon cable between the two adjacent racks - um, OK..

Long story short - me and the senior design engineer decided to use the Intel 8257 DMA controller chip to grab the bus and blast the data between the RAM on two cards.

After a short period of fails, we found that the engineers who designed our 386 cards did not bi-directional buffer the DMA request line onto the backplane as they never expected any other card except the master CPU ones to initiate a DMA, so the CPU cards could not see the line being toggled from elsewhere.

Engineers would not accept a change request for 'reasons'

Intel 589 cards is it then!

All because someone chose to omit one tristate buffer.


I have seen a product getting killed by trying to resolve technical debt. The refactor took nine months and in the end didn't work better.

I am a big fan of constant refactoring on a small scale but I am very skeptical of large refactoring of a whole project. You may end up with something that's just different but not really better.


I've had the opposite happen every time the team I've been on decided to refactor a large portion (or even the entire code base). Every time, what was a source of constant bugs (i.e., X bugs per week, every week, never lessening), became tractable and moved to stable post the rewrite (X bugs first week, .7x bugs second week, etc, until finally we're encountering the odd bug only once every few months, if at all).

I'm not sure what the differentiator is. I'd be curious if others have ideas. I think part of it is that in both cases it was a small team, who caught the issues early enough that it hadn't gotten too bad yet, but late enough that the right direction to move in was clear.


I was talking about enterprise projects that had had years of development, always changing personnel and had to follow complex and changing business rules. These tend to be ugly and difficult to work with after a few years of development. In my view the only way to deal with these is to break them down into smaller components and then refactor. But that turns then into a political issue because the managers (and a lot of developers) don't see the need.


Yeah, I could see that. I've been in those environments too; I have no data points from that, because getting the okay to refactor was so hard it never happened while I was on the team (I'd been on projects that claimed to have refactored the code, but then it was mixed as to whether people claimed it was a success or a waste).


I always tell the younger guys not to try to get an explicit OK to refactor but just add 20% to all estimates and use that for continuous refactoring without asking. It's just a regular part of professional work like writing code, pull requests and testing. This also has the advantage that refactors are relatively small so you can rollback if it turns out that the idea for the refactor was wrong (yes, this happens:-) ).


Even single man-month long "refactors" are something I'm wary of. Smaller changes are easier to test, easier to review, easier to merge, easier to verify are actually improving the state of things and heading in the right direction, easier to pause when your priorities unexpectedly shift midway through cleanup without leaving a terrible mess...

I'm okay with the occasional week-long rewrite of a subsystem, but usually only after I've spent some time coming to grips with exactly why the old one is terrible and have a firm grip of exactly how the new one will be better.


Technical debt is not a thing that kills products. Shit-ass management kills products. Technical debt may or may not be a symptom of shit-ass management.


Agreed. I feel like technical debt is more of a locus of control issue among developers than a real business concern. The only thing we look at all day is code; therefore, if the project fails, it must be because of the code.


I've worked in the industry as a developer for 12 years and I can't remember any.

I do remember a competitor dying of not releasing their big refactored next version soon enough, and running out of cash.

Spolski tells it better than me:

https://www.joelonsoftware.com/2000/04/06/things-you-should-...


I'm a fan of continuous refactoring, making small improvements to code and environments constantly, rather than trying to do everything all at once. It might not be as satisfying, but it's less risky and a lot more realistic in most work environments.


The problem is when you need to change your "platform".

I worked on a 300k LOC business basic application at one point.

The big question everyone was asking is how do you move to something else? Everyone wanted something else, they started writing new services on top of the old system, they had some ideas on where to go, but it just didn't seem like a gradual rewrite was possible.

And to be honest, a Greenfield rewrite just wouldn't work work for something this size with the resources they had. So it stayed in business basic.


Isn't that the usecase for the Strangler app? https://www.martinfowler.com/bliki/StranglerApplication.html


Not when in 30+ countries with different modifications made in each country


The Firefox rewrite was a success by all measures. Spolsky is wrong on this one, at least with the claims of generality.


Except for one measure: Netscape died as a company. The huge rewrite contributed to killing it. If you don't ship a product (for like 4-6 years?) you're gonna die. Mozilla originally chose the name phoenix, (then firebird to avoid trademark problems, then finally firefox) was chosen because it was a phoenix rising from Netscape's ashes. Its major innovation: It was 'blazing fast' when compared to ie 5.5 / 6. Tabbed browsing was also pretty cool.

You can learn a lot of lessons from Netscape, but this isn't one of them. Servo is a great example of how a rewrite should / can work. Mozilla hasn't devoted 100% of resources to Servo, but instead is letting servo build all on its own, and someday unclearly defined in the future, the two could merge. (but might not!) It's a separate product, and nobody is pinning all their hopes and dreams on it.


I remember how long it took to release a stable version of Mozilla and Mozilla Phoenix. In the meantime, had to recompile newer releases all the time manually. There was no alternative browser on Linux or *NIX for that matter (OK, macOS still had MSIE).

The successor of Netscape Communicator was Mozilla (IIRC it was just called that, later renamed Mozilla SeaMonkey), and the successor of Netscape Navigator was Mozilla Phoenix (later renamed Mozilla Firebird and eventually Mozilla Firefox). Firefox and Thunderbird were once again separate clients.

Mozilla was still considered bloated, but Phoenix was far less bloated which is nice on lower RAM machines, and allowed the start of Web 2.0. It was also the return of doing one thing and doing it right: browsing the WWW. As Netscape Communicator (unlike its predecessor, Netscape Navigator) came with a Usenet client and e-mail client.

Later in development, addons became a thing, and you could add features which were previously part of Netscape Communicator such as calendar, HTML editor, etc. You can also add such features with addons to Mozilla Thunderbird.

Then Google Chrome happened, and people switched to that, but I'm not entirely sure why.


Firefox went through a long period of being slow to start and memory hungry.


I switched for the same reason i switched to phoenix: it was just faster. Now I use it for the dev tools. It's still faster imo.


Also, Servo is just the engine. And modern web render engines are themselves highly modular. I think the Gecko engine powering Firefox have had its javascript interpreter replaced 2-3 times.

So when it comes to it, the most likely outcome will be a kind of "my grandfather's axe" scenario where over time parts of Servo replace Gecko within Firefox until Servo has completely replaced Gecko.


Sorry, but I emphatically disagree. Servo entailed creating a new programming language, building a community around that language and using the project as playground for feature validation. This might work for an non-commercial entity but it is not a good example of a rewrite.


It's more about the integration side than the particulars of it. It's a huge project with different goals than ff, so it should be (and is being) treated as such.

It's a huge, audacious, hairy project, which might happen if a startup said "OK let's rewrite everything from scratch!"


Then again Firefox was itself a strip down of a rewritten Netscape suite. Stripped down in that the suite included not just a browser but also a email client, IRC client and a HTML editor, and the UI was done using JS and XUL markup.

What Firefox devs did was to take the browser part, make it stand alone, and replace much of the XUL UI with native widgets (GTK on _nix).


Same thing, back in the first dotcom boom. I think the company would have died anyway, but they burned whatever runway they had by undertaking a complete rewrite of a working ASP/SQL web app (full stack Microsoft). The new version was to run on Linux and use a variety of custom code, sourceforge and/or freshmeat projects, and several different data storage tiers. An explosion of architectural complexity. As far as I could tell the main reason was that the CTO and his top architects were all Unix zealots and hated Microsoft.


Spolsky should be required reading in software development classes. I wish he had time to keep it up.


I can't remember which podcast episode it was, but I do remember him mentioning a professor in South Korea had once told him he was using his blog posts for his CS classes.


Dear deity, why do that read like the development history of everything Linux bar the kernel itself?!


I'm currently watching this happen to a product from the outside. The company I work for has an ERP system from the late 80s, written in COBOL for the HP 3000 series computers. At the time, it was probably an excellent system; however, over the years it's had modernizations tacked on with no regard to actually improving the core system. Some examples:

* In the early 2000s, they added support for Windows NT to the product. Unfortunately, they did this with an MPE compatibility layer that means the entire thing still thinks it's running on an HP 3000, so controlling it programatically means writing MPE job streams.

* It was originally written to store data in COBOL records. When they added support for SQL databases, they apparently just copy-pasted the schema verbatim from the COBOL copybook format. This means the database has no foreign keys, FLAGS columns all over the place (including tables where you have to JOIN ON SUBSTRING), and, most egregiously, a table with ITEMNO_001, ITEMNO_002, ITEMNO_003, PRICE_001, PRICE_002, PRICE_003 and so on, which has to be queried three times and UNIONed to get the data out.

* Printing packing lists requires not only a specific model of printer, but also an extra several-hundred-dollar chip to be installed in that printer. I'm told that this chip's sole function is to enable barcode printing.

I have no insight into what goes on inside the company that makes this thing, but it certainly looks to me like they have a severe case of technical debt. Any bug fixes generally take 4-6 weeks in the best case scenario, and frequently either don't fix the bug or introduce new ones instead. Their only customers are the ones that have been using the system for so long that they're stuck with the system, and can't switch--in fact, many of them are still running HP 3000 systems, which HP has been trying to end-of-life since at least 2006.

The end result of this is that the product is dying a slow, agonizing death of attrition. I think the only reason it still exists at all is because the company that makes it is stuck with support contracts that haven't expired yet.


Excellent anecdote, thanks for sharing.


I am working on a project now that bears some study.

I built this extranet app for a Fortune-class / NYSE company in 2001. They were a Lotus Domino shop so for that and various other reasons the extranet was deployed in Domino. The initial rollout was considered quite successful, but it was definitely "v1" code, and I'm being really generous with the code quality. Plus, Domino.

The application was considered a stopgap until the shop had become fully Microsoft-centric, at which point it was expected to be migrated to .NET. That was expected to be in ~5 years.

The result was that no investment was made into the app for over fifteen years. Every now an then an enhancement would be needed, and a contractor would be called up to bolt on a feature in shockingly slipshod manner (this app is much too complex for the average Domino dev). But no technical debt was ever cleaned up, because "meh, we're going to replace that app by 2007."

2007 was 10 years ago. In the meantime two projects to replace the app were spun up and killed. The app is finally being retired this year. I was called up at the 11th hour to jump back in (15 years later) to help support the thing through the conversion, as the one existing Domino dev they had on staff finally (wisely) jumped ship.

I cannot even begin to describe the state of this app.... it's a case study in "how to not manage IT."

---

Another recent client was a content-creation shop (think glossy magazines). Their outgoing sr dev had deployed a CMS that nobody had (or has) ever heard of. This CMS was originally developed during the glory days of XML. Believe it or not, the app worked by loading all of the CMS content into a single in-memory XML document. This was probably OK for a brochure site, but this was a site with hundreds of thousands of pages of content. As a result the application required a server with 64GB of RAM just to launch. Also - launching the app took about ten minutes after the server OS was loaded. And there was no server farm, just the one server. If the app was ever stopped, it would stay down for at minimum 10 minutes.

I came in to fill in temporarily and to try to find someone to staff the position permanently. Even with a competitive salary, nobody qualified wanted the job.

Meanwhile, the same company also had a set of blogs that they managed in WordPress....


That's insane. Couldn't someone write a program to parse the XML into appropriate data structures for use by a third party web server?


Sure, any amount of work could have been done to replatform the app.

They were already using WordPress for blogging. A custom WordPress implementation would have easily solved their CMS problems and devs are trivial to find.

The point was that the thing had just been rolled out the prior year. There was no budget or appetite for throwing the thing away. It did work. So there it stands, aside some dozen WordPress sites...


I owned the project and decided to kill it, I guess you can call it due to technical debt, but probably more accurately it was due to incompetence, both by the developers and by myself as their "manager".

my product was http://www.teamkpi.com/

I hired 3 mid-level PHP and jsp developers in Thailand and had them make the website + reporting page.

total nightmare. Don't hire developers and assume that they will rise to the occasion (learn new tricks). I gave them as much time as they needed to research and make sound engineering decisions, I ended up with a spaghetti nightmare Frankenstein mix of server side scripts mixed with client side script mixed with server side that generates client side script.

In Thailand at least, you always need a manager to force architecture and design decisions, and force devs to refactor poorly thought out solutions.

I was naive and thought that I could have a team of 3 figure out the web part while I write the desktop client and provide PM-level guidance.


I had a chance to ship the first Tower Defense game for iOS. The OS X game I was porting had some crippling performance problems that were incredibly hard to track down.

The problem was two-fold:

1. The relevant tools (Unity3D) were extremely immature and the problem was quite diffuse. No profiler, poor quality of generated code, tiny caches, etc.

2. A problem in string-handling code that was quite diffuse throughout the game. As near as I can tell, it was blowing out the tiny CPU cache hundreds of times per frame.

On desktop, this code was a complete non-issue. On the puny little ARM on the iPhone? It was the difference between having dozens of towers and 50 enemies in play vs half a dozen towers and less than a dozen enemies in play. The impact on game dynamics, and need to re-balance everything by itself would add weeks to the shipping schedule.

There were plenty of other things that needed to be scaled WAY back of course: Switching from 3D to 2D to get vertex count and draw call count down. Completely rebuilding the entire UI. Revamping the pathfinding and suffix caching to not play havoc with the CPU cache. Moving from a 24x24 grid to a 12x12 grid. All of that combined helped a LOT, but not nearly enough.

The string manipulation was for a hierarchical property system that let me parameterize all sorts of attributes for enemies/spells/towers/projectiles in a set of text files. Ultimately, I had over-engineered on the assumption that I would be tweaking many more things -- with much greater frequency -- than I wound up actually tweaking.

Had I ripped most of it out and just had local properties on each prefab that I assigned manually, I might've hit that market opportunity. Finding that that was the cause was a multi-month project because of how interwoven it was with everything else. Hell, it would've been fine, had a not over-generalized it into a shared component on each prefab that the other components inquired with to get property values. But I did. And it took me vastly too long to identify it was the major problem it was.

Opportunity missed, and that was the final nail in the coffin for my fledgling game studio.


Wow. I'm actually seriously considering starting something on a smaller scale (tower defense for mobile as well) in the next few days as my next long-term(ish) side-project. I've done native iOS and Android for a few years now, but nothing with Unity as of yet. I know C# well enough. Other than what you've mentioned, do you have any immediate tips before I fall face first?


Technical debt won't necessarily kill a project, but over time it will reduce the speed at which you iterate and ship software. That loss of speed does kill companies, young and old.


It, and it's effects, will definitely kill projects.


I've seen projects die not because of the technical debt existing, but because of lack of addressing it. The developers got sick of working around the debt to do trivial things, they left. Customer support personnel left because of the constant manual "tweaking" they had to do due to the technical debt. Management focused on tacking on features instead of paying off any technical debt which was having rippling effects throughout the company.

Technical debt destroyed the team.


I worked for a startup that had basically the right idea but proper execution took so long we ran out of runway. The first iteration of the product was built in an extremely haphazard, cowboy way - and took months, if not years, to refactor into something stable, usable and crash-proof. By the time the product was operational, the company was bankrupt. We simply hemorrhaged money until we bled to death.

As someone else pointed out - technological debt is not a cause per se; it's an indication of some deeper problem - usually of human, not technological, nature.


That may be a 'startup problem'. Do it cheap and coyboy because, runway. Assuming the money will come along later to do it all again. But that happens (lots of money later) only if you get bought out. Not if you have to make it on your own.

So any business plan that includes the steps "A miracle occurs" and then "We get bought out" is probably going to suffer that fate?


Even if the miracle occurs and you get bought out and get shittons of money thrown at you, you've already built a company with a "fake it till you make it" culture. Even if you get to hire great new engineers to build your product the right way, your existing team is a ragtag bunch of amateurs who don't know how to build things properly and block any attempts to improve the status quo. I've been there. You can't build castles on the foundation of mud. You'd have to throw it all out and start again - and that's a recipe for disaster.


I wouldn't say, killed, but severely burdened? Limited by technical debt? Sure.

One application was a web application built in C++ in the 90's. It didn't have the STL, it implemented everything from XML parsing to PDF rendering from scratch. It stored all data in XML files on the file system. It was a single-threaded CGI application. And it was the core product of the small business that created it.

There was no series-B/C/D/E that was going to appear so we could hire more developers and re-write everything or develop a new, superior product, etc. This is where I learned how to maintain and extend legacy software. I spent hours pouring over Michael Feathers' book. We did manage to extend and breath new life into the system. We wrapped the old code in Python, wrote a tonne of integration and unit tests on every change, wrote some code to sync data to a database alongside the XML file storage scheme it used. We even got to a place where we started replacing code paths from the Python API with functionally-equivalent (as far as our test suite was concerned) code written in nice, clean Python (and gained some features along the way thanks to Python's nice libraries!).

We kept the lights on without having to spend too much time hacking on undocumented, untested C++ code and without trying to just re-write everything. It was much more difficult to make progress than a typical greenfield project in a dynamic language but that would've cost more upfront without a clear payoff... so we did what we had to do.

Another company? Well they decided to use a document-based data storage system as the source of truth in a hot-new micro services architecture that was going to save everything... only there was no schema validation and their use cases were killing performance in some scenarios. Random breakages cause by changes at a distance. It hasn't killed their business but it has limited their options.


Highly Recommend Michael Feathers' book

"Working Effectively with Legacy Code"


I've seen a whole company killed by technical debt. Because the software was written so badly far more developers had to be hired to firefight than the company could afford. The technical support team was similarly bloated to deal with the endless problems the customers had. Sales were low due to the bad reputation.

A rewrite was started, but never got anywhere. The company folded under the weight of its massive salary costs.


Our product, a large-scale enterprise software, is slowly getting killed. It's old and it's rather unusable (by the users). Plus, for "backward compatibility", it supports dozens of strange configurations. It's dragged down by so much technical debt (functions longer than 3000 lines with 60 parameters!) that every small changes requires so much time.

We're slowly killing (i.e. no big new developments, but only maintenance for existing customers) and abandoning it. And luckily we're not rewriting it. :-)


Functions with 60 parameters. Jesus Christ....


ITA software is a good example of a company that succeeded due to the collective technical debt across their competitors.

Though they only really succeeded on the shopping part. They didn't ever get to a credible booking engine that anyone would buy. Which may point to something other than tech debt being the biggest barrier to modernizing an airline reservation system.


Former ITA engineer here. Our airfare search product QPX was untouchable at the time due to design: it got results that were far better than those of the competitors because ITA was modeling the problem better (search through a graph). While competitor tech debt didn't hurt us, I don't think it was the pivotal factor in ITA's success. As you point out, our hopes of replacing a major carrier's reservation never came to fruition, unfortunately. A res system is a complex beast.


I do agree that QPX was untouchable, but I still think tech debt in competitors was a major factor. There were plenty of smart people at your competitors...I'm sure graph search occured to them. I suspect efforts to green field that were squashed...nobody wanted to throw out the hairball they had because of the existing investment. Thus, they tried to "fix" what they already had...with obviously bad results.

Edit: And, worth mentioning that your competitors wouldn't have had to be better than, or even as good as QPX. "Good enough" would have squashed several big sales, since shopping was typically bundled in with what their customers already paid.


I'm not quite sure that's correct. ITA was ultimately bought by Google, and their booking engine was used in Expedia and Travelocity, IIRC.


Their shopping/pricing engine was used in those two companies as well as others. It was wildy successful, but...you shop on ITA, and book elsewhere.

A booking engine (CRS/GDS) would be used by either airlines or a reservations system (Amadeus, Sabre, etc). That's the piece they didn't deliver on.

Edit: Reference to the announcement of abandoning the booking space: https://skift.com/2013/05/15/google-and-ita-software-abandon...

"This is indeed a bitter pill for ITA Software’s founders to swallow as they put years and millions of dollars into their dream to transform the nuts and bolts of the way airline reservations systems...are handled"


I've worked somewhere that died for a combination of reasons, one of which was an effect of technical debt and inappropriate outsourcing.

I don't think technical debt alone will kill you. But it may render you unable to cope with another problem, which will then kill you.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: