The examples don't seem to illustrate software rot at all. Rather, I'd say they illustrate software ossification. Over time, each example became more rigid and hard to change because things that should have been modularized and isolated in one place instead became idioms pervading the entire codebase. (The "should have" is arguable, I know. There's a broader point about idioms vs. modules that deserves its own blog post.) Rot would be when a program works less and less over time because its external dependencies vanish or change and the program itself stops adapting.
Software that is merely rotten can often be saved, with varying levels of effort and skill. Software that is ossified often needs to be replaced.
The examples don't seem to illustrate software rot at all. Rather, I'd say they illustrate software ossification.
I was about to come here to say that there's really no such thing as software rot. If all of the environmental variables remain the same, software doesn't really rot. It's actually the environmental variables which change.
Over time, each example became more rigid and hard to change because things that should have been modularized and isolated in one place instead became idioms pervading the entire codebase.
I can deal with idioms. If 1) the idioms are very consistent and 2) you have a reliable syntax driven code rewrite tool, you can drive idioms into a library and/or otherwise translate them. I've been paid to do both.
Rot would be when a program works less and less over time because its external dependencies vanish or change and the program itself stops adapting.
But what stops a program from adapting? It becomes "ossified." Because the code organization isn't up to the challenge, it becomes too hard to add and change features without introducing bugs.
There is no software "rot." It's all different kinds of ossification.
> what stops a program from adapting? It becomes "ossified." Because the code organization isn't up to the challenge
Usually it has nothing whatsoever to do with the code organization. The most common cause is lack of active developers to make the adaptation - company dies, developers die or change jobs or lose interest. That's why "rot" is an apt metaphor, like a house slowly falling apart due to lack of maintenance.
> If all of the environmental variables remain the same
> I can deal with idioms. If 1) the idioms are very consistent and 2) you have a reliable syntax driven code rewrite tool,
Would you like a pony too? The environmental variables don't remain the same. The tools aren't smart enough to catch every subtle variation of an idiom repeated hundreds of times across a large codebase. Yes, you can automate away the easy cases, but that's not even half the battle. That's why people hire maintenance programmers like you.
Usually it has nothing whatsoever to do with the code organization. The most common cause is lack of active developers to make the adaptation
It varies from shop to shop, but what I've seen is that code organization has a lot to do with it. Basically, if your code organization/factoring is really bad, then it's more likely that porting becomes untenable. Ironically, these are the projects that can have the longest lifespans, sometimes developing a history of multiple failed attempts to "port" them. (Really, re-implement while incurring huge opportunity costs.)
> If all of the environmental variables remain the same
Would you like a pony too?
You're seriously missing the point. My point is that the environmental variables never remain the same. (I was making two points, and I think you missed where I switched.)
> 2) you have a reliable syntax driven code rewrite tool
Would you like a pony too?
I repeat. I've been paid good money to use such a tool. I've used such a tool to enable the 1-step-removed porting of a project to a different language variant. Instead of doing the port myself, I was refining a tool which automated the port and enabled the maintainers to easily port all the changed code they were checking in. This lets you do a port without incurring the opportunity cost. I know of a consulting company that once made most of their income with this.
Yes, you can automate away the easy cases, but that's not even half the battle.
You can't simply hit a switch with this, of course. Someone has to put real thought into how the translations are made, especially if you want idiomatic, readable code going out the other end. The good thing about consistent idioms, is that they can be leveraged to multiply that work. The manual part can be re-leveraged through automation.
That's why people hire maintenance programmers like you.
That could be read in a way that sounds less than friendly and maybe a bit elitist.
> what I've seen is that code organization has a lot to do with it
I suggest that your sample is non-representative. Companies don't hire specialists to refactor code that they're abandoning. Dead companies don't hire anyone at all. You don't work on rotten code because nobody does, but that doesn't mean it's not out there.
> That could be read in a way that sounds less than friendly and maybe a bit elitist.
It could, but that would be on the reader. I didn't say anything to denigrate maintenance programmers.
Companies don't hire specialists to refactor code that they're abandoning.
Companies do hire specialists to port code that they're not abandoning.
Dead companies don't hire anyone at all.
"Zombie" companies can still have big revenues and they can and do pay consultants to do things like port old applications. Again, there have been entire companies based on such activity. I've made money from such companies.
You don't work on rotten code because nobody does
I was a consultant for a language vendor for 5 years, then continued working in that niche for another 5. My sample size in this context is in the many dozens at least. There's a lot of code I'd call "rotten" out there that people still work on.
I didn't say anything to denigrate maintenance programmers.
Good. I'm sure I've met a number of them who are probably better coders then either of us when they're half asleep.
Is it? My reading of that definition leads to exactly opposite conclusion.
Rot is when something breaks down while environment remains the same. What happens with software is the opposite: it forever remains in pristine condition; it's the environment that changes.
(Compare also with description of evolution. A species does not "rot". It dies off when its niche changes so much its no longer adapted to survive in it.)
While I agree this article isn't about rot, the definition of rot I'm used to is more about how knowledge of a product becomes lost over time without any active developers, so it becomes harder and harder to switch back to active development.
The one I'm used to is related to that one, but only barely: the software becoming less and less stable and/or harder to change as features are added. The cause can be loss of knowledge or ever-increasing technical debt.
The extreme case is programs for which the entire platform has been abandoned - e.g. DOS (including TSRs), old Windows, old MacOS (including desk accessories). There's a whole "abandonware" community devoted to preserving games that have rotted this way, usually relying on emulation to keep them from being forgotten entirely. There are various file compression/encryption/annotation utilities that have become unusable because they relied on deprecated Windows APIs. Various sound/video hacks on Linux have died as those subsystems remain in constant flux. In my own work as a Gluster maintainer, several pieces such as Java/Python or monitoring APIs have rotted away as those APIs changed. It usually takes a long time for a large program to rot away completely, but individual pieces can become non-functional long before then.
I think the point OP is making is that there is plenty of bad software out there that (as a result of a combination of incompetence, scope creep, bad decisions, and short timelines) just isn't resilient, testable, and single use anymore. As long as the code is isolated enough, it's fairly easy from a technical standpoint to replace/refactor.
Ossified code is code that isn't isolated, but tightly bound to several other components. This means that you can't swap it out without making the exact same architectural mistakes. This means you basically need to start over from scratch, which is a much harder and broader problem.
there is an alternate path for your ossified code, but its really costly. idk if it has a cute name.
if there aren't system/integration tests - write system tests. you have to be pretty thorough.
have a long and involved discussion about what the new thing is going to be. go through all the frustrations with the current code base. convince yourself at the end that its going to be worth it, because the cost is going to be high. ideally the new version will open up new capabilities that just weren't possible before.
find a cut in the dependency graph. you're going to rewrite the code on one side and leave the other side untouched. ideally that cut will be small-ish and contain some particularly broken stuff that you'd like to get rid of as soon as possible. unfortunately there may not be a small cut that makes sense :-(
build a shim between the old model and the new model across the cut. this shim is only going to last as long as the old code on the other side. this shim might be involved and a total waste of effort. suck it up or look for a different cut.
replace the code on the new side and test against your suite. if its not really that exhaustive, expect a rash of bug reports. run through some kind of soft deployment. if its a request/response kind of thing consider forking your production traffic and comparing the results against the old code base.
repeat until golden brown.
this overall process also can fail. often because of poor test coverage, and more likely because you haven't adequately communicated the scope of the undertaking and its absolute necessity to the rest of the engineering organization and the business as a whole.
however, if you make it through, you've avoided the giant speed bump that comes at the end of the rewrite, and you've been able to fold in new feature and bugfix work along the way.
Sure, but it's not the rendering engine they're talking about, it's just that Chrome was architected from the start to isolate tabs/windows to a process (containing its own renderer) each.
That’s the context in that section, sure, but it’s not what I take away from that statement.
Point being, even Chrome would have needed to make WebKit compatible with that model. I don’t know if there was a lot of work to do that or not. But “from scratch” implies that there was no existing code to refactor, which would appear be a false statement.
I think what's discussed in OP is not software rot, but software clutter. Software rot would be something inherent that happens to software naturally over time, without any external change. I can't think of a good example of this, so I think software rot is in general a misnomer.
In my opinion, software "rot" is most likely where software has been passed on through a number of lead contributors over time. The original reasons for a particular set of design decisions are forgotten.
Someone else comes along and doesn't quite understand a particular nuance, tries to add factionality, but it doesn't quite work, so writes a bit of code that ends up reimplementing an aspect of the original, so you have two lots of code doing the same thoing in different ways, perhaps with their own set of variables.
Then someone else comes along and is faced with real confusion and before you know it you find yourself with a system which, if you update data one way, the changes are sometimes picked up by another part of the system, sometimes not and... its rotten.
I'm not sure software clutter describes this either but I understand what you are trying to say. I think of software rot more as code that hasn't been touched by anyone in years and now it no longer works due to it's environment changing. Things like API updates, general system updates, security updates or other factors causing it to malfunction eventually.
I had a thumbnailing micro service I made years ago that worked happily until 3 months ago. A combination of PIL updates and various Ubuntu security updates made it no longer work properly in various ways from keeping the alpha layer on PNG files to cropping from the wrong position. I consider this software rot since I hadn't updated anything but the surrounding system.
"""So it seems that besides GHC, only ever HBC was used to compiler GHC.
HBC is a Haskell compiler where we find the sources of one random version only thanks to archive.org.
Parts of it are written in C, so I looked into this:
Compile HBC, use it to compile GHC-0.29,
and then step for step build every (major) version of GHC until today.
....
[M]ost of [HBC] is written in LML, and the LML compiler is written in LML."""
Well there is bit-rot. One manifestation of this is when nobody works on code for a while and it becomes unusable or less usable because the world has moved on. Maybe newer versions of the compiler won't compile it. Maybe some of the dependencies have changed. Just try to build even a small program written for Windows 95 or 98 today.
This is one reason "do one thing and do it well" is useful. The interface is kept simple and should be easy enough to update to a new environment. At the same time, that means for bigger systems a lot of small things will need updating over time, but at least the changes should be straight forward.
My experience is not really. The most common thing you will need to fix is assumptions about pointer sizes. Including casts between integer and pointer, which compilers used to be more permissive about in the days before intptr_t. (A few years ago I was compiling code from the 90s that did these casts implicitly, which had me wondering why the compiler ever let them get away with it...)
C++ in particular has also changed a lot since the 90s including how permissive the compilers used to be, even for stuff covered by c++98.
The code might very well work the same, but the dependencies were a nightmare: They were scattered across 14 different CDs and web sites. One provider of often used components had gone out of business, luckily those components were released on sourceforge.
Since there wasn't pom.xml (forgive me for mentioning a Java-thing here, but this was long before package.json or ProjectName.csconf was invented) in Delphi projects one also had to go through a bit of research, trial and error to figure out which versions was needed and the exact ones that were compatible with each other.
Luckily we had a seasoned, well paid Delphi expert at hand and we got it working in just under 3 days.
If anything I'd say this is a great example of (one form of) bit rot.
Now that I think of it it might very well have been the same week that I left behind my last bit of prejudice against Java and started openly preferring it.
Yep. We have huge delphi app. It's composed of few dozen DLLs, each dll is a different project with its own forms. Each DLL uses its own components, different DLLs use different versions of the same components, they are incompatible. Basically to build this nightmare you would need around 8 virtual machines to install those components. At least they have all those components downloaded offline. This app is in desperate need of huge refactoring but nobody got money for that.
I think we can test this retroactively. sed has been around a long time. So we can go grab a copy from, say, Version 7 Unix and try to build it on a modern system. I grabbed v7.tar.gz from https://github.com/v7unix/v7unix, extracted it, and tried to build:
$ make
cc -n -O -c -o sed0.o sed0.c
In file included from sed0.c:2:
sed.h:116: warning: declaration does not declare anything
sed.h:128: warning: declaration does not declare anything
sed0.c: In function ‘main’:
sed0.c:32: error: ‘union reptr’ has no member named ‘ad1’
sed0.c:48: warning: incompatible implicit declaration of built-in function ‘exit’
<snip many more lines>
The problem here is that although sed was written in "C" and we have a "C compiler" on our system, the definition of what exactly constitutes "C" has changed. The modern compiler no longer accepts the same language as the V7 compiler.
We can fix the source to deal with this and produce a working binary. But then, what would we do with it? We can't really install it as /bin/sed. The world now expects /bin/sed to support switches and syntax that this sed does not. Trying to use this as /bin/sed would cause lots of programs to fail. The definition of what exactly constitutes a working "sed" program has also changed.
So it seems to me like sed already has rotted in some sense. Looking forward, the sed from a current system is likely to be similarly rotted when you try to use it on a machine 40 years from now.
> The problem here is that although sed was written in "C" and we have a "C compiler" on our system, the definition of what exactly constitutes "C" has changed. The modern compiler no longer accepts the same language as the V7 compiler.
Standards provide solid points of reference in this mess: 1989 ANSI/ISO C is not going to change. Specific compilers come and go, but the language defined by those documents (one ANSI, one ISO) is unchanging and, more importantly, well-understood such that C compiler implementers both feel the need to implement it correctly and understand how.
> Looking forward, the sed from a current system is likely to be similarly rotted when you try to use it on a machine 40 years from now.
I think it's likely the language won't rot the same way, due to the standardization I mentioned, but the OS interfaces beyond POSIX or similar might rot.
sed does simple operations on text. But it certainly had to change when the definition of “text” changed from US-ASCII to “Whatever the LOCALE settings are, but most often ISO-8859-1”, and then later to “Most often UTF-8”.
Think about freshness. When code was recently written, it's fresh in memory. People can remember what the code was written for, and why it was written.
As time passes, code gets stale. And finally, when people can't remember what the code does, at that point you could consider it rotten. At some point the tooling becomes unavailable and you wont be able to run it!
Ways to prevent code rot:
* Clean (refactor, remove tech debt and legacy code) often.
>Counterpoint: rewriting your code from scratch is the worst mistake you can make.
Joel Spolsky's essay is more about rewrites of "ugly" code. E.g. Fix the disorganized code by gradually refactoring into cleaner modules over time instead of a total blank slate rewrite.
In contrast, this Geoff Greer essay is about paradigm-shifting software architectural changes which by their nature, are very difficult to retrofit into an old existing code base. For these, it's often easier to start coding with a "blank slate" rewrite.
As examples Geoff's category of rewrites (new architecture paradigm) vs Joel's category (unhygenic code cleanup), we can look at Joel's ex-employer Microsoft:
- SQL Server database diverged from original Sybase code and rewrote the engine. One of the architecture changes was switching from page locks to row-level locks
- C# compiler completely rewritten from C++ code base to C# codebase. One architecture change was "compiler exposed as a libary service" instead of being the closed "black box" that was assumed by the C++ code.
- operating system: MS Windows NT as a blank slate operating system instead of gradually extending old 16-bit DOS code or 16-bit Windows 1.0 code. One motivating architecture change was the new 32-bit protected mode on the newer Intel 286 & 386 chips. Another factor was the switch from "cooperative multitasking" to "preemptive multitasking".
And it's worth noting that Spolsky left MS & wrote that essay well before MS had its "come to Jesus" moment when it came to security. A fanatical emphasis on backwards compatibility & avoiding rewrites was not helpful for avoiding that crisis.
It's not that simple, as most things in reality never are. For example, "Spoiler alert: They rewrote Basecamp from scratch and it turned out great. It took around a year, and new signups doubled immediately following the release of Basecamp 2."
The key to making this work was to keep the old product around, so the rewrite didn't have to duplicate everything the old one did. That both cuts down the amount of effort and insulates you from the undocumented behavior that Joel writes about. If something about the new product doesn't meet your expectations, just keep using the old one.
There's no black and white here. Yes, rewriting your software from scratch is the worst mistake you can make. Most of the time. Except for those cases where it is actually the right decision. Good luck figuring out up front which one is the situation you are in.
I had the misfortune of living through a rewrite like that. Even though we were able to adapt large portions of the existing code it still took twice as long as expected. The result was so buggy that we took the unprecedented step of mailing out replacement CDs unsolicited to all registered product owners once the bugs were fixed.
I've had my share of frustration with the GIL over the years, but I don't think it fits into the list.
Python's rising years have been the same years that we discovered that (a) generalized parallelism is really hard, and (b) many many many applications of muticore are "embarrassingly parallel" and benefit little from shared resources. Stasis around the GIL is a usecase-driven decision to make embarrassing parallelism easy enough (via multiprocessing) and delegate more involved parallelism to linkable libraries or separate services. It helps the language focus on what it's good at.
I think for most people, async io / coroutines gets them most of the benefit that they'd ever get out of multi-threading.
With async io, you can have a lot going on at once, interleaving nicely while different tasks wait for io responses.
Beyond that, normally you're just as well off to kick off a second process.
You have to get pretty fancy with algorithms to make multi-threaded computation a net-benefit. Most of the time you don't need that. If you do, you're probably not reaching for python.
I agree. Having worked on some long running systems ( > 20 years). Often their success comes from simplicity - - almost any choice different from the GIL, at the time of Python's beginning, would have massively hurt at least one of performance or usability.
I always wondered why there was specifically so much chatter about Python's GIL. If you consider PHP, Perl and Ruby to be Python's main "competitors", they have pretty much the same limitations (not much real world single process concurrency being used) but much less noise around it.
I don't agree with that quote being used to define software rot, and I don't think the examples illustrate rot either. I'm just fundamentally opposed to the terminology of the rotting metaphor when it's applied to projects that receive active upkeep, even if the project drifts further from its original scope and vision. For that, we need a name for that that alludes to scope creep and refactoring, but one that isn't suggestive of decomposition at the mercy of the environment of things immobile and abandoned.
I've always understood software rot to refer only to cases where the dependencies, including libraries and platform APIs, are changed so that an already-delivered copy of the software can no longer function, and the maintainers have since abandoned it, or can't do a quick fix to get it working again.
A recent example is the removal of support for NPAPI plugins in Firefox [1] in 2017. Firefox is in the role of the 'platform API' here, and this change of theirs broke any extension that didn't update. And, the replacement API differs in features, so the effort of updating an extension approaches that of a rewrite.
Another example is the game 'Star Wars Episode I: Racer', the podracing game from 1999. Newer versions of Windows and DirectX have changed things so that the game's original executable doesn't run [2].
Even open source isn't a remedy. Random projects one can find on a source repo, last updated x years ago, are code that are susceptible to this. Sometimes they're shipped without dependency management, pinned to old versions that themselves no longer work, or track the latest when they should be pinned instead. If more work is required to get it running than just the instructions provided by the author (assuming those were sufficient at some point), then it has suffered software rot.
Firefox needed so long for a multi-process architecture because Mozilla focused on other things, like developing mobile operating systems. The main browser was kept in hibernation mode.
Also, as the poster said, Firefox had a rich Add-on API that they wanted to keep support for but which they eventually decided to drop. The new API isn't close to the versatility of the old API, sadly some add ons work worse now.
A better example would be I think iOS parts of which can't adopt swift and have to stay implemented in ObjectiveC because of binary compatibility concerns.
Imho, terms like "software rot" and "bit rot" are simply disingenuous attempts to blame the software for a malfunction that is caused by changes in the environment of the software.
Or caused by programmer neglect. If your software compiles today and then fails to compile or run in the future, you could be doing something wrong. Unless your code got hit by cosmic rays and a bunch of bits flipped, nothing rotted.
I have code I wrote 20 years ago that still compiles and runs. I could check again but I don’t think any of it rotted.
The languages and systems change. Some time ago, I had a job freshening up a large, ~30 years old Common Lisp project and it worked mostly fine, with minimum changes attributable partly to evolution of Linux, and partly to some pieces of code being older than the ANSI Common Lisp standard. I don't believe it would be as easy with Python or JS.
That said, I don't buy "software rot" as a proper name. Software does not "rot". The concept of rot implies an internal change that makes something fail whether or not the environment around it has changed. This does not happen to digitizes data. Programs only stop keeping up with the Red Queen's race of computing environments.
> Adapting mature software to new circumstances tends to take more time and effort than writing new software from scratch. Software people don’t like to admit this, but the evidence is clear.
Software people love rewriting things. In most cases where I have to maintain an old project I would much rather rewrite it, but it's usually not in the budget.
In my experience the re-write always takes much longer than expected. Sure you might be able to redo the core pieces to handle the new problem in X weeks, but then it's going to take another 4*X weeks to handle all the edge cases and make sure you're not breaking anything your clients/users have grown to depend on.
> The first 90% of the code takes the first 90% of the time. The remaining 10% takes the other 90% of the time.
And
> With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody
For sure. That's what I meant by "it's not in the budget". I would probably choose to rewrite most projects I work on, but I know it's completely unreasonable for most non-trivial projects. I just meant I naturally gravitate towards wanting to rewrite because clean slates are satisfying.
The example of Chrome being written in less time than Firefox took to add multi-process support is interesting. Yes, changing a deeply held assumption in a mature code-base is a nightmare. But, Firefox continued to make releases while they were gradually making multi-proccess support happen. Starting again would have been suicide for them.
And they already did start over once, and it almost was suicide. Instead of gradually fixing Netscape Navigator 4.7, we waited years and years for Firefox 1.0
In the end, it went pretty well and it's impossible to know what would have happened if they would have decided to continue on the Navigator track, but Netscape Navigator reached basically 0% market share until Firefox was released.
So the claim is that it would have been less expensive for Firefox to start over from scratch and make a new multi-process browser while at the same time maintaining a single-process browser that stayed current with web standards in the interim?
So in all of those examples: Why did the people behind all of those projects opt for a gradual change, instead of a "2.0" from scratch project? Too afraid of losing existing customers or plugin developers? I mean for Firefox they could have replaced the whole base package via the updater.
Microsoft ended up doing it with Edge (I believe), doesn't seem to have hurt them much.
And further, a lot of microsoft's share of the browser market was defacto share (its the default in the OS) not mindshare - when you choose defacto the enterprises are going to line up behind that, and so edge always has a set of people who follow microsoft's lead regardless of quality.
Joel Spolsky says exactly the opposite: https://www.joelonsoftware.com/2000/04/06/things-you-should-... FWIW, I think Spolsky is wrong and this Greer fellow is right, but I don't have any particular evidence either way (and definitely not enough to convince the sort of person who rises to a position of authority in a software organization) other than I've seen attempts to incrementally add features to massive software go horribly wrong more often than I've seen throw-it-out-and-start-over go horribly wrong.
Software that is merely rotten can often be saved, with varying levels of effort and skill. Software that is ossified often needs to be replaced.