
Software Rot (2017) - lelf
https://geoff.greer.fm/2017/02/28/software-rot/
======
notacoward
The examples don't seem to illustrate software _rot_ at all. Rather, I'd say
they illustrate software _ossification_. Over time, each example became more
rigid and hard to change because things that should have been modularized and
isolated in one place instead became idioms pervading the entire codebase.
(The "should have" is arguable, I know. There's a broader point about idioms
vs. modules that deserves its own blog post.) Rot would be when a program
works less and less over time because its external dependencies vanish or
change and the program itself stops adapting.

Software that is merely _rotten_ can often be saved, with varying levels of
effort and skill. Software that is _ossified_ often needs to be replaced.

~~~
meschi
Can you provide an example of rotten software?

~~~
Raidion
I think the point OP is making is that there is plenty of bad software out
there that (as a result of a combination of incompetence, scope creep, bad
decisions, and short timelines) just isn't resilient, testable, and single use
anymore. As long as the code is isolated enough, it's fairly easy from a
technical standpoint to replace/refactor.

Ossified code is code that isn't isolated, but tightly bound to several other
components. This means that you can't swap it out without making the exact
same architectural mistakes. This means you basically need to start over from
scratch, which is a much harder and broader problem.

~~~
convolvatron
there is an alternate path for your ossified code, but its really costly. idk
if it has a cute name.

if there aren't system/integration tests - write system tests. you have to be
pretty thorough.

have a long and involved discussion about what the new thing is going to be.
go through all the frustrations with the current code base. convince yourself
at the end that its going to be worth it, because the cost is going to be
high. ideally the new version will open up new capabilities that just weren't
possible before.

find a cut in the dependency graph. you're going to rewrite the code on one
side and leave the other side untouched. ideally that cut will be small-ish
and contain some particularly broken stuff that you'd like to get rid of as
soon as possible. unfortunately there may not be a small cut that makes sense
:-(

build a shim between the old model and the new model across the cut. this shim
is only going to last as long as the old code on the other side. this shim
might be involved and a total waste of effort. suck it up or look for a
different cut.

replace the code on the new side and test against your suite. if its not
really that exhaustive, expect a rash of bug reports. run through some kind of
soft deployment. if its a request/response kind of thing consider forking your
production traffic and comparing the results against the old code base.

repeat until golden brown.

this overall process also can fail. often because of poor test coverage, and
more likely because you haven't adequately communicated the scope of the
undertaking and its absolute necessity to the rest of the engineering
organization and the business as a whole.

however, if you make it through, you've avoided the giant speed bump that
comes at the end of the rewrite, and you've been able to fold in new feature
and bugfix work along the way.

~~~
rzzzt
Cute name is "strangler":
[https://www.martinfowler.com/bliki/StranglerApplication.html](https://www.martinfowler.com/bliki/StranglerApplication.html)

------
bluejekyll
"Chrome was written from scratch in far less time than it has taken Firefox to
change."

This doesn't line up with my understanding of the history of Chrome. I thought
Chrome was based on WebKit, therefor definitely not from scratch.

Edit: this post references lifting components from WebKit and Firefox in the
initial version, [https://googleblog.blogspot.com/2008/09/fresh-take-on-
browse...](https://googleblog.blogspot.com/2008/09/fresh-take-on-browser.html)

~~~
Xophmeister
Sure, but it's not the rendering engine they're talking about, it's just that
Chrome was architected from the start to isolate tabs/windows to a process
(containing its own renderer) each.

~~~
bluejekyll
That’s the context in that section, sure, but it’s not what I take away from
that statement.

Point being, even Chrome would have needed to make WebKit compatible with that
model. I don’t know if there was a lot of work to do that or not. But “from
scratch” implies that there was no existing code to refactor, which would
appear be a false statement.

------
euske
I think what's discussed in OP is not software rot, but software _clutter_.
Software rot would be something inherent that happens to software naturally
over time, without any external change. I can't think of a good example of
this, so I think software rot is in general a misnomer.

~~~
phkahler
Well there is bit-rot. One manifestation of this is when nobody works on code
for a while and it becomes unusable or less usable because the world has moved
on. Maybe newer versions of the compiler won't compile it. Maybe some of the
dependencies have changed. Just try to build even a small program written for
Windows 95 or 98 today.

This is one reason "do one thing and do it well" is useful. The interface is
kept simple and should be easy enough to update to a new environment. At the
same time, that means for bigger systems a lot of small things will need
updating over time, but at least the changes should be straight forward.

~~~
agumonkey
isnt this 'bitrot' function of the interface size ? I doubt sed will ever rot
for instance

~~~
abbeyj
I think we can test this retroactively. sed has been around a long time. So we
can go grab a copy from, say, Version 7 Unix and try to build it on a modern
system. I grabbed v7.tar.gz from
[https://github.com/v7unix/v7unix](https://github.com/v7unix/v7unix),
extracted it, and tried to build:

    
    
        $ make
        cc -n -O   -c -o sed0.o sed0.c
        In file included from sed0.c:2:
        sed.h:116: warning: declaration does not declare anything
        sed.h:128: warning: declaration does not declare anything
        sed0.c: In function ‘main’:
        sed0.c:32: error: ‘union reptr’ has no member named ‘ad1’
        sed0.c:48: warning: incompatible implicit declaration of built-in function ‘exit’
        <snip many more lines>
    

The problem here is that although sed was written in "C" and we have a "C
compiler" on our system, the definition of what exactly constitutes "C" has
changed. The modern compiler no longer accepts the same language as the V7
compiler.

We can fix the source to deal with this and produce a working binary. But
then, what would we do with it? We can't really install it as /bin/sed. The
world now expects /bin/sed to support switches and syntax that this sed does
not. Trying to use this as /bin/sed would cause lots of programs to fail. The
definition of what exactly constitutes a working "sed" program has also
changed.

So it seems to me like sed already has rotted in some sense. Looking forward,
the sed from a current system is likely to be similarly rotted when you try to
use it on a machine 40 years from now.

~~~
msla
> The problem here is that although sed was written in "C" and we have a "C
> compiler" on our system, the definition of what exactly constitutes "C" has
> changed. The modern compiler no longer accepts the same language as the V7
> compiler.

Standards provide solid points of reference in this mess: 1989 ANSI/ISO C is
not going to change. Specific compilers come and go, but the language defined
by those documents (one ANSI, one ISO) is unchanging and, more importantly,
well-understood such that C compiler implementers both feel the need to
implement it correctly and understand how.

> Looking forward, the sed from a current system is likely to be similarly
> rotted when you try to use it on a machine 40 years from now.

I think it's likely the language won't rot the same way, due to the
standardization I mentioned, but the OS interfaces beyond POSIX or similar
might rot.

------
jakelazaroff
_> Adapting mature software to new circumstances tends to take more time and
effort than writing new software from scratch._

Counterpoint: rewriting your code from scratch is the worst mistake you can
make. [https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

~~~
jasode
_> Counterpoint: rewriting your code from scratch is the worst mistake you can
make._

Joel Spolsky's essay is more about rewrites of "ugly" code. E.g. Fix the
disorganized code by gradually refactoring into cleaner modules over time
instead of a total blank slate rewrite.

In contrast, this Geoff Greer essay is about _paradigm-shifting software
architectural changes_ which by their nature, are very difficult to retrofit
into an old existing code base. For these, it's often easier to start coding
with a "blank slate" rewrite.

As examples Geoff's category of rewrites (new architecture paradigm) vs Joel's
category (unhygenic code cleanup), we can look at Joel's ex-employer
Microsoft:

\- SQL Server database diverged from original Sybase code and rewrote the
engine. One of the architecture changes was switching from page locks to row-
level locks

\- C# compiler completely rewritten from C++ code base to C# codebase. One
architecture change was "compiler exposed as a libary service" instead of
being the closed "black box" that was assumed by the C++ code.

\- operating system: MS Windows NT as a blank slate operating system instead
of gradually extending old 16-bit DOS code or 16-bit Windows 1.0 code. One
motivating architecture change was the new 32-bit protected mode on the newer
Intel 286 & 386 chips. Another factor was the switch from "cooperative
multitasking" to "preemptive multitasking".

~~~
gwern
And it's worth noting that Spolsky left MS & wrote that essay well before MS
had its "come to Jesus" moment when it came to security. A fanatical emphasis
on backwards compatibility & avoiding rewrites was not helpful for avoiding
that crisis.

------
evrydayhustling
I've had my share of frustration with the GIL over the years, but I don't
think it fits into the list.

Python's rising years have been the same years that we discovered that (a)
generalized parallelism is really hard, and (b) many many many applications of
muticore are "embarrassingly parallel" and benefit little from shared
resources. Stasis around the GIL is a usecase-driven decision to make
embarrassing parallelism easy enough (via multiprocessing) and delegate more
involved parallelism to linkable libraries or separate services. It helps the
language focus on what it's good at.

~~~
war1025
I think for most people, async io / coroutines gets them most of the benefit
that they'd ever get out of multi-threading.

With async io, you can have a lot going on at once, interleaving nicely while
different tasks wait for io responses.

Beyond that, normally you're just as well off to kick off a second process.

You have to get pretty fancy with algorithms to make multi-threaded
computation a net-benefit. Most of the time you don't need that. If you do,
you're probably not reaching for python.

------
niftich
I don't agree with that quote being used to define software rot, and I don't
think the examples illustrate rot either. I'm just fundamentally opposed to
the terminology of the rotting metaphor when it's applied to projects that
receive active upkeep, even if the project drifts further from its original
scope and vision. For that, we need a name for that that alludes to scope
creep and refactoring, but one that isn't suggestive of decomposition at the
mercy of the environment of things immobile and abandoned.

I've always understood software rot to refer only to cases where the
dependencies, including libraries and platform APIs, are changed so that an
already-delivered copy of the software can no longer function, and the
maintainers have since abandoned it, or can't do a quick fix to get it working
again.

A recent example is the removal of support for NPAPI plugins in Firefox [1] in
2017. Firefox is in the role of the 'platform API' here, and this change of
theirs broke any extension that didn't update. And, the replacement API
differs in features, so the effort of updating an extension approaches that of
a rewrite.

Another example is the game 'Star Wars Episode I: Racer', the podracing game
from 1999. Newer versions of Windows and DirectX have changed things so that
the game's original executable doesn't run [2].

Even open source isn't a remedy. Random projects one can find on a source
repo, last updated x years ago, are code that are susceptible to this.
Sometimes they're shipped without dependency management, pinned to old
versions that themselves no longer work, or track the latest when they should
be pinned instead. If more work is required to get it running than just the
instructions provided by the author (assuming those were sufficient at some
point), then it has suffered software rot.

[1] [https://support.mozilla.org/en-US/kb/npapi-
plugins](https://support.mozilla.org/en-US/kb/npapi-plugins) [2]
[https://www.play-old-pc-games.com/2013/12/02/star-wars-
episo...](https://www.play-old-pc-games.com/2013/12/02/star-wars-episode-i-
racer/)

------
est31
Firefox needed so long for a multi-process architecture because Mozilla
focused on other things, like developing mobile operating systems. The main
browser was kept in hibernation mode.

Also, as the poster said, Firefox had a rich Add-on API that they wanted to
keep support for but which they eventually decided to drop. The new API isn't
close to the versatility of the old API, sadly some add ons work worse now.

A better example would be I think iOS parts of which can't adopt swift and
have to stay implemented in ObjectiveC because of binary compatibility
concerns.

------
amelius
Imho, terms like "software rot" and "bit rot" are simply disingenuous attempts
to blame the software for a malfunction that is caused by changes in the
environment of the software.

~~~
ryandrake
Or caused by programmer neglect. If your software compiles today and then
fails to compile or run in the future, you could be doing something wrong.
Unless your code got hit by cosmic rays and a bunch of bits flipped, nothing
rotted.

I have code I wrote 20 years ago that still compiles and runs. I could check
again but I don’t think any of it rotted.

~~~
TeMPOraL
The languages and systems change. Some time ago, I had a job freshening up a
large, ~30 years old Common Lisp project and it worked mostly fine, with
minimum changes attributable partly to evolution of Linux, and partly to some
pieces of code being older than the ANSI Common Lisp standard. I don't believe
it would be as easy with Python or JS.

That said, I don't buy "software rot" as a proper name. Software does not
"rot". The concept of rot implies an _internal_ change that makes something
fail whether or not the environment around it has changed. This does not
happen to digitizes data. Programs only stop keeping up with the Red Queen's
race of computing environments.

------
Scooty
> Adapting mature software to new circumstances tends to take more time and
> effort than writing new software from scratch. Software people don’t like to
> admit this, but the evidence is clear.

Software people love rewriting things. In most cases where I have to maintain
an old project I would much rather rewrite it, but it's usually not in the
budget.

~~~
badfrog
In my experience the re-write always takes much longer than expected. Sure you
might be able to redo the core pieces to handle the new problem in X weeks,
but then it's going to take another 4*X weeks to handle all the edge cases and
make sure you're not breaking anything your clients/users have grown to depend
on.

From another HN thread today:
[https://news.ycombinator.com/item?id=19245485](https://news.ycombinator.com/item?id=19245485)

> The first 90% of the code takes the first 90% of the time. The remaining 10%
> takes the other 90% of the time.

And

> With a sufficient number of users of an API, it does not matter what you
> promise in the contract: all observable behaviors of your system will be
> depended on by somebody

~~~
Scooty
For sure. That's what I meant by "it's not in the budget". I would probably
choose to rewrite most projects I work on, but I know it's completely
unreasonable for most non-trivial projects. I just meant I naturally gravitate
towards wanting to rewrite because clean slates are satisfying.

------
abainbridge
The example of Chrome being written in less time than Firefox took to add
multi-process support is interesting. Yes, changing a deeply held assumption
in a mature code-base is a nightmare. But, Firefox continued to make releases
while they were gradually making multi-proccess support happen. Starting again
would have been suicide for them.

~~~
jacobush
And they already did start over once, and it almost was suicide. Instead of
gradually fixing Netscape Navigator 4.7, we waited years and years for Firefox
1.0

In the end, it went pretty well and it's impossible to know what would have
happened if they would have decided to continue on the Navigator track, but
Netscape Navigator reached basically 0% market share until Firefox was
released.

~~~
kalleboo
To be specific, it was 5 years between Netscape Communicator 4.7 (1997) and
Mozilla 1.0 (2002)

------
jancsika
> Multi-process Firefox

So the claim is that it would have been _less expensive_ for Firefox to start
over from scratch and make a new multi-process browser _while at the same
time_ maintaining a single-process browser that stayed current with web
standards in the interim?

------
Cthulhu_
So in all of those examples: Why did the people behind all of those projects
opt for a gradual change, instead of a "2.0" from scratch project? Too afraid
of losing existing customers or plugin developers? I mean for Firefox they
could have replaced the whole base package via the updater.

Microsoft ended up doing it with Edge (I believe), doesn't seem to have hurt
them much.

~~~
ekianjo
> Microsoft ended up doing it with Edge (I believe), doesn't seem to have hurt
> them much.

Because they had already lost a large chunk of their share.

~~~
hobs
And further, a lot of microsoft's share of the browser market was defacto
share (its the default in the OS) not mindshare - when you choose defacto the
enterprises are going to line up behind that, and so edge always has a set of
people who follow microsoft's lead regardless of quality.

------
nix0n
Full rewrites aren't great either, Netscape[0] is a classic example.

[0][https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

------
commandlinefan
Joel Spolsky says exactly the opposite:
[https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/) FWIW, I think Spolsky is wrong and this Greer fellow is right, but
I don't have any particular evidence either way (and definitely not enough to
convince the sort of person who rises to a position of authority in a software
organization) other than I've seen attempts to incrementally add features to
massive software go horribly wrong more often than I've seen throw-it-out-and-
start-over go horribly wrong.

------
XerTheSquirrel
Software rot definitely hits projects of multiple sizes, you do get better
over time programming though so there is always room for improvement.

------
bregma
It's just software. You can just change it.

------
jeletonskelly
I thought we called it "bit rot"

~~~
overshard
"Software rot, also known as code rot, bit rot, software erosion, software
decay or software entropy..."

[https://en.wikipedia.org/wiki/Software_rot](https://en.wikipedia.org/wiki/Software_rot)

