
Analysis of longevity of code across many popular projects - alexlikeits1999
https://erikbern.com/2016/12/05/the-half-life-of-code.html
======
cyphar
It looks like the exponential model isn't a good fit at all -- in all cases it
undershoots the decay at the start of the graph and overshoots at the tail
end. So while it might "look close" there is some systematic that your model
doesn't account for. In particular, I don't agree that all code in a codebase
has a constant risk of being replaced -- most projects have different
components that are developed at different rates. Some components are legacy
code that is likely to never change, while other parts are under rapid
development. In fact, I'd argue that's why the tail is so long -- legacy code
is called "legacy" for a reason. And the tip of the graph dives down so
quickly because code being rapidly developed has a higher chance of being
replaced.

~~~
BlackFly
Agreed; you can reject exponential decay a priori: a code base has some
minimal set of functionality that it specifies and there is some minimal set
of lines required to provide that functionality. If you believe this, then the
"decay" curves, must illustrate an asymptote. The asymptote only gets reduced
by breaking backwards compatibility. Such an action would include projects
like Angular that go ahead and throw away a ton of core functionality in
moving from 1.x to 2.x.

The decay isn't actually decay at all, but represents the complement of lines
that define peripheral functionality. Lines defining peripheral functionality
typically require modification (refactoring) as additional functionality is
added. The asymptote which all the curves illustrate but the fit cannot
capture represents the proportion of irreducible core functionality.

Simply adding a constant to the fit might fix all the problems.

~~~
Nomentatus
This is a close fit to Fechner's Law (not Weber's) relating perceived
intensity to stimulus in animal vision. Also, thanks so much, author.

------
jsjohnst
I'd posit that the reason is a nuance of #2, more thought was put into older
code on the design of how it should work before making it work. Now we write
code so fast that we have to scrap it all and do it again a second time to fix
the mistakes of the first time [0]. I'm of the mind that upfront planning
would've likely taken less time, but that's simply my opinion and I don't have
anything to back it up besides anecdotal experience. The current practice of
"move fast and break things" very well could be a better approach.

[0] I'm only mentioning this footnote due to the article picking on Angular
(fairly or unfairly), but the point I made that this is footnoting is relevant
to them potentially.

~~~
techiferous
> The current practice of "move fast and break things" very well could be a
> better approach.

It's all about context. What's the cost of "breaking things"? Does your not-
yet-monetized social network startup go down for a couple of hours? Or does
someone die?

Also, I have witnessed first-hand the slowdown in productivity that a "move
fast and break things" approach has when what you are breaking is your team's
ability to work quickly and confidently with the code base.

------
yoavfr
Results for WordPress [https://blog.yoavfarhi.com/2016/12/06/half-life-
wordpress-co...](https://blog.yoavfarhi.com/2016/12/06/half-life-wordpress-
code/)

~~~
jwdunne
It's interesting that it seems better engineered projects tend to retain more
initial lines of code. Almost no first version code exists in WordPress and
that project's codebase truly is hodge bodge, where as Git is considered a
better piece of emgineering. In fact, Linus has been asked specifically about
the good design of his programs and how he achieves it.

------
aamederen
Just Amazing.

I wonder if there any research articles discussing the correlation between
code-change and other metrics like product quality, change frequency of team
members, estimation success, etc.

~~~
siscia
This is not a research article? What is the difference between this article
and one printed in PDF with more complex words and uglier images?

My question sparked for your request of a RESEARCH article and not a "normal"
blog post.

I would like to know why you want a "research" article, that I assumed being
an academic article, instead of a blog post.

~~~
aamederen
What I interested in was not the "research article" part but the comparison
case.

This article comes up with a tool measuring the half-life of the code and
demonstrates it on some project. What I was requested as an addition is the
discussion of correlation of this metric with other metrics.

Being said that, a "paper" makes a difference compared to a "blog post" at
some cases. Sometimes, in order to convince your directors and project
managers about your change proposals to the programming processes, you need to
support your idea with more serious work.

For example, in my previous company, I could use such an academic research in
order to demand more time budget for "code-cleanup" periods where the team
focuses just to re-writing the parts of legacy code, instead of bugfixes and
new features.

I am surprised that this small request offended someone.

~~~
siscia
Nono, I was not offended, sorry if I gave such impression.

What left me wonder is the assumption that an academic paper is more serious
than a blog post; I frequent the academic and that is just not true,
especially in our field.

Then, of course, upper management may not share this point of view.

------
georgeecollins
Look at how consistently the lines of code grow for these projects. I doubt
that is surprising but think about the implications. Linux is a pretty old
open source project and still on balance the lines of code just grow.

How many lines of code will it be in fifty years? Will we have to come up with
new systems to manage the fact that individuals only really understand smaller
and smaller pieces of it? Will it reach a mass where like a black hole it
collapses from some uncomprehensible failure?

There have never been things like this that just grow in complexity forever.

~~~
rmchugh
I think one of the main reasons that something like Linux keeps on growing is
that it needs to support more drivers as they come onto the market. So unless
you want to remove support for older hardware, you're more or less going to
grow endlessly.

~~~
georgeecollins
Right, but nothing can grow endlessly. Yet these projects do grow
relentlessly, almost never getting any smaller, for as long as we have kept
track. It can't go on forever, so how can it end?

~~~
lepton
The industry could converge on standards which allow certain classes of
"repeated but different" code (drivers?) to collapse. By collapse, I mean they
could share a generic version of code, thus reducing the overall line count.

Or a new implementation (or OS) could come along, with compelling advances and
no baggage. The old code doesn't so much collapse as become obsolete.

------
aristus
My rule of thumb is that if a line of code survives its first five years it'll
live forever. The age of a piece of code is the single greatest predictor of
its future.

------
skeltoac
Code from year one may still be the same code but when it gets moved or
reformatted its cohort is updated. If a bad change is reverted, will the
cohort for those lines of code also be reverted? The effect of understated
longevity is not so obvious when it is gradual and organic. Sometimes an event
in a project's history makes the effect very obvious.

[https://blog.yoavfarhi.com/2016/12/06/half-life-wordpress-
co...](https://blog.yoavfarhi.com/2016/12/06/half-life-wordpress-code/)

------
inputcoffee
I really like this. Suppose we were to accept the suggestion that perhaps
linearly decaying code is better built, and more robust, than exponentially
decaying code.

Would this give newbies a great new tool to answer their question of which
framework or language to learn?

Rails or Django? Django lasts longer. Angular or React? VueJs just a trend?

You could answer all these questions with this kind of analysis.

If someone wants to make a genuine contribution, a blog post contrasting the
various decays of Javascript frameworks would be a hit.

~~~
ElonsMosque
Ditto. That would be useful to know as newbie, I'm hoping someone can give an
answer or atleast point into the right direction.

------
puredanger
Clojure takes a remarkably stable and additive approach to maintenance and
growth. Graph for it is here:
[http://imgur.com/a/rH8DC](http://imgur.com/a/rH8DC)

------
Insanity
I'm not a dad and I actually appreciated the "Git of Theseus" pun. The ship is
an interesting thought experience on identity, maybe you can modernize it to
introduce philosophy to computer science students ^^

But more on-topic, nice article, well done!

------
koja86
Good job and great tool. Thanks.

This might actually be an interesting metric in regards to project
architecture and/or project management.

Totally agree that exponential model explanatory power is great.

------
shivpat
Great stuff here.

Definitely some merit to reason #3 - people are more willing to work on
something that they can easily build on top of.

------
msluyter
A lot of projects I've worked on have utility libraries consisting of mostly
stateless, pure functions -- I have a theory that these constitute some of the
longest lived code. That, and database models, which tend to be easier to
expand than to contract. I'd be curious to see some analysis along these
lines.

~~~
jtigger
Nice hypothesis! Reminds me of the shearing layers from Foote & Yoder's "Big
Ball of Mud" —
[http://www.laputan.org/mud/mud.html#ShearingLayers](http://www.laputan.org/mud/mud.html#ShearingLayers)

------
alenmilk
Interesting, but it is not surprising that git and redis are more stable than
node and angular. Git and redis are well defined problems that won't change
that much. Angular is a framework and node is a platform. They should change
more. But then again... javascript fatigue is a thing from what I heard.

------
azag0
The simplest explanation is that the exponential model is not a good one (does
not correspond to the underlying dynamics), and so the half time value is not
an inherent (time-independent) property of the codebase, but depends on its
age. It seems to me that in most projects, the code evolves quickly in the
beginning and then stabilizes on a slower linear decay. This would explain the
observed dependence of the fitted half life on age. It might be more
meaningful to fit a linear dependence at the beginning of the project and in
the asymptotic regime and also look at their ratio. This should be more
stable, would tell you how well was the project designed from the beginning,
and would also indicate whether the project has already stabilized or not.

~~~
jsjohnst
He addressed this partly in the article. Even when looking just at the early
days of the more mature projects the code churned much less than projects less
mature now.

------
qume
Would love to see this restricted to which lines are actually executed in a
typical use. I.e. ignore dead code which is still in the repo.

Probably wouldn't be too hard for the interpreted languages.

------
jtigger
I wish we could capture the "inventiveness" of a particular project — how well
the problem was understood when the project initiated.

There had been _many_ *nix'es by 2006, so the territory had significant prior
art and with it collective deep understanding of the problems being solved.
Angular sprouted alongside a number of other SPA frameworks in an ecosystem
that was experiencing a "growth-spurt" (using that term loosely) — lots of
variables.

------
mmerickel
This is awesome. I ran it on pyramid for fun.
[http://imgur.com/a/KZ9KR](http://imgur.com/a/KZ9KR)

~~~
AstralStorm
A catastrophe based model seems more appropriate. You see big refactoring
taking place. Only in a big number sum those Poisson-like events turn into
exponentials.

~~~
mmerickel
2010 was a major transition for pyramid as it was re-branded from repoze.bfg
and merged with the pylons project.

------
jtigger
Another feature of the model could be stability of product vision. Is there a
correlation between the half-life of committer membership and that of the
code? How has the problem space of the product changed over time?

Perhaps we could talk about "intrinsic churn" vs. "accidental churn". The
former results from the codebase keeping up with the "drift" in the problem
space; the latter comes from having to learn.

------
malkia
There are still AAA games shipped with code written by id Software 20 or more
years ago. Tools too :)

------
esmi
Super interesting.

But it doesn't seem fair to directly compare 2005 14 year old stable 2.6 Linux
to projects which did basically foundational initial releases and not mention
it.

------
twelvechairs
Great work. Would be interesting to compare by language - see if particular
languages need more or less refactoring.

------
throwawygybj
So its true... Angular is the code abyss, my colleagues said it was a legend
but I have seen it with mine own eyes.

Thank you

------
beefman
Would be interesting to see the results as a function of repository size.

------
xiphias
Maybe Linux is not rewritten because it doesn't have tests

~~~
AstralStorm
It is being rewritten all the time, but new code per arch or driver dwarfs
common.

~~~
xiphias
Still, without a high and good test coverage that makes sure that the drivers
don't break when the core is changed, it's not worth to do refactoring just to
make the code cleaner. Still, it's amazing how much extra functionality was
possible to be added

