
The Ball-of-Mud Transition, or how software gets complex - mtwestra
http://akvo.org/blog/the-ball-of-mud-transition/
======
ColinWright
Picking up on adrianN's comment[0], when you have a collection of nodes and
start connecting them at random, initially they are all disconnected
(obviously) and any two that you pick are likely not to have any edges. This
in the early stages, your graph is isolated nodes and isolated edges.

After a while, by chance, you happen to join an existing edge to a node. That
component now has three vertices, and is 50% more likely to be chosen at
random than the isolated edges.

There comes a point where you join two non-trivial components, and before long
you reach a tipping point. Suddenly nearly every node you choose already
belongs to a component, and that component starts vacuuming up everything.

Thus we have the emergence of "The Giant Component". This transition is sharp
and well-studied. Whether you think of it as "obvious" depends on how much you
study these things. I seem to recall that there is a major result that says
that all first-order predicates have these threshold emergence properties, but
it's been too long (30 years) since I studied this, and I could be wrong. I
may be able to find some references if people really want me to.

[0]
[https://news.ycombinator.com/item?id=6546978](https://news.ycombinator.com/item?id=6546978)

~~~
gavinpc
This is very close to the "percolation problem." [0]

According to Robert Sedgewick, this particular problem has _no known
mathematical solution_ , and the threshold (for a given N) is only obtained
through, e.g. a Monte Carlo simulations where you randomly open sites until
the grid percolates (akin to the adding of threads). The whole thing is a good
application of the union find algorithm.

The threshhold for N > 2 is about 60%. Not sure how that applies to software
complexity, but it's interesting to think about.

Thanks, Coursera!

[0]
[http://en.wikipedia.org/wiki/Percolation_threshold](http://en.wikipedia.org/wiki/Percolation_threshold)

~~~
joe_the_user
You mean one particular percolation problem, I assume.

Just to clarify because one might interpret your statement to mean that
percolation problems in general don't have exact formulas for their solution
but the "exact formula" section in your link would say otherwise.

------
curveship
This is a neat thought experiment, but it seems to me it's looking at a
different part of the curve than software complexity. The "phase transition"
the author discusses occurs between 0 and 1 threads/button. In a software
project, you wouldn't bring in a new component unless it served some use to
the existing pieces, so software projects _start at_ 1 "thread/button," and
unless you've got orphaned code, the "cluster size" is always 100%.

Software complexity strikes me as a graph-coverage problem: given a graph of N
vertices and M paths (i.e. software components and dependencies), how many
vertices and paths do we need to traverse (i.e. understand) in order to make a
change to component X? How does that parameter scale with different forms of
graph -- linear, n-ary tree, DAG, cyclic (yikes!)?

Or is there a homomorphism between the two problems?

------
pbw
What he doesn't emphasize is the directionality of this phase transition. It's
really easy to add one more string, but staring at the resulting button-thread
agglomeration it's very difficult to know what string to cut. This is why
there is a never ending stream of new projects started to solve the same
problems over and over. Their authors covet the opportunity to make progress
during the honeymoon period, before the Sisyphean battle against software
entropy sets in.

~~~
mtwestra
I agree completely - once you're on the wrong side of the transition, it is
hard to go back

~~~
beat
More importantly, it's not worth going back. At a certain point, it becomes
actually less work to start over than to try to cut that Gordian Knot.

The problem then is when business logic is encapsulated in the mud.

------
adrianN
What he studied experimentally with the buttons and threads is know in graph
theory as the "Giant component" threshold and is exactly known.

[https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_...](https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model#Properties_of_G.28n.2C_p.29)

~~~
ColinWright
That's true, but I'm unaware of much work being done on the error bounds for
small N (for some concept of small). I started on this during my PhD, but
rapidly moved onto other problems that seemed more tractable, and never really
returned to it. The results of Bollobás, Erdős, Rényi, and others, are mostly
asymptotic. They do, however, seem remarkably good even on graphs of small
size (under 10^6 vertices).

~~~
JulianMorrison
Those names make me wonder if mathematicians are arranged in series - Erdos,
Erdós, Erdős, ...

------
6ren
_The following is a tangent._

When designing something, there are often many choices. If they interact, it
quickly becomes intractable. It's tempting to try to keep them in mind, and
work out the answer, but with exponentially increasing complexity, your limits
are quickly reached (no matter how smart you are). Enhancing your
intelligence, e.g. by offloading information onto paper, also has limits.

One solution is the scientific experiment: hold all variables constant, and
see the effect of changing just one design choice. Holding them constant means
you have made a design choice for that aspect that is almost certainly not
optimal.

Ideally, you can do what is suggested in the article - create modules that are
largely independent, and experiment within one module in isolation. Because
there are fewer variables per module, they are less complex, and it takes
fewer experiments to understand how each works.

The deep problem with this is if you don't _know_ what those modules would be
- i.e. you don't know which aspects are independent because that's the very
thing you're trying to find out! Of course, you can probably have a guess, and
certainly use your initial experiments to check those guesses, and maybe with
the information gained, improve your guesses.

 _EDIT_ a specification is a module, in that it separates out some design
choices.

------
LukeShu
This isn't about what he is saying, but how he is saying it:

This bit bothered me: " _Wikipedia does a great job of explaining it:_ " then
has a quote from an actual source, that happens to be block-quoted on the
Wikipedia page. If the part you quote was directly said by Brian Foote and
Joseph Yoder, attribute it to them.

~~~
mtwestra
You're right, it had escaped my attention. I have quoted them directly in the
blog now. Thanks!

------
smoyer
The attribution to (through) Wikipedia describes the term as being coined in
1997, but the original authors of that paper expanded there ideas in 1999. I
reread this paper every couple of years just to make sure I'm not "guilty".

[http://laputan.org/mud/](http://laputan.org/mud/)

------
SideburnsOfDoom
There is a long but good read that goes into this topic here
[http://blogs.msdn.com/b/karchworld_identity/archive/2011/04/...](http://blogs.msdn.com/b/karchworld_identity/archive/2011/04/01/lehman-
s-laws-of-software-evolution-and-the-staged-model.aspx)

It's about the full lifecycle of a typical software product. Particularly
"Section 2.3.1: Loss of Architectural Integrity"

> code decay can be thought of as any implementation of functionality that
> makes the overall comprehension of the system more difficult to attain and
> increases the efforts required to implement future changes.

.. and makes further decay more likely

------
AndrewDucker
This sounds like "Broken Windows Syndrome".

Once a system gets to the point where more than (somewhere between 30-50%) is
connected together, developers stop caring about keeping it modular, because
it's clear to them that their efforts are a waste of energy.

From that point onwards, the project becomes a cesspool of hackery.

------
WhaleBiologist
From my experience code is either 'done properly' or 'shoehorned in'. But
really, 'done properly' means 'you have time to organize everything
satisfactorily at a high level', typically only when you are writing a new
module from scratch.

Everything else is trying to shoehorn something new into an existing
framework, and you don't have enough time to get it 'done properly' because
your product manager has a heart attack when you tell them how long it'll take
to do a proper refactoring job. This is where you will quite happily cut
corners, and the chance that you'll inadvertently break existing functionality
in the process increases exponentially. This is the mechanism that, in my
experience, causes balls-of-mud.

And of course, no matter how good you are at planning every required use-case
of your code over its lifetime at the 'done properly' stage, you can never
think of it all, so at some point or another you are forced to shoehorn stuff
in everywhere anyway.

------
nraynaud
I do my best to keep in check high level stuff, like asking developers to try
to delete a library if they add one, to try to "buy-back" their added lines of
code in their changesets (ie. try to refactor to delete as much as they
added), to get the cardboards down to the trash etc.

I think I have knack for seeing and caring about that. I see complexity
arising (even if I can't always prevent it, I mean we have to ship too).

------
greenyoda
I'm not sure that _randomly_ connecting nodes is a good model for how software
complexity arises, since connections between software components are _not_
made at random. When was the last time you threw some dice to decide whether
some piece of your user interface would be connected to business logic or
directly to your database server? There's usually a (non-random) reason for
why we add a piece of code, and the reason why complexity gets out of hand is
that we don't, for various reasons, refactor our architectures to eliminate
accumulated complexity before it's too late (and then nobody can understand it
anymore, and you end up with the ball of mud).

It's also interesting to note that Foote and Yoder's "Big Ball of Mud"
paper[1] portrayed the mechanisms for mud-ball formation as a set of anti-
patterns. It's an interesting read. Their give some pretty thoughtful
explanations for how software transmogrifies into a mud-ball, none of which
include random processes.

[1] [http://laputan.org/mud](http://laputan.org/mud) (1999)

------
nemoniac
It's worth noting that Lisp programmers value the quality of that language as
being a "big ball of mud" but for other reasons.

[https://en.wikipedia.org/wiki/Big_ball_of_mud#In_programming...](https://en.wikipedia.org/wiki/Big_ball_of_mud#In_programming_languages)

------
_stuart
The author is measuring largest cluster-size vs threads/button.

In any software, everything is going to be connected, otherwise there's
unreachable code. So the largest cluster is always 100%, so I don't get why
his argument about the sudden phase transition is relevant to software?

~~~
lesterbuck
I think the answer to your question is in the missing directions on the
threads. The goal of modular software is to have the arrowheads point in the
right directions.

Uncle Bob was talking about this in his keynote about how Rails is not your
application.

[http://confreaks.com/videos/759-rubymidwest2011-keynote-
arch...](http://confreaks.com/videos/759-rubymidwest2011-keynote-architecture-
the-lost-years)

[http://blog.8thlight.com/uncle-bob/2012/08/13/the-clean-
arch...](http://blog.8thlight.com/uncle-bob/2012/08/13/the-clean-
architecture.html)

[https://vimeo.com/21145583](https://vimeo.com/21145583)

~~~
aidos
I was wondering about this too. I don't quite get how this random assignment
of connections between components correlated to software complexity. Software
isn't randomly connected (I know, sometimes you see stuff which flies in the
face of this statement). And the directions of the connections are very
important. You can create something that has a single component at the top
connected to 100 other components. threads/buttons = 100/101 and the biggest
cluster = 100%. I'd wager it would probably be a simple program to reason
about. I guess I'm confused about the leap from cluster size to complexity to
reason about.

------
mathattack
So the story is software gets ugly when it gets less modular? This is a
truism, no?

~~~
PaulHoule
In real life I've seen many projects go wrong because people modularized it in
the wrong way -- often connected with a naïve faith in "encapsulation"
(Complex bugs, performance problems, and crackers don't respect
encapsulation.)

For instance, SOA has had a new lease on life lately, for good reasons. I
picked up a system that had four layers involved with doing a request; each of
these layers had different serialization/deserialization logic (two
submodules) and at least one submodule that would actually do the work.

Debugging a problem in the system often involved a wild goose chase across 12
submodules and often changing something simple (like adding a new data field)
would require all 12 modules to be changed.

Even if you have a good batting average and you manage to make these changes
right 90% of the time it's close to certain that making a change to that
system would create a new bug.

The underlying social problem isn't that "people want to do things quickly",
it's that people don't see simplicity as a virtue and don't see complexity as
a problem. If they valued speed, they'd pursue simplicity because you can make
changes much more quickly in a simple system.

~~~
Shish2k
> you can make changes much more quickly in a simple system.

IME: Doing N features hackishly takes O(n^2) time; Doing them properly takes
O(2n). The problem is when management sees a project as a series of several
n=1 tasks, instead of one n=n project.

