Hacker News new | past | comments | ask | show | jobs | submit login
Big Ball of Mud (1999) (laputan.org)
162 points by thesuperbigfrog 14 days ago | hide | past | favorite | 133 comments

Sometimes a big ball of mud is exactly what’s needed.

Undertale has a single giant switch statement thousands of cases long for every line of dialog in the game. There are other stories of how infamously bad the code is. And none of that mattered one bit.

Banks are another example. The software is horrible. Everyone knows it. Yet all that’s required is minimally working software that fails rarely enough that the bank can maintain their lock in.

Believe it or not, being hired to work on a big ball of mud can be pretty chill. The key is for you to be in a stage in your life where you don’t really care about your work beyond meeting professional responsibilities. Big ball of mud codebases are the best for that, because everyone’s there for the same reasons, and the assignments are always easy drudgery. Neither Scottrade nor Thomson Reuters cared too much what I was up to as long as I was there at 9am.

It’s soul crushing for people who want to create nice things, or seek meaning in their work, which is partly why I left. But when you just need the money, there’s no substitute.

A nice aspect of big balls of mud is you can plow over small sections to make gardens. Once you accept that completely dumping the ball of mud is impossible, you can just "do a good turn daily". Clean up a little code here, there, everywhere. Most people working on balls of mud don't care that you do this (provided you aren't completely rewriting 90% of the system).

You'll still have to regularly get dirty, but the petunias you grew over the years form nice memories.

I had this field of mudballs where I kept getting bug reports related to this particularly muddy ball, and I kept slapping on more mud (always in a rush) and it never quite fixed it. Eventually I had read that code so much I almost knew it by heart, I had spent so much time there. And one night the new design appeared in my head. I suspected it would take only a few hours to implement. And, sure enough, five days and a few evenings later I had turned it into a beautiful garden. It passed all tests, it hummed like a well-oiled bee, it was awesome to behold.

And then, because it generated no bug reports, I never looked at it again; and I spent the rest of my days there slapping dirt and water on the other mudballs.

that's an awesome analogy, built very similarly to the system it describes :)

The problem is that all of this time "plowing fields" and "doing a good turn daily" could have been avoided, and when a big ball of mud gets really hairy this can easily turn into the bulk of your work day and dominate projects. It can cause a company to deliver slowly and miss opportunities.

> could have been avoided

Hard to know if that's true or not. Some code goes on for literally decades before being touched by another person. Often times, the parts of the code that are high touch end up justifying more gardening to make future touches less painful/faster.

Knowing when code will be cumbersome is really difficult when writing fresh.

> Knowing when code will be cumbersome is really difficult when writing fresh.

I think a large part of what goes wrong is that code starts out too simple. When you look back at the history of the code, the original check-ins almost always look good. Good, but very “direct”. Here was the problem, and here we have solved it as bluntly as possible.

As the changes come, they often break concepts that were introduced in those first few check-ins. It’s understandable, the person coming after is just trying to add “that one thing” with as small a change as possible. These accumulate, and soon any semblance of a logical “model” has been lost. A bit of “build an example for others to follow” at the beginning, might have saved things down the track.

However, do too much “over-engineering”, and no one is checking in your code, and you might be wasting tons of effort.

Hence, experience is required to know the basic pitfalls that develop, and where you need to put in the extra for that initial check-in. Many code-bases, of course, are started by people without that experience, and balls of mud inevitably develop.

You can refactor them! It’s carve out chunks, partition them off, keep the two in parallel until the new is as stable as the old. Banks have actually done this; not all of them. The reality is that if you don’t, new features simply cost too much to be worth developing. You can eventually be completely paralysed by technical debt.

    > could have been avoided
I complete agree with your sentiment. Too many of these commenters are unrealistic: In most cases you cannot avoid the BBoM. Usually, it is written by someone before you... or above you... so you have little power to change it, but you can make meaningful changes step by step.

In all fairness, they didn't say "could've been avoided by you specifically" rather "could have been avoided". And this I'm inclined to agree with

One issue is that the hardest to maintain code I have seen was written by well meaning people who have really put themselves into it. They just made a wrong decision somewhere, because they were not gods.

for every big ball of mud there's 10 premature optimizations

> Clean up a little code here, there, everywhere.

What does "clean up" mean? If you mean run a safe code formatter on some 20 year old PHP code that was written in vi, sure. But if you mean refactor, this sounds rather idealistic.

IMO, you are already outside of BBOM territory if your BBOM comes with a robust development environment where you can safely implement such changes without breaking prod in some insane and unforeseeable way.

The Boy Scout rule.

Technically a huge switch statement isn't a ball of mud. It is mutually exclusive and completely exhaustive.

It may have other issues, but is actually better than what this article is describing.

I happen to agree, but only in this special case. Normally a giant switch statement is a pain to work through if there's even a tiny amount of logic involved.

One hilarious case was at Scottrade. I was poking around their C++ codebase one day and to my amazed horror saw a file hundreds of lines long, and a big chunk of it was devoted to handling leap years. Like, every leap year, one at a time.

There are few enough leap years that it should in theory be difficult to write hundreds of lines of code to handle each of them, but some contractor managed to.

And so the ball of mud continued. What can you do but laugh and embrace the horror?

I think any time you're dealing with data in the form of an enum, a switch usually the natural way to handle cases. Many compilers will warn you when you've missed a case, which is a nice check to have. Similarly, in languages that support ADTs, pattern matching tends to be the sensible thing to do.

But I agree that in the case you described, that the programmer was being stupid. However they could have written it as a giant if-elseif block and that would also have been stupid, or a loop over a giant list of years, and that also would have been stupid. I think the problem was the programmer not thinking about the problem carefully, not with the control-flow construct they used to write the bad code.

> What can you do but laugh and embrace the horror?

laugh. embrace and share the fun with others: submit to https://thedailywtf.com

Maybe the author wanted to squeeze as many hours out of the task as they could? Plenty of non-top-quartile people love this sort of thing as a day at the beach.

For the diametrical opposite, here's one time on an embedded system where I really didn't want to bring in a library dependency, so I wrote an inconspicuous-looking good-enough version:

   y = (d - (d + 366 + (d >= 47847)) / 1461 + (d >= 47847)) / 365;
   d = d - (d + 365 + (d >= 47848)) / 1461 + (d >= 47848) - y * 365;
   y += 1970;

#include <iostream>

int main() { long long d; std::cout << "Enter a Julian Day Number: "; std::cin >> d;

    if (d < 0) {
        std::cerr << "Error: Invalid Julian Day Number (must be non-negative)." << std::endl;
        return 1; // Indicate an error to the system

    const int DAYS_PER_YEAR = 365;
    const int DAYS_PER_LEAP_YEAR = 366;
    const int DAYS_PER_LEAP_CYCLE = 1461; // 4 years
    const int JULIAN_TO_GREGORIAN_THRESHOLD = 2299161; // Oct 15, 1582

    // Adjust for Julian-to-Gregorian transition
        d += 10; // Account for dropped days

    int a = d + 32044;
    int b = (4 * a + 3) / DAYS_PER_LEAP_CYCLE;
    int c = (4 * a + 3) % DAYS_PER_LEAP_CYCLE;

    int y = (b / 1460) + 1970; 
    d = (c / 4) - 365;

    if (d < 0) {
        d += DAYS_PER_YEAR + (y % 4 == 0);

    std::cout << "Gregorian Year: " << y << std::endl;
    std::cout << "Day of Year: " << d + 1 << std::endl; // Add 1 as days are typically 1-indexed

    return 0;

Not only that, but you can ask it to rewrite it in iambic pentameter and with pirate-speak identifiers.

Really begs the question of what the long-term outlook is for the non-top-quartile people. Maybe re-prompting LLMs is the new ball of mud?

> Maybe re-prompting LLMs is the new ball of mud?

Recently, my company acquired some engineers, and while I cannot evaluate their "percentile", I have been low-key flabbergasted at some of their suggestions for using LLMs in place of scripts. (For example, to modify or move nested data in a non-public JSON representation of a program.)

> As a rule of thumb, leap days come around every four years. But there are exceptions to this rule. For example, at the turn of every century we miss a leap year. Even though the year is divisible by four, we don't add a leap day in the years that end in 00. But there's an exception to this rule too. If the year is a multiple of 400 then we do add in an extra leap day again. At the turn of the millennium, despite being divisible by 100, the year 2000 did, in fact, have a 29 February because it was also divisible by 400.


That reminds me of this program, which calculates the month and day given the year and the day of the year [0]. There's a version that abuses goto, and one that stores state in variables and only uses structured statements. You probably won't see people similarly abuse goto today, though. I think it was translated from Fortran.

[0]: https://craftofcoding.wordpress.com/2020/02/12/the-world-of-...

Not for the task at hand. The initial d was divided down from an unsigned 32-bit timestamp in seconds from the Unix epoch (hence the 1970.) It accounts for 2000 (leap) and 2100 (not leap), after which the u32 overflows.

I am not talking about ease of programming, and c being imperative, and the explicit break are edge cases.

But for case value

value can only be char or int, and must be unique

Really it is just sugar for if-else-if, but if you enforce explicit break in your code it is as close to a provable total function as you get in C.

Total functions are the ideal in any code.

As determining whether or not a function F is total is undecidable, switch is more reliable than if-else-if ladders.

Was it then case that there was special logic attached to special leap years like all of the exceptions we hear about in other software products like Excel etc? Or was it just "return true" and "return false" ?

At least you can be sure that it worked, whereas if he'd actually written the succinct version there was a risk of a subtle bug.

You can only be sure if you verified every year and didn't miss a typo. Having a test that straightforwardly checks every year individually would be more useful, since then you know if you changed anything, and can double-check just the changes.

Although, I suppose it doesn't matter all that much which is which, as long as they're not both written the same way.

How do you ensure that you don't put a typo in the unit test?

Hopefully, it wouldn't match the algorithm.

It's implementing it two different ways and comparing the results. It's not a guarantee, since you could have implemented the wrong spec, but there's only so much you can do with internal consistency checks.

But it would be better to compare to a known good function. It might be easier to add a dependency to do that in a test than in production code.

If you used the list of years in the test and the mathematical expression in the impl (or vice versa), then you know you didn't make the same mistake in both.

Unless there's something about leap years that you don't understand and miss in both tests and implementation. Not an impossible situation.

Why would such a system assure someone that it worked? Conversely, why would only the succinct version lend itself to a subtle bug?

Suppose that someone tried to naively derive the leap year by dividing the current year by 4. There are some years that are divisible by 4 but that are nonetheless not leap years. If you fail to account for this in your succinct implementation then you might introduce a leap day that doesn't exist at some point. I know the table implementation seems really dumb, and it's not that hard to look up a correct algorithm for this, but it's a good example. If you build a table from some correct source then you can simply use that table instead of running a calculation. There are fewer than 30 leap years in the next 100 years, by which time something cataclysmic will happen that will completely erase the need for your code anyway. At least, you will be dead before you need to revise the code again. So it's reasonable to build a function with a table given the time-limited nature of the code and the desire to be visibly and explicitly correct. Sure you could add a unit test with the table and test the calculation, but that actually adds more code to maintain than just jamming the table directly into the function.

I don't think they were stating that a simple table based solution was used, if it were, it would probably meet the bar for "succinct" because you could just write that as something like `if X in Y`. Rather, I got the impression that a lookup based approach was intended, but it was enterprise-ified (like https://github.com/Hello-World-EE/Java-Hello-World-Enterpris...).

Oh, gross.

Doing modulo 4, 100, and 400 isn't really that complicated to implement. I know there are even simpler ways to calculate this but come on, this is stupidly easy stuff. Then again we live in a dimension where there are Node packages for IsEven() and IsOdd() with tons of downloads.

Sure, in Node.JS, and with a specific calibur of programmer in a specific business context this is easy. Technical problems are easy to solve, but people and organizational problems are where the real challenges of engineering come. Sometimes you have to grit your teeth and implement something for the company you work for rather than for your own personal sense of correct.

Mine is as succinct as they come, but a little bit buggy

  int is_leap_year(int year) {
    return 0;

I don't think that qualifies as a "subtle" bug. :)

Ooof, I have dealt with some fun balls of mud.

Had a customers app break when they used zips that contained over 65k separate files. Sent it to dev and they said "Oh use this other API we have". And yes, the file 'worked' but the api didn't do what the customer needed. Got back with the developer and saw

We had two api's that did almost but not quite the same thing. Ok, it happens, maybe there was a reason for it.

We also had two different zip libraries built in. At some point a dev ran into issues with legacy zip support not working on large files and added a new library. They converted about 10% of the API, enough to cover what they were working on and stopped.

>> "Oh use this other API we have"

This is an interesting problem. If you insist on maximizing consistency on a large code base, then nothing can ever change. If you allow too many small deviations / good ideas that are only ever partially completed, it results in chaos. I think it is a little like entropy: some (not too much) is needed for things to progress.

So this was a few years back and they've wrangled a lot of the mess in. They've went with API versioning where you can set a flag in the request of something like ver=5 and you get a static version of the api that doesn't change. If you don't set an api version you get the latest version, which may break on updates.

I think for most users it's made feature observability much better.

The part with having multiple libraries that do the same thing is always an interesting engineering case. Usually it's the case that different libraries support different features, so trying to settle on only one library is folly. Same thing when converting all of the existing zip code to handle a new library, who knows what will break, and even though it may be required production/management rarely want to take the time to make the switch.

Case in point I've been at multiple places that have had multiple JSON and XML libraries in the same product at the same time.

I used to run supply chain tooling for a big manufacturing firm and, at one point, some dev got into a fight with a security-focused sysadmin somewhere who refused to allow the use of sftp for file transfer with trading partners (things like forecasts and order documents from suppliers)... so instead of either escalating or creating an alternative architecture, the dev unilaterally decided to leverage the fact that ports 20-21 were inadvertently open externally and use standard FTP for this. Nobody was fired and nobody really knew because it was an app supported by one person and as long as the business process didn't fail everything was "ok".

Oh, I have written plenty of blatantly bad code to evade security-focused sysadmins, and yes, many of those have made security worse. But they did solve problems.

Turing Machines have that property. There the Tarpit lies.

One of my first programming gigs was working on a 1970s-vintage, 100KLoC+ FORTRAN IV codebase.

All one file.

No subroutines.

No comments.

Three-letter variable names.


VT-100 300-Baud terminals.

Inspired me to never do that to anyone else[0].

[0] https://littlegreenviper.com/leaving-a-legacy/

>Three-letter variable names.

You're giving me flashbacks to all the SAS code I inherited. The tables had three-letter names like "AAA" and "BBB", and all the columns had single-letter names. Wasn't long before I completely rewrote the entire thing (a couple times, because I wrote junk, too).

When I hear stories of programmers who put a moat around their jobs with incomprehensible code, I usually assume it's exaggerated, but in your case I wonder if that's exactly what happened.

I have found that implementing a regression test suite for such systems is a really good way of holding a mirror up to the mud ball and showing it what it could look like if it smartened itself up a bit. If it’s too much to apply some discernment and modularity to the underlying codebase then you can instead do this with the test suite and reflect those discoveries back into the codebase with the benefit that you can now prove your changes don’t break the documented, expected behaviour.

It’s like rewriting the code but by passing the ball from production code to test code and back to figure out who the individual players are.

Games are a different story, especially single player games without a live service component. Most of us write software that needs to be maintained and iterated on. It has nothing to do with "finding meaning" in work and everything to do with delivering to a reasonable standard in a reasonable time in a sustainable way.

> Believe it or not, being hired to work on a big ball of mud can be pretty chill.

Probably depends on how big the ball is and how messy the mud. If you’re not careful, you might find yourself drowning in a whole swamp instead: https://news.ycombinator.com/item?id=18442941

Once you have to dig through dozens of files full of hacks and bad abstractions just to make basic changes and even then the insufficient test suite prevents you from being confident in any positive outcome (the example above at least had good tests, often you don’t even get that much), maybe things aren’t going great.

That said, you can definitely write good monoliths and even codebases with technical debt can be pleasant, as long as things are kept as simple as possible.

A big ball of mud is never what's needed. It's just what happens when too many people have cared too little for too long.

I suspect that the perspective of "It sucks but it works for us; I'm not idealistic anymore" is really a rationalization of despair, but that's just me.

I couldn't disagree more. Given time and budget constraints and the tendency for code/services/entire business models to deprecate in less than a decade heavy architecture should be the last thing that anyone reaches for and only after a need has been comprehensively proven for it. Over the decades I've lost count of how many project budgets have been blown to hell by well-intentioned, well-educated, experienced, and competent developers attempting to apply whatever architectural best practices du jour (the churn in this space is telling). End of the day it's still just code. There will still be bugs and it'll still be a pain in someone's ass to deal with. KISS uber alles.

Sounds like you're saying either:

1. Engineers don't care about the health of the codebase and it becomes/stays a ball of mud, OR...

2. They do care, and end up refactoring/rewriting it in a way that just creates even MORE complexity.

But I think this is a false dichotomy. It just happens to be very difficult, and as much of an art as a science to keep huge codebases moving in the right direction.

I haven't worked at Google, but from what I've heard they have huge codebases, but they're not typically falling apart at the seams.

1. I reject the notion that a ball of mud signifies lack of care. Nobody gets up in the morning planning on shooting themselves in the foot daily for the next N years and culturally devs aren't great about embracing the healthy benefits of pragmatism. With few exceptions architectural best practices are either largely esthetic or born from the particular constraints of specific tech stack and then leak due to concerted evangelical efforts by individuals polishing their street cred and organizations eager to bolster their in-house devs with unwitting volunteers (see also the near complete capture of open source by industry).

2. They absolutely care, but possibly about the wrong things, and this happens so often it's a cliche. Remember when a big chunk of the industry woke up one moring and decided RDBMs just had to be bullshit because my god look how old they are and collectively dove themselves, their projects, and their organizations off the cliff that was shoehorning mapreduce into a project NOT owned by Google?

Speaking of Google, I feel like there are few things less relevant to the industry writ large than whatever they happen to be up to at any given moment. From tech stack to hiring practices their needs are so totally orthogonal to what any other org requires they honestly may as well be operating in another dimension. Chasing after Google is like running around with a fork in your hand in search of a wall outlet.

Anyway, to finish up my thoughts on architecture as a concept, yeah having some structure to a codebase is something of a no-brainer, but dogma is as dogma does. There are very few unimpeachable capital T truths come from the software architecture intelligentsia. We could talk at some length about the decades of churn in this space as roughly every 5 years another crop of grad students gets unleashed into the industry dead-ass certain that they have found The Way.

I honestly don't get why things like giant switch statements are even considered bad or smelly. You have to put the dialog somewhere. What's the material difference between putting it in a switch statement vs separate functions vs named text files? The dialog isn't going away. You still have tons of dialog.

Even for regular code organization. 100 small functions is ok but 100 switch cases is bad? It's the same amount of code, and switch cases aren't that "hard to read".

100 small functions that only get called in one place is also bad. It implies that most of them are slight permutations of each other and can be cut with the right parameterisaton. Unless your logic genuinely has 100 qualitatively unique special cases, and in that case, you’re probably screwed, as there are no elegant solutions.

Switch statements are O(n), but reading dialog config is O(1). It matters when trying to get the game loop to run at 60fps

Edit: Evidently you can get O(1) from a Switch if the compiler converts it to jump statements. Just adding this for posterity, and to correct myself.

For one, changing a typo shouldn't require any code to be recompiled.

Having all the dialog logic in a single switch statement doesn't mean that all the text is right in there. It can still be referring to IDs which are then looked up in a table for the right translation.

That would still be a step above the Undertale case, where the dialog was in the code. The big problem there is that your localisers now need your source code and (potentially expensive commercial) build system to test that their script doesn’t break anything. Or you end up in an expensive back-and-forth with a contractor compiling it for them.

Just taking localisation strings out of the switch statement doesn’t fully fix this. You can swap out individual lines, but characters are still bound to say the same number of lines in the same order in every language, with player choice happening at the same point. This may work for simple conversations, but it will result in clunky localisations when languages differ in verbosity or sentence structure.

Why not? Because your compile times are long?

Changing a translation yaml triggers a webpack build in a lot of web setups but nobody seems to complain about that.

> Sometimes a big ball of mud is exactly what’s needed.

And if you wait a little while, what you need is new developers.

big ball of mud is a great pattern for games (non live-service ones, at least)

once you ship, you don't really need to maintain it the same way as other software. tech debt is free money if you ship before having to pay it back

and players often like complexity and interaction between systems, so there's less benefit to isolating them in code

Looking at some high-profile games that never shipped, that’s a very risky gamble if you don’t know when the bailiff will come to collect.

"Banks are another example. The software is horrible. Everyone knows it. Yet all that’s required is minimally working software that fails rarely enough that the bank can maintain their lock in." Big balls of mud at banks, are a liability these days. They are under increased competition from fintechs that are have better codebases and processes, and can add features much more quickly.

Came here to express the same sentiment.

> Sometimes a big ball of mud is exactly what’s needed...Banks are another example...It’s soul crushing...which is partly why I left.

Not a very convincing argument. ;-)

Yeah this is right. It's like being a software oncologist.

Often there's nothing to do. The patient is somewhat ok, and intervening would mean risking death due to the tumour being near some critical blood vessel.

At most you have some routine maintenance to do on a daily basis. Cut some things here and there, reroute a few small things, tell the patient they need this or that medicine.

You might even find it interesting to learn how the body of code works, the journal of previous surgeons, the little nooks and crannies where interesting decisions are made.

But you are never asked to rebuild the entire body.

> Believe it or not, being hired to work on a big ball of mud can be pretty chill. The key is for you to be in a stage in your life where you don’t really care about your work beyond meeting professional responsibilities.

Yeah, that's it, isn't it? I could probably make a career off of tiny incremental improvements to some terrible business app... or I could ... care. (I do realize the drip of sarcasm in your post, don't worry)

Anyway, hail Mammon, I suppose, innit? Unless...

Care about what though? Given how frequently paradigms shift in the meta of software development investing in this space on an emotional level plants you (and your project) squarely on a treadmill. If it meets spec and the bugs are mostly corralled what else is there really?

Stolen from dang [0] on the last submission with that submission added in:

Big Ball of Mud (1999) - https://news.ycombinator.com/item?id=35481309 - Apr 2023 (32 comments)

Big Ball of Mud (1999) - https://news.ycombinator.com/item?id=28915865 - Oct 2021 (23 comments)

Big Ball of Mud (1999) - https://news.ycombinator.com/item?id=22365496 - Feb 2020 (48 comments)

Big Ball of Mud - https://news.ycombinator.com/item?id=21650011 - Nov 2019 (1 comment)

Big Ball of Mud (1999) - https://news.ycombinator.com/item?id=21484045 - Nov 2019 (1 comment)

Big Ball of Mud (1999) - https://news.ycombinator.com/item?id=13716667 - Feb 2017 (6 comments)

Big Ball of Mud (1999) - https://news.ycombinator.com/item?id=9989424 - Aug 2015 (9 comments)

Big Ball of Mud - https://news.ycombinator.com/item?id=6745991 - Nov 2013 (21 comments)

Big Ball of Mud - https://news.ycombinator.com/item?id=911445 - Oct 2009 (2 comments)

The "Big Ball of Mud" Pattern - https://news.ycombinator.com/item?id=10259 - April 2007 (2 comments)

[0] https://news.ycombinator.com/item?id=35484495

I used to be really good at working with big balls of mud.

I actually even started enjoying it at some point - it felt like a mixture between a crossword puzzle and archaeology. There was also something nice about being good at something other people hated that was nonetheless very critical.

Somewhat ironically though, the better I got at it, the class of job I got the better the quality of the code I ended up working with.

Now my mud wrestling skills have atrophied a fair bit.

Generally speaking, mud wrestling jobs exist in traditional IT departments, not tech product companies, and the jobs pay far less than those where the code is the product. For better or for worse, but the existence of "big balls of mud" is largely a result of "you get what you pay for" mentalities in non-technical CIOs & CTOs... or industries where IT spend is below some low threshold (e.g. 1% revenue).

The article calls these folks "swamp guides"

I really think that archaeology you speak of is great for building skills, and as a result when giving the opportunity you start building much more elegant solutions.

My career path has followed a similar trajectory but I do enjoy a good "mud wrestle" as you call it every now and then :)

> a mixture between a crossword puzzle and archaeology

Cue cave escape scene chased by huge rolling boulder.

I worked at a firm once that had a cronjob that restarted 70+ instances of the same component.

The line was essentially a collection of:

  restart instance_N &; restart instance_N+1&; 

I always assumed that the way you got to 70+ was:

- write the first restart for instance N=1

- add another restart

- keep doing this till you get to about 5-8 restart commands

- AT THIS POINT someone should say "Hey, we should make this a script or something else" BUT CRUCIALLY that didn't happen

- you keep adding restarts till you get to 20+

- Now, doing anything to that giant line other than adding a new restart feels too dangerous so you decide to just keep adding restarts.

Moral of the story: the ball of mud usually doesn't just appear out of nowhere. It appears b/c there were multiple steps were someone didn't say "hey, maybe we should..." (or they said it and no one listened).

At my last job we had a few bash scripts like those. They could have used a bash loop, but one of the developers wasn’t good with Bash, so he took to himself to rewrite using a handmade javascript framework.

It was quite complete and very smart, with lots of OOP patterns and wrappers abound. But it was not only impossible to understand, it was almost impossible to debug too, because of the amount of indirection.

Big balls of mud often also appear because people actually say “hey maybe we should…” but but still manage to miss the mark.

As much as I would love to give some insightful comment here, I can’t. To paraphrase a post I saw here about Casey Muratori: sometimes the only advice you can give to someone is to “get good”. Putting in the 10000 hours and getting actual feedback rather than pretending that the bullshit you wrote is good because the previous one was understandable by a non-techie.

>It was quite complete and very smart, with lots of OOP patterns and wrappers abound. But it was not only impossible to understand, it was almost impossible to debug too, because of the amount of indirection.

I've seen this SO MANY times too.

Nothing like having 8 files open to debug what turns out to be a one line change due to all of the object inheritance.

Someone once had me code review a 2500 line bash script, written like a Cobol program, but with no error checking of anything. I'm not sure what my point is, but sometimes I am surprised civilization still works as well as it does.

Yeah that's pretty much how I've observed this happening, though I've found this scenario to be so much more common in teams that are overloaded with competing priorities they have no control of because of (usually) poor and/or naive management. To the point where even at 5-8 occurrences you kick the proverbial can down the road to focus on the "important" work items.

Note this is not about the art form that is essentially polishing a big ball of mud into a shiny colorful ball (of mud)


There's gotta be some analogy to the software big ball of mud though

I can't speak for anybody else, but some days this is what dev work feels like to me.

Those balls look similar to the Google Chrome logo. Total coincidence, I’m sure!

No worries, we have since moved on and people who cant write more than 2000 coherent lines of code have invented a new solution called microservices. Now you can have very simple services that don’t turn into a mess!

Well that’s the idea anyway, but it always turns into a mess and then I have been using a new term to describe it: spaghetti over http

It's the same Big Ball of Mud architecture.

It's just implemented over YAML (what a great programming language!) and has crunchy microservices inside! (I'm not sure I want to know what the crunchy bits in a ball of mud are).

Well, as a BBoM rolls on down through a village, it tends to accumulate various trinkets and detritus from smashed buildings, along with the broken bones and teeth of the people it obliterates, along with their pets and livestock, and any nearby wild fauna.

I often forward this article to my fresh software engineers but they often get put off by the layout. So I tell them beforehand not to pay too much attention to it =)

P.S. I love it and still re-read it sometimes.

Feels like there is a similarity in a big ball of software mud and complaining about the layout of a website. Does it do the job? Yes? Cool, moving on.

This one and https://www.mindprod.com/jgloss/unmain.html should be required reading for all new programmers. Although sadly they can’t really appreciate them until actually subjected to them.

Take a ball of mud, apply pressure and time, and you sometimes get a septarian concretion. They can be quite beautiful, I have a pair of bookends cut from one on my shelf.


I've seen some big balls of software mud that formed very nicely under pressure too.

Who links images you have to login to view?

I can view without an account. It may be region-specific (or ad-blocker specific).

Is the guy in that first picture John Lennon?


After doing the obvious thing and googling "john lennon shovel waiter" it is indeed him from Magical Mystery Tour

I have been pondering about this issue for a while. Maybe it is inevitable that successful systems turn into big balls of mud eventually once the "inflection" point has been reached and (slow) deterioration begins.

It is somewhat of a clichè but I think that (interactive) documentation and tooling can make a difference, but it is very difficult to design the process and the tooling to be as frictionless as possible. Tudor Girba and his team at feenk have been doing a lot of interesting work in that area that's worth a look [1, 2].

The software in question might be an entangled mess, a large part of it might be even inherent due to its requirements or technical constraints, but if that messy web can be readily augmented with background information and sign-posts I think the situation could be significantly improved.

On a related note, there has been a project at Xerox PARC called PIE (Personal Information Environment) [3] which has put forward the idea of organizing software and its artifacts (source code, various kinds of documentation) as a network. Although that particular concept has never been adopted in any major programming system as far as I know, I think that it has huge potential, especially if the software, as a network, can be collaboratively navigated and extended with additional information, where and when needed -- online.

Now that all does not change the fact that we are still dealing with a ball (or a web) of mud, but at least it is accessible and we might have a better chance to understand its evolution and the reasons that made it complicated in the first place.

[1] https://feenk.com/

[2] https://gtoolkit.com/

[3] http://www.bitsavers.org/pdf/xerox/parc/techReports/CSL-81-3...

I can also recommend the whole book "AntiPatterns", with many more such Anti-patterns. http://antipatterns.com/thebook.htm

What's the correlation between code quality and business success?

This is bearishly hard to measure. For example (this is a real, recent scenario, left intentionally vague): a new government regulation gives you 12 months to expose certain data conforming to a provided API schema. The system under development is a big ball of mud. Re-shaping the data to conform to this schema turns out to be extremely difficult but failing to produce the APIs on time will result in company-endangering penalties from a government regulator with no sense of humour.

So you put a team on the APIs and issues keep cropping up again and again because every time one thing is changed it has knock-on effects causing other company-endangering incidents because you're making changes to to way billing and invoicing data is presented, or to how incoming metering data is stored. So you put some additional teams on the project and work everyone to the bone. Some senior developers burn out and leave the organisation. Likewise some project managers. Your best QA staff have their morale drop to rock bottom.

The project delivers on time but how do you measure these costs? How are you going to put a dollar figure on the burnout and lost staff? How much of the senior developers' decisions to leave can be attributed to the project and the big ball of mud? How do you measure the impact of your best tester losing his will to work and going dark?

Many of us have worked in organisations where things have gone down like this. We all know that impact is huge. But yeah you're right, I can't put together a business case that says it in clear numbers. On paper we had a one-year project, it probably went a bit over budget but it ultimately delivered so what's the big deal about the big ball of mud? Except that our competitors were able to produce the same APIs for a fraction of the cost without burning out half their staff.

I’ve seen companies rapidly collapse because they couldn’t adapt their code base to current platforms or new requirements. As long as nothing changes and you don’t have too many bugs, your code can look like crap under the surface without issues. It’s the day you can’t sell your product without a UI that works on a phone and you have to change every single method to make the change, or you fail to implement the new tax rules correctly even after 18 months of all hands on deck, it matters.

Then it doesn’t have be that bad to have business impact. What does your customer say when the new feature that was promised in one month takes eight months to deliver, or your CFO say when it will take 15 years of licensing fees from that customer to pay those 24 man months to get the features working?

Depends on the timescale and how captive the audience is. Oracle DB has a reportedly pretty bad codebase but a captive market that keeps them from needing to address the problems (as rapidly as their customers might like). If low quality lets you be first to market, that may be worth the later maintenance costs by gaining an outsized portion of the market and earlier revenue than your competitors.

On the other hand, if you've got an old system with poor quality code and can't make changes, your customer(s) are willing to drop you, and you have competitors who can get out the desired features, you're screwed.

Nobody knows, since there is no agreed-upon way to measure code quality.

LOC in Lisp multiplied by ARR

I think ARR/LOC would be a better metric.

I doubt you can correlate to a line. But it may track a down-facing arc.

My personal opinion about the Big Ball of Mud is that it lacks the visual component makes problems immediately apparent: No one dares to enter a building with a big visible crack along one of its edges, but if such a thing existed in software, management would just tell the team to power through it.

The moment that someone manages to crack the software visualization problem for the layperson, would be the (moment + delay) that considerable effort would actually be invested in software architecture.

> This paper examines this most frequently deployed of software architectures: the BIG BALL OF MUD. A BIG BALL OF MUD is a casually, even haphazardly, structured system. Its organization, if one can call it that, is dictated more by expediency than design. Yet, its enduring popularity cannot merely be indicative of a general disregard for architecture.

> These patterns [BIG BALL OF MUD, THROWAWAY CODE, PIECEMEAL GROWTH, KEEP IT WORKING, SHEARING LAYERS, SWEEPING IT UNDER THE RUG, RECONSTRUCTION] explore the forces that encourage the emergence of a BIG BALL OF MUD, and the undeniable effectiveness of this approach to software architecture. What are the people who build them doing right? If more high-minded architectural approaches are to compete, we must understand what the forces that lead to a BIG BALL OF MUD are, and examine alternative ways to resolve them.

In the 80s I wrote a Transport Layer as a computed GOTO via a PL/1 language feature, basically it was a finite state machine.

I think a lot of problems which reduce down to this are done by function-table lookups (as in, look up in a table the index of the next state, and call the function) but for the language I had, that wasn't an option.

The codebase I work on is a Big Ball of Mud. It takes roughly 3x the effort to make any functional changes due to this (and thus we have 3x the devs we actually need). When it was framed like that, the business was eager to start a total rewrite project.

"Grand rewrite in the sky" is also a common pitfall. There is a lot of tradeoffs. If proper practices aren't in place ahead of time and/or an unrealistic deadline is pushed, you often end up writing a new system that's just as bad as the old one.

Hmmm... considering that your current codebase became a Big Ball of Mud, is there any reason to think the same won't happen to your next codebase?

Not that I'm saying don't rewrite. It will certainly be more fun and likely good for the resume.

There's no reason to believe that people, teams and organisations can't learn lessons. This is why we foster a growth mindset. It takes effort and discipline but it can be done.

If the team was top-notch and disciplined they would continously refactor and improve the code base.

There are teams that want to improve the code but get tired of fighting management, but if you go in as senior IC and encourage them, showing it as a heroic fight for the good, the true and the beautiful, they will respond with relief and eagerness and many idea to improve things, and a willingness to find ways to marry the management requests with the proper long term quality way.

Have you considered the possibility that 3x as many programmers were always dedicated to the project made it grow into the big ball of mud that it is?

Team size correlates with system complexity regardless of the problem domain.

The story in this case is a little more complicated than that, we're a team of 3 that inherited a decade old codebase built by lowest bid contractors.

> ....and thus we have 3x the devs we actually need). When it was framed like that, the business was eager to start a total rewrite project.

What fraction of developers is on Team Rewrite?

I feel like this is typically not the right choice.

How did you measure the 3x factor?

Mostly by looking at how long simple changes (e.g. changing a fee + fee message at checkout) that should only take a day or less, actually take.

But how do you know how long a change “should” take? And how do you know if the rewritten code would be any easier to change?

I tend to be sceptical of “rewrite everyting” projects since they are often based on wishful thinking.

Alice has arthritis, so it takes her 3min longer to write any functional changes. She, however, has domain experience across the entire application. Bob can write code 2min quicker than Alice, but is relegated to a specific section of the codebase.

Aren’t we paid the big money precisely to handle Big Balls of Mud? If everything were pristine, easy to read, easy to extend, etc., then anyone could do it, for a cheap price.

If it were that easy, why do most systems converge to The Big Ball of Mud?

Related: Big mound of pebbles. Instead of a pile of a million lines of code with no structure, it's a pile of 10,000 classes. But hey, it's got some out-of-date UML diagrams in a 1990s CASE tool that "explain" the architecture!

I put a UML diagram generator in there somewhere, you just need to run it to get the docs up to date. It's been a while though... but I remember that the name was a really good pun. Good luck!

The ravioli architecture.

It's not spaghetti!

I'd have liked more mathematical treatment. Can the ball of mud be modeled as a some kind of an attractor or a result of backpropagation style optimization inside the organizational chaos and multitude of conflicting priorities, resource constraints, etc.

Fun fact, when I told my skip level manager that we have is big ball of mud, I think he was unfamiliar with the term.

Few months later I heard a complaint from my direct manager that I called our code “big ball of crap” and it was a major software company


Thank you. Updated the title.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact