
In the Defense of Spaghetti Code (2013) - dvfjsdhgfv
http://250bpm.com/blog:36
======
iraldir
I understand the author origin point: To an extent, you cannot simplify an
already complex domain problem with code. However, this has nothing to do with
Spaghetti code. You could have one 1500 lines function that is not spaghetti
code. It's hard to imagine, but it's possible. Similarly you can definitely
have a bunch of functions, classes etc. that are spaghetti code, even if they
are 3 lines each. Spaghetti is about the order in which you make your calls,
the convoluteness of your loop etc.

I think the only point of the article is "Don't try to retro engineer the
logic behind a complex domain that you do not master".

~~~
mannykannot
Yes - spaghetti code adds a dimension of complexity that is distinct from mere
size, and in fact spaghetti code might be smaller than the same algorithm
written as structured code (by the way, large blocks of structured code can be
imagined by thinking of taking good code and inlining all the functions, with
variable renaming to avoid clashes.)

It is fortunate that it is possible to show, via Turing completeness, that
spaghetti code cannot compute anything that structured code cannot, for if
not, we would still have programmers insisting (as some did, in response to
'Goto Considered Harmful') that spaghetti code is sometimes necessary.

> I think the only point of the article is "Don't try to retro engineer the
> logic behind a complex domain that you do not master".

AKA Chesterton's Fence:

[https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence](https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence)

~~~
kjeetgill
I agree with the first of what you're saying but just wanted to add to others:
Structured Code was a software engineer paradigm that didn't just mean "well
organized" aka structured well; it was about specific constructs and
conventions. Just as Functional Programming isn't just about having functions
or being functional as in useful.

Also, I always think Turing Completeness is a red herring in conversations
about Software Engineering. Turing Completeness is such a low bar, that it
hardly factors into most conversations about the expression problem. It's most
notable when the goal is to actively restrict completeness.

It's kinda like starting a conversation about birds and first establishing
they have mass.

[https://en.m.wikipedia.org/wiki/Expression_problem](https://en.m.wikipedia.org/wiki/Expression_problem)

~~~
mannykannot
Structured programming, in the specific sense that we are both talking about,
was such an influential paradigm that most programmers today, other than those
using assembler, haven't seen anything else. Maybe that is why the author of
this article seems to think 'spaghetti code' simply means large functions.

And I'm just pointing out that in this particular case, Turing completeness
was very useful in decisively stopping what would have otherwise been an
unending and inconclusive argument, and, regardless of whether or not it can
justifiably be called a low bar, it was good enough in this case. Furthermore,
It does the same in many other cases: for example, in the question of whether
there are ISAs that are more computationally powerful than others. It seems to
be, or at least have been, very useful in computer science even if it does not
have much relevance to everyday coding, and it may not seem much of a big deal
because the questions that it has solved are no longer (or never became)
problems.

~~~
kjeetgill
Huh. I made the comment about Turing Completeness being "low-bar" from the
perspective of Turing tarpits [0] or NAND universality (which needs
memory/flip-flops). It seems to be easy enough that languages like Brainfuck
can be Turing complete with a handful of elements. So much so that C++
templates or extended regexs seem to _accidentally_ be found to be Turning
Complete.

[0]:
[https://en.wikipedia.org/wiki/Turing_tarpit](https://en.wikipedia.org/wiki/Turing_tarpit)

> In any Turing complete language, it is possible to write any computer
> program, so in a very rigorous sense nearly all programming languages are
> equally capable.

You seem to have more history on the matter than I do, so let me ask a
question instead trying to defend that idea. Why did anyone think Structured
Programming wouldn't be? They must have had reasons to suspect it. I also
don't know much of the history of Structured Programming apart from "Goto
Considered Harmful", so pointers would be great.

~~~
mannykannot
I don't think those programmers, who claimed that structured code was
insufficient, had thought about it in those terms. They were probably proud of
their skills and the programs they wrote, and tried to defend it as the right
way to do things, as it seemed obvious to them, from their experience, that it
was.

------
kd5bjo
The article’s argument is a _reductio ad absurdum_ that flows from the outcome
of the purportedly reasonable refactoring of this:

    
    
      int new_sequence_number (int old_sqn, int foosoft)
      {
          if (foosoft == 2) // version 5.34.2 of the protocol
              return old_sqn + 3;
          else if (foosoft)
              return old_sqn + 2;
          else
              return old_sqn + 1;
      }
    

into this:

    
    
      int new_sequence_number (int old_sqn, int step)
      {
          return old_sqn + 1 + step;
      }
    

The problem is that this _isn’t_ reasonable. Though the function signature
looks the same, _foosoft_ is an enumeration of some kind but _step_ is
actually an integer— This sort of change requires an audit of all call sites,
as the meaning of one of the arguments has changed.

One big red flag here is that the new function body isn’t what you’d expect
from its signature. Why is the sequence number incremented by _step_ +1
instead of _step_? The only reason I can think of is that they want to
maintain compatibility with an old callsite, but that doesn’t make sense with
the category change of the argument.

Ideally, the function signature itself should change so that the compiler will
point out all of the obsolete usages to you. If you can’t do that with the
type system, change the function name.

~~~
theon144
What you're attacking is IMHO just a part of a side-note of the actual
argument; sure, that's a bad way to refactor the function. But even with an
appropriate refactor, the codebase as a (fragmented) whole remains less
readable and maintainable than a long spaghetti function _potentially_ is.

Because the problems now cut across responsibilities of the components built
on certain assumptions made for the sake of removing the spaghetti. Because
now you need to account for "global" parameters (protocol version) in lots of
disparate "local" scopes - individual functions who are quickly losing their
degree of encapsulation.

We're discussing a hypothetical code-base, so focusing on the details isn't
probably going to be too helpful. In any case however, I in no way see how the
argument "flows from the outcome" of the refactor you're focusing on.

~~~
kd5bjo
By using a hypothetical to support his argument, the author is implicitly
asserting that his hypothetical is representative of the real-world situations
we’re trying to discuss. He had the freedom to pick the strongest example
possible, or several varied examples, and chose to use only this one.

Specifically, he’s using the failed outcome of this hypothetical refactoring
to argue that _new_sequence_number_ should never have been broken out into a
separate function in the first place because it separated the _foosoft==3_
case from the conditional, which made the bad refactor more likely to happen.

Certainly, in this hypothetical 1500-line function, it’s possible that they
would have appeared close to each other and it’s possible they wouldn’t have.
In this particular case, it’s entirely possible that the refactor was made
_easier_ to execute successfully because the programmer can leverage the
compiler to find all the locations that need to be updated.

In the end, he’s demonstrated a flawed hypothetical procedure and asserted
that it’s worse than an imagined alternative. As usual, the argument being
unsound doesn’t mean the conclusion is incorrect, only that it hasn’t been
demonstrated here.

------
wodenokoto
Lots of small functions and special classes being called left and right can be
much more difficult to follow than just the whole logic written out
consecutive.

~~~
reallydontask
It's all about balance really and this is part of the problem.

1500 lines of code for a function/method would strongly _suggest_ a problem.
In the same way as a 200 function/methods would make it hard to follow.

~~~
barking
If breaking up a procedure results in the creation of a bunch of smaller
procedures that are only called once then I think I'd prefer to leave it as it
was with well demarcated and commented sub-sections.

It saves you the job of inventing descriptive procedure names as well as
jumping between procedures.

I seem to recall also that in Code Complete, Steve McConnell claimed that
there was an inverse relation between the length of a procedure and the number
of errors per line of code.

~~~
CuriouslyC
How is an inline function different from a demarcated and commented
subsection? It seems to me that the semantic and enforced correctness benefits
of functions would make using them in this way a no brainer.

~~~
Gibbon1
This exactly when you break up a logically flat function into subfunctions,
the names of those functions are simply replacing comments.

------
TotempaaltJ
> PS. If you happen to be a developer of a static analysis tool: Consider
> leaving long functions alone, rather then reporting them as suspicious.
> There are circumstances where long functions are perfectly legal and
> desirable.

Sure, _there are_ circumstances. They're just extremely rare and their
occurrence is dependent on the team as well. So generally, I'd say function
length a pretty good proxy for complexity--as long as there are overrides.

~~~
dagss
I disagree with this. I think something should be a seperate function only if
it is (or likely will be) called from several callsites.

If you try to not always think about contrived cases while writing code, then
this rule can sometimes lead to quite long functions simply from executing
steps in sequence ("script").

One can split what is really a long function into several small ones for
readability, but if one does so I think one should declare that function
within the calling function (in languages that support this), and whether one
does that or not is really about as insignificant as tab vs spaces or where
one likes to have curly braces.

My point is, if a function only has one callsite, and is not likely to get
more (incl. test code), then whether to inline that function or write it
seperately is a style/formatting issue, not a maintainability issue.

~~~
RHSeeger
> I disagree with this. I think something should be a seperate function only
> if it is (or likely will be) called from several callsites.

I disagree. I split out anything that I can describe/test separately. If I
need to take a list and

1\. Filter it based on custom logic 2\. Sort it based on custom logic

Then I will generally break the filter and sort out into their own functions
even if they're not used anywhere else. Each function can have it's own
description and tests.

~~~
dagss
I understand that many prefer this. My point wasn't really so much mine or
your preference -- my point was that it is a question of stylistic preference,
not about spaghetti vs not.

As you simply describe your own stylistic preference I am not sure if you
agree with me about that or not?

If you actually write a seperate test for the function then you have two
callsites and of course it must be a seperate function.

------
kartan
Yes. Over-engineering is a thing. Also, there are problems that cannot be
simplified (e.g. Implementing a Calendar class
[http://yourcalendricalfallacyis.com/](http://yourcalendricalfallacyis.com/) )

I agree with other comments that Spaghetti Code is not the right term for
this. But, the original point is valid.

The other part of the point is also true. To hide implementation inside
functions makes sense if the function is part of the real world domain. To do
a cut/paste to extract lines of codes of a big function to make it look nicer
may increased complexity as now jumping around the code is needed to read it.

------
pmontra
I don't dare to call it spaghetti code because (among the other things) it's
split in a lot of functions, but to give an example that's probably equally
difficult to everybody, check this Erlang module that parses ASN1 [1]

A few hints about the language for the non-functional crowd, then my point

1) Functions can have multiple definitions and are picked top to bottom. The
one that runs is the first one matching the arguments in the call.

2) Variables are Capitalized (and can't be assigned again, big surprise.)

3) Atoms (:symbols in other languages) are lower_cased.

4) The arrow (->), the comma (,) and the dot (.) do (more or less) what Java/C
do with their { ; } characters (sorry, I have to keep it short)

You can start from the bottom and go through the get_token, get_line,
parse_ComponentTypeLists etc, find where they are called and see how they
neatly (IMHO) replace the switch statements in other languages.

Is it easier to understand? Hard to tell: if the problem is difficult the code
won't be easy. Is it less tangled? I think it is. My point: some languages are
messiers than others.

[1]
[https://github.com/erlang/otp/blob/master/lib/asn1/src/asn1c...](https://github.com/erlang/otp/blob/master/lib/asn1/src/asn1ct_parser2.erl)

~~~
_pmf_
To add to this (for those unfamiliar with Erlang/OTP): Erlang not only has
general pattern matching, but also specific support for pattern matching on
binary content (down to bitfields). It's better than C and C++ in that regard
in more than one way.

------
_bxg1
...spaghetti code isn't a monolithic function, it's a constellation of
tightly-coupled but separate entities, woven together like a big knot. I.e.,
exactly the thing the author advises against.

Terminology aside, he does make a good point: if the domain can't truly be
broken down into clean parts, encapsulation can do more harm than good as you
have to weave all these contingencies through every layer.

------
theon144
Key takeaway is this:

>The point I am trying to make in this article is that sometimes it's the
problem domain itself that's complex and convoluted. In short, it's a
spaghetti domain.

I definitely have encountered spaghetti domains and have bashed my head
against them, trying to find nice, clean and elegant abstractions that
describe the problem succinctly and clearly.

Sometimes there's just more exceptions than rules, and spaghetti code is not a
bad outcome. As with all guidelines, do not follow blindly where a less
"conventional" solution makes sense?

~~~
barking
I agree, spaghetti domain seems to be a useful concept. It allows you to just
get on with coding a solution.

------
contravariant
If using something that ought to be an enum as an integer to save a few lines
of code isn't spaghettification then I don't know what is. Rather predictably
it doesn't generalise.

It's called spaghetti because it's hard to see what information flows where,
not just because some information flows are long (at least, my mental image
for spaghetti-code has always been _cooked_ spaghetti).

------
jkingsbery
> Luckily though, 45 years after Dijkstra's paper, goto is not used
> extensively any more.

goto might not literally be used any more, but I see "break" used a lot, and
for long functions that have variables defined outside of the scope of the
"break," it behaves similarly.

~~~
analog31
It would be interesting to go back through old code containing GOTO's, and see
if the main use was similar to "break."

------
watwut
I don't think the article proves what it claims to prove. Original
modification was reasonable and no abdomination. The second so called
beautification, the one that expects the caller to guess step size, was
definitely not improvement.

More importantly, author seems to just give up around

> After few such iterations the code that was originally nice and manageable
> becomes a mess of short functions plagued by lots of behavior-modifying
> arguments, the clean boundaries between components and layers are slowly
> dissolved and nobody is sure anymore what effect any particular change may
> have.

Yes, that is how spaghetti code happen.

But, if all those conditions were embedded in a single 1500 lines long
function, the situation would be no better. You would have interconnected
monstrous system of conditions.

------
jstewartmobile
There is a fine line between DRY and human-Huffman-encoder.

Just knowing that the line exists makes a big difference!

~~~
galangalalgol
I'm stealing that analogy, it's perfect. Huffman encoded codebases give me
rage tantrums when debugging.

------
mr_crankypants
In this particular situation, I would be much quicker to go for copy/pasted
code than spaghetti code.

I worked at a place where we had to interact with financial exchanges, which
nominally talk standard protocols but actually all have their own quirks in
their implementations of the protocols, which we had to account for on our own
end. They ended up going with the copy/paste approach, and, while I was
initially horrified, I came to realize that they're right.

The admonishment against copy/paste is meant to deal with the risk of
accidentally forgetting to update one of the copies when there's a new change
that needs to be pushed across all of them. It turns out that, for the problem
domain we were working in, which sounds quite similar to the one TFI
describes, _that never happens_. Even when the standards body releases a new
version of the protocol, each vendor upgrades on their own schedule, and the
slowest-moving vendor might stick on the old version for 5 or even 10 years,
and when they finally do get around to it, their implementation of that
feature will have its own quirks, so just setting a flag saying, "Vendor X is
now on version Y" wasn't going to work, anyway. So then you (hypothetically)
create a set of flags saying, "Vendor X is using version Y, but with these
behaviors that differ from everyone else's version of Y", add a bunch of new
if-statements, and the actual logic that might execute at run-time gets even
more obfuscated behind a tangle of counterfactual configurations that won't
happen in real life but still exist in the code and are supported in the
application.

So, to sum up: nominally the same protocol, but everyone does it differently,
and upgrades on their own schedule. It really is a horrendous tangle. Dealing
with that mess by mirroring it in your code with an equally tangled pile of
flags and conditionals is also going to be a mess, no matter how you choose to
organize the code.

Don't do that. Alexander the Great was right: Sometimes the best way to deal
with these sorts of things is by cutting them into pieces. Then, when you're
dealing with one implementation's quirks, you only have to be thinking about
that one set of quirks, and not the combinatorial explosion of possible
interactions that you get when you try to mix them all together.

------
mattmanser
In this example the problem is the spaghetti code _in the specification_ ,
which you have no control over.

This then inevitably causes the resulting spaghetti code.

~~~
UK-Al05
You can however nicely isolate the code which has behaviour changes for each
vendor.

~~~
AstralStorm
Typically resulting in ravioli code instead. Or an explicit state machine,
even worse to follow.

[https://en.wikipedia.org/wiki/Spaghetti_code#Ravioli_code](https://en.wikipedia.org/wiki/Spaghetti_code#Ravioli_code)

(Wanted to link C2 wiki but it's a goner.)

You cannot fix essential complexity with any kind of design. If there are
options in design, your code will reflect them at some layer. You might want
to suggest, say, Component or entity-based or other design, or mixins, or
other object level patches for conditionals.

Spaghetti if otherwise well structured is quite understandable. The more
advanced versions? Not nearly. Spaghetti is explicit at least.

~~~
kd5bjo
> You cannot fix essential complexity with any kind of design. If there are
> options in design, your code will reflect them at some layer.

That’s certainly true; my favorite method at the moment for this sort of thing
is to make it data-driven: have a structure that contains all of the
parameters that change based on vendor (including enums/flags for optional
behaviors) and a single process implementation that refers to it when needed.

~~~
AstralStorm
You end up with a big if-then block. Essentially spaghetti but localized.

A joke I sometimes make is that spaghetti is best of a plate or in a box from
which it cannot escape. Otherwise it would spill.

Ravioli does not spill but the relationships become implicit.

~~~
kd5bjo
Localized is good. It means that the next time there’s a new thing, there’s
only one place you need to change to specify its unique set of properties.
Often, this can even be removed from the code entirely into a configuration
file, so that a recompilation isn’t necessary for every new situation.

------
nickpsecurity
If the requirements are complex, then you list the requirements, what you're
doing with each one, how they connect, and the code reflects that. Breaking it
into functions and modules for that often does enough. At the least, the
design documentation will facilitate understanding even if the code itself is
harder to understand.

Spaghetti code is almost always just bad design and coding.

------
hzhou321
A refactored code and the spaghetti version are two sides of the same code.
Just like any complex object requires multiple viewpoints to comprehend,
asserting one perspective is superior to the other is missing the picture. It
is a pity that modern practice is always one or the other. It is possible to
have both. Literate programming is one example.

------
bregma
I cut my coding teeth in assembly, followed by FORTRAN IV and a smattering of
Apple, Commodire, and TI BASIC (with their gosubs and peek/pokes). I've worked
with real spaghetti code.

What most people call spaghetti code these days is just code. Useful programs
are complex and have complex code. Modern software might be difficult to
follow, espcially if you're at an amateur or junior level (which is probably
80% of the people out there reading code, cf. Dunning-Kreuger). It's not
generally spaghetti.

------
SenHeng
(2013)

