
Redundancy vs. dependencies: which is worse? (2008) - ripitrust
http://yosefk.com/blog/redundancy-vs-dependencies-which-is-worse.html
======
d--b
This article is very important. I encourage everyone to read it, as it raises
a lot of good points.

In my opinion, the problem it describes come from the vagueness of the concept
of elegance, and how counterintuitive it is. It doesn't even have to imply
modules or anything like that. In the simplest form it boils down to:

Would you rather write:

    
    
        if (a) {
            doSomething();
            if (b) {
                doSomethingElse();
            }
        }
    

or

    
    
        if (a && b) {
           doSomething();
           doSomethingElse();
        }
        else if (a && !b) {
           doSomething();
        }
    

Of course, there is no true answer to that question, and it always depends on
the context. But many programmers will never ever consider option 2. And that
is for two good reasons: 1, you make one more test, and 2 you duplicate the
function call do doSomething(), so your program is larger. So mathematically
speaking option 1 is more elegant, it's shorter, it's faster, it's lighter,
what's not to like?

Well, multiply the ifs and the elses, and you will soon find out that option 2
is much more readable and changeable, which is the more elegant solution to
anyone who's an engineer rather than a mathematician.

The tension it describes is that in every programmer is a mathematician and an
engineer endlessly conflicting. This should be a good guidance for which style
to use. Is this code a math code, or is it engineer's code. When you can
answer that question, you can decide which style to write your code in.

~~~
MrPatan
I completely agree.

It's _vital_ to know what a, b, doSomething and doSomethingElse actually stand
for, and what is the context. This is not a technical problem, it's a people
problem (like almost all problems).

Take this, for example:

    
    
        if (userLoggedIn) {
          setCookie();
          if (itsANiceDay) {
            tellUSerToGoOutAndPlay();
          }
        }
    

And this:

    
    
        if (itsSunny && itsCold) {
          putOnSunglasses();
          putOnCoat();
        } else if (itsSunny && !itsCold) {
          putOnSunglasses();
        }
    

They both kind of make sense! Now let's see the other way around:

    
    
        if (userLoggedIn && itsANiceDay) {
          setCookie();
          tellUSerToGoOutAndPlay();
        } else if (userLoggedIn && !itsANiceDay) {
          setCookie();
        }
    

This one is just silly. We are mixing logic about two completely different
things. Nobody would do that (many people would do that, I know. Too many)

    
    
        if (itsSunny) {
          putOnSunglasses();
          if (itsCold) {
            putOnCoat();
          }
        }
    

This one seems ok. Until you wonder what happens if it rains and/or it's
windy. Then you'll refactor it to a "switch/case" for clarity. And it's all
about the domain specific content! You cannot reason only about the syntax
tree in isolation and come to a meaningful conclusion!

~~~
jessaustin
The problem with these examples is that _putOnCoat()_ should not depend in any
way on _itsSunny_ , so there should just be two separate _if_ blocks. We
should be discussing a scenario in which one condition is a subset of the
other.

~~~
JustSomeNobody
Why not? I've been on Rainier when it's cold and sunny.

~~~
denis1
I you didn't understand what the GP said. You can certainly have a day when it
is both sunny and cold, but the checks would look like this:

    
    
      if(isSunny){
          putOnGlasses();
      }
      if(isCold) {
          putOnCoat();
      }

~~~
Retric
Depends on context, in a game you might only send someone outside if it's
sunny. So if it's !sunny your staying inside and you don't need a Coat.

------
rsp1984
The article makes some very important points and it's certainly worth a read
for every programmer.

What the article misses to address explicitly however is that the whole
redundancy vs. dependency conflict is caused by modularization. Without
modules there would be no conflict.

So the real questions to answer here are: When do you need modules or do you
need them at all? What should be modularized? And, most importantly, how to
choose smart boundaries? Good answers to these questions will save a project
from a world of pain down the road.

The classic OOP / software engineering education these days lacks critical
debate about software modularization. Modularization is almost always
presented as a good thing. What nobody tells you however is that in real world
engineering, on real world teams, modularization can cause _a lot_ of trouble
if not done the right way.

~~~
loup-vaillant
Code is basically a dependency graph. Each piece of code depends on a number
of other pieces of code. (Dead code is an isolated island in this dependency
graph.)

You want two things out of that graph: less nodes, because less code is
simpler, better, cheaper. And less _edges_ , because understanding, modifying,
or troubleshooting a piece of code requires knowing about its dependencies
(hopefully, only the direct ones).

When the unit of organisation is the function, you kinda state that each
function is a node, and the call graph are part of the edges (the call graph
would cover everything in a purely functional settings, but side effect
produces implicit dependencies). Trouble is, in any significant system, you're
gonna have a _lot_ of nodes and edges. How to make sense of that?

That's why we have module. When you look at your dependency graph, you will
most certainly note that parts of your graph are denser than others. Those
clusters are the natural modules. If you formalise that, and draw module
boundaries around those clusters, you can now have a two-level view: inside a
module, you have a _small_ dependency graph, with a few outbound edges.
Outside, you can visualise a coarser graph of module dependencies. Again,
fewer nodes and edges, because you have grouped them.

Now the _real_ benefits of module is, once you start drawing boundaries, you
have an incentive to make small interfaces, to minimise inter-module
dependencies. Additionally, visualising the module dependency graph directly
helps you spot spooky dependencies that probably shouldn't be there. You can
then cut some dependencies out, simplifying your graph in the process.

Without modules, I don't see how you would manage this kind of scale. Oh and
by the way, some monstrosities are so big that they effectively requires a
_third_ level. But I've never worked on such beasts.

~~~
rsp1984
Yes, that's a very good picture of things. Now the dependency vs redundancy
issue comes in when two clusters are made modules.

If they are _mostly_ separated but still have some connections to each other
the question is what to do with these. Cut them off means having to replicate
so ultimately redundancy. Leaving them in means dependency.

I guess the way modularization should be done is therefore as a min-cut
through the dependency graph.

------
tel
It's kind of funny how, rightfully, the author paints a picture where the
"horrible, enlightened external dependency" itself is antimodular to the T.
Given that all modules supposedly have stable interfaces, documentation,
tests, reasonable size, yada yada then one might expect that each of their
dependencies takes advantage of these properties to maintain light and
wonderful themselves.

Of course, this is a situation that's highly incompatible with C. Let's ditch
that.

In ML modules are king. You probably make hundreds in any non-trivial program
and the compiler will beat your ass if you muck up their interfaces. Anywhere.
Packages are just sets of 3 public modules wrapped up in twine and a README
file (coincidentally this is where "ownership and lifecycle" are managed, but,
sorry, I'm going to ignore those for a moment).

This could be every bit as bad as I described before, but ML also realized
that modules which just form a big dependency tree are actually quite
annoying. The whole reason we define public APIs is so that there can be
multiple satisficing inplementors, but this cannot be in 99% of module
technologies today.

So ML has functors (not Haskell functors, certainly certainly certainly not
C++ functors) which are "parameterized modules that actually work". One could
distribute their command line parsing module with a pluggable serialization
and a pluggable help display. See MirageOS for a giant example of this kind of
system working out.

Does it really work?

Probably not. It's not in most maintainers DNA to functor-ize everything. It's
even a significant challenge to do so since you need to define sufficient
external _and_ internal public APIs and it's a significant community effort to
standardize these sufficiently so that there is significant chance of re-use.

But at least it's a way forward. Fight the heavy module trees. Let's use some
higher order reusability.

~~~
bunderbunder
I've had some success with the object-oriented equivalent of the pattern.
Perversely, I find it to be most effective as a political tool.

It's useful when someone doesn't like my minimalist solution to some problem,
and starts peppering me with feature requests that will complicate the module
and which I perceive to be of marginal utility. So I make that chunk of
functionality pluggable, keep my minimalist implementation as the default, and
publish some instructions for how to drop in a more complicated behavior. Then
all I have to do is sit back and watch the original requester realize that
they only think the stuff they were asking for is worth the effort if they can
get someone else to be the one putting out the effort.

~~~
megaman22
Love to see an example

------
sbov
I generally agree with this. However, sometimes using a module is not adding a
dependency, it is making an already existing implicit dependency explicit.

E.g. we have client and server code. Serialization configuration between the
two is implicitly dependent upon each other - if the client expects dates in a
different format than the server, things don't work. To make that implicit
dependency explicit, we use a module, which also has the affect of making sure
the two don't get out of synch.

------
tempodox
A very good article that aptly shows (some of) the hard & sticky questions we
are confronted with all the time. How you answer these questions will
determine the quality & stability of your code to a large extent.

I agree with the OP that commonly, “ _dependencies are worse_ ”. Redundancy
will increase the quantity of your code, but dependencies increase its
complexity. And quantity is always conquered easier than complexity.

------
guard-of-terra
This depends greatly on your platform. Java projects accept dependencies much
easier than C++ ones because in Java it's much harder to cause trouble and
also coding styles aren't radically different for different dependencies.

Perl&Ruby are even more eager, which should be strange since they're actually
less safe.

~~~
Sirenos
Less safe? How so?

------
rwallace
Excellent article. Just one quibble: he claims a module shouldn't be over 30k
lines. Counterexamples: Linux, Postgres, Boost, LLVM, V8, all in the million
line range. To be sure, each of these has an internal module structure, but
that's irrelevant from the perspective of someone deciding whether to incur a
dependency on one of them - the answer to which may very well be yes because
they do enough to make it worthwhile.

If anything, larger modules like the ones I listed are more likely to be worth
depending on because they do more. It's no coincidence that the author chooses
command line parsing as a negative example - something trivial enough that the
overhead of tracking a dependency may well outweigh the effort of implementing
it yourself.

------
michaelfeathers
It's interesting to read this with micro-services in mind.

