
An Experiment on Code Structure - danielepolencic
https://pboyd.io/posts/code-structure-experiment/
======
userbinator
According to GitHub, the totals are:

backendA: 11 files, 1 directory, 799 lines (676 sloc), 23.56KB

backendB: 23 files, 5 directories, 1578 lines (1306 sloc), 42.26KB

It's approximately twice as big for the same functionality, and I had to spend
a lot more time "digging" through the second one to get an overall idea of how
everything works. Jumping around between lots of tiny files is a big waste of
time and overhead, and one of the pet peeves I have with a lot of how "modern"
software is organised. If you believe that the number of bugs is directly
proportional to the number of lines of code, thus "less code, fewer bugs",
then backendA is far superior.

 _backendB required a bit more work_

I'm not surprised that it did. This experiment reminds me of the "enterprise
Hello World" parodies, and although backendB isn't quite as extreme, it has
some indications of going in that direction. The excessive bureaucracy of
Enterprise Java (and to a lesser extent, C#) leads to even simple changes
requiring lots of "threading the data" through many layers. I've worked with
codebases like that before, many years ago, and don't ever wish to do it
again.

I really don't get this fetish for lots of tiny files and nested directories,
which seems to be a recent trend; "maintainability" is often dogmatically
quoted as the reason, but when it comes time to actually do something to the
code, I much prefer a few larger files in a flat structure, where I can scroll
through and search, instead of jumping around lots of tiny files nested
several directories deep. It might look simpler at the _micro_ level if each
file is tiny, or the functions in them are also very short, but all that means
is the complexity of the system has increased at the _macro_ level and largely
become hidden in the interaction of the parts.

~~~
Chris_Newton
_I really don 't get this fetish for lots of tiny files and nested
directories, which seems to be a recent trend;_

I suspect it is the same kind of thinking that says all functions should be
very small (without reference to whether each function provides a single
meaningful behaviour). Locally, this keeps things relatively simple, but it
ignores the global issue that now there are potentially many more connections
to follow around and everything becomes less cohesive. As far as I’m aware,
such research as we have available on this still tends to show worse results
(in particular, higher bug frequencies) in very short and very long functions,
but that doesn’t stop a lot of people from making an intuitive argument for
keeping individual elements very small.

A similar issue comes up once again in designing APIs: do you go for minimal
but complete, or do you also provide extra help in common cases even if it is
technically redundant? The former is “cleaner”, but in practice the latter is
often easier to use for those writing a client for that API. Smaller isn’t
automatically better.

~~~
MeteorMarc
The book "A Philosophy of Software Design" should interest you then:
[https://www.amazon.com/t/dp/1732102201](https://www.amazon.com/t/dp/1732102201)
It argues, among other things, that deep interfaces matter more than code
complexity inside a module.

~~~
Chris_Newton
That one was the first software book I read in a while where I got to the end
and felt like if I wrote a book myself then that is very close to what I would
want it to say. I highly recommend it to anyone who has built up a bit of
practical programming experience and wants to improve further.

------
nemo1618
Even though the results weren't terribly illuminating, I have to give the
author a lot of credit for even attempting to do a proper experiment like
this. So much of our programming dogma is based on gut feelings ("it looks
cleaner") rather than empirical data and peer-reviewed studies. We have very
vague notions of what works, and even vaguer notions of _why_ those things
work.

~~~
hinkley
I’ve come to the conclusion that half or more of the rules we have about
“clean” are about avoiding merge conflicts. Few things have been consistently
disappointing to me as the inability of coworkers and myself to reason about
merges correctly. There are three hard things in software and merges are #3.

If anyone ever figures out how to make merges Just Work, then I expect a _lot_
of pressure toward decomposition over locality would be reduced, and much of
the rest would be to facilitate testing.

~~~
hotcrossbunny
Can you elaborate a little please? It's unclear to me if you are taking about
merging data, code changes, or something else

~~~
Chris_Newton
I’m assuming it was a reference to merging in source control. A lot of “noise”
in diffs, and by extension in merges and the sometimes awkward job of
resolving merge conflicts, comes from little details like whitespace and
punctuation rather than substantial semantic changes in the code. Many a
coding standard, and even a language change from time to time, has been made
with this in mind, sometimes to the point of putting punctuation in odd places
or avoiding aligning items using extra whitespace just to minimise the number
and/or size of diffs to check.

------
clarry
I wish for a future where we can have more than one concurrent view of the
same code. Structure need not be derived from from mere files and newlines and
a handful of semantic organizational elements (function, class, module).

The current way of doing things forces us to make a compromise between
prioritizing the forest over the trees, or vice versa. Programming languages
are largely concerned with the trees' bark. But to make good software, you
need to see and understand both, so the compromise is always a problem.

The solution probably needs large-scale re-imagining of how compilers,
languages, version control, and editors/ides work (which also requires one to
accept that working with a simple flat-file text editor won't work -- a bitter
pill to swallow for someone like me who likes the simplicity of simple text
editors).

I have some (very vague) ideas, but gosh, how do I find the time to experiment
and refine or reject them...

~~~
marcus_holmes
I love this. Currently working on a file storage system that gets away from
folders, and that's hard because everyone has folders hard-wired into their
brains because history.

Functions shouldn't live in files, for a start. Files are an artefact of
storing code in a file-based storage system, and have nothing to do with code
architecture. Creating a code editor that stopped working with files and only
worked with functions would be interesting as a start on this, I think...

~~~
skohan
File systems aren't without their advantages. One huge advantage is that text
files are extremely un-opinionated about how they're used. If my project
exists as a tree of directories with text files inside, there are a ton of
tools which can operate on them without any knowledge of my program or even
programming language. I can open them in vim or my favorite IDE, dump them to
the console with cat, manage versions with git and so on. Basically text files
are one of the fundamental building blocks of *nix so having my project
represented as files means I can leverage decades of tooling.

It's not to say that it couldn't work to have a program represented as some
kind of a database or API, but that would imply much tighter binding between
tools and their storage representation.

~~~
marcus_holmes
interesting. But if you assume functions don't intrinsically live in files,
they just do that because we have a file-based storage system, and that
functions actually live in, say, scopes, then what does that do to your tools?

Can we have a Vim that understands (e.g) scopes natively rather than files?

~~~
skohan
> Can we have a Vim that understands (e.g) scopes natively rather than files?

Sure we can have a vim that does that. But as I say, it would require tighter
binding between the tooling and the code representation.

Right now vim only has to understand code as lines of text separated by
spaces, newlines, and tabs. The semantics of that code are the business of the
build system and the compiler. The same goes for git. As a result, tools like
git and vim can operate on code of _any_ language which is represented as
text. That could be an popular language like Java or Go, or some weird
experimental language you dream up yourself.

If, as you suggest, the storage representation of the language were tied to
the semantics of the language, rather than some external format, then all the
tools need to have a deeper understanding of the language itself in order to
operate on that storage.

You could try to make it general: i.e. design an organizational structure
based on "scopes" which should apply to all languages, but then what if a
language comes along which doesn't fit neatly into the "scopes" paradigm? Now
you put yourself into a position where you might be making language design
decisions which are based on what's possible with the tooling, rather than
what's the best possible choice for the language?

Decoupling the storage method from the semantics of the language obviates
these problems.

~~~
marcus_holmes
thanks for the answer, that's interesting.

We do have this to a certain extent now, though - file scope is a thing in
some languages.

I'll give up my plan to write a neovim plugin for scope management, though ;)

------
marcus_holmes
I've become a big fan of not worrying about architecture until the rewrite.
The first version is always an exploration of the problem domain, and treating
it as that has always made my projects go quicker.

This is going to trigger some people, so here's some caveats:

\- there's always a rewrite. Even with perfect architecture. Usually because
nobody understands the problem domain until there's been an exploration of it
with a first attempt (occasionally for other reasons). A few have two
rewrites. And that's not a bad thing. Starting again with better knowledge can
make the whole project go quicker, because there's less chance of ending up in
the situation TFA talks about ("we have to refactor because tech debt").

\- architecture needs to be shaped by the problem domain. There isn't a "best"
architecture, so picking one requires knowledge of what the code needs to do.
And that needs an understanding of the problem. No-one understands the problem
from a technical point of view until/unless they've tried writing a program to
solve it.

\- a lot of features of architecture (like choosing to DI the database engine,
instead of picking an engine because it's clearly the right choice) are made
because the devs don't have enough knowledge to make an architectural decision
when they write the code. It's interesting to see how many of these disappear
on the rewrite. It's always more efficient (both performance and development
time) to make these decisions, but making them is difficult without enough
problem information.

\- never underestimate the power of a monolith with good file structure.

~~~
harimau777
The issue that I see with this is that, even if they say otherwise during the
first version, when it comes time for the rewrite the powers that be often
(usually?) aren't willing to support it.

~~~
marcus_holmes
The "powers that be" are non-tech-aware. They care about results, not nerds
pushing the nerd buttons (I paraphrase).

They literally have no clue about what they're asking for, and just have to
hope that the people doing the coding can deliver what they want. There's no
backup, no "plan B", no way of delivering this without relying on the devs to
deliver. So, who cares what they think?

You can literally say to them "we can continue like this, but because of tech
debt it'll take 6 months, or we can rewrite in 3 months". And who's to say
you're wrong? I've had more than one project do that.

The truth is that no-one knows how long any of this takes. Not the devs, not
the project manager, not the CEO. It's always a rough guesstimate, and the
estimates only get better with more information. Smart non-tech managers get
this, and deal with it. Stupid non-tech managers try to control it and create
deterministic outcomes from the non-deterministic process that is software
dev. That always fails.

So, yeah, the "powers that be" need to grok the nature of the thing they're
trying to do before saying "you can't do a rewrite even if you think that'll
be quicker"

------
harimau777
I have a pet theory that there are two different ways that people think about
and approach programs.

Group 1 likes highly decomposed programs which they feel results have clearer
code since hiding the details makes it easier to focus on the behavior.

Group 2 likes to keep code together which they feel results in clearer code
since the details of the implementation are readily apparent.

I suspect that these groups may correspond to the Artist versus Hacker groups
in this article
[https://josephg.com/blog/3-tribes/](https://josephg.com/blog/3-tribes/). I.e.
do you view writing code as primarily about expressing intent or primarily
about controlling technology?

The conclusion that I draw from all of this is that these are likely
fundamental differences that may even result from how different people are
genetically wired to think. Therefore, I think that any solution should find a
way to satisfy both groups. On the other hand, problems arise when, for
example, people in group 2 dismiss the needs of people in group 1 by declaring
that organizing the code is premature optimization and YAGNI.

~~~
cjfd
I am not so sure that pitting to tendencies against each other is such a good
idea. The thing is that the good programming is somewhere in the middle of all
of these things because if any of these tendencies goes too far we run into
problems. I think we should all be able to belong to each of these three
tribes depending on the circumstances.

~~~
harimau777
I definitely agree! Going too far in one direction or the other is likely to
both result in poorer code and to antagonize whichever side isn't compatible
with that approach.

I think what I was trying to get at is that one of the reasons that teams
often don't find balance is because the differences are dismissed as being
just differences of opinion. I was trying to show that they are often much
more significant than that since they can make it difficult for one side or
the other to understand and work with the codebase.

------
Rapzid
Not everyone has a master craftsman in them. Some people will show up in a new
code base and need to do something; their first instinct will be to look
around and try to fit their change in with the established conventions.

They are the minority.

Most will show up and handjam their change in the only way they know how.
There will be no concern for the forest. Their job is processing trees after
all.

This is something that was on my mind in the Google PR review thread. Not
everyone is a "peer" in code reviews. There will be a certain cabal on equal
footing, but there will be many more people who are simply contributors.

This is where people like the author come in; Project leads.

~~~
hinkley
What gets me is that people are willing to write the same code dozens of
times. It’s just a tool in their toolbox. It never seems to occur to them that
our job is substantially about automating predicable things.

------
Antoninus
“I’d really like to get away from the opinions and be able to say with
confidence that one design is better than another. Or, at the very least,
understand the trade-offs being made.” As I’ve taken more leadership in
architectural decisions, this is one of the skills thats helped the most.
Having most of the data regarding tradeoffs before making a commitment has
steered projects from disaster.

------
cryptica
These days, when I start a new project, I think of my code as a tree. I start
at the trunk and write the branches.

Each kind of state change needs to flow through the code in a consistent
direction to avoid unexpected state mutations (like sap flows through a tree).

Another developer should be able to understand all the main parts of my
program just by looking at the main entry point/file (the trunk of the tree).

Also, no dependency injection should be used; all dependencies need to be
listed explicitly and be trackable to its source file. Dependencies need to
either be explicitly imported where they are used or passed down through the
branches explicitly via method or constructor arguments. Traceability is very
important.

About classes/abstractions, they should be easy to explain to a non-technical
person. If you can't explain a class or module to a non-technical person, it
shouldn't exist because it is a poor abstraction.

~~~
ramchip
> Also, no dependency injection should be used; all dependencies need to be
> listed explicitly and be trackable to its source file. Dependencies need to
> either be explicitly imported where they are used or passed down through the
> branches explicitly via method or constructor arguments.

Isn't the latter precisely dependency injection?

[https://en.wikipedia.org/wiki/Dependency_injection#Construct...](https://en.wikipedia.org/wiki/Dependency_injection#Constructor_injection)

~~~
verdagon
Yep, I think GP meant DI frameworks.

------
29athrowaway
How does your code structure help you against these situations:

\- Version control conflicts: if developers are editing the same files all the
time, there will be more conflicts and therefore more tasks related to resolve
them, such as merging, re-testing, fixing bugs related to a bad merge, re-
attempting the merge, etc.

\- Code so complicated that becomes easy to misunderstand, and a source of an
unusually large amount of bugs.

\- Code so complicated that cannot be reliably tested without spending an
unreasonable amount of time or relying on opaque testing methods.

\- Code so complicated that increases the dependency on specific team members,
usually the authors, so that the team cannot function optimally if they're
unavailable or unwilling to collaborate.

\- Code so complex that is impossible for an engineer to determine if the
system is in a healthy state, diagnose a problem, obtain a reproduction step
from a bug report...

\- Code so poorly organized that developers fail to find implementations for a
particular problem, causing them to implement the same thing again.

\- Having multiple variations of the same code, so when a bug is found you may
have to refactor multiple versions of the same code to fix the problem, if you
manage to find them all.

And the list goes on and on. And a solution to these problems can have to do
with how code is structured, and conventions/good practices.

If I see a piece of code that needs to know about 40 classes and 50 methods to
produce a result, I know that it is likely going to be a pain to maintain.
It's not subjective.

If I see a function with 1000 lines of code and a cyclomatic complexity of
500, I know that it may take at least 500 test cases to test it and will be a
pain to maintain in a way that doesn't break. That is not subjective.

------
loopz
There is a huge amount of artisan streak to programming, vanity even.

Betting incredible amounts of effort and time, we tend to double-down as long
as we can before considering alternatives.

There's also the tendency to choose our favourite hammer, it worked so well in
the past!

------
joatmon-snoo
One thing that I've noticed in my time working in large Java server codebases
is that there seem to be a number of broad categories of code (not mutually
exclusive, just one breakdown, and not exhaustive):

* framework - dictating how people should do things like request handlers, how work is scheduled * feature - making something new work * wiring - the binary had this information in it in this codepath, but we also need it in this other place...

And I have found that DI tends to be the magical "will write code for you"
thing that mostly replaces the third one.

