
Happiness is a freshly organized codebase - felixrieseberg
https://slack.engineering/happiness-is-a-freshly-organized-codebase-7ffa6590a70d
======
ryanianian
The concept of linting repo structure is an excellent one.

I cannot even get started in projects without a sane repo layout. Source files
scattered everywhere unrelated to each other, utils junk-drawers with dozens
of files, multiple top-level source directories without explicit rationale,
tests and source totally disjoint, hacks to modify build and run paths,
implicit dependencies between directories, awful convoluted build-system
configuration to match all of these idiosyncrasies, and impossible or very
difficult editor/ide integration as a result. Even rails-eque apps (which come
with a reasonable structure) get really messy really quickly if somebody
doesn't have the diligence to stay on top of it.

I want there to be a {sane, polyglot, large-ish-scale} project layout
convention that plays well with {intuition/discoverability, build-systems,
editors, project size}, but it seems there's an inherent tension between
these. Maven tried. Kinda. I wonder if there's a fundamental problem we aren't
solving somewhere.

~~~
smcameron
On the flip side, reorganizing a codebase totally fucks the git history.

~~~
vimax
Just make sure to use 'git mv' for all your files so the history can be
tracked. 'git log myfile' will only show the history for that file, but 'git
blame' will follow the line changes across file moves.

That aside, there are some many productivity and tooling gains around
organizing your project properly, that it is worth history discontinuities.

Also, I've found that the worse the project organization is, the less likely
the team is to separate out different fileset changes into different commits
to keep a useful history, or make the use of the git history at all outside of
looking at a few recent commits or or sometimes looking at the last release.
I've heard this as very circular arguments. Can't re-org because we loose
history. Can't maintain good history because the project is too disorganized.

~~~
nlawalker
This might not be the forum for it, but I have a question about this - I
thought git was supposed to be pretty smart about detecting renames, but in my
experience (on Windows) it _never_ does, and I always have to use `git mv`.
What's the deal with this?

~~~
ThePadawan
Even if rename detection is enabled, git will only try to identify renames if
it wouldn't take too much time (since IIRC it's a O(#files^2) operation).

Thus, there's a config "diff.renameLimit" and "merge.renameLimit" that tells
git to only detect renames if less than N files are affected.

I have personally been burned by this before, and there usually is a default
value set in most git clients.

See also some discussion at
[https://stackoverflow.com/a/7831027/1237375](https://stackoverflow.com/a/7831027/1237375)
.

------
bob1029
I feel like a lot of the pain in codebase organization boils down to having a
technical project structure (i.e. layout on disk) that does not align well
with the business. Obviously, it's impossible to force something to directly
align with such disparate and abstract requirements, so you have to create
abstractions (layers) to enable a hypothetically-pure realm.

In our architecture, we've created roughly 2 different kinds of abstraction:
Platform and Business.

The Platform abstractions are only intended to support themselves and the
Business abstractions. These are not supposed to be easily-accessible by non-
technical folks. Their entire purpose is to make the Business abstractions as
elegant as possible. The idea is these should be slowly-changing and stable
across all use cases. Developers are only expected to visit this realm maybe
once every other week. We effectively treat ourselves as our own customer with
this layer.

The Business abstractions are built using only first-party primitives and our
Platform layer. Most of this is very clean because we explicitly force all
non-business concerns out into the Platform layer. Things like validation
logic are written using purely functional code in this realm, which makes it
very easy to reason with. Developers are expected to live in this area most of
the time. Organization of business abstractions typically falls along lines of
usage or logical business activity. We explicitly duplicate some code in order
to maintain very clean separation of differing business activities. I've
watched first hand as the subtle variances of a single combined Customer model
across hundreds of contexts eventually turned it into a boat anchor.

As a consequence of all this, our project managers and other non-code-experts
are actually able to review large parts of the codebase and derive meaningful
insights without a developer babysitting them. We are at a point where project
managers can open PRs for adjusting basic validation and parameter items in
our customers' biz code. Developers also enjoy not having to do massive
altitude shifts on abstractions while reading through 1 source file. You
either spend a day working platform concerns, or you are in the business
layer. Context switching sucks and we built to avoid it as much as possible.
IMO, having a bad project structure is one direct side-effect of being forced
to context switch all the time.

~~~
ajsharma
Wow, I love this idea. I'm curious how you have that structured within the
code base. Is it literally a platform folder and a business folder?

~~~
bob1029
Pretty much. We have an issue in our backlog right now that will pull the
platform concern into a completely different project/DLL from the rest of the
biz code. We still have a little bit of coupling, but we are 99% of the way
there. Our long-term goal is to produce a company-common platform layer that
can be used to build a wide range of final products. Most business
applications that we would build share a lot of common concerns - namely how
to manage business state, client views and transactions with external business
systems. This is all implemented in various services within our platform layer
so we rarely think about it. When we integrate with a 3rd party vendor's API,
it goes into platform so anything can now use that integration.

The crazy thing I've come to realize is that the journey doesn't have to end
there either... You can build yet-higher-order abstractions on top of your
platform layer (i.e. a platform for platform). I don't know where this all
ends up, but 1 or 2 iterations of it has been extremely healthy for our
architecture and business use cases so far. We are now able to chain together
extremely complex business processes in ways that can be reasoned with in a
single POCO mapper. Without a separation of the "noise" of platform-related
and other lower-order code from the business code, it would become impossible
to see these opportunities.

~~~
thomk
Bob can you give more details about this or is there a way to contact you? I
am genuinely curious about this type of architecture and would love to learn
more. If you would rather respond here could you give two simple, concrete
examples of each case?

~~~
bob1029
Based on the responses here, it is apparent that I should spend some time
documenting this concept in more detail for the greater good. I do not have
the bandwidth for this right now, but perhaps in a few weeks I'll have time to
put together some realistic examples for a proper Show HN submission.

------
rudi-c
I've encountered similar difficulties around codebases with a lack of file
hierarchy structure. But one major difficulty in fixing the issue is that
moving a lot of files around tends to trash `git blame`, which is often more
valuable than knowing what folder to put a new file in. Is that something
you've encountered?

~~~
ryanianian
There are workarounds to get git to search harder or to commit things in a way
that's helpful for large-scale file-shuffles, but to be honest I've rarely
found this to be enough of a reason to not move things around. The end result
is a much more productive and purposeful place for code to live and grow. This
said, moving files and moving code within those files at the same time is a
recipe for confusing yourself and git. Move files first, then content.

~~~
nitrogen
A few options to look at specifically: --follow, -M, and -C. Less related to
refactoring I also find --first-parent and --merges useful

------
kccqzy
I sometimes wish Git properly records copy and move information. It turns out
Git's heuristics for detecting copies and moves work about 95% of the time,
and for the remaining 5% it's mildly annoying to read a git blame with every
line from a move. You can blame further across that move, but that's manual.
If copy and move information is perfectly recorded, I'd have more incentive to
do these kinds of code reorganization.

~~~
ryanianian
This article mentioned a while ago had relevant info:
[https://news.ycombinator.com/item?id=22689301](https://news.ycombinator.com/item?id=22689301)

In my experience, ensure you don't change the code as you move it, don't
rename files and move content at the same time, and don't move "too much" code
in a single commit. Around 2k lines at a time seems to be a good number. Maybe
some languages/structures are easier for git to analyze when doing blames.

~~~
kccqzy
How do you not change the code as it's being moved, when you need to update
#include directives, imports, and other file paths?

~~~
derefr
It's easy if not every commit needs to compile. :)

------
dangwu
Interesting read!

> No more dumping ground folders like “Helper” or “Utility”

If you’re organizing by feature and you have one of these helper/utility
classes that support multiple features, where does it live? Would you consider
each utility to be its own “feature”?

~~~
_bxg1
I actually think it's vitally important to have "dumping ground" locations for
things that aren't fully figured-out yet. If I'm working on a new thing and I
have something that's relevant in multiple files but I'm not sure where it
should go, or even whether it will stick around, I don't want to have to come
to a screeching halt just to make a taxonomic decision about something that's
still a work in progress. The key, of course, is going back and categorizing
those things later once you _do_ have an idea of where they should go.

~~~
allenu
Totally agree. So much of programming and designing is figuring out what
pattern your code is falling into and what stuff "is" or "means".

I believe it's totally okay to name things utilities when you haven't yet
established a common pattern or understanding of it. Naming things is hard and
spending so much time on it and organizing things can often block you from
progressing to a place where you DO have more information and can
intelligently name things.

You'll always be juggling unknowns, so it's okay to have dumping grounds here
and there, so long as over time you clean them up as you gain more info.

~~~
_bxg1
I recently started working within a Python codebase, and one of the things I
really like about it (not sure whether this is standard Python practice) is
that most directories have a "common.py" file in them. So if you just want to
put something somewhere real quick, you can elevate it to exactly the
appropriate directory-level instead of going to a single, global "utils" file.
It's a neat pattern.

------
nickjj
For new projects (especially as a solo developer), there's also a related
topic of not taking advantage of tools like kanban boards, or having a place
outside of your source code to organize your thoughts and research.

It's very possible to wind up with massive comment dumps of things to
research, alternate implementations, notes to yourself and other things
littered in your code base where you haven't made any git commits yet.

This really leaves things in a messy state where you feel like the project is
never going to be finished.

An example of this and how I solved this problem with a kanban board can be
found here:
[https://youtu.be/HHOkcCqsipE?t=76](https://youtu.be/HHOkcCqsipE?t=76)

------
rukittenme
I've been wondering a lot recently if folders make projects better or worse.

For example, when I write a library its usually very simple. There's a single
directory which contains all of the source files. When people use the library
they:

"""

import lib

lib.run()

""""

Dead simple, no complex module paths to remember, no hierarchical folder
structure forcing you to code based on a pattern rather than functionality.
Pure bliss.

But on the other hand, I have projects that contain 100k lines of source code.
I can't just leave it out in the cold. So poor baby gets a couple of folders.

But I do hate it. I hate writing the code. I hate reading it. I hate finding
it 6 months after the fact.

That's probably just the nature of the job. It is work at the end of the day.
Maybe its just doomed to be hard.

~~~
osener
I'm not plugging the language, but I've come to appreciate OCaml's module
system with no imports and (mostly) globally unique module names. No circular
dependencies allowed either. You can have multiple modules within a file
(which is also a module named after the file name).

I structure larger projects as libraries with minimal dependencies that depend
on one another, and dump all my modules with descriptive file names under the
same directory within the library.

I vaguely remember reading something that hinted at Facebook doing something
similar with their React components.

------
ldd
There is this wonderful utility: dependency-cruiser[0] for javascript /
typescript projects.

It visualizes dependencies in a project. I found it so so easy to refactor and
move files around after I started using it. I am not usually a visually-
oriented person, but for this usecase, and to be happy, `dep-cruiser` surely
helps.

[0]: [https://github.com/sverweij/dependency-
cruiser](https://github.com/sverweij/dependency-cruiser)

------
battery_cowboy
What if we just got rid of files and put our code into a database?

You'd just have a "new code block" button, which creates the editor tab for
your code, usually a function or a class, and usually one item per block. When
you save it, it puts it into a database and you can version things easily. You
can call other functions from the block and your editor will show their code
when you mouseover or maybe some other method, just like today. Basically the
same as today, but you don't need to worry about where some code lives. You
back it with a great search feature to find stuff.

Hell, let's just eliminate pathed files, why do we care about file paths with
the level of search today? Just store everything in a key:value store directly
on the hard drive, no paths needed. For legacy, just add keys for '/etc/fstab'
or whatever.

------
peter_d_sherman
> _" The Slack iOS team lived in these conditions for a few too many years. We
> got here as a result of some attempts to organize source files (several
> times), a lack of architecture pattern in the codebase, and a high growth of
> developers over a couple years. To put things into context, we have roughly
> 13,000 files (and counting), about 27 top level directories, a mix of files
> in Objective-C and Swift, and around 40 iOS developers that work in one
> monorepo."_

An extensive, unrefactored codebase is no different than a jungle.

You might have a 10'xer programmer on your staff, and he might hold the
programming equivalent of a machete, but if the rate of his refactoring
(assuming your corporate rules let him) is slower than the rate of new code
being added by other employees, he is going to fail, no matter how good he is!

I need to write a future essay about the relationship between 10'xers and how
a 10'xer is a combination of only as good as how well the codebase is
refactored, how well they know the codebase, how much corporate rules/polcies
permit refactoring (or not), how much time he doesn't have to waste time
solving stupid one-time issues from single customers, and how much help or
pushback he is or isn't getting from the rest of the team.

In other words, given the right set of conditions between codebase size and
obfuscation, limiting corporate policies (i.e., "you can't refactor", "you
can't make just one mistake in your code, because it's all mission critical,
and if you do, you will be fired, and by the way, there's no test
environment!"), distraction ("you have to help this customer with his cosmetic
problem before you are permitted to tackle the guts of the system"), and
pushback from the rest of the team, you can actually change 10'xers (and
higher!) to 1x'ers and below...

The reverse is true too...

I'll make anyone a "My Fair Lady" / Eliza Doolittle style bet (or the
reverse!) that what I say is true!

That is, that 1'xers can be taught to become 10'xers, and conversely, 10'xers
can be hampered by a variety of factors ("The Perfect Storm") resulting in
them being slowed to 1'xers, or below...

------
stefan_
_Danger is a tool we integrated into our Continuous Integration system that
performs post-commit automated checks_

You mean like the _post-commit_ hook that Git offers out of the box? It's even
named the exact same! I feel like we don't focus all this time on fast build
test and deploy cycles only to then commit, navigate to some website to create
a "merge request", wait for some fairy to allocate computing resources for
trivial checks my computer could have done, only to get some pure noise "don't
put this file here" comment and repeat the cycle all again.

~~~
core-questions
When did you think I was going to get coffee and/or check Slack for more
annoying work to do? It's like you want me to get things _done_, which is not
fun at all.

------
soedirgo
In Go, there's a standard project layout [1]. It'd be nice to have a project
layout linter in Go Report Card [2].

[1]: [https://github.com/golang-standards/project-
layout](https://github.com/golang-standards/project-layout)

[2]: [https://goreportcard.com](https://goreportcard.com)

~~~
dgellow
Just to clarify, that’s not an official standard at all (i.e: not supported by
the go team). It’s a community project to establish some common layout.

Surely valuable, but to say it is actually a Go standard is misleading IMHO.

~~~
soedirgo
Ah! I stand corrected. Mea culpa.

------
momokoko
Needing organized codebases is a personality type as there is zero academic
research I’m aware of that has ever shown strict organization has resulted in
less bugs or faster development.

These are all done by people that need to feel in control and that everything
has a place. So much money and time has been wasted on stuff like this with
zero proven or measured benefit.

~~~
BurningFrog
There is famously also no academic study showing that parachutes improve
outcomes when jumping out of a plane.

~~~
darekkay
There _is_ a study for that! "Parachute use to prevent death and major trauma
when jumping from aircraft: randomized controlled trial". [1]

[1]
[https://www.bmj.com/content/363/bmj.k5094](https://www.bmj.com/content/363/bmj.k5094)

