Hacker News new | past | comments | ask | show | jobs | submit login
Code colocation is king (koenvangilst.nl)
169 points by vnglst on Feb 3, 2022 | hide | past | favorite | 89 comments



Unfortunately, 'code that goes together' is rarely "all these 3 things go together", but more "a and b goes together in one way, b and c in another, and a and c in another. Meanwhile, c is more naturally contextualised as d, which has nothing to do with a or b, so it should really be close to d and far away from a and b, but somehow still close" etc etc ad nauseam.

So, really, the only real way to keep functions linked is to not be lazy, and try to have proper documentation locally in your function which includes a "see also" bit.


I would put all those in the same file. I only turn functions into modules if they are imported/linked in other programs.


maybe code in multiple-dimensions? Metaverse!


Multi process/threaded programming is multi dimensional.


> keep the code that changes together close together.

Agreed. I do that, but never got around to factoring out a one-liner that expresses it.

I also tend to have fairly deep directory trees that tend to reflect the code hierarchy/structure.

I will also factor out large chunks into standalone subprojects, and reinclude them as packages. This results in very high quality code, and also gives me dozens of libraries of tested, ship-quality, reusable code.

Another thing that I do, is have fairly large source files, that aggregate multiple classes/structs/enums/protocols that relate to each other. It drives me nuts to deal with the typical Java "Each class has its own file -no matter how small" thing.

I practice what might be called "eXtreme documentation. I rely on Jazzy (sort of like Doxygen), and now, DocC, so every element in my code has a headerdoc. I make heavy use of the "// MARK:" macro, as well.

I write about that, here: https://littlegreenviper.com/miscellany/leaving-a-legacy/ (I need to add a DocC update).


> It drives me nuts to deal with the typical Java "Each class has its own file -no matter how small" thing.

Couple this with extensive use of inheritance and design patterns, and you have a recipe for awfulness. One of my previous teams had an implementation where rendering a piece of HTML would involve digging through about twenty different source files, each with maybe 3-4 lines of actual code other than the class definition boilerplate. One top-level line would send you down the inheritance hierarchy of ClassThatGetsData/ClassThatGetsDataPlusThisOneOtherThing/...PlusThisOtherOtherThing/... etc etc, then the same thing for ClassThatProcessesData/... and then ClassThatRendersData/...

Adding a single bit of data to rendered HTML (e.g. a tiny star for 'this product is well-reviewed') would involve altering every single source file in the multiple trees, and maybe adding some new specialisations to make the trees even deeper.

At the time, fresh out of Uni with a head full of design patterns, I thought this was just how enterprise code was meant to be structured!


> At the time, fresh out of Uni with a head full of design patterns, I thought this was just how enterprise code was meant to be structured!

Ugh... and that's the flaw in teaching design patterns.

Don't worry, that happened to me too, though I didn't go to a university.

Design patterns are an advanced tool. If you're learning to code, design patterns are mostly non-applicable to the kinds of problems that newbs are solving. Design patterns are meant for specific problems, but somehow we end up believing they're the end-all-be-all and that we must be using some design pattern in our work. If you have design patterns pounded into your head, and you believe the only design patterns that exist are the one's someone has already named, then not using a particular design pattern can seem like chaos even when it's not.

I wish we'd stop teaching that shit. If you've been coding long enough and faced complex enough problems, you'll either come to embrace some design patterns or you won't. Otherwise they'll likely just be misapplied.

OOP fits this view of mine as well. In general, I don't think most programmers need to be aware off OOP principles because they will almost certainly misuse them. And to what end? Several single purpose classes and "tiny functions" scattered across multiple files that others are now forced to jump between.


Most “design patterns” are actually just workarounds for lack of expressiveness in old Java versions.

In other languages or modern Java you can replace half of them with simple constructs. Factory pattern for example can be replaced with simple inline callbacks to anonymous functions. Same with many others.


It takes a long time for some people to understand that complexity is the enemy; and for patterns in particular it takes some people a long time to learn when to apply them.

Same thing for algorithms: BloomFilters are the classic example. You can tell when it has hit the font page of HackerNews recently because suddenly everyone starts looking for problems that it can solve (poorly).

Just don't get me started on blockchain... :-)


It feels like half the usage of inheritance is also just workarounds for mocking things in unit tests or being able to access things that are private/protected. Everything ends up with an interface and an impl just because of silly language decisions.


> Another thing that I do, is have fairly large source files, that aggregate multiple classes/structs/enums/protocols that relate to each other. It drives me nuts to deal with the typical Java "Each class has its own file -no matter how small" thing.

This can go both ways - I've seen plenty of projects where you have to dig through 3k lines of unrelated spaghetti code just to find the bit you need. That turns into a special piece of hell when you have multiple team members working on it.

The original motivation for having more small files was driven by these "god objects" and the limitations of version control. CVS, Visual Source Safe and to a lesser extent Subversion were much easier to use together with smaller files.

If you're using modern version control (git) then that reason has gone.

My personal preference is to split horizontal concerns into their own modules and depend on them; they become the core / support libraries for my team/organisation. For services I try to keep the domain-specific parts close. Tests go in a different module which mirrors the structure. When units start to get big (or become cross-cutting) they get factored into smaller units or other modules.

In the end it's all a balance and unless you're under outside constraints (cough SonarCube box-ticking busywork idiocy cough) then you can generally keep it sane.


> It drives me nuts to deal with the typical Java "Each class has its own file -no matter how small" thing.

There is value in keeping some source together, but I think it's more judgment based than you imply, it can also drive me nuts when I have to dig through a dozen 2k+ line files to find the thing I need to change, and it also reduces my confidence that I can change that thing without introducing unwanted side effects (unit tests help increase that confidence regardless of code structure).


My tolerance for multiple classes in a file or larger classes is heavily driven by what tooling I have available.

If I've got a structural outline panel, like in IntelliJ-based editors, or the SuperCharger plugin in Visual Studio, then whatever, it's easy, because I have that table-of-contents to easily navigate with.

With less powerful tools, the "every type in its own named file" pattern is much more useful.


That makes a lot of sense.

I use Xcode’s editor. It makes zipping around big text files, quite easy, and going through multiple text files, a pain.


Fair point.

As someone that regularly digs through 200-1.5K files (2K is a bit much, for me), I can report that good documentation makes a huge difference.


Good documentation is definitely helpful, but expensive and challenging to maintain (and keep correct). In my personal experience, documentation is out of date or invalid nearly the minute the ink dries... much better (again, in my experience) to have good, descriptive unit tests that fail when the code changes.


Unit tests are not always a good match with the application: https://littlegreenviper.com/miscellany/testing-harness-vs-u...

I tend to use a lot of test harnesses (which, IMNSHO, are much better application exemplars than most unit tests).

I also use a lot of headerdoc stuff. In Xcode, it allows your own code to show up in the QuickHelp navigator tab, and you can generate really good SDK docs.

It also stays fresh. Easy to maintain. Using "breakers" is a huge aspect of my code. Makes it much easier to scan, and liberal use of // MARK: is good.


Agreed, higher-order integration tests tend to offer many more serendipitous fault discoveries than unit tests. Still, unit tests add value, acting as the living "comment" on a method or module, while the higher-order integration tests act as the living "how-to-use instructions" for the application.


Yeah. Unit tests aren’t especially applicable to the type of software I tend to write (UI mobile apps), though. I use them a lot for the backend code, and a number of the packages that my apps include, but I don’t like to rely on automated UI testing, as these tests are invariably scripted “record/playback/pattern-match” tests. Only useful for testing a minuscule number of “low-hanging fruit” issues. Not at all useful for a “user goes where they want,” unbounded GUI. Maybe an AI-driven testing system might work.

In my experience, there’s no substitute for good, old-fashioned, disciplined “monkey-tests,” when it comes to UI software. Even test harnesses aren’t always relevant, and I end up running most of my tests on the production code, as the project matures.

Requires discipline. And, in my experience, a well-documented codebase (“Why,” as opposed to “what,” when documenting internals), pays off in spades. Headerdoc markup works wonders. It, quite literally, makes self-documenting code, and documenting the interfaces (as opposed to the internals) has a great shelf life.

Since I use Xcode, it "live-parses" my interface docs, and shows them in the QuickHelp panel, on the right side of the screen, so, when I select a function that I wrote, it displays the interface, just like it does for the Apple-authored stuff. Very cool. Since all my included packages (the ones I wrote, anyway -which is 99% of them), use it, I have great "live" documentation.

I used to argue with my Japanese peers about this stuff. They were not fans of automated testing. I was able to get them on board with unit-testing engine code, but they were absolutely correct about testing GUI code, and I had to cede the point to them.

They were very disciplined, when it came to “monkey testing,” code structure, and documentation. One reason, was because they tended to rotate engineers through projects all the time, and leaving a legacy was important. I write about what I have learned, here: https://littlegreenviper.com/miscellany/leaving-a-legacy/


I followed the same pattern when I had the time to. Instead of building the thing, build it as a separate package and include it as a dep. Packaging code up for open source consumption leads to better docs, tests, and tighter abstractions that are generalized.


I think that's the opposite of what the parent says: first build the thing and only then factor it out and reinclude it.


Not completely. The way that it works for me, is that I start work on a project, and, while building, I notice that some code that I'm working on is:

1) Pretty complex, and fairly insular; and/or

2) Possibly useful, elsewhere.

If that's the case, I will then stop work on the main project, and take some time to extract and "genericize" the subproject. I'll usually set it up as a standalone open-source project; complete with tests and documentation. As the commenter stated, I think that this results in some excellent code. I always clean the house before the guests arrive.

This may happen before I have completed the coding in the main project, or may happen as the result of a review, after the fact.

In some cases, I very clearly need to develop a subproject before starting on the main project, or before certain milestones within that project (for example, SDKs or drivers). In that case, the timelines are completely separate.

If you look at my GH repos, you'll see a whole bunch of these projects, including some rather strange ones, like an XML duration parser[0]. These are the types of projects that I extract.

In some cases, I end up not using the extracted project in my main project (happens to some of my UI widgets). In that case, even though I am not using it, I still have an excellent project for the future. Here's an example[1]. I have ended up not using the spinner in my own work, as it was too obtrusive a widget, but it's nice to have it available for future projects.

[0] https://github.com/RiftValleySoftware/RVS_ParseXMLDuration

[1] https://github.com/RiftValleySoftware/RVS_Spinner


This sounds a lot better than the Single Responsibility Principle that is often quoted and IMHO often misunderstood.


This makes sense to me.

And it also seems notable to me that Rails (which I generally like; I am not a Rails hater this is not a Rails hate comment) -- often seems to be trying to do the opposite. Like separating a controller and it's view code -- things which in the typical Rails app are pretty tightly coupled -- into the higher level controllers and views folders, putting all controllers next to each other and all views next to each other, instead of a controller next to it's coupled view. (which I do find an inconvenience as a developer).

But the risingly popular view_component library for use with Rails makes the 'proximate' choice, putting the view template(s) next to the logic it's coupled to.


as a rails dev, I agree with this, instead of app->{models/views/controllers}->resource, I would prefer app->resource->{models/views/controllers}


Interesting, because I realize I would not personally want that done with models, in my apps models/controllers frequently don't have a 1-1 relationship.

But controllers and view templates almost always do, because of the nature of the architecture.

So actually just using the `view_component` gem for all my views from now on (not just partials) probably satisfies me!


I don’t see what the benefit to view components are over simple views and partials though?


This is kind of what Django does instead and I much prefer the Rails way. Though it might just be because I started MVC-dev with Rails.

I think I find it easier to find common abstractions on the {models/views/controllers} level than the resource level and that frustrates me with Django app-based dev.


There are many different criteria to decide where to put your code or how to group it. And each case benefits more from one idea or another (and each codebase contains many different cases). Humans don't organize knowledge in folders in their brains. The information network is much more complex. We can't find a good solution only with folders.

In fact, the idea of trying to model complex systems in a text format divided in files (most programming languages) doesn't quite hold... gracefully at least. For example, the frequent discussions about inheritance and generics are pretty revealing of the fact that we mix modelling and implementations in the same working space, when in fact in many cases it would be better to work on those at different layers.

So, in my opinion, to really make "code colocation" better you would kinda need to start modelling complex systems with richer toolsets that don't try to express them only with code files. You can't properly work with complex systems with a single view, no matter which one you pick.


Totally agree. Files&Folders are almost omnipresent, which is why people get used to it so much that it starts to shape how they think about structuring things.

But if you take a step back, they are actually not the right abstraction for most things. In this case, a tagging system would serve the purpose much better.

Unfortunately, tooling that deals with files&folders and text is very mature and it will be hard to extend or even replace it.


I tend to rely heavily on the storyboard, in my Apple development. It's going to be going away, with SwiftUI, and I'm not convinced that what SwiftUI provides will actually be a replacement, but we'll see how things shake out.


I wonder if you could apply this principle by writing a tool that goes through your git history and identifies distant regions of code that are often changed at the same time, so you can take some action to link those regions. At a minimum, adding "see also" cross-references, if not refactoring to bring them closer.


This is one of the key concepts in how Codescene analyses a code base by mining the Git repo.

It is quite useful to find find problem areas and tacit knowledge such as “when you change this API these two client side adapters should probably also be updated”

See https://codescene.com/


I often thought about a concept of an “IDE” where there are no files. You would instead have a “database” of functions/classes that you work with. Some modern IDEs come close, but you still have to worry about files. I guess getting rid of files completely might require some extra magic from the IDE to make it work with current programming languages. Does something like this exist?


This sounds very similar to how a Smalltalk development environment works. Instead of a file system you have a class browser that you use to navigate your code.


Take it a step further and instead of a database use content addressing. https://www.unisonweb.org/


In an abstract way it's not really different. Next you'll want namespaces and sub-namespaces, and we're back to files and directories, just built again on top of something that's not a file system.


I have an idea for an IDE that stores functions independently, and instead of being contained in classes/modules defined by syntactical scope delineated by eg curly braces, they simply get tags that you can browse/search/filter in various ways. I image something like a UML class browser with little planes containing code snippets and connections between them for call graphs etc.


Yep I think this will be my next major project. Taking inspiration from Rust, Go/Zig, Nix, Unison, Old SmallTalk and Self videos, Eve, and probably SQLite/Postgres.

There is a lot of separate innovation that could be combined; so I think.

While inspired from these various things I doubt it would just be "another smalltalk" or "just a nocode thing" which many modern versions of this end up looking like.


IBM Visual Age for smalltalk and Java worked like this.


You could technically do this with Rust. rust-analyzer will allow you to jump to certain types, functions, etc, and Rust has first-class modules.


Things like rust-analyzer are exactly what I think lead me to believe... what if code was designed to be imported by a "database" and then changes could be made there and serialized back. Not too dissimilar to how unison lets you select functions to edit, but drop the cli and text files requirements. Create an API/UI/etc. to edit the "code" in the "database".


Agreed, it helps reduce the cognitive load and context switching by a decent margin. If you can't achieve co-location, at least structure the directories the same way or name related files the same thing.

If you have a partial/catalog/product/buy.js file, have a partial/catalog/product/buy.scss file as well. Finding the CSS for the buy.js component inside add-to-cart.scss is very lame.


> One of the things I struggled with when I started out as a programmer was where to put my code. It was not something I could easily find in tutorials and for a long time I wondered why everyone was so focused on how to get framework X to do Y, when all I wanted to know was where to put the code that does Y.

So true. Small things like this are why I'm so glad I had a software engineering job at a real company for a few years, even though I don't necessarily want to do that for life. I learned a lot from looking at codebase conventions and quick questions to the elders. Unfortunately, no one can help you with the other hard problem: naming.


That is one reason to choose a framework instead of building from scratch, so you don't have to make decisions like "Where do I put Y". The most import value a framework provides isn't in the way it functions, but that it is a collection of idioms, decisions and conventions that everyone has to use.

It also means I don't have to worry about where Dev X put thing Y, because I know where to look for it already.

That's what software patterns are good for as well, a shared set of idioms so you don't have to invent new approaches. I think that's important in a professional environment but less so in a personal one. For personal projects I just do whatever is fun.


But the whole problem is frameworks always decide against code colocation, and also they don't tell you what to do with all the functions that don't fit their directory scheme. People who don't use a framework will naturally tend toward fewer subdirectories and code colocation just because, e.g., there's no reason to elevate the importance of a models/views/controllers pattern.


> Unfortunately, no one can help you with the other hard problem: naming.

I have a guiding principle here: If, a few minutes after naming something, you want to use the thing and (without looking it up) your first attempt at writing the name is correct, it's a decent name.

If you name a function processFoos and then later in your code your first intuition for what it was called is createBars, then there is some disconnect (and possibly some missing conceptual clarity) in what this function is supposed to accomplish. Is it more about accepting foos or is it more about creating bars? I find it very productive to, in such a case, dive a bit deeper into these differences (which may often also reveal something about different perspectives between caller and callee).

So TL;DR: If you don't have to look up the name of what you wrote minutes ago, it's probably a good name.


Things like this was really hard to learn. Now thankfully there are many open source projects but sometimes still hard to find suitable project.


Modern IDEs have made this less important than it used to be, but it still matters. My quick rule of thumb would be to follow the mental model of your code. Things that you need to read to understand a given aspect of the code should be close together.


That's just the small scale version of it. In a large organization this gets a lot more important. If you have multiple applications, and each needs roles like database , developer, architect, ... then there is a push to move all database management to a centralized DBA team, all devs to a dev team, all architecture to the architects team.

One thing that gets lost is the application-specific knowledge, which is the whole reason you are doing things. Yes, your database is now near-optimally managed according to $vendor, except it's a poor fit for the application, as it received a one size fits all configuration.

Second problem is velocity. If 1 team can adapt the database structure, modify the backend code and services, you'l go a lot faster than if you have to book a databaser for a week, a developer for 2 weeks, etc...

Next problem is implicit waterfall: The developer will hide data in the wrong columns, because otherwise the databaser has to be called back, which causes rescheduling and a rebudget( i.e. management now hates you). It's only temporary, everybody tells themselves, until the right person revisits the application again next month.

And god help you if the architect did not deliver perfect work the first time around, because then everybody is creating things that don't fit together. The architect being of course the guy/girl who drew some boxes and arrows between 2 very important meetings, at the point where business requirements were not yet delivered and nobody knew what the application actually needed to do. Architects are expensive, so they're long gone before the first character of the code is ever written.

Final problem: Your application is now spread over 10s of silos, and nobody knows what connects to what. Any maintenance done on it starts with someone walking around between silos, asking everybody what their piece of the puzzel is.

So my opinion is to do the reverse: Let the dev team modify the database structure, even if the result is clearly suboptimal. They will feel the pain from their mistakes and have the ability to fix things. It might take a while, but a coherent team will figure things out. A loose bunch specialists, available part-time? Not a chance.


You make some very good points (echoing Conway's Law: software will reflect the org chart). Smart companies are constantly seeking better ways to structure teams and processes to mitigate the downsides at either extreme (horizontal layers : vertical stacks).

IMHO, over-emphasis on efficiency is a common trap; efficiency and agility are at the poles, and it's the latter that often matters more.

So, how to strike a pragmatic balance? I've been part of successful experiments with a small, free-roaming, multidisciplinary "red team" or "green team" that crossed org boundaries to solve gnarly problems, free up log jams, and facilitate step-function improvements in standard teams' capabilities.

> "Architects are expensive, so they're long gone before the first character of the code is ever written."

As a hands-on architect, I don't consider my work complete until there's been meaningful collaboration with dev leads and in-depth review of working code. It's a dynamic and iterative process. Immutable, ivory tower boxes-and-arrows -- divorced from the realities of actual software development at any kind of scale -- are insufficient. A prescribed, linear, waterfall process of biz req -> architecture -> implementation is bound to fail. Success requires embracing the rich interplay between various forms of software design (requirements, IX/UX, architecture, and implementation) and stepping in and out of them where and when appropriate.


I feel this practice makes code review a lot easier, not just authoring.


In Python it's especially scary to have some resource management at the end of a block. I have made bugs so many times when after refactoring I screwed up the indentation and the resource management at the end no longer happened in the right way.

The pragmatic approach I've taken to deal with this is make heavier use of custom context managers so creation and cleanup code must stay together.

But I'm also getting tired of whitespace sensitive languages.


It's funny, I've been a professional python dev for 8 years or so and I can't remember a single instance of whitespace having caused a bug.

The closest I can think of is:

  if something:
    ...handle something..
  elif something_else:
    ...handle something else..
  if other_thing:
    ...handle other thing...
  elif yet_another_thing:
    ...handle yet another thing...
  else:
    ...for everything else...
If I had a frustration with python it would be the prevalence of "who needs types when we have dicts". This situation has been somewhat improved by dataclasses but lots of our code predates them by many years.


Python whitespace bites me every once in a while. But this does not generalize to other languages, in particular, I prefer writing Haskell with whitespace sensitivity¹ and I don't remember it ever being a problem.

1 - Last time I asked "who doesn't" on the internet, I found somebody who doesn't. But it's by far the most popular option.


True. Languages with static type checking probably have this easier because you'll have some warnings about differences in scopes or unused variables.


It's not that.

The whitespace works differently between those languages. The Python syntax has limitations that aren't on the Haskell one.

But also, Haskell tends to place the kind of things the GP is talking about on the beginning of your expression (because of the "reverse" order on function composition), while Python tends to place them at the end of a block.


Similar benefit in Nim where types help catch stray indent issues. Plus proper scoping of automatic variables in ‘for’ loops helps too.


It's a bit out of date now, but here are some of my thoughts on this from 10 years ago:-

http://williamtpayne.blogspot.com/2012/07/structure.html

http://williamtpayne.blogspot.com/2012/05/development-concer...

Looking back on this now, it all feels a bit unconventional and strange, so I wouldn't make the same recommendations today. I think that people want to operate in an environment with fewer, and less rigid constraints. I do think some of the driving concerns are still valid though, and still worth considering.


I've found that code colocation is great, especially combined with a directory structure that mimics the application's hierarchy/layers. At my place is employment we try to follow a principle of "keep related code contained to a single place". On its own this ended up causing some problems with "reinventing the wheel" and a relatively inconsistent experience when crossing borders between team ownership. The big thing that has helped a ton is having a regular meeting with nearly all employees working on the same part of the stack. We used the meeting as a space for talking about what we're working on and even get into the details of "I'm solving problem X by building solution Y." At which point others can chime in with interest in using it for their own upcoming projects, describe how they already have a solution to problem X, or provide suggestions. It's not the only thing discussed in this meeting, but it has been important for improving the consistency and general quality of the product. The resulting code follows the same principle above, but reused code ends up bubbling up to a "common" space in the lowest common shared layer; whether it's a new directory or a new package used by multiple projects.

A couple new problems we're dealing with now:

1. Finding older common things and deprecating them. When it's in a common space it feels like it's everyone's responsibility which means nobody ends up working on it. Maybe more narrow, clear ownership would solve that problem.

2. Someone finds the common thing that almost fits their need, if only it had one more little feature. The problem is when this happens many times and you end up with this complicated beast of an abstraction. This is probably solved by finding ways to decompose the abstraction and by following a principle of "do one thing well" or something about simplicity.


I think, colocation is even more crucial than most coding purists think.

CSS-in-JS and Tailwind greatly helped here to encapsulate things.


No: ./User/index.ts ./User/interfaces.ts

Yes: ./User.ts


It is "code that changes together" but also an expression of which code depends on what and in what kind of manner does it do so (e.g. is one subsumed by the other or is used by 3x other modules). Understanding where things live is also a statement on the composability of the code and how many assumptions about the outside world that the interfaces make.

It's also worth noting that there's likely not one single "correct" organization/taxonomy, and certainly not one that's static and guaranteed to last forever. It truly is one of the most complex parts of programming because it's subjective, subject to change and is in a lot of ways right through the center of what it means to abstract something in the first place. I don't think there are any simple answers because it's equal parts philosophical and mundane.

Software mixes things that model the real world, convenient fictions, abstract truths and guilty hacks and I believe trying to make sense of it is at least as much an art as a science.


I agree. Tests for example used to be in a separate folder mirroring the code folder structure. I have moved away from that to put it right next to the file.

Random thought: Can the IDE recommend next file to work on - “people who worked with this file, also worked on”. The real problem is not where we keep it, it’s how quickly we can get to it.


You see, this is a design that's taken hold of a lot of codebases, and I don't think it works. I see a lot of projects with big folders mixing top level exports, implementation files, tests, test utils, fixtures and the like.

I'd much rather see the 'mirrored' approach.

> Random thought: Can the IDE recommend next file to work on

I'm _sure_ I've seen something like this in IntelliJ.

EDIT: I did! It's in the changes tab > the eye icon > 'show files related to active changelist'. It then sometimes will make suggestions based on project history.


Tests that involve complicated setup or are otherwise bigger than "unit" tests probably don't make sense to keep in the source tree.

But it's a nice idea to keep unit tests specifically alongside the "units" of code that they test.


I think this is a pretty good rule of thumb. Does anyone know of any metrics to see how an individual codebase does in this regard by looking at PRs/commits?

It is largely because of this reason that I disdain patterns which separate things out into top level folders like /views, /reducers, /actions, /utils. Most code should be organized at the feature level with separate modules for global utilities (which are always hard to organize).

I think this methodology works well because more files means more mental overhead and context switching, particularly for code reviews. The best code review given a reasonably sized change (not 20kloc) is a single file where the file contains little not relevant for the changed code. Not only is it easier to comprehend, it also avoids unnecessary merge conflicts.


Just stick everything in the same file until it actually hurts. This presentation[0] gets it right.

[0]: https://www.youtube.com/watch?v=XpDsk374LDE


I was with this post until the author mentioned a /utils or /helpers directory, which made me sad.

No piece of code is so bland that it can’t be placed somewhere with a name that describes what it does or what category it belongs to.


I second this. It's poor naming and should be avoided.


I'm currently dealing with a framework where I've got to define data models here, register them with dependency injection over there, use them in an API controller a third place, register the routes somewhere else, and touch three or four different UI components, every time I make a small change. It's not Java, but it feels very Spring-inspired.

The difference is stark when I have the lean, more-or-less functional core business logic library sitting beside this baroque stringly-typed reflected 7-bean salad of a web application shell.


The global "helpers" folder is a sigh of bad code smell for me. Doing this essentially means "I just don't want to think where to put it, so I'll just pile it here".

Code can and should be aligned to the business (sub)domain(s). It is much harder to do on a frontend, but possible nonetheless.


uhhh tricky

if your codebase is 3 features each of which has 3 similar functions (listener, db model, middleware, let's say)

do you group by similar functions? (middlewares.py, models.py)? or do you group by feature? (feature1.py). depending on whether 'work' is a feature change or a refactor this week, the correlations will be different


A lot of programmers want to divide logic into Categories of Behavior and then package them up into behavior packages, so that you have to hopscotch all over the directory tree to reconstruct and navigate an actual process from beginning to end; then they want to chop it all up into microservices in an attempt to compensate for the resulting mess.

If you package logic according to the process, it's a lot easier to control the scope of logic that is exclusive to a given process (some things are of course shared). Languages that can enforce package privacy, file privacy, etc. help a lot here.

In general, unlimited scope & dependencies is what strangles a dev team to death, so any option that restricts scope to what is necessary is arguably best.


by process, you mean call stack or series of event handoffs?

I think you're right that that's one of the key ways that devs (esp noobs to the codebase) need to navigate, and must be supported. I think 'stack graphs' are supposed to do that[1], but have never tried them.

but there are other navigation modes we need to support as well -- like if you're deleting something, or upgrading something, 'find all references'. if you're inventorying use of a DB type, you might want to audit field types in all models.

I sometimes wonder if the future is tagging code as 'belonging to a feature', but they can live wherever -- because devs have different needs on different days.

[1] https://dcreager.net/talks/2021-strange-loop/


By process I typically mean "business process", as in: Bob wants to ship all the widgets to the South Warehouse, and this pile of logic right here is going to do that for him, end-to-end (or mostly end-to-end).

The best option I've found for "find everything that references this database table and see if it will be affected by this thing I need to change" is to give database tables really unique names (like "tbl_user" instead of "user") and just grep the hell out of everything.

Also if I'm writing large programs I'm going to use a compiler. I mean if a linter can tell you "Hey this python won't work because you removed this function" then good for python, otherwise bad for python. And if I'm going to include 100 open-source libraries then god help me if my build system can't tell that one of them went missing or that the author removed a function I'm using in version 1.2.3.4.b.

But all of that is about dependencies, and dependencies kill teams. That's why I want the scope of any logic as tightly constrained and enforced as possible.

I'm sure you could design a multi-dimensional behavioral categorizer/IDE/language system that blows everybody's minds and wouldn't that be cool, but I doubt anyone would use it, because behavioral categorization really doesn't solve problems. It just looks nice.

But any rate, at the very minimum, back to this point: Don't try to compensate for the failure of behavioral categorization with microservices when you could have just packaged things along the same lines of division.


100% with you on greppability as an underrated design skill

interested in 'library versions as types' and have been thinking about this problem on and off. version numbers are a proxy for the combination of: 1) call signature, and 2) internal semantics

linters / typesystems are good at (1). If we had a system that could do (2), version numbers would be less necessary and compatibility could be proof-based. (Well, to the extent that the semantic assertions are valid).


"Code colocation"?

>Turns out that "where to put code" is one of the hard things in software engineering and there are no silver bullets. That's part of the reason why there are so few easy tutorials on this subject.

This is called modularization and there are lots of papers and books that discuss it in decent detail.


> Turns out that "where to put code" is one of the hard things in software engineering

This is essentially "naming things" (from "There are only two hard things in Computer Science..."), applied to namespaces.


Counterpoint: for a long time, the convention was to store test files in a structure that 'mirrored' the source files. I never had a problem with this, and personally I don't know why it fell out of favour.


Makes perfect sense! This would help lead to low coupling and high cohesion.


Low coupling and high cohesion were the terms that I was thinking of, too.


I like this approach while also building 'building blocks' that each feature can utilize; standardized Cache, Redis, DB Repositories, URL generation, 3rd party APIs, etc


That's also true for code (and data) layout in computer memory, to get your program use the cache and execute faster.


It's also good for caching/performance reasons.

I don't know too much about compilers, but if you're improved proximity results in better proximity of the byte code then you're less likely to have a cache miss. That's a big deal.


Follow composition.

If A calls B, then A folder should contain B folder.


And if A is recursive(ly calling itself)?

:-D


    cd a
    ln -s . a
Problem solved; perfect organization.


I tried this and disappointingly Linux rejects it before programming languages break.

gcc:

    a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/test.h:1:10: fatal error: a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/test.h: Too many levels of symbolic links
python:

    ModuleNotFoundError: No module named 'tmp.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: