Hacker News new | past | comments | ask | show | jobs | submit login
It's probably time to stop recommending Clean Code (2020) (qntm.org)
733 points by avinassh on May 25, 2021 | hide | past | favorite | 658 comments



There's a lot of bad advice being tossed around in this thread. If you are worried about having to jump through multiple files to understand what some code is doing, you should consider that your naming conventions are the problem, not the fact that code is hidden behind functional boundaries.

Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.

The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.


> you have failed to sufficiently explain

This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.

I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.


Carmack once suggested that people in-line their functions more often, in part so they could “see clearly the full horror of what they have done” (paraphrased from memory) as code gets more complicated. Many helper functions can be replaced by comments and the code inlined. I tried this last year and it led to overall more readable code, imho.


You’re very close to his actual quote, he was referring to the horrors of mutating shared state: http://number-none.com/blow/john_carmack_on_inlined_code.htm...


Thanks for the link

> The real enemy addressed by inlining is unexpected dependency and mutation of state, which functional programming solves more directly and completely. However, if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing. When it gets to be too much to take, figure out how to factor blocks out into pure functions (and don.t let them slide back into impurity!).


Carmack is such a great communicator of software development philosophy. He also wrote a classic article on "Functional programming in C++": https://www.gamasutra.com/view/news/169296/Indepth_Functiona...


Do you know anything else he wrote like these?


The idea is that without proper boundaries, finding the line that needed to be changed may be a lot harder than clicking through files with an IDE. Smaller components also help with code reviews since it’s a lot easier to understand a line within the context of a component (or method name) without having to understand what the huge globs of code before it is doing. Also, like you said a lot of the times a developer has to read code they didn’t write so there are other factors to consider like how easy it is for someone from another team to make a change or whether a new employee could easily digest the code base.


The problem being solved here is just scope, not re-usability. Functions are a bad solution because they force non-locality. A better way to solve this would be local scope blocks, /that define their dependencies.

E.g. something like:

    (reads: var_1, var_2; mutates: var_3) {
       var_3 = var_1 + var_2
    }
You could also define which variables defined in the block get elevated, like return values:

    (reads: var_1, var_2; mutates: var_3) {
       var_3 = var_1 + var_2
       int result_value = var_1 * var_2
    } (exports: result_value)

    return result_value * 5
This is also a more tailored solution to the problem than a function, it allows finer-grained control over scope restriction.

It's frustrating that most existing languages don't have this kind of feature. Regular scope blocks suck because they don't allow you to define the specific ways in which they are permeable, so they only restrict scope in one direction (things inside the scope block are restricted) - but the outer scope is what you really want to restrict.

You could also introduce this functionality to IDEs, without modifying existing languages. Highlight a few lines, and it could show you a pop-up explaining which variables that section reads, mutates and defines. I think that would make reading long pieces of code significantly easier.


This is one of the few comments in this entire thread that I think is interesting and born out of a lot of experience and not cargo culting.

In C++ you can make a macro function that takes any number of arguments but does nothing. I end up using that to label a scope because that scope block will then collapse in the IDE. I usually declare any variables that are going to be 'output' by that scope block just above it.

This creates the ability to break down isolated parts of a long function that don't need to be repeated. Variables being used also don't need to be declared as function inputs which also simplifies things significantly compared to a function.

This doesn't address making the compiler enforce much, though it does show that anything declared in the scope doesn't pollute the large function it is in.


Thank you. Your macro idea is interesting, but I definitely want to be able to defer to the compiler on things like this. I want my scope restrictions to also be a form of embedded test. Similar to typing.

I wish more IDEs had the ability to chunk code like this on-the-fly. I think it's technically possible, and maybe even possible to insert artificial blocks automatically, showing you how your code layout chunks automatically... Hmm.

You know, once I'm less busy I might try implementing something like this.


C++ lambda captures work exactly like this. You need to state which variables that should be part of the closure and whether they should be mutable and by reference or copies.

    auto result_value = [var1, var2, &var3]() {
        var3 = var1 + var2
        return var1 * var2
    }()
    return result_value * 5
Does anyone know if compiler is smart enough to inline self-executing lambda as above? Or will this be less performant than plain blocks?


Ada/SPARK actually has dependencies like that as part of function specs. Including which variables depend on what.


> Clicking through files with an IDE

This is a big assumption. Many engineers prefer to grep through code without an IDE, the "clean code" style breaks grep/github code search and forces someone to install an IDE with go to declaration/find usages. On balance I prefer the clean code style and bought the jetbrains ultimate pack, however I do understand that some folks are working with grep/vim/code search and would rather not download a project to figure out how it works.


I've done both on a "Clean Code", lots-of-tiny-functions C++ codebase. Due to various reasons[0], I spent a year using Emacs with no IDE features to work on that codebase, after which I managed to get a language server to work in our specific context, and continued to use Emacs with all the bells and whistles LSP provides.

My conclusion? Small functions are still annoying. Sure, with IDE features in a highly-productive environment like Emacs is, I can jump around the codebase at the speed of thought. But it doesn't solve the critical problem: to understand a piece of code that does something useful, I have to keep all these tiny functions in my working memory. And it ain't big enough for that.

I've long been dreaming about IDE/editor feature that would let you inline code for viewing, without actually changing it. That is, I could mark a block of code, and my editor would replace all function calls[1] with their bodies, with names of their parameters replaced by the names of arguments passed[2].

This way, I could reap benefits of both approaches - small functions that compose and have meaningful ways, and long sequential blocks of code that don't tax my working memory.

--

[0] - C++ is notoriously hard to get reliable code intelligence (autocomplete, xref) to work. Even commercial IDEs get confused if the codebase is large enough, or built in an atypical fashion. Visual Studio in particular would happily crash for me every other day...

[1] - With some sane default filters, like "don't inline functions from the standard library and third-party libraries".

[2] - Or autogenerated ones when the argument is an expression. Think Lisp gensym. E.g. when I have `auto foo(F f);` and call it like `foo(2+2);`, the inlined code would start with `F f_1 = 2+2;`. Like with expanding Lisp macros, the goal of this exercise is that I should be able to replace my original code with generated expansion, and it should work.


You wrote: "I've long been dreaming about IDE/editor feature that would let you inline code for viewing, without actually changing it." That sounds like a great idea! That might be useful for both the writer and reader. It might be possible to build something like using the libraries that Clang provides, but it would be a huge feat -- like a master's or part of a PhD.

You also wrote: "Visual Studio in particular would happily crash for me every other day..." Have you tried CLion by JetBrains? (Usually, they have a free 30-day trial.) I have not used it for enterprise-large projects, but I have used it for a few personal projects. It is excellent. The pace of progress (new features in "code sensing") is impressive. If you find bugs, you can report them and they usually fix them. (They have fixed about 50% of the bugs I have raised about their products over the last 5 years. An impressive clearance rate!)


> It might be possible to build something like using the libraries that Clang provides, but it would be a huge feat -- like a master's or part of a PhD.

Yeah, that's how I feel about it too. A useful MVP would probably be shorter, though, even if it sometimes couldn't do the inlining, or misidentified the called function. I mean, this is C++, I haven't seen any product with a completely reliable type hints & autocompletion, and yet even buggy ones are still very useful.

> Have you tried CLion by JetBrains?

Not yet. Going by the experience with IntelliJ, I expect to be a very good product. But right now, I'm sticking to Emacs.

In my experience, professional IDEs (particularly the JetBrains ones) are the best for working with a particular programming language, but they aren't so good for polyglot work and all the secondary tasks surrounding programming - version control, file management, log inspection, and even generalized text editing. My Emacs setup, on the other hand, delivers superior ergonomics for all these secondary tasks, and - as long as I can find appropriate language server - is within order of magnitude on programming itself. So it feels like a better deal overall.


I agree about the "professional IDEs" point. Are you aware that IntelliJ has language plug-ins that lets you mix HTML/JavaScript/CSS/Java/Python in the same project? I guess CLion can at least mix C/C++/HTML/JavaScript/CSS/Python. This is great when you work with research scientists who like to use different languages in the same project due to external dependencies. I can vouch for /certain/ polyglot projects, it works fine in IntelliJ. (That said, you might have a very unusual polyglot project.)

As for tooling, I might read/write/compile/debug code in the IDE, but do all the secondary tasks in a Linux/Bash/Cygwin terminal. Don't feel guilty/ashamed of this style! "Use the right tool for the job." I am forced to work in Windows, but Cygwin helps me "Get that Linux feeling - on Windows". I cringe when I watch people using Git from a GUI (for the most part) instead of using the command line, which is normally superior. However, I also cringe when I watch people hunt and peck (badly!) in vim to resolve a merge conflict. (Have you seen the latest merge tool in IntelliJ? For 90% of users, it is a superior user experience.) To be fair, I have also watched real pros resolve merge conflicts in vim/emacs equally fast.

One thing you will find "disappointing" is the CPU & memory footprint of any modern IDE requires 1990s supercomputer resources. It is normal to see (multiple) enterprise-large projects take 1-10GB of RAM and 8-16 cores (for a few mins) to get fired up. (I am not resource constrained on my dev box, so I am willing to pay this tax.) However, after init, you can navigate the code quickly, and get good real-time static analysis feedback ("code sensing").


Vim has weapons-grade go to definition today using language server protocol, so multiple files is a non-issue for users running LSP.


With ViM you can get decent results with a plugin that consumes the output from the ctags library.

It’s not perfect though and depending on how you have it set up you may have to manually trigger tag regeneration which can take a bit depending on deep into package files you set it to go.


>Coding at scale is about managing complexity.

I would extend this one level higher to say managing complexity is about managing risk. Risk is usually what we really care about.

From the article:

>any one person's opinions about another person's opinions about "clean code" are necessarily highly subjective.

At some point CS as a profession has to find the right balance of art and science. There's room for both. Codifying certain standards is the domain of professions (in the truest sense of the word) and not art.

Software often likens itself to traditional engineering disciplines. Those traditional engineering disciplines manage risk through codified standards built through industry consensus. Somebody may build a pressure system that doesn't conform to standards. They don't get to say "well your idea of 'good' is just an opinion so it's subjective". By "professional" standards they have built something outside the acceptable risk envelope and, if it's a regulated engineering domain, they can't use it.

This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.


A lot of "best practices" in engineering were established empirically, after root cause analysis of failures and successes. Software is more or less evolving along the same path (structured programming, OOP, higher-than-assembly languages, version control, documented ISAs).

Go back to earlier machines and each version had it's own assembly language and instruction set. Nobody would ever go back to that era.

OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer thanks to design patterns and abstractions dictated by a "Software Architect". We all know it to be false, and bordering on snake oil, but it still had some good ideas. Having a class encapsulate complexity and defining interfaces is neat. It forces to think in terms of abstractions and helps readability.

> This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

As more and more years pass, I'm less and less against a regulatory body. Would help with getting rid of snake oil salesman in the industry and limit offshoring to barely qualified coders. And simplify hiring too by having a known certification that tells you someone at least meets a certain bar.


Software is to alchemy what software engineering is to chemistry. Software engineering hasn't been invented yet. You need a systematizing scientific revolution (Kuhn style) before you can or should create a regulatory body to enforce it. Otherwise you're just enforcing your particular brand of alchemy.


Well said. In the 1990s, in the aerospace software domain it was a once referred to an era of “cave drawings”


> OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer.

Not initially. Eventually, everything that reaches a certain minimal popularity in software development level gets pitched by snake-oil salesman to enterprise management as a solution to that problem, including things developed specifically to deal with the problem of othee solutions being cargo culted and repackaged that way, whether its a programming paradigm or a development methodology or metamethodology.


>having a known certification that tells you someone at least meets a certain bar.

This was tried a few years back by creating a Professional Engineer licensure for software but it went away due to lack of demand. It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power. It also creates a large risk to the SWEs due to the lack of codified standards and the inherent difficulty in software testing. It's not like a mechanical engineer who can confidently claim a system is safe because it was built to ASME standards.


> It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power.

For any software purchase above a certain amount the government should be forced to have someone with some kind of license sign on the request. So many projects have doubled or tripled in price after it was discovered the initial spec didn't make any sense.


I think that at this point, for the software made/maintained for the government, they should just hire and train software devs themselves.

From what I've seen, with a few exceptions, government software development always ends up with a bunch of subcontractors delivering bad software on purpose, because that's the way they can ensure repeat business. E.g., the reason Open Data movement didn't achieve much, why most public systems are barely integrated with each other, is because every vendor does its best to prevent that from happening.

It's a scam, but like other government procurement scams, it obeys the letter of the law, so nobody goes to jail for this.


The development of mass transit (train lines) has a similar issue when comparing the United States to Western Europe, Korea, Japan, Taiwan, Singapore, or Hongkong. In the US, as much as possible is sub-contracted. In the others, a bit less, and there is more engineering expertise on the gov't payroll. There is a transit blogger who writes about this extensively... but his name eludes me. (Does anyone know it?)

Regarding contractors vs in-house software engineering talent, I have seen (from media) UK gov't (including NHS) has hired more and more talent to develop software in-house. No idea if UK folks think they are doing a good job, but it is a worthy experiment (versus all contractors).


>should just hire and train software devs themselves

There are lots of people who advocate this but it’s hard to bring into fruition. One large hurdle is the legacy costs, particularly because it’s so hard to fire underperforming government employees. Another issue is that government salaries tend to not be very competitive by software industry standards so you’ll only get the best candidates if they happen to be intrinsically motivated by the mission. Third, software is almost always an enabling function that is often competing for resources with core functions. For example, if you run a government hospital and you can hire one person, you’re much more likely to prefer a healthcare worker hire than a software developer. One last, and maybe unfair point, is that the security of government positions tends to breed complacency. This often creates a lack of incentive to improve systems which results in a lot of legacy systems hobbling along past their usefulness.

I don’t think subcontractors build bad systems on purpose, but rather they build systems to bad requirements. A lot of times you have non-software people acting as program managers who are completely fine with software being a black box. They don’t particularly care about software as much as their domain of expertise and are unlikely to spend much time creating good software requirements. What I do think occurs is that contractors will deliberately under bid on bad retirements knowing they will make their profits on change orders. IMO, much of the cost overruns can be fixed by having well-written requirement specs


Do you mean sign as in qualify that the software is "good"?

In general, they already have people who are supposed to be responsible for those estimates and decisions (project managers, contracting officers etc.) but whether or not they're actually held accountable is another matter. Having a license "might" ensure some modicum of domain expertise to prevent what you talk about but I have my doubts


> Do you mean sign as in qualify that the software is "good"?

We're not there yet. Just someone to review the final spec and see if it makes any sense at all.

Canonical example is the Canadian Phenix Payroll System. The spec described payroll rules that didn't make any sense. The project tripled in cost because they had to rewrite it almost completely.

> In general, they already have people who are supposed to be responsible for those estimates and decisions (project managers, contracting officers etc.) but whether or not they're actually held accountable is another matter.

For other projects, they must have an engineer's signature else nothing gets built. So someone does the final sanity check for the project managers-contracting officers-humanities-diploma bureaucrat. For software, none of that is required, despite the final bill being often as expensive as a bridge.

> Having a license "might" ensure some modicum of domain expertise to prevent what you talk about but I have my doubts

Can't be worse than none at all.


Annoyingly, the government already sorta does this: many federal jobs, as well as the patent bar, require an ABET-accredited degree.

The catch is that many prominent CS programs don’t care about ABET: DeVry is certified, but CMU and Stanford are not, so it’s not clear to me that this really captures “top talent.”


I suspect this is because HR and probably even side hiring managers cannot distinguish between the quality of curriculums. One of the problems with CS is the wide variance in programs...some require calculus through differential equations and some don’t require any calculus whatsoever. Sob it’s easier to just require an ABET degree. Similar occurs with Engineering Technology degrees, even if they are ABET accredited.

To your point, it unfortunately and ironically locks out many CS majors for computer science positions.


> I suspect this is because HR and probably even side hiring managers cannot distinguish between the quality of curriculums.

Part of the reason for that is they likely haven't even been exposed to graduates of good computer science curriculums.


In what sense do you think they haven't been exposed? As in, they've never seen their resumes? Or they've never worked with them?

I think it's an misalignment of incentives in most cases. HR seems to care very little once someone is past the hiring gate. So they would have to spend the time to understand the curriculum distinctions, probably change their grading processes, etc. It's just much easier for them to apply a lazy heuristic like "must have an ABET accredited degree" because they really don't have to deal much with the consequences months and years after the hire. In some cases, they even overrule the hiring manager's initial selection.


>the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

The problem I see with this is that programming could be described as a kind of general problem solving. Other engineering disciplines standardize methods that are far more specific, e.g. how to tighten screws.

It's hard to come up with specific rules for general problems though. Algorithms are just solution descriptions in a language the computer and your colleagues can understand.

When we look at specific domains, e.g. finance and accounting software, we see industry standards have already emerged, like dealing with fixed point numbers instead of floating point to make calculation errors predictable.

If we now start codifying general software engineering, I'm worried we will just codify subjective opinions about general problem solving. And that will stop any kind of improvement.

Instead we have to accept that our discipline is different from the others, and more of a design or craft discipline.


>kind of general problem solving

Could you elaborate on this distinction? At the superficial level, "general problem solving" is exactly how I describe engineering in general. The example of tightening screws is just a specific example of a fastening problem. In that context, codified standards are an industry consensus on how to solve a specific problem. Most people wrenching on their cars are not following ASME torque guidelines but somebody building a spacecraft should be. It helps define the distinction of a professional build for a specific system. Fastening is the "general problem"; fastening certain materials for certain components in certain environments is the specific problem that the standards uniquely address.

For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."

>I'm worried we will just codify subjective opinions about general problem solving.

Ironically, this is the same attitude in many circles of traditional engineering. People who don't want adhere to industry standards have their own subjective ideas about should solve the problem. Standards aren't always right, but it creates a starting point to 1) identify a risk and 2) find an acceptable way to mitigate it.

>Instead we have to accept that our discipline is different from the others

I strongly disagree with this and I've seen this sentiment used (along with "it's just software") to justify all kinds of bad design choices.


>For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."

Then you start having discussions about every algorithm being used on collections of 10 or 100 elements, it doesn't really matter to the problem to be solved. Instead the language's built in sort functionality will probably do here and increase readability, because you know what's meant.

Profiling and replacing the algorithms that matter is much more efficient than looking at each usage.

Which again brings us back to the general vs specific issue. In general this won't matter, but if you're in a real-time embedded system you will need algorithms that don't allocate with known worst case execution times. But here again, at least for the systems that matter, we have specific rules.


>Profiling and replacing the algorithms that matter is much more efficient than looking at each usage.

I think this speaks to my point. If you are deciding which algorithms suffice, you are creating standards to be followed just as with other engineering disciplines.

>Then you start having discussions about every algorithm being used on collections of 10 or 100 elements, it doesn't really matter to the problem to be solved

If you’re claiming it didn’t matter on the specific problem, then you’re essentially saying it’s not risk-based. The problem here is you will tend to over-constrain design alternatives regardless if it decreases risk or not. My experience is people will strongly resist this strategy as it gets interpreted as mindlessly draconian.

FWIW, examining specific use cases is exactly what’s done in critical applications (software as well as other domains). Hazard analysis, fault-tree analysis, and failure-modes effect analysis are all tools to examine specific use cases in a risk-specific context.

>But here again, at least for the systems that matter, we have specific rules.

I think we’re making the save point. Standards do exactly this. That’s why in other disciplines there are required standards in some use cases and not others (see my previous comment contrasting aerospace to less risky applications)


> At some point CS as a profession has to find the right balance of art and science.

That seems like such a hard problem. Why not tackle a simpler one?


I didn’t downvote but I’ll weigh in on why I disagree.

The glib answer is “because it’s worth it.” As software interfaces with more and more of our lives, managing the risks becomes increasingly important.

Imagine if I transported you back 150 years to when the industrial revolution and steam power were just starting to take hold. At that time there were no consensus standards about what makes a mechanical system “good”; it was much more art than science. The numbers of mishaps and the reliability reflected this. However, as our knowledge grew we not only learned about what latent risks were posed by, say, a boiler in your home but we also began to define what is an acceptable design risk. There’s still art involved, but the science we learned (and continue to learn) provides the guardrails. The Wild West of design practice is no longer acceptable due to the risk it incurs.


I imagine that's part of why different programming languages exist -- IE you have slightly less footguns with Java than with C++.

The problem is, the nature of writing software intrinsically requires a balance of art and science no matter what language it is. That is because solving business problems is a blend of art and science.

It's a noble aim to try and avoid solving unnecessarily hard problems, but when it comes to the customer, a certain amount of it gets incompressible. So you can't avoid it.


Yes, coding at scale is about managing complexity. No, "Keeping methods short" is not a good way to manage complexity, because...

> then mentally model the entire graph of interactions at once

...partially applies even if you have well-named functional boundaries. You said it yourself:

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow.

Programs have a certain essential complexity. Making a function "simpler" means making it less complex, which means that that complexity has to go somewhere else. If you make all of your functions simple, then you simply need more functions to represent the same program, which increases the total number of possible interactions between nodes and therefore the cognitive load of understanding the whole graph/program.

Allowing more complexity in your functions makes them individually harder to understand, but reduces the total number of functions needed and therefore makes the entire program more comprehensible.

Also note that just because a function's implementation is complex doesn't mean that its interface also has to be complex.

And, functions with complex implementations are only themselves difficult to understand - functions with complex interfaces make the whole system more difficult to understand.


This is where Occam's Razor applies - do not multiply entities unnecessarily.

Having hundreds or thousands of simple functions is the opposite of this advice.

You can also consider this in more scientific terms.

Code is a mental model of a set of operations. The best possible model has as few moving parts as possible, there are as few connections between the parts as possible, each part is as simple as possible, and both the parts and the connections between them are as intuitively obvious as possible.

Making parts as simple as possible is just one design goal, and not a very satisfactory or useful one in its own terms.

All of this turns out to be incredibly hard, and is a literal IQ test. Mediocre developers will always, always create overcomplicated solutions. Top developers have a magical ability to combine a 10,000 foot overview with ground level detail, and will tear through complex problems and reduce them to elegant simplicity.

IMO we should spend less time teaching algorithms and testing algorithmic specifics, and more on analysing complex systems and implementing them with minimal, elegant, intuitive models.


Lately I’ve found decoupling to be helpful in this regard.

This is an auth layer, it’s primary charge is ensure those receiving and modifying resources have the permissions to do so.

This is the data storage layer. It’s focused on clean, relatively generic data storage abstractions and models that are relatively unopinionated, and flexible.

This is the contract layer. It’s more concerned with combining the apis of the data and auth than it is with data transformation or business logic.

This is the business logic layer. It takes relatively abstract data from our API and performs transformations to massage it into shapes that fit the needs of our customers and the mental models we’ve created around those requirements.

Etc. Etc.

Of course this pragmatic decoupling is easier said than done, but the logical grouping of like concerns allows for discoverability, flexibility, and a generally clear demarcation of concerns.


I've also been gravitating towards this kind of component categorization, but then there's the ugly problem of "cross-cutting concerns". For instance:

- The auth layer may have an opinion on how half of the other modules should work. Security is notoriously hard to isolate into a module that can be composed with others.

- Diagnostics layer - logging, profiling, error reporting, debugging - wants to have free access to everything, and is constantly trying to pollute all the clean interfaces and beautiful abstractions you design in other layers.

- User interface - UI design is fundamentally about creating a completely separate mental model of the problem being solved. To make a full program, you have to map the UI conceptualization to the "backend" conceptualization. That process has a nasty tendency of screwing with every single module of the program.

I'm starting to think about software as a much higher-dimensional problem. In Liu Cixin's "The Three Body Problem" trilogy, there's a part[0] where a deadly device encased in impenetrable unobtanium[1] is neutered by an attack from a higher dimension. While the unobtanium shell completely protects the fragile internals in 3D space, in 4D space, both the shell and the internals lie bare, unwound, every point visible and accessible simultaneously[2].

This is how I feel about building software systems. Our abstractions are too flat. I'd like to have a couple more dimensions available, to compose them together. Couple more angles from which to view the source code. But our tooling is not there. Aspect-oriented programming moved in that direction a bit, but last I checked, it wasn't good enough.

--

[0] - IIRC it's in the second book, "The Dark Forest".

[1] - It makes more sense in the book, but I'm trying to spoiler-proof my description.

[2] - Or, going down a dimension, for flat people living on a piece of paper, a circle is an impenetrable barrier. But when we look at that piece of paper, we can see what's inside the circle.


Neat, that's some heady shit. I'll have to check Aspect oriented programming out.

It's a bit of work but I've been thinking the concept of interchange logic is a neat idea for cross layer concerns.

So for instance, I design my UI to exist in some fashion (I've been thinking contexts are actually a decent way to implement this model cause then you can swap them in and out in order to use them in different manners...)

So say, I've got some component which exists in the ForumContext, and it needs all the data to display for the forum.

So I build a ForumContext provider which is an interchange layer between my ForumApi and my ForumUI.

Then if it turns out I want to swap out the Api with another, all I have to do is create a new ForumContext provider which provides the same shape of data, and the User Interface doesn't need to change.

Alternatively if I need to shape the data in a new fashion, all I need to do is update my ForumContext provider to reshape the API data and I don't need to muss with the API at all (unless of course, I need new data in which case, yea of course).

It's not perfect, and React's docs seem to warn against use of contexts but I think you could make a decent architecture out of them potentially. And they can be a lot less boiler plate than a similar redux store by using the state hooks React provides.

I still have to build out some sort of proof of concept of my idea, it's essentially connected component trees again. But when half the components in my library are connected to the API directly you just end up with such a mess any time you need to either repurpose a component for any other use or switch a section of your app over to a new data store or api.

At the end of the day, it seems like no matter how hard you try, it's really just about finding the best worst solution ;-).

And yea, security is a doozy in general. I've been working on decoupling our permissions logic a bit lately since it's couple between records, permissions, and other shit at the moment. Leaves a lot of room for holes.


>If you make all of your functions simple, then you simply need more functions to represent the same program

The semantics of the language and the structure of the code help hide irrelevant functional units from the global namespace. Methods attached to an object only need to be considered when operating on some object, for example. Private methods do not pollute the global namespace nor do they need to be present in any mental model of the application unless it is relevant to the context.

While I do think you can go too far with adding functions for its own sake, I don't see that they add to the cognitive load in the same way that possible interactions within a functional unit does. If you're just polluting a global namespace with functions and tiny objects, then that does similarly increase cognitive load and should be avoided.


> No, "Keeping methods short" is not a good way to manage complexity

Agreed

> Allowing more complexity in your functions makes them individually harder to understand

I think that that can mostly be avoided, by sometime creating local scopes {..} to avoid too much state inside a function, combined with whitespace and some section "header" comments (instead of what would have been sub function names).

Can be quite readable I think. And nice to not have to jump back and forth between myriads of files and functions


I have found this to be one of those A or B developer personas that are hard for someone to change, and causes much disagreement. I personally agree 100%, but have known other people who couldn't disagree more, it is what it is.

I've always felt it had a strong correlation to top-down vs bottom-up thinkers in terms of software design. The top-down folks tend to agree with your stance and the bottom-up group do not. If you're naturally going to want to understand all of the nitty gritty details you want to be able to wrap your head around those as quickly as possible. If you're willing to think in terms of the abstractions you want to remove as many of those details from sight as possible to reduce visual noise.


I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.

Have you ever seen a codebase with infrastructure and piping taking about 70% of the code, with tiny pieces of business logic thrown here and there? It's impossible to figure out where the actual job is being done (and what it actually is): all you can see is just an endless chain of methods that mostly just delegate the responsibility further and further. What could've been a 100-line loop of "foreach item in worklist, do A, B, C" kind is instead split over seven tightly cooperating classes that devote 45% of their code to multiplexing/load-balancing/messaging/job-spooling/etc, another 45% to building trivial auxiliary structure and instantiating each other, and only 10% actually devoted to the actual data processing, but good luck finding those 10%, because there is a never-ending chain of calling each other: A.do_work() calls B.process_item() which calls A.on_item_processing() which calls B.on_processed()... wait, shouldn't there been some work done between "on_item_processing" and "on_processed"? Yes, it was done by an inconspicuously named "prepare_next_worklist_item" function.

Ah, and the icing on the cake: looping is actually done from the very bottom of this call chain by doing a recursive call to the top-most method which at this point is about 20 layers above the current stack frame. Just so you can walk down this path again, now with the feeling.


Your comment gives me emotional flashbacks. Years ago I took Java off my resume, because I don’t want to ever interact with this sort of thing again. (I’m sure it exists in other languages, but I’ve never seen it quite as bad as in Java.)

I think the best “clean code” programming advice is the advice writers have been saying for centuries. Find your voice. Be direct and be brief. But not too brief. Programming is a form of expression. Step 1 is to figure out what you’re trying to say (eg the business logic). Then say it in its most natural form (switch statements? If-else chain? Whatever). Then write the simplest scaffold around it you can so it gets called with the data it needs.

The 0th step is stepping away from your computer and naming what you want your program to express in the first place. I like to go for walks. Clear code is an expression of clear thoughts. You’ll usually know when you’ve found it because it will seem obvious. “Oh yeah, this code is just X. Now I just have to type it up.”


>I wish there was an "auto-flattener"/"auto-inliner" tool

I'm as big an advocate of "top-down" design as anyone, and I have also wished for such a tool. When you just want to know "what behavior comes next", all the abstractions do get in the way. The IDE should be able to "flatten" the execution path from current context and give you a linear view of the code. Sort of like a trace of a debug session, but generated on-the-fly. But still, I don't think this is the best way to write code.


Most editors have code folding. I've noticed this helps when there are comments or it's easy to figure out the branching or what not.

However, what you're asking for is a design style that's hard to implement I think without language tooling (for example identifying effectful methods).


GP is asking for the opposite. They're asking for code unfolding.

That is, given a "clean code like":

  auto DoTheThing(Stuff stuff) -> Result {
    const auto foo = ProcessSth(stuff);
    const auto bar = ValidateSthElse(stuff);

    return DoSth(foo, bar);
  }
The tool would inline all the function calls. That is, for each of ProcessSth(), ValidateSthElse() and DoSth(), it would automatically perform the task of "copy the function body, paste it at the call site, and massage the caller to make it work". It's sometimes called the "inline function" refactoring - the inverse of "extract function"/"extract method" refactoring.

I'd really, really want such a tool. Particularly one where the changes were transient - not modifying the source code, just overlaying it with a read-only replacement. Also interactive. My example session is:

- Take the "clean code" function that just calls a bunch of other functions. With one key combination, inline all these functions.

- In the read-only inlined overlay, mark some other function calls and inline them too.

- Rinse, repeat, until I can read the overlay top-to-bottom and understand what the code is actually doing.


Signed up just to say that I've also really, really wanted such a tool since forever. While for example the Jetbrains IntelliJ family of editors has the automatic "inline function" refactoring, they do it by permanently modifying the source code, which is not quite what we want. Like you say, it should be transient!

So I recently made a quick&dirty interactive mock-up of how such an editor feature could look. The mockup is just a web page with javascript and html canvas, so it's easy to try here: https://emilprogviz.com/expand-calls/?inline,substitution (Not mobile friendly, best seen on desktop)

There are 2 different ways to show the inlining. You can choose between them if you click the cogwheel icon.

Then I learned that the Smalltalk editor Pharo already has a similar feature, demonstrated at https://youtu.be/baxtyeFVn3w?t=1803 I wish other editors would steal this idea. Microsoft, are you listening?

My mock-up also shows an idea to improve code folding. When folding / collapsing a large block of code, the editor could show a quick summary of the block. The summary could be similar to a function signature, with arguments and return values.


Thank you! I'm favoriting this comment. This is exactly what I was thinking about (+/- some polish)!

In particular, the SieveOfErastothenes() call, which I can inline, and inside the overlay, I can inline the call to MarkMultiples(), and the top-level variable name `limit` is threaded all the way down.

Please don't take that demo site down, or publish it somewhere persistent - I'd love to show it around to people as the demonstration of the tool I'm looking for.

> When folding / collapsing a large block of code, the editor could show a quick summary of the block.

I love how you did this! It hasn't even occurred to me, but now that I saw it, I want to have this too! I also like how you're trying to guess which branches in a conditional won't be taken, and diminish them visually.

EDIT: Also, welcome to HN! :).


> Please don't take that demo site down, or publish it somewhere persistent

Feel free to spread the URL around, I plan to keep it online for the rest of my life, or until the feature is available in popular editors - whichever comes first. And if someone wants to mirror the demo elsewhere, it should be easy to do so, since it's client-side only and MIT licensed.

> Also, welcome to HN! :)

Thanks! Been lurking here in read-only mode for years, but today I finally had something to contribute.


I just finished binge-watching all five of your videos on better programming tools, and I must say, it just blew my mind. Thank you for making them.

I've been maintaining my own notes on the kind of tools I'd like to have, with hopes to maybe implement them one day, and your videos covered more than half of my list, while also showing tons of brilliant ideas that never occurred to me. I'm very happy to see that the pain points I identified in my programming work aren't just my imagination.

Also, on a more abstract level, I love your approach to programming dilemmas, and it's the first time I saw it articulated explicitly: when there are two strong, conflicting views, you do a pros/cons analysis on both, and try to find a new approach that captures all the benefits, while addressing all the drawbacks.

I've sent you an e-mail a while ago, let me know if it got through :). I'll be happy to provide all kinds of feedback on the ideas you described in your videos, and I'd love to bounce the remaining part of my list off you, if you're interested :).

> today I finally had something to contribute

That's a first-class contribution. I think you should post the link to your site as a HN submission, using title "Show HN: Ideas for better programming tools" ("Show HN" being a marker that you're submitting your own work).


Wow, thanks, I'm really happy you liked my videos so much! I wonder how many of us have great tool ideas in private notes sitting on our hard drives, not really sharing them with others. I'm glad your ideas overlap with mine, because the more people have the same idea, the more likely it is to be a good one, I think.

> when there are two strong, conflicting views, you do a pros/cons analysis on both

Yeah, it's not easy... I've participated in endless, looping debates as much as anyone - I guess it's just human psychology. But with enough conscious effort, I find that it's sometimes possible to take a step back, take a fair look at both sides, and design a best-of-both-worlds solution. I'll apply this method again in future videos, and if I can inspire a few more people to use it, that's great. Making the programming world a tiny bit less "tribal" and a bit more constructive.

> I've sent you an e-mail a while ago

Yeah, let's continue our discussion over email. I replied to your email from my private address, let me know if it got through.

> That's a first-class contribution. I think you should post the link to your site as a HN submission

"First-class contribution" gave me tears of joy :) I'd like to "Show HN" in a few months. Once I post there, I might get a lot of comments, and I want to be available to answer the comments and make follow-up videos quickly, but currently my personal life is too busy.


> I wonder how many of us have great tool ideas in private notes sitting on our hard drives, not really sharing them with others.

From talking to others, as well as spending way too much time on HN, I think the answer is, "quite a lot". Perhaps not relative to the number of programmers, but in absolute terms, I'm pretty sure there's a hundred strong ideas to be found among just the people who comment here.

I do feel that our industry has an implicit bias against those ideas - I think it's a combination of, if you complain you get labeled as whiny, and working on speculative tooling is considered time spent not providing business value.

> let me know if it got through.

Yeah, I got it, thanks! I'm desperately trying to trim down my draft reply, because I somehow managed to write a short article when describing two of my most recent ideas :).

> I'd like to "Show HN" in a few months.

Sure, take your time :). But I think people will love what you're already have. It's not just the ideas you're presenting, but also a kind of "impression of quality" your videos give.


Great mock up, this is pretty interesting. Food for thought!


I'm curious to understand your use-case, would be open to explaining more?

Do you actually want to overlay the code directly into the parent method or would a tooltip (similar to hyperlink previews) work? I wondering how expanding the real estate space would help with readability and how the userflow would work.

For example, code folding made a lot more sense because the window would have those little boxes to fold unfold (which is basically similar to the act of inline and un-inline).


Yes, I want to overlay the code directly into the parent method, preferably with appropriate syntax highlighting and whatever other goodies the IDE/editor provides normally. It would be read-only to indicate that it's just a transient overlay, and not an actual code change.

So, if I have a code like:

  auto Foo(Bar b) {
    return b.knob();
  }

  auto Frob(Frobbable f) {
    auto q = Foo(f.thing());
    return q.quux(f.otherthing());
  }

  auto DoSth(Frobbable frobbie) {
    auto a = Frob(frobbie);
    return a.magic();
  }
Then I want to mark the last function, and automatically turn it into:

  auto DoSth(Frobbable frobbie) {
    auto foo_1 = frobbie.thing();
    auto q_1 = foo_1.knob();
    auto frob_1 = frobbie.otherthing();
    auto a = q_1.quux(frob_1);
    return a.magic(); 
  }
Or something equivalent, possibly with highlights/synthetic comments telling me which bits of code came from where. I want to be able to keep inlining function calls like this, until I hit a boundary layer like the standard library, or a third-party library. I might want to expand past that, but I don't think I'd do that much. I'd also like to be able to re-fold code I'm not interested in, to reduce noise.

What such tool would do is automating the process I'm currently doing manually - jumping around the tiny functions calling other tiny functions, in order to reassemble the actual sequence of lower-level operations.

I don't want this to be a tooltip, because I want to keep expanding past the first level, and have the overlay stay in place until I'm done with it.

EDIT: languages in the Lisp family - like Common Lisp or Emacs Lisp - feature a tool called "macroexpander". Emacs/SLIME wraps it into an interactive "macrostepper" that behaves pretty much exactly like the tool I described in this discussion thread.

EDIT2: See the excellent demo upthread by ' emilprogviz - https://news.ycombinator.com/item?id=27306118. That's the kind of tool I had in mind.


yes excellent mock, I see what you mean.

How would you deal with multiple levels of nesting? :) Let's say you're at level 5 which is pretty reasonable.

Oh and I also forgot about languages like Java that are heavy on interfaces and DI. That would be interesting to handle.


> I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.

Learn to read assembly and knock yourself out.


That's not a very helpful response. Unless the code is compiled to native machine code and is all inlined, this won't help one bit.


On today's HN with this thread is "the hole in mathematics".

It is directly germane to what you are talking about.

In the process of formalizing axiomatic math, 1+1=2 took 700 pages in a book to formally prove.

The point about assembly is more or less correct. The process of de-abstracting is going to be long and probably not that clear in the end.

I understand what you mean: the assembly commenter is correct, you'll need to actually execute the program and reduce it to a series of instructions it actually performed.

Which is either an actual assembly, or a pseudo-assembly instruction stream for the underlying turing machine: your computer.


I really need you to introduce you to Jester, my toy functional programming language. It compiles down to pure lambda calculus (data structures are implemented with Scott-Mogensen encoding) and then down to C that uses nothing but function calls and assignments of pointers to struct fields. The logic and arithmetic are all implemented in the standard library: a Bool is a function that takes 2 continuations, a Byte is 8 Bool, an Int is 4 Byte, addition uses the good old ripple-carry algorithm, etc.

Reading the disassembly of the resulting program is pretty unhelpful: any function consists entirely of putting values from the fields of the passed-in structures into the fields of new structures and (tail)calling another function and passing it some mix of old/new structures.


Maybe not helpfull, but it made me smile :-)


While I think you are onto something about top-down vs. bottom-up thinkers, one of the issues with a large codebase is literally nobody can do the whole thing bottom-up. So you need some reasonable conventions and abstraction, or the whole thing falls apart under it's own weight.


Yep, absolutely.

That's another aspect of my grand unifying theory of developers. Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service. How one perceives complexity, what causes one to complain about complexity, etc all vary based on these things. It's easy to arrive in circumstances where people are arguing past each other.

If you need to be able to keep all the details in your head you're going to need smaller codebases. Similar, if you're already keeping track of everything, things like static typing become less important to you. And the opposite is true.


> Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service.

Your theory needs to account for progression over time. For example, the first programming languages I've learned were C++ and Java, so I believed in static typing. Then I worked a lot in PHP, Erlang and Lisp, and became a dynamic typing proponent. Later on, with much more experience behind me, I became a static typing fan again - to the point that my Common Lisp code is thoroughly typed (to the point of being non-idiomatic), and I wish C++ type system was more expressive.

Curiously, at every point of this journey, I was really sure I have it all figured out, and the kind of typing I like is the best way to manage complexity.

--

EDIT: your hypothesis about correlated "frames of mind" reminds me of a discussion I had with 'adnzzzzZ here, who also claimed something similar, but broader: https://news.ycombinator.com/item?id=26076639. The topic started as, roughly, whether people designing addictive games using gambling mechanics are devil incarnate (my view) or good people servicing a different target audience than me (their view), but the overarching theory 'adnzzzzZ presented in https://github.com/a327ex/blog/issues/66 also touched on static/dynamic typing debate.


My programming path is similar to yours! Started with C++ then moved into Perl. Then realised that uber-dynamic-typing in Perl was a death-trap in enterprise software. Then oddly found satisfaction in Excel/VBA because you can write more strictly-typed code in VBA (there is also a dynamic side) and even safely call the Win32 API directly. Finally, I came back to C++ and Java which are "good enough" for expressing the static types that I need. The tooling and open-source ecosystem in Java makes it very hard to be more productive in other languages (except maybe C#, but they are in the same language family). I'm a role now that also has some Python. While the syntactical sugar is like written prose, the weaker typing (than C++/Java) is brutal in larger projects. Unless people are fastidious about type annotations, I constantly struggle to reason about the code while (second-)guessing about types.

You wrote: <<I wish C++ type system was more expressive.>> Can you share an idea? For example: Java 17 (due for release in the fall) will feature sealed classes. This looks very cool. For years, I (accidentally) simulated this behaviour using enums tied to instances or types.


Huh. There's something to this.

I've often wondered why certain people feel so attached to static typing when in my experience it's rarely the primary source of bugs in any of the codebases I work with.

But it's true, I do generally feel like a codebase that's so complex or fractured that no one can understand any sizable chunk of it is just already going to be a disaster regardless of what kind of typing it uses. I don't hate microservices, they're often the right decision, but I feel they're almost always more complicated than a monolith would be. And I do regularly end up just reading implementation code, even in 3rd-party libraries that I use. In fact in some libraries, sometimes reading the source is quicker and more reliable than trying to find the relevant documentation.

I wouldn't extrapolate too much based on that, but it's interesting to hear someone make those connections.


I'll add my voice to your parent.

Statically typed languages and languages that force you to be explicit are awesome for going into a codebase you have never seen and understanding things. You can literally just let your IDE show you everything. All questions you have are just one Ctrl-click away and if proper abstraction (ala Clean Code) has been used you can ignore large swaths of code entirely and just look at what you need. Naming is awesome and my current and previous code bases were both really good in this (both were/are mixes of monolith and microservices). I never really care where a file is located. I know quite a few coders that will want to find things via the folder tree. I just use the keyboard shortcut to open by name and start guessing. Usually first or second guess finds what I need because things are named well and consistently.

Because we use proper abstractions I can usually see at first glance what the overall logic is. If I need to know how a specific part works in detail I can easily drill down via Ctrl-click. With a large inlined blob of code I would have a really hard time. Do I skip from line 1356 to 1781 or is that too far? Oh this is JavaScript and I don't even know if this variable here is a string or a number or both depending on where in the code we are or maybe it's an object that's used as a map?

The whole thing is too big to keep in my head all the time and I will probably not need to touch the same piece of code over and over and instead I will move from one corner to the next and again to another corner over the course of a few weeks to months.

That's why our Frontend code is being converted to TypeScript and our naming (and other) conventions make even our javascript code bearable.


Is your backend Java or C#? Your IDE description feels like Java w/ Eclipse or IntelliJ or C# w/ Visual Studio. I have similar experience to you. The "discoverability" of a large codebase is greatly increased by combining language with tooling. If you use Java with Maven-like dependency management (you can use Gradle these days if 'alergic' to Maven's pom.xml), the IDE will usually automatically download and "hook-up" source code. It is ridiculous how fast you can move between layers of (a) project code, (b) in-house libraries, (c) open source libraries, and (d) commercial closed-source libraries (decompile on the fly in 2021!). (I assume all the same can be done for C# w/ Visual Studio.)

To be fair, when I started my career, I worked on a massive C project that was pretty easy to navigate because it was a mono-repo with everything in one place. CTags could index 99% of what you needed, and the macros weren't out of control. (Part of the project was also C++, but written in the style of career C programmers who only wanted namespaces and trivial generics like vector and map! Again, very simple to navigate huge codebase.)

I'm still surprised in 2021 when someone asks me to move a Java class to a different package during a code review. My inner monologue says: "Really... do they still use a file browser? Just use the IDE to find it!"


> I've often wondered why certain people feel so attached to static typing when in my experience it's rarely the primary source of bugs in any of the codebases I work with.

That's precisely why people are attached to it; because it's rarely a source of bugs. :-)


Ha! Good catch. :)


[ separate answer for microservices ]

Yeah, monoliths are frequently easier to reason about, simply because you have fewer entities. The big win of microservices (IMHO) isn't "reason about", it is that they are a good way of getting more performance out of your total system IFF various parts of the system have different scaling characteristics.

If your monolith is composed of a bunch of things, where most things require resources (CPU/RAM/time) on an O(n) (for n being the number of active requests), but one or a few parts may be O(n log n). Or be O(n), but with a higher constant...

Then, those "uses more resources" is the limit of scaling for each instance of the monolith, and you need to deploy more monoliths to cope with a larger load.

On the other hand, in a microservice architecture, you can deply more instances of just the microservices that need it. This can, in total, lead to more thinsg being done, with in total less resources.

But, that also requires you to have your microservices cut out in suitable sizes, which requires you to at one point have understood the system well enough to cut them apart.

And that, in turn, may lead to better barriers between microservices, meaning that each microservice MAY be easier to understand in isolation.


> But, that also requires you to have your microservices cut out in suitable sizes, which requires you to at one point have understood the system well enough to cut them apart.

Sure, but that’s not particularly hard; it’s been basic system analysis since before “microservices” or even “service-oriented architecture” was a thing. Basic 70s-era Yourdon-style structured analysis (which, while its not the 1970s approach, can be applied incrementally in a story-by-story agile fashion to build up a system as well as doing either big upfront design or working from the physical design to the logical requirements of an existing system) produces pretty much exactly what you need to determine service boundaries.

(It’s also a process that very heavily leverages locality of knowledge within processes and flows, so its quite straightforward to carry out without ever having to hold the whole system in your head.)


Yep, there's no real magic here. There's some understanding forced by a (successful) transition to microservices ,but a transition to microservices is not a requirement for said gained insight.

And if all parts of your system scale identically, it may be better to scale it by replicating monoliths.

Another POSSIBLE win is if you start having multiple systems, sharing the same component (say, authentication and/or authorization), at which point there's something to be said for breaking at least that bit out of every monolith and putting them in a single place.


I don't really care about the static/dynamic typing spectrum, I care about the strong/weak typing spectrum.

At any point, will the code interpret a data item according to the type it was created with?

A prime example of "weakly typed" is when you can add "12" and 34 to get either "1234" or 46.


This is an interesting distinction. I confess that I frequently interchange the pairs.


I mean, to some respect, "dynamic typing" is "type the data" and "static typing" is "type the variable".

In both cases, there's the possibility for doing type propagation. But, if you somehow manage to pass in two floats to an addition that a C compiler thinks is an integer addition, you WILL have a bad day. Whereas in Common Lisp, the actual passed-in values are typed (for floats, usually boxed, for integers, if they're fixnums, usually tagged and having a few bits less than you would expect).


I’m reminded of an earlier HN discussion about an article called The Wrong Abstraction, where I argued¹ that abstractions have both a benefit and a cost and that their ratio may change as a program evolves and which of those “nitty gritty details” are immediately relevant and which can helpfully be hidden behind abstractions changes.

¹ https://news.ycombinator.com/item?id=23742118


The point is that bottom-up code is a siren song. It never scales. It makes it a lot easier to get started, but given enough complexity it inevitably breaks down.

Once your codebase gets to somewhere around the 10,000 line mark, it becomes impossible for a single mind to hold the entire program in their head at a single time. The only way to survive past that point is with carefully thought out, water tight layers of abstractions. That almost never happens with bottom-up. Bottom-up is a lot like natural selection. You get a lot of kludges that work great to solve their immediate problem, but behave in undefined and unpredictable ways when you extend them outside their original environment.

Bottom-up can work when you're inside well-encapsulated modular components with bounded scope and size. But there's no way to keep those modules loosely coupled unless you have a elegant top-down architecture imposing order at the large-scale structure.


But the reverse is also true. Top-down programming doesn't really work well for smaller programs, it definitely doesn't work well when you're dealing with small, highly performance-critical or complex tasks.

So sure, I'll grant that when your program reaches the 10,000 line mark, you need to have some serious abstractions. I'll even give you that you might need to start abstracting things when a file reaches 1,000 lines.

But when we start talking about the rule of 30 -- that's not managing complexity, that's alphabetizing a sock drawer and sewing little permanent labels on each sock. That approach also doesn't scale to large programs because it makes rewrites and refactors into hell, and it makes new features extremely cumbersome to quickly iterate on. Your 10,000 line program becomes 20,000 lines because you're throwing interfaces and boilerplate all over the place.

Note that this isn't theoretical, I have worked in programs that did everything from building an abstraction layer over the database in case we wanted to use Mongo and SQL at the same time (we didn't), to having a dependency management system in place that meant we had to edit 5 files every time we wanted to add a new class, to having a page lifecycle framework that was so complicated that half of our internal support requests were trying to figure out when it was safe to start adding customer data to the page.

The benefit of a good, long, single-purpose function that contains all of its logic in one place is that you know exactly what the dependendencies are, you know exactly what the function is doing, you know that no one else is calling into the inlined logic that you're editing, and you can easily move that code around and change it without worrying about updating names or changing interfaces.

Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function. It's fine to have a function that's longer than a couple hundred lines. If you're building something like a rendering or update loop, in many cases I would say it's preferable.


It's funny how these things are literally what the Clean Code book advocates for. Sure there is mention of a lot of stuff that's no longer needed and was a band aid over language deficiencies of a particular language. But the ideas are timeless and I used them before I even knew the book and I used them in Perl.


> these things are literally what the Clean Code book advocates for

I'm not sure I understand what you're saying, I might be missing your point. The Clean Code book advocates that the ideal function is a single digit number of lines, double digits at the absolute most.

In my mind, the entire process of writing functions that short involves abstracting almost everything your code does. It involves passing data around all over the place and attaching state to objects that get constructed over multiple methods.

How do you create a low-abstraction, bottom-up codebase when every coroutine you need to write is getting turned into dozens of separate functions? I think this is showcased in the code examples that the article author critiques from Clean Code. They're littered with side effects and state mutations. This stuff looks like it would be a nightmare to maintain, because it's over-abstracted.

Martin is writing one-line functions whose entire purpose is to call exactly one other function passing in a boolean. I don't even know if I would call that top-down programming, it feels like critiquing that kind of code or calling it characteristic of their writing style is almost unfair to top-down programmers.


I'm not saying the entire book taken literally is how everything must be done. I was trying to say that the general ideas make sense such as keeping a function at the same level of abstraction and keeping them small.

I agree with you that having all functions be one liners is not useful. Keeping all functions to within just a few lines or double digits at most makes sense however. Single digit could be 9. That's a whole algorithm right there! For example quicksort (quoted from the Wikipedia article)

  algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := partition(A, lo, hi)
        quicksort(A, lo, p - 1)
        quicksort(A, p + 1, hi)
This totally fits the single digit of lines rule and it describes the algorithm on a high enough level of abstraction that you get the idea of the whole algorithm easily. Do you think that inlining the partition function would make this easier or harder to read?

  algorithm quicksort(A, lo, hi) is
    if lo < hi then
        pivot := A[hi]
        i := lo
        for j := lo to hi do
            if A[j] < pivot then
                swap A[i] with A[j]
                i := i + 1
        swap A[i] with A[hi]

        quicksort(A, lo, i - 1)
        quicksort(A, i + 1, hi)
(I hope I didn't mix up the indentation - on the phone here and it's hard to see lol)

Now some stuff might require 11 or 21 lines. But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.


> But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.

Well, but that's exactly what I'm pushing back against. I think the rule of 30 is often a mistake. I think if you're going out of your way to avoid long functions, then you are probably over-abstracting your code.

I don't necessarily know that I would inline a quicksort function, because that's genuinely something that I might want to use in multiple places. It's an already-existing, well-understood abstraction. But I would inline a dedicated custom sorting method that's only being used in one place. I would inline something like collision detection, nobody else should be calling that outside of a single update loop. In general, it's a code smell to me if I see a lot of helper functions that only exist to be called once. Those are prime candidates for inlining.

This is kind of a subtle argument. I would recommend http://number-none.com/blow/john_carmack_on_inlined_code.htm... as a starting point for why inlined code makes sense in some situations, although I no longer agree with literally everything in this article, and I think the underlying idea I'm getting at is a bit more general and foundational.

> Do you think that inlining the partition function would make this easier or harder to read?

Undoubtedly easier, although you should label that section with a comment and use a different variable name than `i`. Your secondary function is just a comment around inline logic, it's not doing anything else.[0]

But by separating it out, you've introduced the possibility for someone else in the same class or file to call that function without your knowledge. You've also introduced the possibility for that method to contain a bug that won't be visible unless you step through code. You've also created a function with an unlabeled side effect that's only visible by looking at the implementation, which I thought we were trying to avoid.

You've added a leaky abstraction to your code, a function that isn't just only called in one place, but should only be called in one place. It's a function that will produce unexpected results if anyone other than the `quickSort` method calls it, that lacks any error checking; it's not really a self-contained unit of code at all.

And for what benefit? Is the word `partition` really fully descriptive of what's going on in that method? Does it indicate that the method is going to manipulate part of the array? And is anyone ever going to need to debug or read a quicksort method without looking at the partition method? I think that's very unlikely.

----

Maybe you disagree with everything I'm saying above, but regardless, I don't think that Clean Code is actually advocating for the same ideas as I am:

> Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function.

I don't think that claim is one that Martin would agree with. Or if it is, I don't think it's a statement he's giving actionable advice about inside of his book.

----

[0]: In a language like Javascript (or anything that supports inline functions), we might still use a function or a new context as a descriptive boundary, particularly if we didn't want `j` and `pivot` to leak:

  function quicksort(data, lowIndex, highIndex) {
    if (lowIndex >= highIndex) { return; }

    const pivotIndex = (function partition (data, lo, hi) {
      //etc...
    }(data, lo, hi));

    quickSort(data, lowIndex, pivotIndex - 1);
    quickSort(data, pivotIndex + 1, highIndex);
  }
But for something this trivially small, I suspect that a simple comment would be easier to read.

  function quicksort(data, lowIndex, highIndex) {
    if (lowIndex < highIndex) { return; }

    /* Partition */
    let pivot = data[hi];
    //etc...

    quicksort(data, lowIndex, partionIndex - 1);
    quicksort(data, partionIndex + 1, highIndex);
  }
Remember that your variable and function names can go out of date at the same speed as any of your comments. But the real benefit of inlining this partition function (besides readability, which I'll admit is a bit subjective), is that we've eliminated a potential source of bugs and gotten rid of a leaky abstraction that other functions might be tempted to call into.


> Remember that your variable and function names can go out of date at the same speed as any of your comments.

A very good point, thank you for voicing it!

As the luck would have it, two days ago I was writing comments about this at work during code review - there was a case where a bunch of functions taking a "connection" object had it replaced with a "context" object (which encapsulated connection, and some other stuff), but the parameter naming wasn't updated. I was briefly confused by this when studying the code.


Ha :) This is something that's also been drilled into me mostly just because I've gotten bitten by it in jobs/projects. The most recent instance I ran into was a `findAllDependents` method turning into `findPartialDependentsList`, but the name never getting updated.

Led to a non-obvious bug because from a high level, the code all looked fine, and it was only digging into the dependents code that revealed that not everything was getting returned anymore.


Absolutely agree that all naming can go out of date. With at least the tools I use nowadays it's even easier for comments to go out of date that it was previously because of all the automatic folding away in the IDE.

But one of the best reminders that comments don't do sh+t was early on in my career when my co worker asked me a question on a line of code (and it was literally just the two of us working on that code base). I probably had a very weird look on my face. I simply pointed to the line above the one he asked about. He read the comment and said "thank you".

I guess my point is that all you can do is to incorporate the "extra information" as closely as possible to the actual code, so that it's less likely to just be ignored/not seen. Thus incorporating it into the variable and function aiming itself is the closest you will get and as per your example (and my own experience as well) it can still happen. Nothing but rigorous code review practices and co workers that care will help with this.

But I think we can all agree (or I hope so at least) that it's better to have your function called `findAllDependents` and be slightly out of date than to have it called `function137` with a big comment on top that explains in 5 lines that it finds the list of all dependents.


Glad you admitted subjectivity. I will too and I am on the other side of that subjectivity. For the quicksort example, that was the pseudo code from the Wikipedia article.

I personally think that the algorithm is easier to grasp conceptually if I just need to know 'it partitions the data and the runs quicksort on both of those partitions. Divide and conquer. Awesome'.

I don't care at that level of abstraction _how_ the partitioning works. In fact there are multiple different partition functions people have created that have various characteristics. The fact that this changes its parameters is geberally bad if you ask me but in this specific case of a general purpose and high performance sorting function totally acceptable for the sake of speed and memory use considerations. In other 'real world' scenarios of 'simple business software' I would totally forsake that speed and memory efficiency for better abstractions. This is also where Carmack is basically not a good example. His world is that of high performance graphics and game engine programming where he's literally the one dude that has it all in his head. I can totally see why he would have different from someone like me that has to go look at a different piece of code that I've never seen before every day multiple times.

You mention various problems with this code such as the in place nature and bad naming and such. Most of that is simply the copy from Wikipedia and yes I agree I would also rename these in real code. I do not agree however with the parts about 'someone else could call this now'. To stick with Clean Code's language of choice, the partition function would actually be a private method to the quicksort class. Thus nobody outside can call it but the algorithm itself, as a self contained unit is not just a blob of code.

Same with your inlining of collision detection and such. I don't think I would do that. I think it has value to know that the overall loop is something like

  do_X() 
  do_Y() 
  detect_collisions() 
  do_Z() 
Overall "game loop" easily visible straight away. The collision detection function might be a private method to that class you're in though. Will depend on real world scenario I would say.

You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends. In this example it's sort of easy to see. As the code we are talking about gets larger it's not as easy any more. So you have to make sure to make a new comment above every 'section'. Problem is that this can be forgotten. Now I need to actually read and fully understand the code to figure out these boundaries. I can no longer just tell my editor to jump over something. I can no longer have the compiler ensure that the boundaries are set (it will ensure proper function definition and calls).


> The collision detection function might be a private method to that class you're in though.

Definitely making things private helps a lot, although its worth noting that classes often aren't maintained by only one person, and they often encapsulate multiple public methods and behaviors. It's still possible to clutter a class with private methods and to have other people working on that class that are calling them incorrectly. This is especially true for methods that mutate private state (at least, in my experience), because those state mutations and state assumptions are often not obvious and are undocumented unless you read the implementation code (and private methods tend to be less documented than public methods in my experience).

Writing in a more functional style (even inside of a class) can help mitigate that problem quite a bit since you get rid of a lot of the problematic hidden state, but I don't want to give the impression that if you make a method private that's always safe and it'll never get misused.

> You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends.

In this example, I felt like it was overkill to include a closing comment, since the whole thing is like 20 lines of code. But you could definitely add a closing comment here. If you use an editor that supports regions, they're pretty handy for collapsing logic as well. That's a bit language dependent though. If you're using something like C# -- C# has fantastic region support in Visual Studio. Other languages may vary.

Of course, people who don't use an IDE can't collapse your regions, but in my experience people who don't use an IDE also often hate jumping between function definitions since they need to manually find the files or grep for the function name, so I'm somewhat doubtful they'll be too upset in either case.

> I can no longer have the compiler ensure that the boundaries are set

You may already know this, but heads up that if you're worried about scope leaking and boundaries, check if your language of choice supports block expressions or an equivalent. Languages like Rust and Go can allow you to scope arbitrary blocks of code, C (when compiled with gcc) supports statement expressions, and many other languages like Javascript support anonymous/inline functions. Even if you are separating a lot of your code into different functions, it's still nice to be able to occasionally take advantage of those features. I often like to avoid the extra indentation in my code if I can help it, but that's just my own visual preference.


As mainly a bottom-up person, I completely agree with your analysis but I wonder if you might be using "top-down architecture" here in an overloaded way?

My personal style is bottom up, maximally direct code, aiming for monolithic modules under 10kloc, combined with module coupling over very narrow interfaces. Generally the narrow interfaces emerge from finding the "natural grain" of the module after writing it, not from some a priori top-down idea of how the communication pathways should be shaped.

Edit: an example of a narrow interface might be having a 10kloc quantitative trading strategy module that communicates with some larger system only by reading off a queue of things that might need to be traded, and writing to a queue of desired actions.


I never thought of things this way but it is a useful perspective.


> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

That's only 1 part of the complexity equation.

When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.

If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

Short functions are good when they fit the problem. Often they don't.


I feel like this is only a problem if the small functions share a lot of global state. If each one acts upon its arguments and returns values without side effects, ordering is much less of an issue IMO.


Well, if they were one function before they probably share some state.

Clean code recommends turning that function into a class and promoting the shared state from local variables into fields. After such a "refactoring" you get a nice puzzle trying to understand what exactly happens.


I've seen threads on this before but the "goto" (couldn' t stop myself) reaching of object oriented-ness to "solve" everything is really frustrating.

I've found the single greatest contributor to more readable and maintainable code is to limit state as much as possible.

Which was really hard for me to learn because it can be somewhat less efficient, and my game programmer upbringing hates it.


Sometimes eliminating state can also mean increasing complexity and lines of code tremendously.


A lot depends on what your language and its ecosystem can support. For instance, the kind of monadic stuff people do with Haskell and Scala can compress programs tremendously, but then I've worked in a codebase that tried the same things in C++ - and there, the line count expands, because the language just can't express some of the necessary concepts in a concise way.


> if they were one function before they probably share some state

and this is exactly why you refactor to pull out the shared state into parameters, so that each of the "subfunctions" have zero side effects.


In javascript I sometimes break up the behaviour of a large function by putting small internal functions inside it. Those internal functions often have side effects, mutating the state of the outer function which contains them.

I find this approach a decent balance between having lots of small functions and having one big function. The result is self contained (like a function). It has the API of a function, and it can be read top to bottom. But you still get many of the readability benefits of small functions - like each of the internal methods can be named, and they’re simple and each one captures a specific thought / action.


If you're calling those functions once each in a particular order then I can't possibly figure out what that does for you that whitespace and a few comments wouldn't. How does turning 100 lines of code into 120 and shuffling it out of execution order possibly make it easier to read?


I coded this way for a while and found it makes the code easier to read and easier to reason about. Instead of your function being

  func foo() {
    // do A.1
    // do A.2
    // do B.1
    // do B.2
    // etc...
  }
It becomes

  func foo() {
    // do A
    // do B
    // etc...
    // func A()...
    // func B()...
  }
When the func is doing something fairly complicated the savings can really add up. It also makes expressing some concurrency patterns easier (parallel, series etc...), I used to do this a lot back in the async.js days. The main downside seems to be less elegant automated testing from all the internal state.


No; I wouldn't do it if I was just calling them once each in a particular order. And I don't often use this trick for simple functions. But sometimes long functions have repeated behaviour.

For example, in this case I needed to do two recursive tree walks inside this function, so each walk was expressed as an inner function which recursively called itself, and each is called once from the top level method:

https://github.com/ottypes/json1/blob/05ef789cc697888802e786...

I don't do it like this often though. The code in this file is easily the most complex code I've written in nearly 3 decades of programming. Here my ability to read and edit the code is easily the most important factor. I think this form makes the core algorithm more clear than any other way I could factor this code. I considered smearing this internal logic out over several top level methods, but I doubt that would be any easier to read.



Aren't you creating new functions on each call to your parent function though? I imagine there must be a performance or memory penalty?


Now this is usually in my opinion not a good advice (it is like reintroduction of global variables) as unnecessary state certainly makes things more difficult to reason about.

I have read the book (not very recently) and I do not recall this but perhaps I am just immune to such advice.

I like his book about refactoring more than Clean Code but it introduced me to some good principles like SOLID (a good mnemonic), so I found it somewhat useful.


Yes and no.

What I find is that function boundaries have a bunch of hidden assumptions we don't think about.

Especially things like exceptions.

For all these utility functions are you going to check input variables, which means doing it over, over and over again. Catching exceptions everywhere etc?

A function can be used for a 'narrow use case' - but - when it's actually made available to other parts of the system, it needs to be kind of more generalized.

This is the problem.

Is it possible that 'nested functions' could provide a solution? As in, you only call the function once, in the context of some other function, so why not physically put it there?

I can have it's own stack, be tested separately if needed, but it remains exclusive to the context that it is in from a readability perspective - and you don't risk having it used for 'other things'.

You could even have an editor 'collapse' the function into a single line of code, to make the longer algorithm more readable.


The problem is abstraction isn't free. Sometimes it frees up your brain from unnecessary details and sometimes the implementation matters or the abstraction leaks.

Even something as simple as Substring which is a method we use all the time and is far more clear than most helper functions I've seen in code bases.

Is it Substring(string, index, length) or Substring(string, indexStart, indexEnd)

What happens when you pass in "abc".Substring(0,4) do you get an exception or "abc"?

What does Substring(0,-1) do? or Substring (-2,-3).

What happens when you call it on null? Sometimes this matters, sometimes it doesn't.


Also:

- Does it destructively modify the argument, or return a substring? Or both?

- If it returns a substring, is it a view over the original string, or a fresh substring that doesn't share memory with the original?

- If it returns a fresh substring, how does it do it? Is it smart or dumb about allocations? This almost never matters, except when it does.

- How does it handle multibyte characters? Do locales impact it in any way?

With the languages we have today, a big part of the function contract cannot be explicitly expressed in function signatures. And it only gets worse with more complicated tools of abstraction.


I posted this elsewhere in the thread, but local blocks that define which variables they read, mutate and export would IMO be a very good solution to this problem:

    (reads: var_1, var_2; mutates: var_3) {
       var_3 = var_1 + var_2
       int result_value = var_1 * var_2
    } (exports: result_value)

    return result_value * 5
There are a couple of newer languages experimenting with concepts like this, Jai being one: https://youtu.be/5Nc68IdNKdg?t=3493


This is a fascinating idea. In some languages like C or Java or C#, the IDE can probably do this "for free" -- generate, then programmer can spot check for surprises. Or the reverse, highlight a block of code and ask the IDE to tell you about read/mutate/export. In some sense, when you use automatic refactoring tools (like IntelliJ), extract a few lines of code as a new method needs to perform similar static analysis.

In the latest IntelliJ, the IDE will visually hint about mutable, primitive-typed local variables (including method parameters). A good example is a for loop variable (i/j/k). The IDE makes it stand-out. When I write Java, I try to use final everywhere for primitive-typed local variables. (I borrowed this idea from functional programming styles.) The IDE gives me a hint if I accidentally forget to mark something as final.


> local blocks that define which variables they read, mutate and export would IMO be a very good solution to this problem:

this is basically a lambda you call instantly.

    [&x, y, z] () {
        x = y + z;
    }();


It's similar, but lambdas don't specify the behaviour as precisely, and they're not as readable since the use of a lambda implies a different intention, and the syntax that transforms them into a scope block is very subtle. They may also have performance overhead depending on the environment, which is (arguably) additional information the programmer has to consider on usage.


>If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

You're missing likely missing one or more techniques that make this work well:

1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).

4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.

In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.


> It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example.

One of the codebases I'm currently working is a big example of that. I obviously can't share parts of it, but I'll say that I agree with GP. Lots of tiny functions kills readability.

> 1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

> 2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

That's table stakes. Unfortunately, quite often a properly descriptive name would be 100+ characters long, which obviously nobody does.

> 3. Similar levels of abstraction in each function

That's a given, but in a way, each "layer" of such functions introduces its own sublevel of abstraction, so this leads to abstraction proliferation. Sometimes those abstractions are necessary, but I found it easier when I can handle them through few "deep" (as Ousterhout calls it) functions than a lot of "shallow" ones.

> 4. Explicit pre/post conditions in each method

These introduce a lot of redundant code, just so that the function can ensure a consistent state for itself. It's such a big overhead that, in practice, people skip those checks, and rely on everyone remembering that these functions are "internal" and had their preconditions already checked. Meanwhile, a bigger, multi-step function can check those preconditions once.


> Lots of tiny functions kills readability.

I've heard this argument a lot, and I've found generally there's another problem that causes lack of readability than the small functions.

>Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

Here though you're kind of used to reading code upwards though, so flip the depth first and make it depth last (or take the hit on the forward declarations. If you've got more than you can handle of these, your classes are probably too complex regardless (i.e. doing input, parsing, transformation, and output in the same method).

> quite often a properly descriptive name would be 100+ characters long

Generally if this is the case, then the containing class / module / block / ? is too big. Not a problem of small methods, problem is at a higher level.

> Explicit pre/post conditions in each method

I should have been more explicit here - what I meant is that you know that in the first method, that only the first 3 variables matter, and those variables / parameters are not modified / relevant to the rest of the method. Even without specifically coding pre/post-cons, you get a better feel for the intended isolation of each block. You fall into a pattern of writing code that is simple to reason about. Paired with pure methods / immutable variables, this tends to (IMO) generate easily scannable code. Code that looks like it does what it does, rather than code that requires reading every line to understand.


> You're missing likely missing one or more techniques that make this work well:

I know how to do it, I just don't always think it's worth it.

> Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

Not 100 lines, just 34, but it's a good example of a function I wouldn't split even if it get to 300 lines.

    function getFullParameters() {
        const result = {
            "gridType": { defaultValue: 1, randomFn: null, redraw: onlyOneRedraw("grid"), },
            "gridSize": { defaultValue: 32, randomFn: null, redraw: onlyOneRedraw("grid"), },
            "gridOpacity": { defaultValue: 40, randomFn: null, redraw: onlyOneRedraw("grid"), },
            "width": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
            "height": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
            "seed": { defaultValue: 1, randomFn: () => Math.round(Math.random() * 65536), redraw: allRedraws(), },
            "treeDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
            "stoneDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("stones"), },
            "twigsDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("twigs"), },
            "riverSize": { defaultValue: 3, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
            "roadSize": { defaultValue: 0, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
            "centerRandomness": { defaultValue: 20, randomFn: () => Math.round(30), redraw: onlyOneRedraw("trees"), },
            "leavedTreeProportion": { defaultValue: 95, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
            "treeSize": { defaultValue: 50, randomFn: () => Math.round(30) + Math.round(Math.random() * 40), redraw: onlyOneRedraw("trees"), },
            "treeColor": { defaultValue: 120, randomFn: () => Math.round(Math.random() * 65536), redraw: onlyOneRedraw("trees"), },
            "treeSeparation": { defaultValue: 40, randomFn: () => Math.round(80 + Math.random() * 20), redraw: onlyOneRedraw("trees"), },
            "serrationAmplitude": { defaultValue: 130, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
            "serrationFrequency": { defaultValue: 30, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
            "serrationRandomness": { defaultValue: 250, randomFn: () => Math.round(100), redraw: onlyOneRedraw("trees"), },
            "colorRandomness": { defaultValue: 30, randomFn: () => Math.round(20), redraw: onlyOneRedraw("trees"), },
            "clearings": { defaultValue: 9, randomFn: () => Math.round(3 + Math.random() * 10), redraw: onlyRedrawsAfter("clearings"), },
            "clearingSize": { defaultValue: 30, randomFn: () => Math.round(30 + Math.random() * 20), redraw: onlyRedrawsAfter("clearings"), },
            "treeSteps": { defaultValue: 2, randomFn: () => Math.round(3 + Math.random() * 2), redraw: onlyOneRedraw("trees"), },
            "backgroundNo": { defaultValue: 1, randomFn: null, redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "showColliders": { defaultValue: 0, randomFn: null, redraw: onlyOneRedraw("colliders"), },
            "grassLength": { defaultValue: 85, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "grassDensity": { defaultValue: 120, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "grassSpread": { defaultValue: 45, randomFn: () => Math.round(5 + Math.random() * 25), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
        };
        return result;
    }
There's a lot of value in having all of this in one place. Ordering isn't a problem here, just no need to refactor.


I have seen so much GUI code like this in my career! Real world sophisticated GUIs can have tens or hundreds of attributes to setup. Especially ancient Xlib stuff, this was the norm. You have a few functions with maybe hundreds of lines doing pure GUI setup. No problem -- easy to mentally compartmentalise.

Your deeper point (if I may theorise): Stop following hard-and-fast rules. Instead, do what makes sense and is easy to read and maintain.


> I know how to do it, I just don't always think it's worth it.

Agreed:)

Generally no problem with this method other than it's difficult to at a glance see what each item will get set to. Something like the following might be an easy first step:

    function getFullParameters() {
      function param(defaultValue, redraws) {
        return { defaultValue: defaultValue, randomFn: null, redraws };
      }
      function param(defaultValue, randomFn, redraws) {
        return { defaultValue: defaultValue, randomFn: randomFn, redraws };
      }
      const result = {
        "gridType": param(1, onlyOneRedraw("grid")),
        "gridSize": param(32, onlyOneRedraw("grid")),
        "gridOpacity": param(40, onlyOneRedraw("grid")),
        "width": param(1024, allRedraws()),
        "height": param(1024, allRedraws()),
        "seed": param(1, () => Math.round(Math.random() * 65536), allRedraws()),
        "treeDensity": param(40, () => Math.round(Math.random() * 100), onlyOneRedraw("trees")),
        "stoneDensity": param(40, () => Math.round(Math.random() * 20 * Math.random() * 5), onlyOneRedraw("stones")),
        "twigsDensity": param(40, () => Math.round(Math.random() * 20 * Math.random() * 5), onlyOneRedraw("twigs")),
        "riverSize": param(3, () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, onlyRedrawsAfter("river")),
        "roadSize": param(0, () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, onlyRedrawsAfter("river")),
        "centerRandomness": param(20, () => Math.round(30), onlyOneRedraw("trees")),
        "leavedTreeProportion": param(95, () => Math.round(Math.random() * 100), onlyOneRedraw("trees")),
        "treeSize": param(50, () => Math.round(30) + Math.round(Math.random() * 40), onlyOneRedraw("trees")),
        "treeColor": param(120, () => Math.round(Math.random() * 65536), onlyOneRedraw("trees")),
        "treeSeparation": param(40, () => Math.round(80 + Math.random() * 20), onlyOneRedraw("trees")),
        "serrationAmplitude": param(130, () => Math.round(80 + Math.random() * 40), onlyOneRedraw("trees")),
        "serrationFrequency": param(30, () => Math.round(80 + Math.random() * 40), onlyOneRedraw("trees")),
        "serrationRandomness": param(250, () => Math.round(100), onlyOneRedraw("trees")),
        "colorRandomness": param(30, () => Math.round(20), onlyOneRedraw("trees")),
        "clearings": param(9, () => Math.round(3 + Math.random() * 10), onlyRedrawsAfter("clearings")),
        "clearingSize": param(30, () => Math.round(30 + Math.random() * 20), onlyRedrawsAfter("clearings")),
        "treeSteps": param(2, () => Math.round(3 + Math.random() * 2), onlyOneRedraw("trees")),
        "backgroundNo": param(1, onlyTheseRedraws(["background", "backgroundCover"])),
        "showColliders": param(0, onlyOneRedraw("colliders")),
        "grassLength": param(85, () => Math.round(25 + Math.random() * 50), onlyTheseRedraws(["background", "backgroundCover"])),
        "grassDensity": param(120, () => Math.round(25 + Math.random() * 50), onlyTheseRedraws(["background", "backgroundCover"])),
        "grassSpread": param(45, () => Math.round(5 + Math.random() * 25), onlyTheseRedraws(["background", "backgroundCover"])),
        "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
      };
      return result;
    }
For someone looking at this for the first time, the rationale for each random function choice is obtuse so you might consider pulling out each type of random function into something descriptive like randomIntUpto(65536), randomDensity(20, 5), randomIntRange(30, 70).

Does it add value? Maybe - ask a junior to review the two and see which they prefer maintaining. Regardless, this code mostly exists at a single level of abstraction, which tends to imply simple refactorings rather than complex.

My guess is if this extended to multiple (levels / maps / ?) you'd probably split the settings into multiple functions, one per map right...?


> My guess is if this extended to multiple (levels / maps / ?) you'd probably split the settings into multiple functions, one per map right...?

This was handling ui dependencies for https://ajuc.github.io/outdoorsBattlemapGenerator/

Basically I wanted to redraw as little as possible so I build a dependency graph.

But then I wanted to add more parameters and to group them, so I can have many different kinds of trees without hardcoding their parameters. It was mostly an UI problem, not a refactor problem. So I'm rewriting it like this:

https://ajuc.github.io/kartograf/

Graph editor keeps my dependencies for me, and user can copy-paste 20 different kinds of trees and play with their parameters independently. And I don't need to write any code - a library handles it for me :)

Also now i can add interpolate node which takes 2 configurations and a number and interpolates the result between them. So I can have high grass go smoothly to low grass while trees go from one kind to another.


I am surprised that this is the top answer (Edit: at the moment, was)

How does splitting code into multiple functions suddenly change the order of the code?

I would expect that these functions would be still called in a very specific order.

And sometimes it does not even make sense to keep this order.

But here is a little example (in a made up pseudo code):

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    positiveInt total = 0
    positiveInt max = 0
    for (positiveInti=0; i < values.length; i++) 
      total = total + values[i]
      max = values[i] > max ? values[i] : max
    return total - max
===>

  function positiveInt max(positiveInt[] values)
    positiveInt max = 0
    for (positiveInt i=0; i < values.length; i++) 
      max = values[i] > max ? values[i] : max
    return max

  function positiveInt total(positiveInt[] values)
    positiveInt total = 0
    for (positiveInt i=0; i < values.length; i++) 
      total = total + values[i]
    return total

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    return total(values)-max(values)
Better? No?


> How does splitting code into multiple functions suddenly change the order of the code?

Regardless of how smart your compiler is and all the tricks it pulls to execute the codein much the same order, the order in which humans read the pseudo code is changed

  01. function positiveInt max(positiveInt[] values)
  02.   positiveInt max = 0
  03.   for (positiveInt i=0; i < values.length; i++) 
  04.     max = values[i] > max ? values[i] : max
  05.   return max

  07. function positiveInt total(positiveInt[] values)
  08.   positiveInt total = 0
  09.   for (positiveInt i=0; i < values.length; i++) 
  10.     total = total + values[i]
  11.   return total

  12. function positiveInt calcMeaningOfLife(positiveInt[] values)
  13.   return total(values) - max(values)

Your modern compiler will take care of order in which the code is executed, but as humans need to trace the code line-by-line as [13, 12, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11]. By comparison, the inline case can be understood sequentially by reading lines 01 to 07 in order.

  01. function positiveInt calcMeaningOfLife(positiveInt[] values)
  02.   positiveInt total = 0
  03.   positiveInt max = 0
  04.   for (positiveInt i=0; i < values.length; i++) 
  05.     total = total + values[i]
  06.     max = values[i] > max ? values[i] : max
  07.   return total - max
> Better? No?

In most cases, yeah probably your better off with the two helper functions. max() and total() are common enough operations, and they are named well enough that we can easily guess their intent without having to read the function body.

However, depending on the size of the codebase, the complexity of the surrounding functions and the location of the two helper functions it's easy to see that this might not always be the case.

If you want to try and understand the code for the first time, or if you are trying to trace down some complex bug there's a chance having all the code inline would help you.

Further, splitting up a large inline function is more trivial than reassembling many small functions (hope you got your unit tests!).

> And sometimes it does not even make sense to keep this order.

Agreed. But naming and abstractions are not trival problems. Often times it's the larger/more complex codebases, where you see these practices get applied more dogmatically


Well, inlining by the compiler would be expected but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

Splitting the code into smaller functions does not automatically warrant a better design, it is just one heuristic.

A naive implementation of the principle could perhaps have found a less optimal solution

  function positiveInt max(positiveInt value1, positiveInt value2)
    return value1 > value2 ? value1 : value2

  function positiveInt total(positiveInt value1, positiveInt value2)
    return value1 + value2 

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    positiveInt total = 0
    positiveInt max = 0
    for (positiveInt i=0; i < values.length; i++)
      total = total(total, values[i])
      max = max(max, values[i])
    return total - max
Now this is a trivial example but we can imagine that instead of max and total we have some more complex calculations or even calls to some external system (a database, API etc).

When faced with a bug, I would certainly prefer the refactoring in the GP comment than one here (or the initial implementation).

I think that when inlining feels strictly necessary then there has been problem with boundary definition but I agree that being able to view one single execution path inlined can help to understand the implementation.

I completely agree that naming and abstractions are perhaps two most complicated problems.


> but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

That's the thing, isn't it? Various arguments have been raised all across this thread, so I just want to put a spotlight on this principle, and say:

Myself, based on my prior experience, find code with few larger functions much more readable than the one with lots of small functions. In fact, I'd like a tool that could perform the inlining described by the GP for me, whenever I'm working in a codebase that follows the "lots of tiny functions" pattern.

Perhaps this is how my brain is wired, but when I try to understand unfamiliar code, the first thing I want to know is what it actually does, step by step, at low level, and only then, how these actions are structured into helpful abstractions. I need to see the lower levels before I'm comfortable with the higher ones. That's probably why I sometimes use step-by-step debugging as an aid to understanding the code...


>the first thing I want to know is what it actually does, step by step, at low level

I feel like we might be touching on some core differences between the top-down guys and the bottom-up guys. When I read low level code, what I'm trying to do is figure out what this code accomplishes, distinct from "what it's doing". Once I figure it out and can sum up its purpose in a short slogan, I mentally paper over that section with the slogan. Essentially I am reconstructing the higher level narrative from the low level code.

And this is precisely why I advocate for more abstractions with names that describe its behavior; if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code. I wonder if how the bottom-up guys understand code is substantially different? Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code, or does your mental model mostly hold on to the low level detail even when reasoning about the high level?


> Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code,

It does!

> or does your mental model mostly hold on to the low level detail even when reasoning about the high level?

It does too!

What I mean is, I do what you described in your first paragraph - trying to see happens at the low level, and build up some abstractions/narrative to paper it over. However, I still keep the low-level details in the back of my mind, and they inform my reasoning when working at higher levels.

> if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code

I feel the same way. I'm really grateful for good abstractions, clean structure and proper naming. But I naturally tend to not take them at face value. That is, I'll provisionally accept the code is what it says it is, but I feel much more comfortable when I can look under the hood and confirm it. This practice of spot-checking implementation saved me plenty of times from bad naming/bad abstractions, so I feel it's necessary.

Beyond that, I generally feel uncomfortable about code if I can't translate it to low-level in my head. That's the inverse of your first paragraph. When I look at high-level code, my brain naturally tries to "anchor it to reality" - translate it into something at the level of sequential C, step through it, and see if it makes sense. So for example, when I see:

  foo = reduce(map(bar, fn), fn2)
My mind reads it as both:

- "Convert items in 'bar' via 'fn' and then aggregate via 'fn2'", and

- "Loop over 'bar', applying 'fn' to each element, then make an accumulator, initialize it to first element of result, loop over results, setting the accumulator to 'fn2(accumulator, element)', and return that - or equivalent but more optimized version".

To be able to construct the second implementation, I need to know how 'map' and 'reduce' actually work, at least on the "sequential C pseudocode" level. If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code. Like floating above the cloud cover, not knowing where I am. I can still work like this, I just feel very insecure.

One particular example I remember: I was very uncomfortable with Prolog when I was learning it in university, until one day I read a chapter about implementing some of its core features in Lisp. When I saw how Prolog's magic works internally, it all immediately clicked, and I could suddenly reason about Prolog code quite comfortably, and express ideas at its level of abstraction.

One side benefit of having a simultaneous high and low-level view is, I have a good feel about the lower bound of performance of any code I write. Like in the map/reduce example above: I know how map and reduce are implemented, so I know that the base complexity will be at least O(n), how complexity of `fn` and `fn2` influence it, how the data access pattern will look like, how memory allocation will look like, etc.

Perhaps performance is where my way of looking at things comes from - I started programming because I wanted to write games, so I was performance-conscious from the start.


>If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code.

This is probably the biggest difference with myself. If I have a clear concept of how the abstractions operate in the context of the related abstractions and the big picture, I feel perfectly comfortable not knowing the details of how it gets done at a lower level. To me, the details just get in the way of comprehending the big picture.


A common problem with code written like that is checking the same preconditions repeatedly (or worse - never) and transforming data one way and back for no reason. I remember a bug I helped fix a fresh graduate that joined our project. It crashed with NPE when a list was empty. It's weird cause empty list should cause IndexOutOfBound if anything and the poor guy was stumped.

I looked at call stack and we got list as input then it was changed to null if it was empty then it was checked for size and in yet another function it was dereferenced and indexed.

Guy was trying to fix it by adding yet another if then else 5 levels in callstack below the first time it was checked for size. No doubt then another intern would have added even more checks ;)

If you don't know what happens to your data in your program you're doing voodoo programming.


There's certainly some difference in priorities between massive 1000-programmer projects where complexity must be aggressively managed and, say, a 3-person team making a simple web app. Different projects will have a different sweet spot in terms of structural complexity versus function complexity. I've seen code that, IMO, misses the sweet spot in either direction.

Sometimes there is too much code in mega-functions, poor separation of concerns and so on. These are easy mistakes to make, especially for beginners, so there are a lot of warnings against them.

Other times you have too many abstractions and too much indirection to serve any useful purpose. The ratio of named things, functional boundaries, and interface definitions to actual instructions can easily get out of hand when people dogmatically apply complexity-managing patterns to things that aren't very complex. Such over-abstraction can fall under YAGNI and waste time/$ as the code becomes slower to navigate, slower to understand in depth, and possibly slower to modify.

I think in software engineering we suffer more from the former problem than the latter problem, but the latter problem is often more frustrating because it's easier to argue for applying nifty patterns and levels of indirection than omitting them.

Just for a tangible example: If I have to iterate over a 3D data structure with an X Y and Z dimension, and use 3 nested loops to do so, is that too complex a function? I'd say no. It's at least as clear without introducing more functional boundaries, which is effort with no benefit.


Well named functions are only half (or maybe a quarter) of the battle. Function documentation is paramount in complex codebases, since documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).

Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.


> documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.


What's the alternative? Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

You can argue that there should be separate functions for `send_to_database` and `lock_database` and `format_data_for_database` and `handle_db_error`. But you're still going to have to document the same stuff. You're still going to have to remind people to lock the database in some situations. You're still going to have to worry about people forgetting to call one of those functions.

And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.


> Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

Invert the dependencies. After many years of programming I started deliberately asking myself "hmm, what if, instead of A calling B, B were to call A?" and now it's become part of my regular design and refactoring thinking. See also Resource Acquisition Is Initialization.


> See also Resource Acquisition Is Initialization.

I'm not sure I follow. RAII removes the ability to accidentally forget to call destruction/initialization code and allows managing resource lifecycle. It doesn't remove the need to document how that code works, it just means you're now documenting it as part of the class/block. Freeing a resource during a destructor, locking the database during a constructor -- that stuff still has to be documented the same way it would have been documented if you put it into a single function instead of a single class.

Even with dependency inversion, you still end up eventually with the same problem I brought up:

> And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.

Maybe you call your functions in a different order or way, maybe you invert the dependency chain so your smaller functions are getting passed references to the bigger ones. You're still running the same amount of code, you haven't gotten rid of your documentation requirements.

Unless I'm misunderstanding what you mean by inversion of dependencies. Most of the dependency inversion systems I've seen in the wild increase the number of interfaces in code because they're trying to reduce coupling, which in turn increases the need to document those interfaces. But it's possible I've only seen a subset, or that you're doing something different.


> increase the number of interfaces in code because they're trying to reduce coupling

Yes, exactly! You want lots of interfaces. You want very small interfaces.

> which in turn increases the need to document those interfaces.

Not if the interfaces are small. For example, in the Go language standard library we find two interfaces: io.Reader and io.Writer. They each define a single method. In the case of io.Reader, that method is defined as Read(p []byte) (n int, err error) and correspondingly io.Writer has Write(p []byte) (n int, err error)

These interfaces are so small they barely need documentation.


> These interfaces are so small they barely need documentation.

Sort of.

On the other end of the dependency inversion chain, there is some code that implements those interfaces. That code comes with various caveats that need to be documented.

Then there's the glue code, the orchestration - the part that picks a concrete thing, makes it conform to a desired interface, and passes it to the component which needs it. In order to do its job correctly, this orchestrating code needs to know all the various caveats of the concrete implementation, and all the idiosyncratic demands of the desired interface. When writing this part you may suddenly discover that your glue code is buggy, because the "trivial" interface was thoroughly undocumented.


My style is similar about tiny interfaces: My usual style in Java is an interface with a single method and a nested POJO (struct) called Result. Then, I have a single implementation in production, and another implementation for testing (mocking in 2010s forward). Some of my longer lived projects might have 100s of these after a few years.

Please enjoy this silly, but illustrative example!

public interface HerdingCatsService {

    /*public static*/ final class Result {
        ...
    }

    Result herdThemCats(ABunchOfCats soMuchFun)
    throws Exception;
}


Yikes, I hope I don't have to read documentation to understand how the code deals with failed records or other use cases. Good code would have the use cases separated from the send_records_to_database so it would be obvious what the records were and how failure conditions are handled.


How else are you going to understand how a library works besides RTFM or RTFC? I guess the third option is copy pasta from stack overflow and hope your use case doesn't require any significant deviation?

You seriously never have to read documentation?

Must be nice, I've been balls-deep in GCP libraries and even simple things like pulling from a PubSub topic have footguns and undocumented features in certain library calls. Like subscriber.subscribe returns a future that triggers a callback function for each polled message, while subscriber.pull returns an array of messages.

That's a pretty damn obvious case where functions should have been named "obviously" (pull_async, pull_sync), yet they weren't. And that's from a very widely used service from one of the biggest tech companies out there, written by a person that presumably passed one of the hardest interviews in the industry and gets paid in the top like 1% of developer.

Without documentation, I would have never figured those out.


"Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design."

This, this, and... this.

Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.


Documentation is only useful it is up to date and correct. I ignore documentation because I've never found the above are true.

There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)


Oh I agree, but a person who won't take the time to update documentation after a significant change, certainly isn't going to refactor the code such that the method name matches the updated functionality. Assuming they can even update the name if they wanted to.

After all, documentation is cheap. If you're going to write a commit message, why not also update the function docs with pretty much the same thing? "Filename parameter will now use S3 if an appropriate URI is passed (i., filename='s3://bucket/object/path.txt'). Note: doesn't work with path-style URLs."


Ignore, as in you don't write any?


> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects.

Code telling a story is a fallacy that programmers keep telling themselves and which fails to die. Code doesn't tell stories, programmers do. Code can't explain why it exists; it can't tell you about the buggy API it relies on and which makes its implementation weird and not straight-forward; it can't say when it's no longer needed.

Good names are important, but it's false that having well-chosen function and arguments names will tell a programmer everything they need to know.


>Code doesn't tell stories, programmers do. Code can't explain why it exists;

Code can't tell every relevant story, but it can tell a story about how it does what it does. Code is primarily written for other programmers. Writing code in such a way that other people with some familiarity with the problem space can understand easily should be the goal. But this means telling a story to the next reader, the story of how the inputs to some functional unit are translated into its outputs or changes in state. The best way to explain this to another human is almost never the best way to explain it to a computer. But since we have to communicate with other humans and to the computer from the same code, it takes some effort to bridge the two paradigms. Having the code tell a story at the high level by way of the modules, objects and methods being called is how we bridge this gap. But there are better and worse ways to do this.

Software development is a process of translating the natural language-spec of the system into a code-spec. But you can have the natural language-spec embedded in the structure of the code to a large degree. The more, the better.


Code is not primarily written for other programmers. It's written for the computer, the primary purpose is to tell the computer what to do. Readability is desirable, but inherently secondary to that concern, and abstraction often interferes with your ability to understand and express what is actually happening on the silicon - even if it improves your ability to communicate the abstract problem. Is that worth it? It's not straightforward.

An overemphasis on readability is how you get problems like "Twitter crashing not just the tab but people's entire browser for multiple years". Silicon is hard to understand, but hiding it behind abstractions also hides the fundamental territory you're operating in. By introducing abstractions, you may make high-level problems easier to tackle, but you make it much harder to tackle low-level problems that inevitably bubble up.

A good symptom of this is that the vast majority of JS developers don't even know what a cache miss is, or how expensive it is. They don't know that linearly traversing an array is thousands of times faster than linearly traversing a (fragmented) linked list. They operate in such an abstract land that they've never had to grapple with the actual nature of the hardware they're operating on. Performance issues that arise as a result of that are a great example of readability obscuring the fundamental problem.


>Code is not primarily written for other programmers.

I should have said code should be written primarily for other programmers. There are an infinite number of ways to express the same program, and the computer is indifferent to which one it is given. But only a select few are easily understood by another human. Code should be optimized for human readability barring overriding constraints. Granted, in some contexts efficiency is more important than readability down the line. But such contexts are few and far between. Most code does not need to consider the state of the CPU cache, for example.


Joel Spolsky opened my eyes to this issue: Code is read more than it is written. In theory, code is written once (then touched-up for bugs). For 99.9% its life, it is read-only. That is a strong case for writing readable code. I try to write my code so that a junior hire can read and maintain it -- from a technical view. (They might be clueless about the business logic, but that is fine.) Granted, I am not always successful in this goal!


Code should be written for debugability, not readability. I don't care if it takes someone 20 minutes to understand my algorithm, if when they understand it bugs become immediately obvious.

Most simplification added to your code obscures the underlying operations on the silicon. It's like writing a novel so a 5-year-old can read it, versus writing a novel for a 20-year-old. You want to communicate the same ideas? The kid's version is going to be hundreds of times longer. It's going to take longer to write, longer to read, and you're much more likely to make mistakes related to non-local dependencies. In fact, you're going to turn a lot of local dependencies into non-local dependencies.

Someone who's competent can digest much more complex input, so you can communicate a lot more in one go. Training wheels may make it so anyone can ride your bike but they also limit your ability to compete in, say, the Tour de France.

Also, this is a side note, but "code is read by programmers" is a bit of a platitude IMO - it's wordplay. Your code is also read by the computer a lot more than it's read by other programmers. Keep your secondary audience in mind, but write for your primary audience.


My point was not just about performance - a lot of bugs come from the introduction of abstractions to increase readability, because the underlying algorithms are obscured. Humans are just not that good at reading algorithms. Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem. Every time you add an abstraction, you increase the degree of misrepresentation. You can argue that's worth it because code is read a lot, but it's still a tradeoff.

But another point worth considering is that a lot of things that make code easier to read make it much harder to rewrite, and they can also make it harder to debug.


>Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem.

Do you have an example, as this is entirely counter to my experience. Of course, you can misrepresent the behavior in words, but then you just used the wrong words to describe what's going on. That's not an indictment of abstraction generally. Abstractions necessarily leave something out, but what is left out is not an assertion of absence. This is not a misrepresentation.


Let me try explaining a few ways:

1. ---

You don't need to assert absence, the abstraction inherently ignores that which is left out, and the reader remains ignorant of it (that's the point, in fact). The abstraction asserts that the information it captures is the most useful information, and arguably it asserts that it is the only relevant information. This may be correct, but it may also be wrong. If it's wrong, any bugs that result will be hard to solve, because the information necessary to understand how A links to B is deliberately removed in the path from A to B.

2. ---

An abstraction is a conceptual reformulation of the problem. Each layer of abstraction reformulates the problem. It's lossy compression. Each layer of abstraction is a lossy compression of a lossy compression. You want to minimise the layers because running the problem through multiple compressors loses a lot information and obscures the constraints of the fundamental problem.

3. ---

You don't know a-priori if the information your abstraction leaves out is important.

I would go further and argue: leaving out the wrong information is usually a disaster, and very hard to reverse. One way to avoid this is to avoid abstractions (not that I'd recommend it, but it's part of the tradeoff).

4. ---

Abstractions misrepresent by simplifying. For example, the fundamental problem you're solving is moving electrons through wires. There are specific problems that occur at that level of specificity which you aren't worried about once you introduce the abstraction of the CPU's ISA. For example, bit instability.

Do those problems disappear at the level of the ISA? No, you've just introduced an abstraction which hides them, and hopefully they don't bubble up. The introduction of that abstraction also added overhead, partly in order to ensure the lower-level problems don't bubble up.

Ok, let's go up a few levels. You're now using a programming language. One of your fundamental problems here is cache locality. Does your code trigger cache misses? Well, it's not always clear, and it becomes less clear the more layers of abstraction you add.

"But cache locality rarely matters," ok, but sometimes it does, and if you have 10 layers of abstraction, good luck solving that. Can you properly manage cache locality in Clojure? Not a chance. It's too abstract. What happens when your Clojure code is eventually too slow? You're fucked. The abstraction not only makes the problem hard to identify, it makes it impossible to solve.


Abstractions are about carving up the problem space into conceptual units to aid comprehension. But these abstractions do not suggest lower level details don't exist. What they do is provide sign posts from which one can navigate to the low level concern of interest. If I need to edit the code that reads from a file, ideally how the problem space is carved up allows me to zero-in on the right code by allowing me to eliminate irrelevant code from my search. It's a semantic b-tree search. Without this tower of abstractions, you have to read the entire codebase linearly to find the necessary points to edit. There's no way you can tell me this is more efficient.

Of course, not all problems are suited to this kind of conceptual division. Cross-cutting concerns are inherently the sort that cannot be isolated in a codebase. Your example of cache locality is case in point. You simply have to scan the entire codebase to find instances where your code is violating cache locality. Abstractions inherently can't help, and do hurt somewhat in the sense that there's more code to read. But the benefits overall are worth it in most contexts.


I feel like you didn't really engage with most of what I said. It sounds like you're repeating what you were taught as an undergraduate (I hope that doesn't come across as crass).

I understand the standard justifications for abstraction - I'm saying: I have found that those justifications do not take into account or accurately describe the problems that result, and they definitely underestimate the severity. Repeatedly changing the shape of a problem until it is unrecognisable results in a monster, and it's not as easy to tame as our CS professors make out.

To reiterate: Twitter, with a development budget of billions was crashing people's entire browsers for multiple years. That's not even server-side, where the real complexity is - that's the client. That kind of issue simply should not exist, and it wouldn't if it were running on a (much) shallower stack.

This is a side note, but you keep referencing the necessity of the tower. Bear in mind what happens when you increase the branching factor on a tree. You don't need a tower to segment the problem effectively. 100-item units allow segmenting one million items with three layers, and 10 billion items with five. Larger units mean much, much fewer layers.


>I feel like you didn't really engage with most of what I said.

I didn't engage point-by-point because I strongly disagree with how you characterize abstractions and going point-by-point seemed like overkill. They don't misrepresent--they carve up. If you take the carving at a given layer as all there is to know, the mistake is yours. And this isn't something I was taught in school, rather I converged to this style of programming independently. My CS program taught CS concepts, we were responsible for discovering how to construct programs on our own. Most of the students struggled to complete moderately large assignments. I found them trivial, and I attribute this to being able to find the right set of abstractions for the problem. Find the right abstractions, and the mental load of the problem is never bigger than one moderately sized functional unit. This style of development has served me very well in my career. You will be hard-pressed to talk me out of it.

>Repeatedly changing the shape of a problem until it is unrecognisable results in a monster

I can accept some truth to this in low-level/embedded contexts where the "shape" of the physical machine is a relevant factor and so hiding this shape behind a domain-specific abstraction can cause problems. But most software projects can ignore the physical machine and program to a generic Turing-machine.

>You don't need a tower to segment the problem effectively

Agreed. Finding the right size of the functional units is critical. 100 interacting units is usually way too much. The right size for a functional unit is one where you can easily inspect it for correctness and be confident there are no bugs. As the functional unit gets larger, your ability to even be confident (let alone correct) falls off a cliff. A good set of abstractions is one where (1) the state being manipulated is made obvious at all times, (2) each functional unit is sized such that it can easily be inspected for correctness, and (3) each layer provides a non-trivial increase in resolution of the solution. I am as much against useless abstractions and endless indirection as anyone.


I don't think we're going to agree on this, so I'll just say that I do grok the approach you're advocating, I used to think like you, and I've deliberately migrated away from it. I used to chunk everything into 5ish-line functions that were very clean and very carefully named, being careful to encapsulate with clean objects with clearly-defined boundaries, etc. I moved away from that consciously.

I don't work in low-level or embedded (although I descend when necessary). My current project is a desktop accessibility application.

Like, I can boil a lot of our disagreement down to this:

> 100 interacting units is usually way too much.

I don't think this is true. It's dogma.

First, they aren't all interacting. Lines in a function don't interact with every other line (although you do want to bear in mind the potential combinatorial complexity for the reader). But more specifically: 100-line functions are absolutely readable most of the time, provided they were written by someone talented. The idea that they aren't is... Wrong, in my opinion. And they give you way more implementation flexibility because they don't force you into a structure defined by clean barriers. They allow you to instead write the most natural operation given the underlying datastructure.

Granted, you often won't be able to unit-test that function as easily, but unit tests are not the panacea everyone makes out, in my opinion. Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.


> 100-line functions are absolutely readable most of the time, provided they were written by someone talented.

Readable, sure. Easily inspected for correctness, not in most cases. The 100 lines won't all interact, but you don't know this until you look. So much mental effort is spent navigating the 100 lines to match braces, find where variables are defined, where they are in scope, and whether they are mutated elsewhere within the function, comprehend how state changes as the lines progress, find where errors can occur and ensure they are handled within the right block and that control flow continues or exits appropriately, and so on. So little of this is actually about understanding the code's function, its about comprehending the incidental complexity due to its linear representation. This is bad. All of this incidental complexity makes it harder to reason about the code's correctness. Most of these incidental concerns can be eliminated through the proper use of abstractions.

The fact is, code is not written linearly nor is it executed linearly. Why should it be read linearly? There is a strong conceptual mismatch between how code is represented as linear files and its intrinsic structure as a DAG. Well structured abstractions help us move the needle of representation towards the intrinsic DAG structure. This is a win for comprehension.

>Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.

We do agree on something!


Honestly, this characterisation doesn't ring true to me at all. I find long functions much easier to read, inspect and think about than dutifully decomposed lasagne that forces me to jump around the codebase. But also, like... Scanning for matching braces? Who is writing your code? Indentation makes that extremely clear. And your IDE should have a number of tools for quickly establishing uses of a name, and scope.

The older I get, the more I think the vast majority of issues along the lines of "long code is hard to reason about" are just incompetent programmers being let loose on the codebase. Comment rot is another one - who on earth edits code without checking and modifying the surrounding comments? That's not an inherent feature of programming to me, it's crazy. However, I absolutely see comment rot in lasagne code - because the comments aren't proximate to the algorithm.

With regards to the idea that abstractions inherently misrepresent, I'll defer to Joel Spolsky for another point:

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...


> Code doesn't tell stories, programmers do

It is like saying the books do not tell stories, writers do.


It is, but GP's point is pretty clear. Perhaps a better way to express it would be: unlike natural languages, programming languages are insufficiently expressive for the code to tell the full story. That's why books tell stories, and code is - at best - Cliff's Notes.


Is code just a byproduct of specs then? Any thoughts on literate programming?


Literate programming is for programs that is static and don't ever change much. Works great for those cases though.

No, what works is the same that worked 20 years ago. Nothing have truly changed. You still have layers upon layers, that sometimes pass something, othertimes not, and you sometimes wished it passed something, othertimes not.


Your argument falls apart once you need to actually debug one of these monstrosities, as often the bug itself also gets spread out over half a dozen classes and functions, and it's not obvious where to fix it.

More code, more bugs. More hidden code, more hidden bugs. There's a reason those who have worked in software development longer tend to prefer less abstraction: most of them are those who have learned from their experiences, and those who aren't are "architects" optimising for job security.


If a function is only called once it should just be inline, the IDE can collapse. A descriptive comment can replace the function name. It can be a lambda with immediate call and explicit captures if you need to prevent the issue of not knowing which local variables it interacts with as the function grows significantly, or if the concern is others using leftover variables its own can go into a plain scop e. Making you have to jump to a different area of code to read just breaks up linear flow for no gain, especially when you often have to read it anyway to make sure it doesn't have global side effects, might as well read it in the single place it is used.

If it is going to be used more than once and is, then make a function (unless it is so trivial the explicit inline is more readable). If you are designing a public API where it may need to be overridden count it as more than once.

Some of the above is language dependent.


I don't get this. This is literally what the 'one level of abstraction' rule is for.

If you can find a good name for a piece of code I don't need to read in detail, why do you want to make me skip from line 1458 to line 2345 to skip over the details of how you do that thing? And why would you add a comment on it instead of making it a function that is appropriately named and I don't have to break my reading flow to skip over a horrendously huge piece of code?


> why do you want to make me skip from line 1458 to line 2345 to skip over the

You should be using an editor that can jump to the matching brace if it is all in its own scope or lambda. There are other tools like #pragma region depending on language. For a big function of multiple large steps and I only wanted to look at a part of it I'd fold it at the first indent level for an overview and unfold the parts I want to look at. But when I'm reading through the whole thing or stepping through in the debugger it is terrible to make you jump around and needs much more sophisticated tooling to jump to the right places consistently in complicated languages like C++.

If there is a big long linear sequence of steps that you need to read, you just want to read it, not jump around because someone wanted to put a descriptive label over the steps. Just comment it, that's the label, not the function name, since it's only ever used once.

You would rarely want it in something like a class overview since it is only called once, but if you could make a case for needing that, profiling tools are limited to it, etc., then those could be reasons.


My editor can fold/jump, no issues there. Though to be fair vi can easily do it for languages that use {} for blocks which Python is not for example. But it breaks my flow nonetheless. Instead, if I had a function my reading flow is not broken. I can skip over the details of how "calculateX()" is achieved. All I need to know at the higher level of abstraction is that in order for this (hypothetical scenario) piece of code to do its thing is that in that step it needs to calculate X and I can move on and see what it does with X. It is not important how X is calculated. If calculateX() was say a UUIDv4 calculation, would you want to inline that or just call "uuid.v4()" and move on to the next line that does something interesting with that UUID now?

You mention debuggers too. Here I can't jump easily. I still can jump in various ways depending on your tooling but again it is made harder. With proper levels of abstraction I can either step over the function call because I don't care _how_ calculateX() is done or I can step in and debug through that because I've narrowed it down to something being wrong in said function somewhere.

Maybe you've just never had a properly abstracted code base (none are perfect of course but there's definitely good and bad ones). Code can either make good use of these different levels of abstraction as per Clean Code or it can throw in functions nilly willy, badly named, with global state manipulations all over the place for good measure, side effects from functions that don't seem like they would etc. If those are the only code bases you've worked with I would understand your frustration. Still I'd rather move towards a properly structured and abstracted code base than to inline everything and land in code duplication hell.


Visual debuggers also let you click to where you want to skip through but I agree step over with a hotkey is faster.

Letting the debugger skip over it can be done with with the immediately called lambda approach.


> The best code is code you don't have to read because of well named functional boundaries.

I don't know which is harder. Explaining this about code, or about tests.

The people with no sense of DevX see nothing wrong with writing tests that fail as:

    Expected undefined to be "foo"
If you make me read the tests to modify your code, I'm probably going to modify the tests. Once I modify the tests, you have no idea if the new tests still cover all of the same concerns (especially if you wrote tests like the above).

Make the test red before you make it green, so you know what the errors look like.


Oh god. Or just the tests that are walls of text, mixes of mocks and initializers and constructors and method calls.

Like good god, extract that boiler plate into a function. Use comments and white space to break it up and explain the workflow.


I have a couple people who use a wall of boiler plate to do something 3 lines of mocks could handle, and not couple the tests to each other in the process.

Every time I have to add a feature I end up rewriting the tests. But you know, code coverage, so yay.


I see this with basically any Javascript test. Yes, mocking any random import is really cool and powerful, but for fucks sake, can we just use a DI container so that the tests don’t look like satans’ invocation.


> Make the test red before you make it green, so you know what the errors look like.

Oh! I like this. I never considered this particular reason why making tests fail first might be a good idea.


“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” ― C. A. R. Hoare

this quote scales


This quote does not scale. Software contains essential complexity because it was built to fulfill a need. You can make all of the beautiful, feature-impoverished designs you want - they won't make it to production, and I won't use them, because they don't do the thing.

If your software does not do the thing, then it's not useful, it's a piece of art - not an artifact of software engineering that is meant to fulfill a purpose.


But not everybody codes “at scale”. If you have a small, stable team, there is a lot less to worry about.

Secondly it is often better to start with less abstractions and boundaries, and add them when the need becomes apparent, rather than trying to remove ill conceived boundaries and abstractions that were added at earlier times.


Coding at scale is not dependent on the number of people, but on the essential complexity of the problem. One can fail at a one-man project due to lack of proper abstraction with a sufficiently complex problem. Like, try to write a compiler.


> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

That's fine in theory and I still sort-of believe that, but in practice, I came to believe most programming languages are insufficiently expressive for this vision to be true.

Take, as a random example, this bit of C++:

  //...
  const auto foo = Frobnicate(bar, Quuxify);
Ok, I know what Frobnification is. I know what Quuxify does, it's defined a few lines above. From that single line, I can guess it Frobs every member of bar via Quuxify. But is bar modified? Gotta check the signature of Frobnicate! That means either getting an IDE help popup, or finding the declaration.

  template<typename Stuffs, typename Fn>
  auto Frobnicate(const std::vector<Stuffs>&, Fn)
    -> std::vector<Stuffs>;
From the signature, I can see that bar full of Bars isn't going to be modified. But then I think, is foo.size() going to be equal to bar.size()? What if bar is empty? Can Frobnicate throw an exception? Are there any special constraints on the function Fn passed to it? Does Fn have to be a funcallable thing? Can't tell that until I pop into definition of Frobnicate.

I'll omit the definition here. But now that I see it, I realize that Fn has to be a function of a very particular signature, that Fn is applied to every other element of the input vector (and not all of them, as I assumed), that the code has a bug and will crash if the input vector has less than 2 elements, and it calls three other functions that may or may not have their own restrictions on arguments, and may or may not throw an exception.

If I don't have a fully-configured IDE, I'll likely just ignore it and bear the risk. If I have, I'll routinely jump-to-definition into all these functions, quickly eye them for any potential issues... and, if I have the time, I'll put a comment on top of Frobnicate declaration, documenting everything I just learned - because holy hell, I don't want to waste my time doing the same thing next week. I would rename the function itself to include extra details, but then the name would be 100+ characters long...

Some languages are better at this than others, but my point is, until we have programming languages that can (and force you to) express the entire function contract in its signature and enforce this at compile-time, it's unsafe to assume a given function does what you think it does. Comments would be a decent workaround, if most programmers could be arsed to write them. As it is, you have to dig into the implementation of your dependencies, at least one level deep, if you want to avoid subtle bugs creeping in.


This is a good point and I agree. In fact, I think this really touches on why I always had a hard time understanding C++ code. I first learned to program with C/C++ so I have no problem writing C++, but understanding other people's code has always been much more difficult than other languages. Its facilities for abstraction were (historically) subpar, and even things like aliased variables where you have to jump to the function definition just to see if the parameter will be modified really get in the way of easy comprehension. And then the nested template definitions. You're right that how well relying on well named functional boundaries works depends on the language, and languages aren't at the point where it can be completely relied on.


This is true but having good function names will at least help you avoid going two levels deep. Or N levels. Having a vague understanding of a function call’s purpose from its name helps because you have to trim the search tree somewhere.

Though, if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?


> having good function names will at least help you avoid going two levels deep. Or N levels.

I agree. You have to trim your search space, or you'll never be able to do anything. What I was trying to say is, I don't know of the language that would allow you to only ever rely on function names/signatures. None that I worked could do that in practice.

> if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?

That's the reason I hate the "Clean Code"-ish pattern of lots of very tiny functions. I worked in a codebase written in this style, and doing anything with it felt like it was 90% jumping around function definitions, desperately trying to keep them all in my working memory.


I think part of the problem is imitating having abstraction boundaries without actually doing the work to make a clean abstraction. If you’re reading the source code of a function, the abstraction is failing.

The function calls you write will often “know too much,” depending on implementation details in a way that make the implementation harder to change. It’s okay if you can fix all the usages when needed.

Real abstraction boundaries are expensive and tend only to be done properly out of necessity. (browser API’s, Linux kernel interface.) If you’re reading a browser implementation instead of standards docs to write code then you’re doing it wrong since other browsers, or a new version of the same browser, may be different.

Having lots of fake abstraction boundaries adds obfuscation via indirection.


One more angle: reliable & internalized abstraction vs unfamiliar one.

Java string is abstraction over bytes. I feel I understand it intimately even though I have not read the implementation.

When I try to understand code fully (searching for root cause), and there is String.format(..), I don't dig deeper into string - I already am confident that I understand what that line does.

Browser and linux api I guess would fall into same category (for others).

Unfamiliar abstraction even with seemingly good naming and documentation, will not cause same level of confidence. (I trust unfamiliar abstraction naming&docs the same way I trust weather forecast)


I think it may be harder still: typically, when writing against a third-party API, I usually consult that API's documentation. The documentation thus becomes a part of the abstraction boundary, a part that isn't expressed in code.


Oh definitely. And then there are performance considerations, where there are few guarantees and nobody even knows how to create an implementation-independent abstraction boundary.


Function names are comments, and have similar failure modes.


Comments that are limited to only a 2 or 3 dozen characters at most, so worse than comments ime.


You can put your prose at the top of the function if you really need to explain it more. :)


But it's easier to notice they're outdated, because you don't see them only when looking at the implementation.


> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

Which is often unavoidable, many functions are insufficiently explained by those alone unless you want four-word camelcase monstrosities for names. The code of the function should be right-sized. Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger. I work on compilers, query processors and compute engines, cognitive load from the subject domains are bad enough without making the code arbitrarily shaped.

[edit] oh yes, what jzoch says below. Locality helps with taming the network of complexity between functions and data.

[edit] oh no, here come the downvotes!


> ...many functions are insufficiently explained by [naming and set of arguments] alone unless you want four-word camelcase monstrosities for names.

Come now, is four words really all that "monstrously" much?

> The code of the function should be right-sized.

Feels like that should go for its name too.

> Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger.

The longer the code, the longer the name?


Quite a bit of sentiment around against long names, I personally am fine with them up to about 30-35 chars or so, then they start to really intrude. Glad you’re not put off by choosing function over form!


Stretch it to 36, and that is four not-all-that short words, at 4× 9 = 36 letters. Form and function! :-)

So it gets monstrous only from five words upwards or so... But still, I think I may by sheer coincidence have come up with a heuristic (that I'm somewhat proud of): The more convoluted the logic = the longer the code needed to express it ==> the longer a name it "deserves".


I think we need to recognize the limits of this concept. To reach for an analogy, both Dr. Seuss and Tolstoy wrote well but I'd much rather inherit source code that reads like 10 pages of the former over 10 pages of the latter. You could be a genuine code-naming artist but at the end of the day all I want to do is render the damn HTML.


> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.

Additionally, I have found that function names become outdated at about the same rate as comments do. If the common criticism of code commenting is that "comments are code you don't run", function names also fall into that category.

I don't have a universal rule on this, I think that managing code complexity is highly application-dependent, and dependent on the size of the team looking at the code, and dependent on the age of the code, and dependent on how fast the code is being iterated on and rewritten. However, in many cases I've started to find that it makes sense to inline certain logic, because you get rid of the risk of names going out of date just like code comments, and you remove any ambiguity over what the code actually does. There are some other benefits as well, but they're beyond the scope of the current conversation.

Perfect abstractions are relatively rare, so in instances where abstractions are likely to be very leaky (which happens more often than people suspect), it is better to be extremely transparent about what the code is doing, rather than hiding it behind a function name.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

I'll also push back against this line of thought. The sum total of possible interactions do not decrease when you move code out into a separate function. The same number of lines of code still get run, and each line carries the same potential to have a bug. In fact, in many cases, adding additional interfaces between components and generalizing them can increase the number of code paths and potential failure points.

If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

What I've come to understand is that complexity is relative. A solution that makes a codebase less complex for one person in an organization may make a codebase more complex for someone else in the organization who has different responsibilities over the codebase.

If you are building an application with a large team, and there are clear divisions of responsibilities, then functional boundaries are very helpful because they hide the messy details about how low-level parts of the code work.

However, if you are responsible for maintaining both the high-level and low-level parts of the same codebase, than separating that logic can sometimes make the program harder to manage, because you still have to understand how both parts of the codebase work, but now you also have understand how the interfaces and abstractions between them fit together and what their limitations are.

In single-person projects where I'm the only person touching the codebase I do still use abstractions, but I often opt to limit the number of abstractions, and I inline code more often than I would in a larger project. This is because if I'm the only person working on the code, I need to be able to hold almost the entire codebase in my head at the same time in order to make informed architecture decisions, and managing a large number of abstractions on top of their implementations makes the code harder to reason about and increases the number of things I need to remember. This was a hard-learned lesson for me, but has made (I think) an observable difference in the quality and stability of the code I write.


>> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

> This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.

Both of these things are not quite right. Yes, if you have to dig into the details of a function to understand what it does, it hasn't been explained well enough. No, the prototype cannot contain enough information to explain it. No, you shouldn't look at the implementation either - that leads to brittle code where you start to rely on the implementation behavior of a function that isn't part of the interface.

The interface and implementation of a function are separate. The former should be clearly-documented - a descriptive name is good, but you'll almost always also need docstrings/comments/other documentation - while you should rarely rely on details of the latter, because if you are, that usually means that the interface isn't defined clearly enough and/or the abstraction boundaries are in the wrong places (modulo things like looking under the hood to refactor, improve performance, etc - all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).

> If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.

This - this is what everyone who advocates for "small functions" doesn't understand.


> all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).

I think this gets back to the old problem of "documentation is code that doesn't run." I'm not saying get rid of documentation -- I comment my code to an almost excessive degree, because I need to be able to remember in the future why I made certain decisions, I need to know what the list of tradeoffs were that went into a decision, I need to know if there are any potential bugs or edge-cases that I haven't tested for yet.

But what I am saying is that it is uncommon for a interface to be perfectly documented -- not just in code I write, but especially in 3rd-party libraries. It's not super-rare for me to need to dip into library source code to figure out behaviors that they haven't documented, or interfaces that changed between versions and aren't described anywhere. People struggle with good documentation.

Sometimes that's performance: if a 3rd-party library is slow, sometimes it's because of how it's implemented. I've run into that with d3 addons in the past, where changing how my data is formatted results in large performance gains, and only the implementation logic revealed that to me. Is that a leaky abstraction? Sure, I suppose, but it doesn't seem to be uncommon. Is it fragile? Sure, a bit, but I can't release charts that drop frames whenever they zoom just because I refuse to pay attention to the implementation code.

So I get what you're saying, but to me "abstractions shouldn't be leaking" is a bit like saying "code shouldn't have bugs", or "minor semvar increases should have no breaking changes." I completely agree, but... it does, and they do. Relying on undocumented behavior is a problem, but sometimes documented behavior diverges from implementation. Sometimes the abstractions are so leaky that you don't have a choice.

And that's not just a problem with 3rd-party code, because I'm also not a perfect programmer, and sometimes my own documentation on internal methods diverges from my implementation. I try very hard not to have that happen, but I also try hard to compensate for the fact that I'm a human being who makes mistakes. I try to build systems that are less work to maintain and less prone to having their documentation decay over time. I've found that in code that I'm personally writing, it can be useful to sidestep the entire problem and inline the entire abstraction. Then I don't have to worry about fragility at all.

If you're not introducing a 3rd-party library or a separate interface for every measly 50 lines of code, and instead you just embed your single-use chunk of logic into the original function you want to call it in, then you never have to worry about whether the abstraction is leaky. That can have a tangible effect on the maintainability of your program, because it reduces the number of opportunities you have to mess up an interface or its documentation.

For perfect abstractions, I agree with you. I'm not saying get rid of all abstractions. I just think that perfect abstractions are more difficult and rarer than people suppose, and sometimes for some kinds of logic, a perfect abstraction might not exist at all.


Finally! I'm glad to hear I'm not the only one. I've gone against 'Clean Code' zealots that end up writing painfully warped abstractions in the effort to adhere to what is in this book. It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse. I've had developers use the 'partial' feature in C# to meet Martin's length restrictions to the point where I have to look through 10-15 files to see the full class. The examples in this post are excellent examples of the flaws in Martin's absolutism.


You were never alone Juggles. We've been here with you the whole time.

I have witnessed more people bend over backwards and do the most insane things in the name of avoiding "Uncle Bob's" baleful stare.

It turns out that following "Uncle Sassy's" rules will get you a lot further.

1. Understand your problem fully

2. Understand your constraints fully

3. Understand not just where you are but where you are headed

4. Write code that takes the above 3 into account and make sensible decisions. When something feels wrong ... don't do it.

Quality issues are far more often planning, product management, strategic issues than something as easily remedied as the code itself.


"How do you develop good software? First, be a good software developer. Then develop some software."

The problem with all these lists is that they require a sense of judgement that can only be learnt from experience, never from checklists. That's why Uncle Bob's advice is simultaneously so correct, and yet so dangerous with the wrong fingers on the keyboard.


Agreed.

That's why my advice to junior programmers is, pay attention to how you feel while working on your project - especially, when you're getting irritated. In particular:

- When you feel you're just mindlessly repeating the same thing over and over, with minor changes - there's probably a structure to your code that you're manually unrolling.

- As you spend time figuring out what a piece of code does, try to make note of what made gaining understanding hard, and what could be done to make it easier. Similarly, when modifying/extending code, or writing tests, make note of what took most effort.

- When you fix a bug, spend some time thinking what caused it, and how could the code be rewritten to make similar bugs impossible to happen (or at least very hard to introduce accidentally).

Not everything that annoys you is a problem with the code (especially when one's unfamiliar with the codebase or the tooling, the annoyance tends to come from lack of understanding). Not everything should be fixed, even if it's obviously a code smell. But I found that, when I paid attention to those negative feelings, I eventually learned ways to avoid them up front - various heuristics that yield code which is easier to understand and has less bugs.

As for following advice from books, I think the best way is to test the advice given by just applying it (whether in a real or purpose-built toy project) and, again, observing whether sticking to it makes you more or less angry over time (and why). Code is an incredibly flexible medium of expression - but it's not infinitely flexible. It will push back on you when you're doing things the wrong way.


> When you feel you're just mindlessly repeating the same thing over and over, with minor changes - there's probably a structure to your code that you're manually unrolling.

Casey has a good blog post about this where he explains his compression-oriented programming, which is a progressive approach, instead of designing things up front.

https://caseymuratori.com/blog_0016


I read that a while ago. It's a great article, I second the recommendation! I also love the term, "compression-oriented programming", it clicked in my mind pretty much the moment I saw it.


That's a great set of tips, thanks for sharing.

I like the idea of trying to over-apply a rule on a toy project, so you can get a sense of where it helps and where it doesn't. For example, "build Conway's Game of Life without any conditional branches" or "build FizzBuzz where each function can have only one line of code".


Yeah to some degree. I am in that 25 years of experience range. The software I write today looks much more like year 1 than year 12. The questions I ask in meetings I would have considered "silly questions" 10 years ago. Turns out there was a lot of common sense I was talked out of along the way.

Most people already know what makes sense. It's the absurdity of office culture, JR/SR titles, and perverse incentive that convinces them to walk in the exact opposite direction. Uncle Bob is the culmination of that absurdity. Codified instructions that are easily digested by the lemmings on their way to the cliff's edge.


The profession needs a stronger culture of apprenticeship.

In between learning the principles incorrectly from books, and learning them inefficiently at the school of hard knocks, there's a middle path of learning them from a decent mentor.


The problem is that there is a huge amount of “senior” devs who only got that title for having been around and useless for a long time. It is the best for all to not have them mentoring anyone.

But otherwise I agree, it’s just hard to recognize good programmers.


Also, good programmers don't necessarily make good mentors.

But I imagine these problems aren't unique to the software industry. It can't be the case that every blacksmith was both a good blacksmith and a good mentor, and yet the system of apprenticeship successfully passed down knowledge from generation to generation for a long time. Maybe the problem is our old friend social media, and how it turns all of us into snobs with imposter syndrome, so few of us feel willing and able to be a mentor.


I've seen juniors with better judgment than "seniors".

But they were not in the same comp bracket. And I don't think they gravitated in the same ecosystems so to speak.


The right advice to give new hires, especially junior ones, is to explain to them that in order to have a good first PR they should read this Wikipedia page first:

https://en.wikipedia.org/wiki/Code_smell

Also helpful are code guidelines like the one that Google made for Python:

https://google.github.io/styleguide/pyguide.html

Then when their first PR opens up, you can just point to the places where they didn't quite get it right and everyone learns faster. Mentorship helps too, but much of software is self-learning and an hour a week with a mentor doesn't change that.


I've also never agreed completely with Uncle Bob. I was an OOP zealot for maybe a decade, and I'm now I'm a Rust convert. The biggest "feature" of Rust is that is probably brought semi-functional concepts to the "OOP masses." I found that, with Rust, I spent far more time solving the problem at hand...

Instead of solving how I am going to solve the problem at hand ("Clean Coding"). What a fucking waste of time, my brain power, and my lifetime keystrokes[1].

I'm starting to see that OOP is more suited to programming literal business logic. The best use for the tool is when you actually have a "Person", "Customer" and "Employee" entities that have to follow some form of business rules.

In contradiction to your "Uncle Sassy's" rules, I'm starting to understand where "Uncle Beck" was coming from:

1. Make it work.

2. Make it right.

3. Make it fast.

The amount of understanding that you can garner from make something work leads very strongly into figuring out the best way to make it right. And you shouldn't be making anything fast, unless you have a profiler and other measurements telling you to do so.

"Clean Coding" just perpetuates all the broken promises of OOP.

[1]: https://www.hanselman.com/blog/do-they-deserve-the-gift-of-y...


Simula, arguably the first (or at least one of the earliest) OOP languages, was written to simulate industrial processes, where each object was one machine (or station, or similar) in a manufacturing chain.

So, yes, it was very much designed for when you have entities interacting, each entity modeled by a class, and then having one (or more) object instantiations of that class interacting with each other.


> 1. Understand your problem fully

> 2. Understand your constraints fully

These two fall under requirements gathering. It's so often forgotten that software has a specific purpose, a specific set of things it needs to do, and that it should be crafted with those in mind.

> 3. Understand not just where you are but where you are headed

And this is the part that breaks down so often. Because software is simultaneously so easy and so hard to change, people fall into traps both left and right, assuming some dimension of extensibility that never turns out to be important, or assuming something is totally constant when it is not.

I think the best advice here is that YAGNI, don't add functionality for extension unless your requirements gathering suggests you are going to need it. If you have experience building a thing, your spider senses will perk up. If you don't have experience building the thing, can you get some people on your team that do? Or at least ask them? If that is not possible, you want to prototype and fail fast. Be prepared to junk some code along the way.

If you start out not knowing any of these things, and also never junking any code along the way, what are the actual odds you got it right?


>These two fall under requirements gathering. It's so often forgotten that software has a specific purpose, a specific set of things it needs to do, and that it should be crafted with those in mind.

I wish more developers would actually gather requirements and check if the proposed solution actually solves whatever they are trying to do.

I think part of the problem is that often we don't use what we work on, so we focus too much in the technical details, but we forget what the user actually needs and what workflow would be better.

In my previous job, clients were always asking for changes or new features (they paid dev hours for it) and would come with a solution. But I always asked what was the actual problem, and many times, there was a solution that would solve the problem in a better way


>Write code that takes the above 3 into account and make sensible decisions. When something feels wrong ... don't do it.

The problem is that people often need specifics to guide them when they're less experienced. Something that "feels wrong" is usually due to vast experience being incorporated into your subconscious aesthetic judgement. But you can't rely on your subconscious until you've had enough trials to hone your senses. Hard rules can and often are overapplied, but its usually better than the opposite case of someone without good judgement attempting to make unguided judgement calls.


You are right, but also I think the problem discussed in the article is that some of these hard rules are questionable. DRY for example: as a hard rule it leads to overly complex and hard to maintain code because of bad and/or useless abstractions everywhere (as illustrated in TFA). It needs either good experience to sense if they "feel good" like you say, or otherwise proven repetitions to reveal a relevant abstraction.


> and make sensible decisions

well there goes the entire tech industry


My last company was very into Clean Code, to the point where all new hires were expected to do a book club on it.

My personal take away was that there were a few good ideas, all horribly mangled. The most painful one I remember was his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish. (Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.) So most everyone who read the book came to earnestly believe that the Law of Demeter is about period-to-semicolon ratios, and proceeded to convert something like

  val frobnitz = Frobnitz.builder()
      .withPixieDust()
      .withMayonnaise()
      .withTarget(worstTweetEver)
      .build();
into

  var frobnitzBuilder = Frobnitz.builder();
  frobnitzBuilder = frobnitzBuilder.withPixieDust();
  frobnitzBuilder = frobnitzBuilder.withMayonnaise();
  frobnitzBuilder = frobnitzBuilder.withTarget(worstTweetEver);
  val frobnitz = frobnitzBuilder.build();

and somehow convince themselves that doing this was producing tangible business value, and congratulate themselves for substantially improving the long-term maintainability of the code.

Meanwhile, violations of the actual Law of Demeter ran rampant. They just had more semicolons.


On that note, I've never seen an explanation of Law of Demeter that made any kind of sense to me. Both the descriptions I read and the actual uses I've seen boiled down to the type of transformation you just described, which is very much pointless.

> Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.

I'd like to read more. Do you know of a source that covers this properly?


> Law of Demeter

"Don't go digging into objects" pretty much.

Talk to directly linked objects and tell them what you need done, and let them deal with their linked objects. Don't assume that you know what is and always will be involved in doing something on dependent objects of the one you're interacting with.

E.g. lets say you have a Shipment object that contains information of something that is to be shipped somewhere. If you want to change the delivery address, you should consider telling the shipment to do that rather than exposing an Address and let clients muck around with that directly, because the latter means that now if you need to add extra logic if the delivery address changes there's a chance the changes leaks all over the place (e.g. you decide to automate your customs declarations, and they need to change if the destination country changes; or delivery costs needs to updated).

You'll of course, as always, find people that takes this way too far. But the general principle is pretty much just to consider where it makes sense to hide linked objects behind a special purpose interface vs. exposing them to clients.


As for why this is useful:

If objects are allowed to talk to friends of friends, that greatly increases the level of interdependency among objects, which, in turn, increases the number of ancillary changes you might need to make in order to ensure all the code talking to some object remains compatible with its interface.

More subtly, it's also a code smell that suggests that, regardless of the presence of objects and classes, the actual structure and behavior of the code is more procedural/imperative than object-oriented. Which may or may not be a big deal - the importance of adhering to a paradigm is a decision only you can make for yourself.


Talk to directly linked objects and tell them what you need done, and let them deal with their linked objects. Don't assume that you know what is and always will be involved in doing something on dependent objects of the one you're interacting with.

IMHO, this is one of those ideas you have to consider on its merits for each project.

My own starting point is usually that I probably don’t want to drill into the internals of an entity that are implementation details at a lower level of abstraction than the entity’s own interface. That’s breaking through the abstraction and defeating the point of having a defined interface.

However, there can also be relationships between entities on the same level, for example if we’re dealing with domain concepts that have some sort of hierarchical relationship, and then each entity might expose a link to parent or child entities as part of its interface. In that case, I find it clearer to write things like

    if (entity.parent.interesting_property !== REQUIRED_VALUE) {
        abort_invalid_operation();
    }
instead of

    let parent_entity = entities.find(entity.parent_id);
    if (parent_entity.interesting_property !== REQUIRED_VALUE) {
        abort_invalid_operation();
    }
and this kind of pattern might arise often when we’re navigating the entity relationships, perhaps finding something that needs to be changed and then checking several different constraints on the overall system before allowing that change.

The “downside” of this is that we can no longer test the original entity’s interface in isolation with unit tests. However, if the logic requires navigating the relationships like this, the reality is that individual entities aren’t independent in that sense anyway, so have we really lost anything of value here?

I find that writing a test suite at the level of the overall set of entities and their relationships — which is evidently the smallest semantically meaningful data set if we need logic like the example above — works fine as an alternative to dogmatically trying to test the interface for a single entity entirely in isolation. The code for each test just sets up the store of entities and adds the specific instances and relationships I want for each test, which makes each test scenario nicely transparent. This style also ensures the tests only depend on real code, not stand-ins like mocks or stubs.


I don't think the two versions are relevant to Law of Demeter. One example has pointers/references in a strong tree and another has indexed ones, but neither is embracing LoD more or less than the other.

This would be a more relevant example:

parent_entity.children.remove(this)

vs

parent_entity.remove_child(this)

...Where remove_child() would handle removing the entity from `children` directly, and also perhaps busting a cache, or notifying the other children that the heirarchy has changed, etc etc.

Going back to your original case, you _could_ argue that LoD would advise you to create a method on entity which returns the parent, but I think that would fall under encapsulation. If you did that though, you could hide the implementation detail of whether `parent` is a reference or an ID on the actual object, which is what most ORMs will do for you.


Ah, but what if children is some kind of List or Collection which can be data-bound? By Liskov's substition principle, you ought to be able to pass it to a Collection-modifying routine and have it function correctly. If the parent must be called the children member should be private, or else the collection should implement eventing and the two methods should have the same effect (and ideally you'd remove one).


That takes us back up to viardh's concluding remark from earlier in the thread:

> You'll of course, as always, find people that takes this way too far. But the general principle is pretty much just to consider where it makes sense to hide linked objects behind a special purpose interface vs. exposing them to clients.

I would say that if you're using a ViewModel object that will be data-bound, then you're sort of outside the realm of the Law of Demeter. It's really more meant to concern objects that implement business logic, not ones that are meant to be fairly dumb data containers.

On the other hand, if it is one that is allowed to implement business logic, then I'd say, yeah, making the children member public in the first place is violating the law. You want to keep that hidden and supply a remove_child() method instead, so that you can retain the ability to change the rules around when and how children are removed, without violating LSP in the process.


In the other branch I touched on this- iterating children still being a likely use case after all, you have the option of exposing the original or making a copy which could have perf impacts.

But honestly best to not preoptimize, I would probably do

private _children : Entity[];

get children() { return this._children.slice(); }

And reconsider the potential mutation risk later if the profiler says it matters.


It can be, though there are some interesting philosophical issues there.

The example that I always keep coming back to is Smalltalk, which is the only language I know of that represents pure object-oriented programming. Similar to how, for the longest time, Haskell was more-or-less the only purely functional language. Anyway, in Smalltalk you generally would not do that. You'd tell the object to iterate its own children, and give it some block (Smalltalk equivalent of an anonymous function) that tells it what to do with them.

Looping over a data structure from the outside is, if you want to get really zealous about it, is more of a procedural and/or functional way of doing things.


Indeed! It's the visitor pattern and since we're talking about a tree that would probably be useful here.


FWIW, I was writing JavaScript in that example, so `entity.parent` might have been implemented internally as a function anyway:

    get parent() {
        return this.entities.find(this.parent_id);
    }
I don’t think whether we write `entity.parent` or `entity.parent()` really matters to the argument, though.

In any case, I see what you’re getting at. Perhaps a better way of expressing the distinction I was trying to make is whether the nested object that is being accessed in a chain can be freely used without violating any invariants of the immediate object. If not, as in your example where removing a child has additional consequences, it is probably unwise to expose it directly through the immediate object’s interface.


Yes, its a great case for making the actual `children` collection private so that mutation must go through the interface methods instead. But still, iteration over the children is a likely use case, so you are left with either exposing the original object or returning a copy of the array (potentially slower, though this might not matter depending).


That problem could potentially be solved if the programming language supports returning some form of immutable reference/proxy/cursor that allows a caller to examine the container but without being able to change anything. Unfortunately, many popular languages don’t enforce transitive immutability in that situation, so even returning an “immutable” version of the container doesn’t prevent mutation of its contained values in those languages. Score a few more points for the languages with immutability-by-default or with robust ownership semantics and support for transitive immutability…


Very true. JS has object freezing but that would affect the class's own mutations. On the other hand you could make a single copy upon mutation, freeze it, and then return the frozen one for iteration if you wanted to. Kind of going a bit far though imho.


If you really want to dig into it, perhaps a book on domain-driven design? That's where I pulled the term "bounded context" from.

My own personal oversimplification, probably grossly misleading, is that DDD is what you get when you take the Law of Demeter and run as far with it as you can.


Thanks.

I have Evans' book on my bookshelf. I understand it's the book on DDD, right? I tried to read it a while ago, I got through about one third of it before putting it away. Maybe I should revisit it.


Three things that go well together:

  - Law of Demeter
  - Tell, Don’t Ask
  - narrow interfaces


Agree that the transformation described is pointless.

A more interesting statement, but I am not sure it is exactly equivalent to the law of Demeter:

Distinguish first between immutable data structures (and I'd group lambdas with them), and objects. An Object is something more than just a mutable data structure, one wants to also fold in the idea that some of these objects exist in a global namespace providing a named mutable state to the entire rest of the program. And to the extent that one thinks about threads one thinks about objects as providing a shared-state multithread story that requires synchronization, and all of that.

Given that distinction, one has a model of an application as kind of a factory floor, there are widgets (data structures and side-effect-free functions) flowing between Machines (big-o Objects) which process them, translate them, and perform side-effecting I/O and such.

Quasi-law-of-Demeter: in computing you have the power to also send a Machine down a conveyor belt, and build other Machines which can respond to that.[1] This is a tremendous power and it comes with tremendous responsibility. Think a system which has something like "Hey, we're gonna have an Application store a StorageMechanism and then in the live application we can swap out, without rebooting, a MySQL StorageMechanism for a local SQLite Storage Mechanism, or a MeticulouslyLoggedMySQL Storage Mechanism which is exactly like the MySQL one except it also logs every little thing it does to stdout. So when our application is misbehaving we can turn on logging while it's misbehaving and if those logs aren't enough we can at least sever the connection with the live database and start up a new node while we debug the old one and it thinks it's still doing some useful work."

The signature of this is being identified by this quasi-Law-of-Demeter as this "myApplication.getStorageMechanism().saveRecord(myRecord)" chaining. The problem is not the chaining itself; the idea would be just as wrong with the more verbose "StorageMechanism s = myApp.getStorageMechanism(); s.saveRecord(myRecord)" type of flow. The problem is just that this superpower is really quite powerful and YAGNI principles apply here: you probably don't need the ability for an application to hot-swap storage mechanisms this way.[2]

Bounded contexts[3] are kind of a red herring here, they are extremely handy but I would not apply the concept in this context.

1. FWIW this idea is being shamelessly stolen from Haskell where the conveyor belt model is an "Arrow" approach to computing and the idea that a machine can flow down a conveyor belt requires some extra structure, "ArrowApply", which is precisely equivalent to a Monad. So the quasi-law-of-Demeter actually says "avoid monads when possible", hah.

2. And of course you may run into an exception to it and that's fine, if you are aware of what you are doing.

3. Put simply a bounded context is the programming idea of "namespace" -- a space in which the same terms/words/names have a different meaning than in some other space -- applied to business-level speech. Domain-driven design is basically saying "the words that users use to describe the application, should also be the words that programmers use to describe it." So like in original-Twitter the posts to twitter were not called tweets, but now that this is the common name for them, DDD says "you should absolutely create a migration from the StatusUpdate table to the Tweet table, this will save you incalculable time in the long-run because a programmer may start to think of StatusUpdates as having some attributes which users don't associate with Tweets while users might think of Tweets as having other properties like 'the tweet I am replying to' which programmers don't think the StatusUpdates should have... and if you're not talking the same language then every single interaction consists of friction between you both." The reason we need bounded contexts is that your larger userbase might consist both of Accountants for whom a "Project" means ABC, and Managers for whom a "Project" means DEF, and if you try to jam those both into the Project table because they both have the same name you're gonna get hurt real bad. In turn, DDD suggests that once you can identify where those namespace boundaries seem to exist in your domain, those make good module boundaries, since modules are the namespaces of our software world. And then if say you're doing microservices, instead of pursuing say the "strong entity" level of ultra-fine granularity, "every term used in my domain deserves its own microservice," try coarser-graining it by module boundaries and bounded contexts, create a "mini-lith" rather than these "nanoservices" that each manage one term of the domain... so the wisdom goes.


I love how this is clearly a contextual recommendation. I'm not a software developer, but a data scientist. In pandas, to write your manipulations in this chained methods fashing is highly encouraged IMO. It's even called "pandorable" code


The latter example (without the redundant assignments) is preferred by people who do a lot of line-by-line debugging. While most IDEs allow you to set a breakpoint in the middle of an expression, that's still more complicated and error prone than setting one for a line.

I've been on a team that outlawed method chaining specifically because it was more difficult to debug. Even though I'm more of a method-chainer myself, I have taken to writing unchained code when I am working on a larger team.

  var frobnitzBuilder = Frobnitz.builder();
  frobnitzBuilder.withPixieDust();
  frobnitzBuilder.withMayonnaise();
  frobnitzBuilder.withTarget(worstTweetEver);
  val frobnitz = frobnitzBuilder.build();
...is undeniably easier to step-through debug than the chained version.


Might depend on the debugger? The main ones I've used also let me go through the chained version one at a time, including seeing intermediate values.


TBH, the only context where I've seen people express a strong preference for the non-chained option is under the charismatic influence of Uncle Bob's baleful stare.

Otherwise, it seems opinions typically vary between a strong preference for chaining, and rather aggressive feelings of ¯\_(ツ)_/¯


Every time that I see a builder pattern, I see a failure in adopting modern programming languages. Use named parameters, for f*cks sake!

  const frobnitz = new Frobnitz({pixieDust: true, mayonnaise: true, target: worstTweetEver});


I think following some ideas in the book, but ignoring others like the ones applicable for the law of demeter can be a recipe for a mess. The book is very opinionated, but if followed well I think it can produce pretty dead simple code. But at the same time, just like with any coding, experience plays massively into how well code is written. Code can be written well when using his methods or when ignoring his methods and it can be written badly when trying to follow some of his methods or when not using his methods at all.


>his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish.

oof. I mean, yeah, at least explain what the main thing you’re talking about is about, right? This is a pet peeve.


wow this is a nightmare


> It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse.

I don't recall where I picked up from, but the best advice I've heard on this is a "Rule of 3". You don't have a "pattern" to abstract until you reach (at least) three duplicates. ("Two is a coincidence, three is pattern. Coincidences happen all the time.") I've found it can be a useful rule of thumb to prevent "premature abstraction" (an understandable relative of "premature optimization"). It is surprising sometimes the things you find out about the abstraction only happen when you reach that third duplicate (variables or control flow decisions, for instance, that seem constants in two places for instance; or a higher level idea of why the code is duplicated that isn't clear from two very far apart points but is clearer when you can "triangulate" what their center is).


I don't hate the rule of 3. But i think it's missing the point.

You want to extract common code if it's the same now, and will always be the same in the future. If it's not going to be the same and you extract it, you now have the pain of making it do two things, or splitting. But if it is going to be the same and you don't extract it, you have the risk of only updating one copy, and then having the other copy do the wrong thing.

For example, i have a program where one component gets data and writes it to files of a certain format in a certain directory, and another component reads those files and processes the data. The code for deciding where the directory is, and what the columns in the files are, must be the same, otherwise the programs cannot do their job. Even though there are only two uses of that code, it makes sense to extract them.

Once you think about it this way, you see that extraction also serves a documentation function. It says that the two call sites of the shared code are related to each other in some fundamental way.

Taking this approach, i might even extract code that is only used once! In my example, if the files contain dates, or other structured data, then it makes sense to have the matching formatting and parsing functions extracted and placed right next to each other, to highlight the fact that they are intimately related.


> You want to extract common code if it's the same now, and will always be the same in the future.

I suppose I take that as a presumption before the Rule of 3 applies. I generally assume/take for granted that all "exact duplicates" that "will always be the same in future" are going to be a single shared function anyway. The duplication I'm concerned about when I think the Rule of 3 comes into play is the duplicated but diverging. ("I need to do this thing like X does it, but…")

If it's a simple divergence, you can add a flag sometimes, but the Rule of 3 suggests that sometimes duplicating it and diverging it that second time "is just fine" (avoiding potentially "flag soup") until you have a better handle on the pattern for why you are diverging it, what abstraction you might be missing in this code.


The rule of three is a guideline or principle, not a strict rule. There's nothing about it that misses the point. If, from your experience and judgement, the code can be reused, reuse it. Don't duplicate it (copy/paste or write it a second time). If, from your experience and judgement, it oughtn't be reused, but you later see that you were wrong, refactor.

In your example, it's about modularity. The directory logic makes sense as its own module. If you wrote the code that way from the start, and had already decoupled it from the writer, then reuse is obvious. But if the code were tightly coupled (embedded in some fashion) within the writer, than rewriting it would be the obvious step because reuse wouldn't be practical without refactoring. And unless you can see how to refactor it already, then writing it the second time (or third) can help you discover the actual structure you want/need.

As people become more experienced programmers, the good ones at least, already tend to use modular designs and keep things decoupled which promotes reuse versus copy/paste. In that case, the rule of three gets used less often by them because they have fewer occurrences of real duplication.


I think the point you and a lot of other commenters make is that applying hard and fast rules without referring to context is simply wrong. Surely if all we had to do was apply the rules, somebody would have long ago written a program to write programs. ;-)


You have a point for extracting exact duplicates that you know will remain the same.

But the point of the rule of 3 remains. Humans do a horrible job of abstracting from one or two examples, and the act of encountering an abstraction makes code harder to understand.


> “premature abstraction”

Also known as, “AbstractMagicObjectFactoryBuilderImpl” that builds exactly one (1) factory type that generates exactly (1) object type with no more than 2 options passed into the builder and 0 options passed into the factory. :-)


The Go proverb is "A little copying is better than a little dependency." Also don't deduplicate 'text' because it's the same, deduplicate implementations if they match in both mechanism 'what' it does as well as their semantic usage. Sometimes the same thing is done with different intents which can naturally diverge and the premature deduplication is debt.


I'm coming to think that the rule of three is important within a fairly constrained context, but that other principle is worthwhile when you're working across contexts.

For example, when I did work at a microservices shop, I was deeply dissatisfied with the way the shared utility library influenced our code. A lot of what was in there was fairly throw-away and would not have been difficult to copy/paste, even to four or more different locations. And the shared nature of the library meant that any change to it was quite expensive. Technically, maybe, but, more importantly, socially. Any change to some corner of the library needed to be negotiated with every other team that was using that part of the library. The risk of the discussion spiraling away into an interminable series of bikesheddy meetings was always hiding in the shadows. So, if it was possible to leave the library function unchanged and get what you needed with a hack, teams tended to choose the hack. The effects of this phenomenon accumulated, over time, to create quite a mess.


An old senior colleague of mine used to insist that if i added a script to the project, i had to document it on the wiki. So i just didn't add my scripts to the project.


Do you think that's a feature or a bug?


I'd argue if the code was "fairly throw-away" it probably did not meet the "Rule of 3" by the time it was included in the shared library in the first place.


> I don't recall where I picked up from, but the best advice I've heard on this is a "Rule of 3"

I know this as AHA = Avoid Hasty Abstractions:

- https://kentcdodds.com/blog/aha-programming

- https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction


> It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse.

Something I’ve mentioned to my direct reports during code reviews: Sometimes, code is duplicated because it just so happens to do something similar.

However, these are independent widgets and changes to one should not affect the other; in other words, not suitable for abstraction.

This type of reasoning requires understanding the problem domain (i.e., use-case and the business functionality ).


I've gone through java code where i need to open 15 different files, with one lined pieces of code, just to find out it's a "hello world" class.

I like abstraction as much as the next guy but this is closer to obfuscation than abstraction.


At a previous company, there was a Clean Code OOP zealot. I heard him discussing with another colleague about the need to split up a function because it was too long (it was 10 lines). I said, from the sidelines, "yes, because nothing enhances readability like splitting a 10 line function into 10, 1-line functions". He didn't realize I was being sarcastic and nodded in agreement that it would be much better that way.


Spaghetti code is bad, but lasagna code is just as bad IMO.


There seems to be a lot of overlap between the Clean Coders and the Neo Coders [0]. I wish we could get rid of both.

[0] People who strive for "The One" architecture that will allow any change no matter what. Seriously, abstraction out the wazoo!

Honestly. If you're getting data from a bar code scanner and you think, "we should handle the case where we get data from a hot air balloon!" because ... what if?, you should retire.


I like to say "the machine that does everything, does nothing".


The problem is that `partial` in C# should never even have been considered as a "solution" to write small, maintainable classes. AFAIK partial was introduced for code-behind files, not to structure human written code.

Anyways, you are not alone with that experience - a common mistake I see, no matter what language or framework, is that people fall for the fallacy "separation into files" is the same as "separation of concerns".


Seriously? That's an abuse of partial and just a way of following the rules without actually following them. That code must have been fun to navigate...


Many years ago I worked on a project that had a hard “no hard coded values” rule, as requested by the customer. The team routinely wrote the equivalent to

    const char_a = “a”;
And I couldn’t get my manager to understand why this was a problem.


Clearly it is still a hardcoded value! It fails the requirement. Instead there should be a factory that loads a file that reads in the "a" to the variable, nested down under 6 layers of abstraction spread across a dozen files.


thats too enterprisey, at a startup you would write it like this:

const charA = (false+"")[1];


That has three constants.


is this better?

const charA = (![]+[])[+!+[]];


I cannot even parse this. What is going on here?


[] is still a constant.


True, but we can both agree that this is a better constant than "a". Much better job security in that code.. unless you get fired for writing it, that is


Saw this just the other day. I was at a loss to know what to say. :(


This gets to intent.

What, in the code base, does char_a mean? Is it append_command? Is it used in a..z for all lowercase? Maybe an all_lowercase is needed instead.

I know that it's an "A". I don't know why that matters to the codebase. Now, there are cases where it is obvious to even beginners, and I'm fine with it as magic characters, but I've seen cases where people were confused to why 1000 was in a `for(int i = 0; i<1000; i++)`. Is 1000 arbitrary in that case, or is it based on a defunct business requirement in 2008? Will it break if we change it to 5000 because our computers are faster now?


Can I please forward your contact info to my developer? Maybe you can do a better job convincing him haha ;)


> where I have to look through 10-15 files to see the full class

The Magento 2 codebase is a good example of this. It's both well written and horrible at the same time. Everything is so spread out into constituent technical components, that the code loses the "narrative" of what's going on.


I started OOP in '96 and I was never able to wrap my head around the code these "Clean Code" zealots produced.

Case in point: Bob Martin's "Video Store" example.

My best guess is that clean code, to them, was as little code on the screen as possible, not necessarily "intention revealing code either", instead everything is abstracted until it looks like it does nothing.


I have had the experience of trying to understand how a feature in a C++ project worked (both Audacity and Aegisub I think) only to find that I actually could not find where anything was implemented, because everything was just a glue that called another piece of glue.

Also sat in their IRC channel for months and the lead developer was constantly discussing how he'd refactor it to be cleaner but never seemed to add code that did something.


SOLID code is a very misleading name for a technique that seems to shred the code into confetti.

I personally don't feel all that productive spending like half my time just navigating the code rather than actually reading it, but maybe it's just me.


> I personally don't feel all that productive spending like half my time just navigating the code rather than actually reading it

Had this problem at a previous job - main dev before I got there was extremely into the whole "small methods" cult and you'd regularly need to open 5-10 files just to track down what something did. Made life - and the code - a nightmare.


People to people dealt this fate ...

What is mostly surprising I find most of developers are trying to obey the "rules". Code containing even minuscule duplication must be DRYied, everyone agrees that code must be clean and professional.

Yet it is never enough, bugs are showing up and stuff that was written by others is always bad.

I start thinking that 'Uncle Bob' and 'Clean code' zealots are actually harmful, because it prevents people from taking two steps back and thinking about what they are doing. Making microservices/components/classes/functions that end up never reused and making DRY holy grail.

Personally I am YAGNI > DRY and a lot of times you are not going to need small functions or magic abstractions.


I think the problem is not the book itself, but people thinking that all the rules apply to all the code, al the time. A length restriction is interesting because it makes you think if maybe you should spit your function into more than one, as you might be doing too much in one place. Now, if splitting will make things worse, then just don't.


In C# and .NET specifically, we find ourselves having a plethora of services when they are "human-readable" and short.

A service has 3 "helper" services it calls, which may, in turn have helper services, or worse, depend on a shared repo project.

The only solution I have found is to move these helpers into their own project, and mark the helpers as internal. This achieves 2 things:

1. The "sub-services" are not confused as stand-alone and only the "main/parent" service can be called. 2. The "module" can now be deployed independently if micro-services ever become a necessity.

I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash, and we have achieved a lot be re-using modular services.

We are 1.5 years into a project and our code re-use is sky-rocketing, which allows us to keep errors low.

Of course, a lot of dependencies also make testing difficult, but allow easier mocks if there are no globals.


>I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash

Dunno if this is the feedback you are after, but I would try to not be such an absolutist. There is no reason that a great 100 line long file becomes unreadable trash if you add one line.


I mean feedback on a way to abstract away helper services. As far as file length, I realize that this is subjective, and the 100 lines number is pulled out of thing air, but extremely long files are generally difficult to read and context gets lost.


Putting shared context in another file makes it harder to read though. Files should be big enough to represent some reasonable context, for more complicated things that necessary creates a big shared context you want bigger files and simpler things smaller files.

A thing that can be perfectly encapsulated in a 1000 line file with a small clean interface is much much better than splitting that up into 20 files 100 lines each calling each other.


Partial classes is an ugly hack to mix human and machine generated source code. IMHO it should be avoided


> I've had developers use the 'partial' feature in C# to meet Martin's length restrictions

That is not the fault of this book or any book. The problem is people treating the guidelines as rituals instead of understanding their purpose.


What do you say to convince someone? It’s tricky to review a large carefully abstracted PR that introduces a bunch of new logic and config with something like: “just copy paste lol”


... and here I was thinking I was alone!


Sometimes you really just do need a 500 line function.


Yes, it's the first working implementation before good boundaries are not yet known. After a while it becomes familiar and natural conceptual boundaries arise that leads to 'factoring' and shouldn't require 'refactoring' because you prematurely guessed the wrong boundaries.

I'm all for the 100-200 line working version--can't say I've had a 500. I did once have a single SQL query that was about 2 full pages pushing the limits of DB2 (needed multiple PTFs just to execute it)--the size was largely from heuristic scope reductions. In the end, it did something in about 3 minutes that had no previous solution.


Nah mate, you never do. Nor 500 1-liners.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: