It's probably time to stop recommending Clean Code (2020)

hackinthebochs · on May 25, 2021

There's a lot of bad advice being tossed around in this thread. If you are worried about having to jump through multiple files to understand what some code is doing, you should consider that your naming conventions are the problem, not the fact that code is hidden behind functional boundaries.

Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.

The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

jzoch · on May 25, 2021

> you have failed to sufficiently explain

This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.

I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.

gentleman11 · on May 25, 2021

Carmack once suggested that people in-line their functions more often, in part so they could “see clearly the full horror of what they have done” (paraphrased from memory) as code gets more complicated. Many helper functions can be replaced by comments and the code inlined. I tried this last year and it led to overall more readable code, imho.

cassonmars · on May 26, 2021

You’re very close to his actual quote, he was referring to the horrors of mutating shared state: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

gentleman11 · on May 26, 2021

Thanks for the link

> The real enemy addressed by inlining is unexpected dependency and mutation of state, which functional programming solves more directly and completely. However, if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing. When it gets to be too much to take, figure out how to factor blocks out into pure functions (and don.t let them slide back into impurity!).

physicles · on May 26, 2021

Carmack is such a great communicator of software development philosophy. He also wrote a classic article on "Functional programming in C++": https://www.gamasutra.com/view/news/169296/Indepth_Functiona...

gentleman11 · on May 26, 2021

Do you know anything else he wrote like these?

derangedHorse · on May 25, 2021

The idea is that without proper boundaries, finding the line that needed to be changed may be a lot harder than clicking through files with an IDE. Smaller components also help with code reviews since it’s a lot easier to understand a line within the context of a component (or method name) without having to understand what the huge globs of code before it is doing. Also, like you said a lot of the times a developer has to read code they didn’t write so there are other factors to consider like how easy it is for someone from another team to make a change or whether a new employee could easily digest the code base.

MillenialMan · on May 26, 2021

The problem being solved here is just scope, not re-usability. Functions are a bad solution because they force non-locality. A better way to solve this would be local scope blocks, /that define their dependencies.

E.g. something like:

    (reads: var_1, var_2; mutates: var_3) {
       var_3 = var_1 + var_2
    }

You could also define which variables defined in the block get elevated, like return values:

    (reads: var_1, var_2; mutates: var_3) {
       var_3 = var_1 + var_2
       int result_value = var_1 * var_2
    } (exports: result_value)

    return result_value * 5

This is also a more tailored solution to the problem than a function, it allows finer-grained control over scope restriction.

It's frustrating that most existing languages don't have this kind of feature. Regular scope blocks suck because they don't allow you to define the specific ways in which they are permeable, so they only restrict scope in one direction (things inside the scope block are restricted) - but the outer scope is what you really want to restrict.

You could also introduce this functionality to IDEs, without modifying existing languages. Highlight a few lines, and it could show you a pop-up explaining which variables that section reads, mutates and defines. I think that would make reading long pieces of code significantly easier.

CyberDildonics · on May 28, 2021

This is one of the few comments in this entire thread that I think is interesting and born out of a lot of experience and not cargo culting.

In C++ you can make a macro function that takes any number of arguments but does nothing. I end up using that to label a scope because that scope block will then collapse in the IDE. I usually declare any variables that are going to be 'output' by that scope block just above it.

This creates the ability to break down isolated parts of a long function that don't need to be repeated. Variables being used also don't need to be declared as function inputs which also simplifies things significantly compared to a function.

This doesn't address making the compiler enforce much, though it does show that anything declared in the scope doesn't pollute the large function it is in.

MillenialMan · on May 29, 2021

Thank you. Your macro idea is interesting, but I definitely want to be able to defer to the compiler on things like this. I want my scope restrictions to also be a form of embedded test. Similar to typing.

I wish more IDEs had the ability to chunk code like this on-the-fly. I think it's technically possible, and maybe even possible to insert artificial blocks automatically, showing you how your code layout chunks automatically... Hmm.

You know, once I'm less busy I might try implementing something like this.

Too · on May 30, 2021

C++ lambda captures work exactly like this. You need to state which variables that should be part of the closure and whether they should be mutable and by reference or copies.

    auto result_value = [var1, var2, &var3]() {
        var3 = var1 + var2
        return var1 * var2
    }()
    return result_value * 5

Does anyone know if compiler is smart enough to inline self-executing lambda as above? Or will this be less performant than plain blocks?

Jtsummers · on May 27, 2021

Ada/SPARK actually has dependencies like that as part of function specs. Including which variables depend on what.

lumost · on May 25, 2021

> Clicking through files with an IDE

This is a big assumption. Many engineers prefer to grep through code without an IDE, the "clean code" style breaks grep/github code search and forces someone to install an IDE with go to declaration/find usages. On balance I prefer the clean code style and bought the jetbrains ultimate pack, however I do understand that some folks are working with grep/vim/code search and would rather not download a project to figure out how it works.

TeMPOraL · on May 27, 2021

I've done both on a "Clean Code", lots-of-tiny-functions C++ codebase. Due to various reasons[0], I spent a year using Emacs with no IDE features to work on that codebase, after which I managed to get a language server to work in our specific context, and continued to use Emacs with all the bells and whistles LSP provides.

My conclusion? Small functions are still annoying. Sure, with IDE features in a highly-productive environment like Emacs is, I can jump around the codebase at the speed of thought. But it doesn't solve the critical problem: to understand a piece of code that does something useful, I have to keep all these tiny functions in my working memory. And it ain't big enough for that.

I've long been dreaming about IDE/editor feature that would let you inline code for viewing, without actually changing it. That is, I could mark a block of code, and my editor would replace all function calls[1] with their bodies, with names of their parameters replaced by the names of arguments passed[2].

This way, I could reap benefits of both approaches - small functions that compose and have meaningful ways, and long sequential blocks of code that don't tax my working memory.

--

[0] - C++ is notoriously hard to get reliable code intelligence (autocomplete, xref) to work. Even commercial IDEs get confused if the codebase is large enough, or built in an atypical fashion. Visual Studio in particular would happily crash for me every other day...

[1] - With some sane default filters, like "don't inline functions from the standard library and third-party libraries".

[2] - Or autogenerated ones when the argument is an expression. Think Lisp gensym. E.g. when I have `auto foo(F f);` and call it like `foo(2+2);`, the inlined code would start with `F f_1 = 2+2;`. Like with expanding Lisp macros, the goal of this exercise is that I should be able to replace my original code with generated expansion, and it should work.

throwaway2037 · on May 27, 2021

You wrote: "I've long been dreaming about IDE/editor feature that would let you inline code for viewing, without actually changing it." That sounds like a great idea! That might be useful for both the writer and reader. It might be possible to build something like using the libraries that Clang provides, but it would be a huge feat -- like a master's or part of a PhD.

You also wrote: "Visual Studio in particular would happily crash for me every other day..." Have you tried CLion by JetBrains? (Usually, they have a free 30-day trial.) I have not used it for enterprise-large projects, but I have used it for a few personal projects. It is excellent. The pace of progress (new features in "code sensing") is impressive. If you find bugs, you can report them and they usually fix them. (They have fixed about 50% of the bugs I have raised about their products over the last 5 years. An impressive clearance rate!)

TeMPOraL · on May 27, 2021

> It might be possible to build something like using the libraries that Clang provides, but it would be a huge feat -- like a master's or part of a PhD.

Yeah, that's how I feel about it too. A useful MVP would probably be shorter, though, even if it sometimes couldn't do the inlining, or misidentified the called function. I mean, this is C++, I haven't seen any product with a completely reliable type hints & autocompletion, and yet even buggy ones are still very useful.

> Have you tried CLion by JetBrains?

Not yet. Going by the experience with IntelliJ, I expect to be a very good product. But right now, I'm sticking to Emacs.

In my experience, professional IDEs (particularly the JetBrains ones) are the best for working with a particular programming language, but they aren't so good for polyglot work and all the secondary tasks surrounding programming - version control, file management, log inspection, and even generalized text editing. My Emacs setup, on the other hand, delivers superior ergonomics for all these secondary tasks, and - as long as I can find appropriate language server - is within order of magnitude on programming itself. So it feels like a better deal overall.

throwaway2037 · on May 27, 2021

I agree about the "professional IDEs" point. Are you aware that IntelliJ has language plug-ins that lets you mix HTML/JavaScript/CSS/Java/Python in the same project? I guess CLion can at least mix C/C++/HTML/JavaScript/CSS/Python. This is great when you work with research scientists who like to use different languages in the same project due to external dependencies. I can vouch for /certain/ polyglot projects, it works fine in IntelliJ. (That said, you might have a very unusual polyglot project.)

As for tooling, I might read/write/compile/debug code in the IDE, but do all the secondary tasks in a Linux/Bash/Cygwin terminal. Don't feel guilty/ashamed of this style! "Use the right tool for the job." I am forced to work in Windows, but Cygwin helps me "Get that Linux feeling - on Windows". I cringe when I watch people using Git from a GUI (for the most part) instead of using the command line, which is normally superior. However, I also cringe when I watch people hunt and peck (badly!) in vim to resolve a merge conflict. (Have you seen the latest merge tool in IntelliJ? For 90% of users, it is a superior user experience.) To be fair, I have also watched real pros resolve merge conflicts in vim/emacs equally fast.

One thing you will find "disappointing" is the CPU & memory footprint of any modern IDE requires 1990s supercomputer resources. It is normal to see (multiple) enterprise-large projects take 1-10GB of RAM and 8-16 cores (for a few mins) to get fired up. (I am not resource constrained on my dev box, so I am willing to pay this tax.) However, after init, you can navigate the code quickly, and get good real-time static analysis feedback ("code sensing").

semicolonandson · on May 26, 2021

Vim has weapons-grade go to definition today using language server protocol, so multiple files is a non-issue for users running LSP.

codyb · on May 26, 2021

With ViM you can get decent results with a plugin that consumes the output from the ctags library.

It’s not perfect though and depending on how you have it set up you may have to manually trigger tag regeneration which can take a bit depending on deep into package files you set it to go.

bumby · on May 25, 2021

>Coding at scale is about managing complexity.

I would extend this one level higher to say managing complexity is about managing risk. Risk is usually what we really care about.

From the article:

>any one person's opinions about another person's opinions about "clean code" are necessarily highly subjective.

At some point CS as a profession has to find the right balance of art and science. There's room for both. Codifying certain standards is the domain of professions (in the truest sense of the word) and not art.

Software often likens itself to traditional engineering disciplines. Those traditional engineering disciplines manage risk through codified standards built through industry consensus. Somebody may build a pressure system that doesn't conform to standards. They don't get to say "well your idea of 'good' is just an opinion so it's subjective". By "professional" standards they have built something outside the acceptable risk envelope and, if it's a regulated engineering domain, they can't use it.

This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

908B64B197 · on May 25, 2021

A lot of "best practices" in engineering were established empirically, after root cause analysis of failures and successes. Software is more or less evolving along the same path (structured programming, OOP, higher-than-assembly languages, version control, documented ISAs).

Go back to earlier machines and each version had it's own assembly language and instruction set. Nobody would ever go back to that era.

OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer thanks to design patterns and abstractions dictated by a "Software Architect". We all know it to be false, and bordering on snake oil, but it still had some good ideas. Having a class encapsulate complexity and defining interfaces is neat. It forces to think in terms of abstractions and helps readability.

> This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

As more and more years pass, I'm less and less against a regulatory body. Would help with getting rid of snake oil salesman in the industry and limit offshoring to barely qualified coders. And simplify hiring too by having a known certification that tells you someone at least meets a certain bar.

javajosh · on May 25, 2021

Software is to alchemy what software engineering is to chemistry. Software engineering hasn't been invented yet. You need a systematizing scientific revolution (Kuhn style) before you can or should create a regulatory body to enforce it. Otherwise you're just enforcing your particular brand of alchemy.

bumby · on May 25, 2021

Well said. In the 1990s, in the aerospace software domain it was a once referred to an era of “cave drawings”

dragonwriter · on May 25, 2021

> OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer.

Not initially. Eventually, everything that reaches a certain minimal popularity in software development level gets pitched by snake-oil salesman to enterprise management as a solution to that problem, including things developed specifically to deal with the problem of othee solutions being cargo culted and repackaged that way, whether its a programming paradigm or a development methodology or metamethodology.

bumby · on May 25, 2021

>having a known certification that tells you someone at least meets a certain bar.

This was tried a few years back by creating a Professional Engineer licensure for software but it went away due to lack of demand. It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power. It also creates a large risk to the SWEs due to the lack of codified standards and the inherent difficulty in software testing. It's not like a mechanical engineer who can confidently claim a system is safe because it was built to ASME standards.

908B64B197 · on May 25, 2021

> It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power.

For any software purchase above a certain amount the government should be forced to have someone with some kind of license sign on the request. So many projects have doubled or tripled in price after it was discovered the initial spec didn't make any sense.

TeMPOraL · on May 27, 2021

I think that at this point, for the software made/maintained for the government, they should just hire and train software devs themselves.

From what I've seen, with a few exceptions, government software development always ends up with a bunch of subcontractors delivering bad software on purpose, because that's the way they can ensure repeat business. E.g., the reason Open Data movement didn't achieve much, why most public systems are barely integrated with each other, is because every vendor does its best to prevent that from happening.

It's a scam, but like other government procurement scams, it obeys the letter of the law, so nobody goes to jail for this.

throwaway2037 · on May 27, 2021

The development of mass transit (train lines) has a similar issue when comparing the United States to Western Europe, Korea, Japan, Taiwan, Singapore, or Hongkong. In the US, as much as possible is sub-contracted. In the others, a bit less, and there is more engineering expertise on the gov't payroll. There is a transit blogger who writes about this extensively... but his name eludes me. (Does anyone know it?)

Regarding contractors vs in-house software engineering talent, I have seen (from media) UK gov't (including NHS) has hired more and more talent to develop software in-house. No idea if UK folks think they are doing a good job, but it is a worthy experiment (versus all contractors).

bumby · on May 27, 2021

>should just hire and train software devs themselves

There are lots of people who advocate this but it’s hard to bring into fruition. One large hurdle is the legacy costs, particularly because it’s so hard to fire underperforming government employees. Another issue is that government salaries tend to not be very competitive by software industry standards so you’ll only get the best candidates if they happen to be intrinsically motivated by the mission. Third, software is almost always an enabling function that is often competing for resources with core functions. For example, if you run a government hospital and you can hire one person, you’re much more likely to prefer a healthcare worker hire than a software developer. One last, and maybe unfair point, is that the security of government positions tends to breed complacency. This often creates a lack of incentive to improve systems which results in a lot of legacy systems hobbling along past their usefulness.

I don’t think subcontractors build bad systems on purpose, but rather they build systems to bad requirements. A lot of times you have non-software people acting as program managers who are completely fine with software being a black box. They don’t particularly care about software as much as their domain of expertise and are unlikely to spend much time creating good software requirements. What I do think occurs is that contractors will deliberately under bid on bad retirements knowing they will make their profits on change orders. IMO, much of the cost overruns can be fixed by having well-written requirement specs

bumby · on May 25, 2021

Do you mean sign as in qualify that the software is "good"?

In general, they already have people who are supposed to be responsible for those estimates and decisions (project managers, contracting officers etc.) but whether or not they're actually held accountable is another matter. Having a license "might" ensure some modicum of domain expertise to prevent what you talk about but I have my doubts

908B64B197 · on May 25, 2021

> Do you mean sign as in qualify that the software is "good"?

We're not there yet. Just someone to review the final spec and see if it makes any sense at all.

Canonical example is the Canadian Phenix Payroll System. The spec described payroll rules that didn't make any sense. The project tripled in cost because they had to rewrite it almost completely.

> In general, they already have people who are supposed to be responsible for those estimates and decisions (project managers, contracting officers etc.) but whether or not they're actually held accountable is another matter.

For other projects, they must have an engineer's signature else nothing gets built. So someone does the final sanity check for the project managers-contracting officers-humanities-diploma bureaucrat. For software, none of that is required, despite the final bill being often as expensive as a bridge.

> Having a license "might" ensure some modicum of domain expertise to prevent what you talk about but I have my doubts

Can't be worse than none at all.

mattkrause · on May 26, 2021

Annoyingly, the government already sorta does this: many federal jobs, as well as the patent bar, require an ABET-accredited degree.

The catch is that many prominent CS programs don’t care about ABET: DeVry is certified, but CMU and Stanford are not, so it’s not clear to me that this really captures “top talent.”

bumby · on May 26, 2021

I suspect this is because HR and probably even side hiring managers cannot distinguish between the quality of curriculums. One of the problems with CS is the wide variance in programs...some require calculus through differential equations and some don’t require any calculus whatsoever. Sob it’s easier to just require an ABET degree. Similar occurs with Engineering Technology degrees, even if they are ABET accredited.

To your point, it unfortunately and ironically locks out many CS majors for computer science positions.

908B64B197 · on May 26, 2021

> I suspect this is because HR and probably even side hiring managers cannot distinguish between the quality of curriculums.

Part of the reason for that is they likely haven't even been exposed to graduates of good computer science curriculums.

bumby · on May 26, 2021

In what sense do you think they haven't been exposed? As in, they've never seen their resumes? Or they've never worked with them?

I think it's an misalignment of incentives in most cases. HR seems to care very little once someone is past the hiring gate. So they would have to spend the time to understand the curriculum distinctions, probably change their grading processes, etc. It's just much easier for them to apply a lazy heuristic like "must have an ABET accredited degree" because they really don't have to deal much with the consequences months and years after the hire. In some cases, they even overrule the hiring manager's initial selection.

carlmr · on May 25, 2021

>the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

The problem I see with this is that programming could be described as a kind of general problem solving. Other engineering disciplines standardize methods that are far more specific, e.g. how to tighten screws.

It's hard to come up with specific rules for general problems though. Algorithms are just solution descriptions in a language the computer and your colleagues can understand.

When we look at specific domains, e.g. finance and accounting software, we see industry standards have already emerged, like dealing with fixed point numbers instead of floating point to make calculation errors predictable.

If we now start codifying general software engineering, I'm worried we will just codify subjective opinions about general problem solving. And that will stop any kind of improvement.

Instead we have to accept that our discipline is different from the others, and more of a design or craft discipline.

bumby · on May 25, 2021

>kind of general problem solving

Could you elaborate on this distinction? At the superficial level, "general problem solving" is exactly how I describe engineering in general. The example of tightening screws is just a specific example of a fastening problem. In that context, codified standards are an industry consensus on how to solve a specific problem. Most people wrenching on their cars are not following ASME torque guidelines but somebody building a spacecraft should be. It helps define the distinction of a professional build for a specific system. Fastening is the "general problem"; fastening certain materials for certain components in certain environments is the specific problem that the standards uniquely address.

For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."

>I'm worried we will just codify subjective opinions about general problem solving.

Ironically, this is the same attitude in many circles of traditional engineering. People who don't want adhere to industry standards have their own subjective ideas about should solve the problem. Standards aren't always right, but it creates a starting point to 1) identify a risk and 2) find an acceptable way to mitigate it.

>Instead we have to accept that our discipline is different from the others

I strongly disagree with this and I've seen this sentiment used (along with "it's just software") to justify all kinds of bad design choices.

carlmr · on May 26, 2021

>For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."

Then you start having discussions about every algorithm being used on collections of 10 or 100 elements, it doesn't really matter to the problem to be solved. Instead the language's built in sort functionality will probably do here and increase readability, because you know what's meant.

Profiling and replacing the algorithms that matter is much more efficient than looking at each usage.

Which again brings us back to the general vs specific issue. In general this won't matter, but if you're in a real-time embedded system you will need algorithms that don't allocate with known worst case execution times. But here again, at least for the systems that matter, we have specific rules.

bumby · on May 26, 2021

>Profiling and replacing the algorithms that matter is much more efficient than looking at each usage.

I think this speaks to my point. If you are deciding which algorithms suffice, you are creating standards to be followed just as with other engineering disciplines.

>Then you start having discussions about every algorithm being used on collections of 10 or 100 elements, it doesn't really matter to the problem to be solved

If you’re claiming it didn’t matter on the specific problem, then you’re essentially saying it’s not risk-based. The problem here is you will tend to over-constrain design alternatives regardless if it decreases risk or not. My experience is people will strongly resist this strategy as it gets interpreted as mindlessly draconian.

FWIW, examining specific use cases is exactly what’s done in critical applications (software as well as other domains). Hazard analysis, fault-tree analysis, and failure-modes effect analysis are all tools to examine specific use cases in a risk-specific context.

>But here again, at least for the systems that matter, we have specific rules.

I think we’re making the save point. Standards do exactly this. That’s why in other disciplines there are required standards in some use cases and not others (see my previous comment contrasting aerospace to less risky applications)

0xdeadbeefbabe · on May 25, 2021

> At some point CS as a profession has to find the right balance of art and science.

That seems like such a hard problem. Why not tackle a simpler one?

bumby · on May 25, 2021

I didn’t downvote but I’ll weigh in on why I disagree.

The glib answer is “because it’s worth it.” As software interfaces with more and more of our lives, managing the risks becomes increasingly important.

Imagine if I transported you back 150 years to when the industrial revolution and steam power were just starting to take hold. At that time there were no consensus standards about what makes a mechanical system “good”; it was much more art than science. The numbers of mishaps and the reliability reflected this. However, as our knowledge grew we not only learned about what latent risks were posed by, say, a boiler in your home but we also began to define what is an acceptable design risk. There’s still art involved, but the science we learned (and continue to learn) provides the guardrails. The Wild West of design practice is no longer acceptable due to the risk it incurs.

yowlingcat · on May 25, 2021

I imagine that's part of why different programming languages exist -- IE you have slightly less footguns with Java than with C++.

The problem is, the nature of writing software intrinsically requires a balance of art and science no matter what language it is. That is because solving business problems is a blend of art and science.

It's a noble aim to try and avoid solving unnecessarily hard problems, but when it comes to the customer, a certain amount of it gets incompressible. So you can't avoid it.

fouric · on May 25, 2021

Yes, coding at scale is about managing complexity. No, "Keeping methods short" is not a good way to manage complexity, because...

> then mentally model the entire graph of interactions at once

...partially applies even if you have well-named functional boundaries. You said it yourself:

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow.

Programs have a certain essential complexity. Making a function "simpler" means making it less complex, which means that that complexity has to go somewhere else. If you make all of your functions simple, then you simply need more functions to represent the same program, which increases the total number of possible interactions between nodes and therefore the cognitive load of understanding the whole graph/program.

Allowing more complexity in your functions makes them individually harder to understand, but reduces the total number of functions needed and therefore makes the entire program more comprehensible.

Also note that just because a function's implementation is complex doesn't mean that its interface also has to be complex.

And, functions with complex implementations are only themselves difficult to understand - functions with complex interfaces make the whole system more difficult to understand.

TheOtherHobbes · on May 25, 2021

This is where Occam's Razor applies - do not multiply entities unnecessarily.

Having hundreds or thousands of simple functions is the opposite of this advice.

You can also consider this in more scientific terms.

Code is a mental model of a set of operations. The best possible model has as few moving parts as possible, there are as few connections between the parts as possible, each part is as simple as possible, and both the parts and the connections between them are as intuitively obvious as possible.

Making parts as simple as possible is just one design goal, and not a very satisfactory or useful one in its own terms.

All of this turns out to be incredibly hard, and is a literal IQ test. Mediocre developers will always, always create overcomplicated solutions. Top developers have a magical ability to combine a 10,000 foot overview with ground level detail, and will tear through complex problems and reduce them to elegant simplicity.

IMO we should spend less time teaching algorithms and testing algorithmic specifics, and more on analysing complex systems and implementing them with minimal, elegant, intuitive models.

codyb · on May 26, 2021

Lately I’ve found decoupling to be helpful in this regard.

This is an auth layer, it’s primary charge is ensure those receiving and modifying resources have the permissions to do so.

This is the data storage layer. It’s focused on clean, relatively generic data storage abstractions and models that are relatively unopinionated, and flexible.

This is the contract layer. It’s more concerned with combining the apis of the data and auth than it is with data transformation or business logic.

This is the business logic layer. It takes relatively abstract data from our API and performs transformations to massage it into shapes that fit the needs of our customers and the mental models we’ve created around those requirements.

Etc. Etc.

Of course this pragmatic decoupling is easier said than done, but the logical grouping of like concerns allows for discoverability, flexibility, and a generally clear demarcation of concerns.

TeMPOraL · on May 27, 2021

I've also been gravitating towards this kind of component categorization, but then there's the ugly problem of "cross-cutting concerns". For instance:

- The auth layer may have an opinion on how half of the other modules should work. Security is notoriously hard to isolate into a module that can be composed with others.

- Diagnostics layer - logging, profiling, error reporting, debugging - wants to have free access to everything, and is constantly trying to pollute all the clean interfaces and beautiful abstractions you design in other layers.

- User interface - UI design is fundamentally about creating a completely separate mental model of the problem being solved. To make a full program, you have to map the UI conceptualization to the "backend" conceptualization. That process has a nasty tendency of screwing with every single module of the program.

I'm starting to think about software as a much higher-dimensional problem. In Liu Cixin's "The Three Body Problem" trilogy, there's a part[0] where a deadly device encased in impenetrable unobtanium[1] is neutered by an attack from a higher dimension. While the unobtanium shell completely protects the fragile internals in 3D space, in 4D space, both the shell and the internals lie bare, unwound, every point visible and accessible simultaneously[2].

This is how I feel about building software systems. Our abstractions are too flat. I'd like to have a couple more dimensions available, to compose them together. Couple more angles from which to view the source code. But our tooling is not there. Aspect-oriented programming moved in that direction a bit, but last I checked, it wasn't good enough.

--

[0] - IIRC it's in the second book, "The Dark Forest".

[1] - It makes more sense in the book, but I'm trying to spoiler-proof my description.

[2] - Or, going down a dimension, for flat people living on a piece of paper, a circle is an impenetrable barrier. But when we look at that piece of paper, we can see what's inside the circle.

codyb · on May 28, 2021

Neat, that's some heady shit. I'll have to check Aspect oriented programming out.

It's a bit of work but I've been thinking the concept of interchange logic is a neat idea for cross layer concerns.

So for instance, I design my UI to exist in some fashion (I've been thinking contexts are actually a decent way to implement this model cause then you can swap them in and out in order to use them in different manners...)

So say, I've got some component which exists in the ForumContext, and it needs all the data to display for the forum.

So I build a ForumContext provider which is an interchange layer between my ForumApi and my ForumUI.

Then if it turns out I want to swap out the Api with another, all I have to do is create a new ForumContext provider which provides the same shape of data, and the User Interface doesn't need to change.

Alternatively if I need to shape the data in a new fashion, all I need to do is update my ForumContext provider to reshape the API data and I don't need to muss with the API at all (unless of course, I need new data in which case, yea of course).

It's not perfect, and React's docs seem to warn against use of contexts but I think you could make a decent architecture out of them potentially. And they can be a lot less boiler plate than a similar redux store by using the state hooks React provides.

I still have to build out some sort of proof of concept of my idea, it's essentially connected component trees again. But when half the components in my library are connected to the API directly you just end up with such a mess any time you need to either repurpose a component for any other use or switch a section of your app over to a new data store or api.

At the end of the day, it seems like no matter how hard you try, it's really just about finding the best worst solution ;-).

And yea, security is a doozy in general. I've been working on decoupling our permissions logic a bit lately since it's couple between records, permissions, and other shit at the moment. Leaves a lot of room for holes.

hackinthebochs · on May 25, 2021

>If you make all of your functions simple, then you simply need more functions to represent the same program

The semantics of the language and the structure of the code help hide irrelevant functional units from the global namespace. Methods attached to an object only need to be considered when operating on some object, for example. Private methods do not pollute the global namespace nor do they need to be present in any mental model of the application unless it is relevant to the context.

While I do think you can go too far with adding functions for its own sake, I don't see that they add to the cognitive load in the same way that possible interactions within a functional unit does. If you're just polluting a global namespace with functions and tiny objects, then that does similarly increase cognitive load and should be avoided.

cutemonster · on May 26, 2021

> No, "Keeping methods short" is not a good way to manage complexity

Agreed

> Allowing more complexity in your functions makes them individually harder to understand

I think that that can mostly be avoided, by sometime creating local scopes {..} to avoid too much state inside a function, combined with whitespace and some section "header" comments (instead of what would have been sub function names).

Can be quite readable I think. And nice to not have to jump back and forth between myriads of files and functions

jghn · on May 25, 2021

I have found this to be one of those A or B developer personas that are hard for someone to change, and causes much disagreement. I personally agree 100%, but have known other people who couldn't disagree more, it is what it is.

I've always felt it had a strong correlation to top-down vs bottom-up thinkers in terms of software design. The top-down folks tend to agree with your stance and the bottom-up group do not. If you're naturally going to want to understand all of the nitty gritty details you want to be able to wrap your head around those as quickly as possible. If you're willing to think in terms of the abstractions you want to remove as many of those details from sight as possible to reduce visual noise.

Joker_vD · on May 25, 2021

I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.

Have you ever seen a codebase with infrastructure and piping taking about 70% of the code, with tiny pieces of business logic thrown here and there? It's impossible to figure out where the actual job is being done (and what it actually is): all you can see is just an endless chain of methods that mostly just delegate the responsibility further and further. What could've been a 100-line loop of "foreach item in worklist, do A, B, C" kind is instead split over seven tightly cooperating classes that devote 45% of their code to multiplexing/load-balancing/messaging/job-spooling/etc, another 45% to building trivial auxiliary structure and instantiating each other, and only 10% actually devoted to the actual data processing, but good luck finding those 10%, because there is a never-ending chain of calling each other: A.do_work() calls B.process_item() which calls A.on_item_processing() which calls B.on_processed()... wait, shouldn't there been some work done between "on_item_processing" and "on_processed"? Yes, it was done by an inconspicuously named "prepare_next_worklist_item" function.

Ah, and the icing on the cake: looping is actually done from the very bottom of this call chain by doing a recursive call to the top-most method which at this point is about 20 layers above the current stack frame. Just so you can walk down this path again, now with the feeling.

josephg · on May 25, 2021

Your comment gives me emotional flashbacks. Years ago I took Java off my resume, because I don’t want to ever interact with this sort of thing again. (I’m sure it exists in other languages, but I’ve never seen it quite as bad as in Java.)

I think the best “clean code” programming advice is the advice writers have been saying for centuries. Find your voice. Be direct and be brief. But not too brief. Programming is a form of expression. Step 1 is to figure out what you’re trying to say (eg the business logic). Then say it in its most natural form (switch statements? If-else chain? Whatever). Then write the simplest scaffold around it you can so it gets called with the data it needs.

The 0th step is stepping away from your computer and naming what you want your program to express in the first place. I like to go for walks. Clear code is an expression of clear thoughts. You’ll usually know when you’ve found it because it will seem obvious. “Oh yeah, this code is just X. Now I just have to type it up.”

hackinthebochs · on May 26, 2021

>I wish there was an "auto-flattener"/"auto-inliner" tool

I'm as big an advocate of "top-down" design as anyone, and I have also wished for such a tool. When you just want to know "what behavior comes next", all the abstractions do get in the way. The IDE should be able to "flatten" the execution path from current context and give you a linear view of the code. Sort of like a trace of a debug session, but generated on-the-fly. But still, I don't think this is the best way to write code.

vikiomega9 · on May 25, 2021

Most editors have code folding. I've noticed this helps when there are comments or it's easy to figure out the branching or what not.

However, what you're asking for is a design style that's hard to implement I think without language tooling (for example identifying effectful methods).

TeMPOraL · on May 27, 2021

GP is asking for the opposite. They're asking for code unfolding.

That is, given a "clean code like":

  auto DoTheThing(Stuff stuff) -> Result {
    const auto foo = ProcessSth(stuff);
    const auto bar = ValidateSthElse(stuff);

    return DoSth(foo, bar);
  }

The tool would inline all the function calls. That is, for each of ProcessSth(), ValidateSthElse() and DoSth(), it would automatically perform the task of "copy the function body, paste it at the call site, and massage the caller to make it work". It's sometimes called the "inline function" refactoring - the inverse of "extract function"/"extract method" refactoring.

I'd really, really want such a tool. Particularly one where the changes were transient - not modifying the source code, just overlaying it with a read-only replacement. Also interactive. My example session is:

- Take the "clean code" function that just calls a bunch of other functions. With one key combination, inline all these functions.

- In the read-only inlined overlay, mark some other function calls and inline them too.

- Rinse, repeat, until I can read the overlay top-to-bottom and understand what the code is actually doing.

emilprogviz · on May 27, 2021

Signed up just to say that I've also really, really wanted such a tool since forever. While for example the Jetbrains IntelliJ family of editors has the automatic "inline function" refactoring, they do it by permanently modifying the source code, which is not quite what we want. Like you say, it should be transient!

So I recently made a quick&dirty interactive mock-up of how such an editor feature could look. The mockup is just a web page with javascript and html canvas, so it's easy to try here: https://emilprogviz.com/expand-calls/?inline,substitution (Not mobile friendly, best seen on desktop)

There are 2 different ways to show the inlining. You can choose between them if you click the cogwheel icon.

Then I learned that the Smalltalk editor Pharo already has a similar feature, demonstrated at https://youtu.be/baxtyeFVn3w?t=1803 I wish other editors would steal this idea. Microsoft, are you listening?

My mock-up also shows an idea to improve code folding. When folding / collapsing a large block of code, the editor could show a quick summary of the block. The summary could be similar to a function signature, with arguments and return values.

TeMPOraL · on May 27, 2021

Thank you! I'm favoriting this comment. This is exactly what I was thinking about (+/- some polish)!

In particular, the SieveOfErastothenes() call, which I can inline, and inside the overlay, I can inline the call to MarkMultiples(), and the top-level variable name `limit` is threaded all the way down.

Please don't take that demo site down, or publish it somewhere persistent - I'd love to show it around to people as the demonstration of the tool I'm looking for.

> When folding / collapsing a large block of code, the editor could show a quick summary of the block.

I love how you did this! It hasn't even occurred to me, but now that I saw it, I want to have this too! I also like how you're trying to guess which branches in a conditional won't be taken, and diminish them visually.

EDIT: Also, welcome to HN! :).

emilprogviz · on May 27, 2021

> Please don't take that demo site down, or publish it somewhere persistent

Feel free to spread the URL around, I plan to keep it online for the rest of my life, or until the feature is available in popular editors - whichever comes first. And if someone wants to mirror the demo elsewhere, it should be easy to do so, since it's client-side only and MIT licensed.

> Also, welcome to HN! :)

Thanks! Been lurking here in read-only mode for years, but today I finally had something to contribute.

TeMPOraL · on May 27, 2021

I just finished binge-watching all five of your videos on better programming tools, and I must say, it just blew my mind. Thank you for making them.

I've been maintaining my own notes on the kind of tools I'd like to have, with hopes to maybe implement them one day, and your videos covered more than half of my list, while also showing tons of brilliant ideas that never occurred to me. I'm very happy to see that the pain points I identified in my programming work aren't just my imagination.

Also, on a more abstract level, I love your approach to programming dilemmas, and it's the first time I saw it articulated explicitly: when there are two strong, conflicting views, you do a pros/cons analysis on both, and try to find a new approach that captures all the benefits, while addressing all the drawbacks.

I've sent you an e-mail a while ago, let me know if it got through :). I'll be happy to provide all kinds of feedback on the ideas you described in your videos, and I'd love to bounce the remaining part of my list off you, if you're interested :).

> today I finally had something to contribute

That's a first-class contribution. I think you should post the link to your site as a HN submission, using title "Show HN: Ideas for better programming tools" ("Show HN" being a marker that you're submitting your own work).

emilprogviz · on May 29, 2021

Wow, thanks, I'm really happy you liked my videos so much! I wonder how many of us have great tool ideas in private notes sitting on our hard drives, not really sharing them with others. I'm glad your ideas overlap with mine, because the more people have the same idea, the more likely it is to be a good one, I think.

> when there are two strong, conflicting views, you do a pros/cons analysis on both

Yeah, it's not easy... I've participated in endless, looping debates as much as anyone - I guess it's just human psychology. But with enough conscious effort, I find that it's sometimes possible to take a step back, take a fair look at both sides, and design a best-of-both-worlds solution. I'll apply this method again in future videos, and if I can inspire a few more people to use it, that's great. Making the programming world a tiny bit less "tribal" and a bit more constructive.

> I've sent you an e-mail a while ago

Yeah, let's continue our discussion over email. I replied to your email from my private address, let me know if it got through.

> That's a first-class contribution. I think you should post the link to your site as a HN submission

"First-class contribution" gave me tears of joy :) I'd like to "Show HN" in a few months. Once I post there, I might get a lot of comments, and I want to be available to answer the comments and make follow-up videos quickly, but currently my personal life is too busy.

TeMPOraL · on May 29, 2021

> I wonder how many of us have great tool ideas in private notes sitting on our hard drives, not really sharing them with others.

From talking to others, as well as spending way too much time on HN, I think the answer is, "quite a lot". Perhaps not relative to the number of programmers, but in absolute terms, I'm pretty sure there's a hundred strong ideas to be found among just the people who comment here.

I do feel that our industry has an implicit bias against those ideas - I think it's a combination of, if you complain you get labeled as whiny, and working on speculative tooling is considered time spent not providing business value.

> let me know if it got through.

Yeah, I got it, thanks! I'm desperately trying to trim down my draft reply, because I somehow managed to write a short article when describing two of my most recent ideas :).

> I'd like to "Show HN" in a few months.

Sure, take your time :). But I think people will love what you're already have. It's not just the ideas you're presenting, but also a kind of "impression of quality" your videos give.

vikiomega9 · on May 27, 2021

Great mock up, this is pretty interesting. Food for thought!

vikiomega9 · on May 27, 2021

I'm curious to understand your use-case, would be open to explaining more?

Do you actually want to overlay the code directly into the parent method or would a tooltip (similar to hyperlink previews) work? I wondering how expanding the real estate space would help with readability and how the userflow would work.

For example, code folding made a lot more sense because the window would have those little boxes to fold unfold (which is basically similar to the act of inline and un-inline).

TeMPOraL · on May 27, 2021

Yes, I want to overlay the code directly into the parent method, preferably with appropriate syntax highlighting and whatever other goodies the IDE/editor provides normally. It would be read-only to indicate that it's just a transient overlay, and not an actual code change.

So, if I have a code like:

  auto Foo(Bar b) {
    return b.knob();
  }

  auto Frob(Frobbable f) {
    auto q = Foo(f.thing());
    return q.quux(f.otherthing());
  }

  auto DoSth(Frobbable frobbie) {
    auto a = Frob(frobbie);
    return a.magic();
  }

Then I want to mark the last function, and automatically turn it into:

  auto DoSth(Frobbable frobbie) {
    auto foo_1 = frobbie.thing();
    auto q_1 = foo_1.knob();
    auto frob_1 = frobbie.otherthing();
    auto a = q_1.quux(frob_1);
    return a.magic(); 
  }

Or something equivalent, possibly with highlights/synthetic comments telling me which bits of code came from where. I want to be able to keep inlining function calls like this, until I hit a boundary layer like the standard library, or a third-party library. I might want to expand past that, but I don't think I'd do that much. I'd also like to be able to re-fold code I'm not interested in, to reduce noise.

What such tool would do is automating the process I'm currently doing manually - jumping around the tiny functions calling other tiny functions, in order to reassemble the actual sequence of lower-level operations.

I don't want this to be a tooltip, because I want to keep expanding past the first level, and have the overlay stay in place until I'm done with it.

EDIT: languages in the Lisp family - like Common Lisp or Emacs Lisp - feature a tool called "macroexpander". Emacs/SLIME wraps it into an interactive "macrostepper" that behaves pretty much exactly like the tool I described in this discussion thread.

EDIT2: See the excellent demo upthread by ' emilprogviz - https://news.ycombinator.com/item?id=27306118. That's the kind of tool I had in mind.

vikiomega9 · on May 27, 2021

yes excellent mock, I see what you mean.

How would you deal with multiple levels of nesting? :) Let's say you're at level 5 which is pretty reasonable.

Oh and I also forgot about languages like Java that are heavy on interfaces and DI. That would be interesting to handle.

ambicapter · on May 25, 2021

> I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.

Learn to read assembly and knock yourself out.

Jtsummers · on May 25, 2021

That's not a very helpful response. Unless the code is compiled to native machine code and is all inlined, this won't help one bit.

AtlasBarfed · on May 25, 2021

On today's HN with this thread is "the hole in mathematics".

It is directly germane to what you are talking about.

In the process of formalizing axiomatic math, 1+1=2 took 700 pages in a book to formally prove.

The point about assembly is more or less correct. The process of de-abstracting is going to be long and probably not that clear in the end.

I understand what you mean: the assembly commenter is correct, you'll need to actually execute the program and reduce it to a series of instructions it actually performed.

Which is either an actual assembly, or a pseudo-assembly instruction stream for the underlying turing machine: your computer.

Joker_vD · on May 26, 2021

I really need you to introduce you to Jester, my toy functional programming language. It compiles down to pure lambda calculus (data structures are implemented with Scott-Mogensen encoding) and then down to C that uses nothing but function calls and assignments of pointers to struct fields. The logic and arithmetic are all implemented in the standard library: a Bool is a function that takes 2 continuations, a Byte is 8 Bool, an Int is 4 Byte, addition uses the good old ripple-carry algorithm, etc.

Reading the disassembly of the resulting program is pretty unhelpful: any function consists entirely of putting values from the fields of the passed-in structures into the fields of new structures and (tail)calling another function and passing it some mix of old/new structures.

nytgop77 · on May 25, 2021

Maybe not helpfull, but it made me smile :-)

ska · on May 25, 2021

While I think you are onto something about top-down vs. bottom-up thinkers, one of the issues with a large codebase is literally nobody can do the whole thing bottom-up. So you need some reasonable conventions and abstraction, or the whole thing falls apart under it's own weight.

jghn · on May 25, 2021

Yep, absolutely.

That's another aspect of my grand unifying theory of developers. Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service. How one perceives complexity, what causes one to complain about complexity, etc all vary based on these things. It's easy to arrive in circumstances where people are arguing past each other.

If you need to be able to keep all the details in your head you're going to need smaller codebases. Similar, if you're already keeping track of everything, things like static typing become less important to you. And the opposite is true.

TeMPOraL · on May 27, 2021

> Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service.

Your theory needs to account for progression over time. For example, the first programming languages I've learned were C++ and Java, so I believed in static typing. Then I worked a lot in PHP, Erlang and Lisp, and became a dynamic typing proponent. Later on, with much more experience behind me, I became a static typing fan again - to the point that my Common Lisp code is thoroughly typed (to the point of being non-idiomatic), and I wish C++ type system was more expressive.

Curiously, at every point of this journey, I was really sure I have it all figured out, and the kind of typing I like is the best way to manage complexity.

--

EDIT: your hypothesis about correlated "frames of mind" reminds me of a discussion I had with 'adnzzzzZ here, who also claimed something similar, but broader: https://news.ycombinator.com/item?id=26076639. The topic started as, roughly, whether people designing addictive games using gambling mechanics are devil incarnate (my view) or good people servicing a different target audience than me (their view), but the overarching theory 'adnzzzzZ presented in https://github.com/a327ex/blog/issues/66 also touched on static/dynamic typing debate.

throwaway2037 · on May 27, 2021

My programming path is similar to yours! Started with C++ then moved into Perl. Then realised that uber-dynamic-typing in Perl was a death-trap in enterprise software. Then oddly found satisfaction in Excel/VBA because you can write more strictly-typed code in VBA (there is also a dynamic side) and even safely call the Win32 API directly. Finally, I came back to C++ and Java which are "good enough" for expressing the static types that I need. The tooling and open-source ecosystem in Java makes it very hard to be more productive in other languages (except maybe C#, but they are in the same language family). I'm a role now that also has some Python. While the syntactical sugar is like written prose, the weaker typing (than C++/Java) is brutal in larger projects. Unless people are fastidious about type annotations, I constantly struggle to reason about the code while (second-)guessing about types.

You wrote: <<I wish C++ type system was more expressive.>> Can you share an idea? For example: Java 17 (due for release in the fall) will feature sealed classes. This looks very cool. For years, I (accidentally) simulated this behaviour using enums tied to instances or types.

danShumway · on May 25, 2021

Huh. There's something to this.

I've often wondered why certain people feel so attached to static typing when in my experience it's rarely the primary source of bugs in any of the codebases I work with.

But it's true, I do generally feel like a codebase that's so complex or fractured that no one can understand any sizable chunk of it is just already going to be a disaster regardless of what kind of typing it uses. I don't hate microservices, they're often the right decision, but I feel they're almost always more complicated than a monolith would be. And I do regularly end up just reading implementation code, even in 3rd-party libraries that I use. In fact in some libraries, sometimes reading the source is quicker and more reliable than trying to find the relevant documentation.

I wouldn't extrapolate too much based on that, but it's interesting to hear someone make those connections.

tharkun__ · on May 26, 2021

I'll add my voice to your parent.

Statically typed languages and languages that force you to be explicit are awesome for going into a codebase you have never seen and understanding things. You can literally just let your IDE show you everything. All questions you have are just one Ctrl-click away and if proper abstraction (ala Clean Code) has been used you can ignore large swaths of code entirely and just look at what you need. Naming is awesome and my current and previous code bases were both really good in this (both were/are mixes of monolith and microservices). I never really care where a file is located. I know quite a few coders that will want to find things via the folder tree. I just use the keyboard shortcut to open by name and start guessing. Usually first or second guess finds what I need because things are named well and consistently.

Because we use proper abstractions I can usually see at first glance what the overall logic is. If I need to know how a specific part works in detail I can easily drill down via Ctrl-click. With a large inlined blob of code I would have a really hard time. Do I skip from line 1356 to 1781 or is that too far? Oh this is JavaScript and I don't even know if this variable here is a string or a number or both depending on where in the code we are or maybe it's an object that's used as a map?

The whole thing is too big to keep in my head all the time and I will probably not need to touch the same piece of code over and over and instead I will move from one corner to the next and again to another corner over the course of a few weeks to months.

That's why our Frontend code is being converted to TypeScript and our naming (and other) conventions make even our javascript code bearable.

throwaway2037 · on May 27, 2021

Is your backend Java or C#? Your IDE description feels like Java w/ Eclipse or IntelliJ or C# w/ Visual Studio. I have similar experience to you. The "discoverability" of a large codebase is greatly increased by combining language with tooling. If you use Java with Maven-like dependency management (you can use Gradle these days if 'alergic' to Maven's pom.xml), the IDE will usually automatically download and "hook-up" source code. It is ridiculous how fast you can move between layers of (a) project code, (b) in-house libraries, (c) open source libraries, and (d) commercial closed-source libraries (decompile on the fly in 2021!). (I assume all the same can be done for C# w/ Visual Studio.)

To be fair, when I started my career, I worked on a massive C project that was pretty easy to navigate because it was a mono-repo with everything in one place. CTags could index 99% of what you needed, and the macros weren't out of control. (Part of the project was also C++, but written in the style of career C programmers who only wanted namespaces and trivial generics like vector and map! Again, very simple to navigate huge codebase.)

I'm still surprised in 2021 when someone asks me to move a Java class to a different package during a code review. My inner monologue says: "Really... do they still use a file browser? Just use the IDE to find it!"

CRConrad · on May 26, 2021

> I've often wondered why certain people feel so attached to static typing when in my experience it's rarely the primary source of bugs in any of the codebases I work with.

That's precisely why people are attached to it; because it's rarely a source of bugs. :-)

danShumway · on May 26, 2021

Ha! Good catch. :)

randomswede · on May 26, 2021

[ separate answer for microservices ]

Yeah, monoliths are frequently easier to reason about, simply because you have fewer entities. The big win of microservices (IMHO) isn't "reason about", it is that they are a good way of getting more performance out of your total system IFF various parts of the system have different scaling characteristics.

If your monolith is composed of a bunch of things, where most things require resources (CPU/RAM/time) on an O(n) (for n being the number of active requests), but one or a few parts may be O(n log n). Or be O(n), but with a higher constant...

Then, those "uses more resources" is the limit of scaling for each instance of the monolith, and you need to deploy more monoliths to cope with a larger load.

On the other hand, in a microservice architecture, you can deply more instances of just the microservices that need it. This can, in total, lead to more thinsg being done, with in total less resources.

But, that also requires you to have your microservices cut out in suitable sizes, which requires you to at one point have understood the system well enough to cut them apart.

And that, in turn, may lead to better barriers between microservices, meaning that each microservice MAY be easier to understand in isolation.

dragonwriter · on May 26, 2021

> But, that also requires you to have your microservices cut out in suitable sizes, which requires you to at one point have understood the system well enough to cut them apart.

Sure, but that’s not particularly hard; it’s been basic system analysis since before “microservices” or even “service-oriented architecture” was a thing. Basic 70s-era Yourdon-style structured analysis (which, while its not the 1970s approach, can be applied incrementally in a story-by-story agile fashion to build up a system as well as doing either big upfront design or working from the physical design to the logical requirements of an existing system) produces pretty much exactly what you need to determine service boundaries.

(It’s also a process that very heavily leverages locality of knowledge within processes and flows, so its quite straightforward to carry out without ever having to hold the whole system in your head.)

randomswede · on May 26, 2021

Yep, there's no real magic here. There's some understanding forced by a (successful) transition to microservices ,but a transition to microservices is not a requirement for said gained insight.

And if all parts of your system scale identically, it may be better to scale it by replicating monoliths.

Another POSSIBLE win is if you start having multiple systems, sharing the same component (say, authentication and/or authorization), at which point there's something to be said for breaking at least that bit out of every monolith and putting them in a single place.

randomswede · on May 26, 2021

I don't really care about the static/dynamic typing spectrum, I care about the strong/weak typing spectrum.

At any point, will the code interpret a data item according to the type it was created with?

A prime example of "weakly typed" is when you can add "12" and 34 to get either "1234" or 46.

throwaway2037 · on May 27, 2021

This is an interesting distinction. I confess that I frequently interchange the pairs.

randomswede · on May 28, 2021

I mean, to some respect, "dynamic typing" is "type the data" and "static typing" is "type the variable".

In both cases, there's the possibility for doing type propagation. But, if you somehow manage to pass in two floats to an addition that a C compiler thinks is an integer addition, you WILL have a bad day. Whereas in Common Lisp, the actual passed-in values are typed (for floats, usually boxed, for integers, if they're fixnums, usually tagged and having a few bits less than you would expect).

Chris_Newton · on May 25, 2021

I’m reminded of an earlier HN discussion about an article called The Wrong Abstraction, where I argued¹ that abstractions have both a benefit and a cost and that their ratio may change as a program evolves and which of those “nitty gritty details” are immediately relevant and which can helpfully be hidden behind abstractions changes.

¹ https://news.ycombinator.com/item?id=23742118

dcolkitt · on May 25, 2021

The point is that bottom-up code is a siren song. It never scales. It makes it a lot easier to get started, but given enough complexity it inevitably breaks down.

Once your codebase gets to somewhere around the 10,000 line mark, it becomes impossible for a single mind to hold the entire program in their head at a single time. The only way to survive past that point is with carefully thought out, water tight layers of abstractions. That almost never happens with bottom-up. Bottom-up is a lot like natural selection. You get a lot of kludges that work great to solve their immediate problem, but behave in undefined and unpredictable ways when you extend them outside their original environment.

Bottom-up can work when you're inside well-encapsulated modular components with bounded scope and size. But there's no way to keep those modules loosely coupled unless you have a elegant top-down architecture imposing order at the large-scale structure.

danShumway · on May 25, 2021

But the reverse is also true. Top-down programming doesn't really work well for smaller programs, it definitely doesn't work well when you're dealing with small, highly performance-critical or complex tasks.

So sure, I'll grant that when your program reaches the 10,000 line mark, you need to have some serious abstractions. I'll even give you that you might need to start abstracting things when a file reaches 1,000 lines.

But when we start talking about the rule of 30 -- that's not managing complexity, that's alphabetizing a sock drawer and sewing little permanent labels on each sock. That approach also doesn't scale to large programs because it makes rewrites and refactors into hell, and it makes new features extremely cumbersome to quickly iterate on. Your 10,000 line program becomes 20,000 lines because you're throwing interfaces and boilerplate all over the place.

Note that this isn't theoretical, I have worked in programs that did everything from building an abstraction layer over the database in case we wanted to use Mongo and SQL at the same time (we didn't), to having a dependency management system in place that meant we had to edit 5 files every time we wanted to add a new class, to having a page lifecycle framework that was so complicated that half of our internal support requests were trying to figure out when it was safe to start adding customer data to the page.

The benefit of a good, long, single-purpose function that contains all of its logic in one place is that you know exactly what the dependendencies are, you know exactly what the function is doing, you know that no one else is calling into the inlined logic that you're editing, and you can easily move that code around and change it without worrying about updating names or changing interfaces.

Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function. It's fine to have a function that's longer than a couple hundred lines. If you're building something like a rendering or update loop, in many cases I would say it's preferable.

tharkun__ · on May 26, 2021

It's funny how these things are literally what the Clean Code book advocates for. Sure there is mention of a lot of stuff that's no longer needed and was a band aid over language deficiencies of a particular language. But the ideas are timeless and I used them before I even knew the book and I used them in Perl.

danShumway · on May 26, 2021

> these things are literally what the Clean Code book advocates for

I'm not sure I understand what you're saying, I might be missing your point. The Clean Code book advocates that the ideal function is a single digit number of lines, double digits at the absolute most.

In my mind, the entire process of writing functions that short involves abstracting almost everything your code does. It involves passing data around all over the place and attaching state to objects that get constructed over multiple methods.

How do you create a low-abstraction, bottom-up codebase when every coroutine you need to write is getting turned into dozens of separate functions? I think this is showcased in the code examples that the article author critiques from Clean Code. They're littered with side effects and state mutations. This stuff looks like it would be a nightmare to maintain, because it's over-abstracted.

Martin is writing one-line functions whose entire purpose is to call exactly one other function passing in a boolean. I don't even know if I would call that top-down programming, it feels like critiquing that kind of code or calling it characteristic of their writing style is almost unfair to top-down programmers.

tharkun__ · on May 26, 2021

I'm not saying the entire book taken literally is how everything must be done. I was trying to say that the general ideas make sense such as keeping a function at the same level of abstraction and keeping them small.

I agree with you that having all functions be one liners is not useful. Keeping all functions to within just a few lines or double digits at most makes sense however. Single digit could be 9. That's a whole algorithm right there! For example quicksort (quoted from the Wikipedia article)

  algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := partition(A, lo, hi)
        quicksort(A, lo, p - 1)
        quicksort(A, p + 1, hi)

This totally fits the single digit of lines rule and it describes the algorithm on a high enough level of abstraction that you get the idea of the whole algorithm easily. Do you think that inlining the partition function would make this easier or harder to read?

  algorithm quicksort(A, lo, hi) is
    if lo < hi then
        pivot := A[hi]
        i := lo
        for j := lo to hi do
            if A[j] < pivot then
                swap A[i] with A[j]
                i := i + 1
        swap A[i] with A[hi]

        quicksort(A, lo, i - 1)
        quicksort(A, i + 1, hi)

(I hope I didn't mix up the indentation - on the phone here and it's hard to see lol)

Now some stuff might require 11 or 21 lines. But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.

danShumway · on May 26, 2021

> But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.

Well, but that's exactly what I'm pushing back against. I think the rule of 30 is often a mistake. I think if you're going out of your way to avoid long functions, then you are probably over-abstracting your code.

I don't necessarily know that I would inline a quicksort function, because that's genuinely something that I might want to use in multiple places. It's an already-existing, well-understood abstraction. But I would inline a dedicated custom sorting method that's only being used in one place. I would inline something like collision detection, nobody else should be calling that outside of a single update loop. In general, it's a code smell to me if I see a lot of helper functions that only exist to be called once. Those are prime candidates for inlining.

This is kind of a subtle argument. I would recommend http://number-none.com/blow/john_carmack_on_inlined_code.htm... as a starting point for why inlined code makes sense in some situations, although I no longer agree with literally everything in this article, and I think the underlying idea I'm getting at is a bit more general and foundational.

> Do you think that inlining the partition function would make this easier or harder to read?

Undoubtedly easier, although you should label that section with a comment and use a different variable name than `i`. Your secondary function is just a comment around inline logic, it's not doing anything else.[0]

But by separating it out, you've introduced the possibility for someone else in the same class or file to call that function without your knowledge. You've also introduced the possibility for that method to contain a bug that won't be visible unless you step through code. You've also created a function with an unlabeled side effect that's only visible by looking at the implementation, which I thought we were trying to avoid.

You've added a leaky abstraction to your code, a function that isn't just only called in one place, but should only be called in one place. It's a function that will produce unexpected results if anyone other than the `quickSort` method calls it, that lacks any error checking; it's not really a self-contained unit of code at all.

And for what benefit? Is the word `partition` really fully descriptive of what's going on in that method? Does it indicate that the method is going to manipulate part of the array? And is anyone ever going to need to debug or read a quicksort method without looking at the partition method? I think that's very unlikely.

----

Maybe you disagree with everything I'm saying above, but regardless, I don't think that Clean Code is actually advocating for the same ideas as I am:

> Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function.

I don't think that claim is one that Martin would agree with. Or if it is, I don't think it's a statement he's giving actionable advice about inside of his book.

----

[0]: In a language like Javascript (or anything that supports inline functions), we might still use a function or a new context as a descriptive boundary, particularly if we didn't want `j` and `pivot` to leak:

  function quicksort(data, lowIndex, highIndex) {
    if (lowIndex >= highIndex) { return; }

    const pivotIndex = (function partition (data, lo, hi) {
      //etc...
    }(data, lo, hi));

    quickSort(data, lowIndex, pivotIndex - 1);
    quickSort(data, pivotIndex + 1, highIndex);
  }

But for something this trivially small, I suspect that a simple comment would be easier to read.

  function quicksort(data, lowIndex, highIndex) {
    if (lowIndex < highIndex) { return; }

    /* Partition */
    let pivot = data[hi];
    //etc...

    quicksort(data, lowIndex, partionIndex - 1);
    quicksort(data, partionIndex + 1, highIndex);
  }

Remember that your variable and function names can go out of date at the same speed as any of your comments. But the real benefit of inlining this partition function (besides readability, which I'll admit is a bit subjective), is that we've eliminated a potential source of bugs and gotten rid of a leaky abstraction that other functions might be tempted to call into.

TeMPOraL · on May 27, 2021

> Remember that your variable and function names can go out of date at the same speed as any of your comments.

A very good point, thank you for voicing it!

As the luck would have it, two days ago I was writing comments about this at work during code review - there was a case where a bunch of functions taking a "connection" object had it replaced with a "context" object (which encapsulated connection, and some other stuff), but the parameter naming wasn't updated. I was briefly confused by this when studying the code.

danShumway · on May 27, 2021

Ha :) This is something that's also been drilled into me mostly just because I've gotten bitten by it in jobs/projects. The most recent instance I ran into was a `findAllDependents` method turning into `findPartialDependentsList`, but the name never getting updated.

Led to a non-obvious bug because from a high level, the code all looked fine, and it was only digging into the dependents code that revealed that not everything was getting returned anymore.

tharkun__ · on May 28, 2021

Absolutely agree that all naming can go out of date. With at least the tools I use nowadays it's even easier for comments to go out of date that it was previously because of all the automatic folding away in the IDE.

But one of the best reminders that comments don't do sh+t was early on in my career when my co worker asked me a question on a line of code (and it was literally just the two of us working on that code base). I probably had a very weird look on my face. I simply pointed to the line above the one he asked about. He read the comment and said "thank you".

I guess my point is that all you can do is to incorporate the "extra information" as closely as possible to the actual code, so that it's less likely to just be ignored/not seen. Thus incorporating it into the variable and function aiming itself is the closest you will get and as per your example (and my own experience as well) it can still happen. Nothing but rigorous code review practices and co workers that care will help with this.

But I think we can all agree (or I hope so at least) that it's better to have your function called `findAllDependents` and be slightly out of date than to have it called `function137` with a big comment on top that explains in 5 lines that it finds the list of all dependents.

tharkun__ · on May 26, 2021

Glad you admitted subjectivity. I will too and I am on the other side of that subjectivity. For the quicksort example, that was the pseudo code from the Wikipedia article.

I personally think that the algorithm is easier to grasp conceptually if I just need to know 'it partitions the data and the runs quicksort on both of those partitions. Divide and conquer. Awesome'.

I don't care at that level of abstraction _how_ the partitioning works. In fact there are multiple different partition functions people have created that have various characteristics. The fact that this changes its parameters is geberally bad if you ask me but in this specific case of a general purpose and high performance sorting function totally acceptable for the sake of speed and memory use considerations. In other 'real world' scenarios of 'simple business software' I would totally forsake that speed and memory efficiency for better abstractions. This is also where Carmack is basically not a good example. His world is that of high performance graphics and game engine programming where he's literally the one dude that has it all in his head. I can totally see why he would have different from someone like me that has to go look at a different piece of code that I've never seen before every day multiple times.

You mention various problems with this code such as the in place nature and bad naming and such. Most of that is simply the copy from Wikipedia and yes I agree I would also rename these in real code. I do not agree however with the parts about 'someone else could call this now'. To stick with Clean Code's language of choice, the partition function would actually be a private method to the quicksort class. Thus nobody outside can call it but the algorithm itself, as a self contained unit is not just a blob of code.

Same with your inlining of collision detection and such. I don't think I would do that. I think it has value to know that the overall loop is something like

  do_X() 
  do_Y() 
  detect_collisions() 
  do_Z()

Overall "game loop" easily visible straight away. The collision detection function might be a private method to that class you're in though. Will depend on real world scenario I would say.

You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends. In this example it's sort of easy to see. As the code we are talking about gets larger it's not as easy any more. So you have to make sure to make a new comment above every 'section'. Problem is that this can be forgotten. Now I need to actually read and fully understand the code to figure out these boundaries. I can no longer just tell my editor to jump over something. I can no longer have the compiler ensure that the boundaries are set (it will ensure proper function definition and calls).

danShumway · on May 27, 2021

> The collision detection function might be a private method to that class you're in though.

Definitely making things private helps a lot, although its worth noting that classes often aren't maintained by only one person, and they often encapsulate multiple public methods and behaviors. It's still possible to clutter a class with private methods and to have other people working on that class that are calling them incorrectly. This is especially true for methods that mutate private state (at least, in my experience), because those state mutations and state assumptions are often not obvious and are undocumented unless you read the implementation code (and private methods tend to be less documented than public methods in my experience).

Writing in a more functional style (even inside of a class) can help mitigate that problem quite a bit since you get rid of a lot of the problematic hidden state, but I don't want to give the impression that if you make a method private that's always safe and it'll never get misused.

> You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends.

In this example, I felt like it was overkill to include a closing comment, since the whole thing is like 20 lines of code. But you could definitely add a closing comment here. If you use an editor that supports regions, they're pretty handy for collapsing logic as well. That's a bit language dependent though. If you're using something like C# -- C# has fantastic region support in Visual Studio. Other languages may vary.

Of course, people who don't use an IDE can't collapse your regions, but in my experience people who don't use an IDE also often hate jumping between function definitions since they need to manually find the files or grep for the function name, so I'm somewhat doubtful they'll be too upset in either case.

> I can no longer have the compiler ensure that the boundaries are set

You may already know this, but heads up that if you're worried about scope leaking and boundaries, check if your language of choice supports block expressions or an equivalent. Languages like Rust and Go can allow you to scope arbitrary blocks of code, C (when compiled with gcc) supports statement expressions, and many other languages like Javascript support anonymous/inline functions. Even if you are separating a lot of your code into different functions, it's still nice to be able to occasionally take advantage of those features. I often like to avoid the extra indentation in my code if I can help it, but that's just my own visual preference.

_dps · on May 25, 2021

As mainly a bottom-up person, I completely agree with your analysis but I wonder if you might be using "top-down architecture" here in an overloaded way?

My personal style is bottom up, maximally direct code, aiming for monolithic modules under 10kloc, combined with module coupling over very narrow interfaces. Generally the narrow interfaces emerge from finding the "natural grain" of the module after writing it, not from some a priori top-down idea of how the communication pathways should be shaped.

Edit: an example of a narrow interface might be having a 10kloc quantitative trading strategy module that communicates with some larger system only by reading off a queue of things that might need to be traded, and writing to a queue of desired actions.

anm89 · on May 25, 2021

I never thought of things this way but it is a useful perspective.

ajuc · on May 25, 2021

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

That's only 1 part of the complexity equation.

When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.

If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

Short functions are good when they fit the problem. Often they don't.

jjnoakes · on May 25, 2021

I feel like this is only a problem if the small functions share a lot of global state. If each one acts upon its arguments and returns values without side effects, ordering is much less of an issue IMO.

ajuc · on May 25, 2021

Well, if they were one function before they probably share some state.

Clean code recommends turning that function into a class and promoting the shared state from local variables into fields. After such a "refactoring" you get a nice puzzle trying to understand what exactly happens.

virtue3 · on May 25, 2021

I've seen threads on this before but the "goto" (couldn' t stop myself) reaching of object oriented-ness to "solve" everything is really frustrating.

I've found the single greatest contributor to more readable and maintainable code is to limit state as much as possible.

Which was really hard for me to learn because it can be somewhat less efficient, and my game programmer upbringing hates it.

JamesBarney · on May 26, 2021

Sometimes eliminating state can also mean increasing complexity and lines of code tremendously.

TeMPOraL · on May 27, 2021

A lot depends on what your language and its ecosystem can support. For instance, the kind of monadic stuff people do with Haskell and Scala can compress programs tremendously, but then I've worked in a codebase that tried the same things in C++ - and there, the line count expands, because the language just can't express some of the necessary concepts in a concise way.

chii · on May 26, 2021

> if they were one function before they probably share some state

and this is exactly why you refactor to pull out the shared state into parameters, so that each of the "subfunctions" have zero side effects.

josephg · on May 25, 2021

In javascript I sometimes break up the behaviour of a large function by putting small internal functions inside it. Those internal functions often have side effects, mutating the state of the outer function which contains them.

I find this approach a decent balance between having lots of small functions and having one big function. The result is self contained (like a function). It has the API of a function, and it can be read top to bottom. But you still get many of the readability benefits of small functions - like each of the internal methods can be named, and they’re simple and each one captures a specific thought / action.

BigJono · on May 26, 2021

If you're calling those functions once each in a particular order then I can't possibly figure out what that does for you that whitespace and a few comments wouldn't. How does turning 100 lines of code into 120 and shuffling it out of execution order possibly make it easier to read?

throwawaythekey · on May 26, 2021

I coded this way for a while and found it makes the code easier to read and easier to reason about. Instead of your function being

  func foo() {
    // do A.1
    // do A.2
    // do B.1
    // do B.2
    // etc...
  }

It becomes

  func foo() {
    // do A
    // do B
    // etc...
    // func A()...
    // func B()...
  }

When the func is doing something fairly complicated the savings can really add up. It also makes expressing some concurrency patterns easier (parallel, series etc...), I used to do this a lot back in the async.js days. The main downside seems to be less elegant automated testing from all the internal state.

josephg · on May 26, 2021

No; I wouldn't do it if I was just calling them once each in a particular order. And I don't often use this trick for simple functions. But sometimes long functions have repeated behaviour.

For example, in this case I needed to do two recursive tree walks inside this function, so each walk was expressed as an inner function which recursively called itself, and each is called once from the top level method:

https://github.com/ottypes/json1/blob/05ef789cc697888802e786...

I don't do it like this often though. The code in this file is easily the most complex code I've written in nearly 3 decades of programming. Here my ability to read and edit the code is easily the most important factor. I think this form makes the core algorithm more clear than any other way I could factor this code. I considered smearing this internal logic out over several top level methods, but I doubt that would be any easier to read.

nojokes · on May 26, 2021

A recommended reading https://www.amazon.com/Refactoring-Improving-Existing-Addiso...

jensvdh · on May 26, 2021

Aren't you creating new functions on each call to your parent function though? I imagine there must be a performance or memory penalty?

nojokes · on May 26, 2021

Now this is usually in my opinion not a good advice (it is like reintroduction of global variables) as unnecessary state certainly makes things more difficult to reason about.

I have read the book (not very recently) and I do not recall this but perhaps I am just immune to such advice.

I like his book about refactoring more than Clean Code but it introduced me to some good principles like SOLID (a good mnemonic), so I found it somewhat useful.

jollybean · on May 26, 2021

Yes and no.

What I find is that function boundaries have a bunch of hidden assumptions we don't think about.

Especially things like exceptions.

For all these utility functions are you going to check input variables, which means doing it over, over and over again. Catching exceptions everywhere etc?

A function can be used for a 'narrow use case' - but - when it's actually made available to other parts of the system, it needs to be kind of more generalized.

This is the problem.

Is it possible that 'nested functions' could provide a solution? As in, you only call the function once, in the context of some other function, so why not physically put it there?

I can have it's own stack, be tested separately if needed, but it remains exclusive to the context that it is in from a readability perspective - and you don't risk having it used for 'other things'.

You could even have an editor 'collapse' the function into a single line of code, to make the longer algorithm more readable.

JamesBarney · on May 26, 2021

The problem is abstraction isn't free. Sometimes it frees up your brain from unnecessary details and sometimes the implementation matters or the abstraction leaks.

Even something as simple as Substring which is a method we use all the time and is far more clear than most helper functions I've seen in code bases.

Is it Substring(string, index, length) or Substring(string, indexStart, indexEnd)

What happens when you pass in "abc".Substring(0,4) do you get an exception or "abc"?

What does Substring(0,-1) do? or Substring (-2,-3).

What happens when you call it on null? Sometimes this matters, sometimes it doesn't.

TeMPOraL · on May 27, 2021

Also:

- Does it destructively modify the argument, or return a substring? Or both?

- If it returns a substring, is it a view over the original string, or a fresh substring that doesn't share memory with the original?

- If it returns a fresh substring, how does it do it? Is it smart or dumb about allocations? This almost never matters, except when it does.

- How does it handle multibyte characters? Do locales impact it in any way?

With the languages we have today, a big part of the function contract cannot be explicitly expressed in function signatures. And it only gets worse with more complicated tools of abstraction.

MillenialMan · on May 26, 2021

I posted this elsewhere in the thread, but local blocks that define which variables they read, mutate and export would IMO be a very good solution to this problem:

    (reads: var_1, var_2; mutates: var_3) {
       var_3 = var_1 + var_2
       int result_value = var_1 * var_2
    } (exports: result_value)

    return result_value * 5

There are a couple of newer languages experimenting with concepts like this, Jai being one: https://youtu.be/5Nc68IdNKdg?t=3493

throwaway2037 · on May 27, 2021

This is a fascinating idea. In some languages like C or Java or C#, the IDE can probably do this "for free" -- generate, then programmer can spot check for surprises. Or the reverse, highlight a block of code and ask the IDE to tell you about read/mutate/export. In some sense, when you use automatic refactoring tools (like IntelliJ), extract a few lines of code as a new method needs to perform similar static analysis.

In the latest IntelliJ, the IDE will visually hint about mutable, primitive-typed local variables (including method parameters). A good example is a for loop variable (i/j/k). The IDE makes it stand-out. When I write Java, I try to use final everywhere for primitive-typed local variables. (I borrowed this idea from functional programming styles.) The IDE gives me a hint if I accidentally forget to mark something as final.

ajuc · on May 27, 2021

> local blocks that define which variables they read, mutate and export would IMO be a very good solution to this problem:

this is basically a lambda you call instantly.

    [&x, y, z] () {
        x = y + z;
    }();

MillenialMan · on May 28, 2021

It's similar, but lambdas don't specify the behaviour as precisely, and they're not as readable since the use of a lambda implies a different intention, and the syntax that transforms them into a scope block is very subtle. They may also have performance overhead depending on the environment, which is (arguably) additional information the programmer has to consider on usage.

joshka · on May 26, 2021

>If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

You're missing likely missing one or more techniques that make this work well:

1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).

4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.

In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.

TeMPOraL · on May 27, 2021

> It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example.

One of the codebases I'm currently working is a big example of that. I obviously can't share parts of it, but I'll say that I agree with GP. Lots of tiny functions kills readability.

> 1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

> 2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

That's table stakes. Unfortunately, quite often a properly descriptive name would be 100+ characters long, which obviously nobody does.

> 3. Similar levels of abstraction in each function

That's a given, but in a way, each "layer" of such functions introduces its own sublevel of abstraction, so this leads to abstraction proliferation. Sometimes those abstractions are necessary, but I found it easier when I can handle them through few "deep" (as Ousterhout calls it) functions than a lot of "shallow" ones.

> 4. Explicit pre/post conditions in each method

These introduce a lot of redundant code, just so that the function can ensure a consistent state for itself. It's such a big overhead that, in practice, people skip those checks, and rely on everyone remembering that these functions are "internal" and had their preconditions already checked. Meanwhile, a bigger, multi-step function can check those preconditions once.

joshka · on May 27, 2021

> Lots of tiny functions kills readability.

I've heard this argument a lot, and I've found generally there's another problem that causes lack of readability than the small functions.

>Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

Here though you're kind of used to reading code upwards though, so flip the depth first and make it depth last (or take the hit on the forward declarations. If you've got more than you can handle of these, your classes are probably too complex regardless (i.e. doing input, parsing, transformation, and output in the same method).

> quite often a properly descriptive name would be 100+ characters long

Generally if this is the case, then the containing class / module / block / ? is too big. Not a problem of small methods, problem is at a higher level.

> Explicit pre/post conditions in each method

I should have been more explicit here - what I meant is that you know that in the first method, that only the first 3 variables matter, and those variables / parameters are not modified / relevant to the rest of the method. Even without specifically coding pre/post-cons, you get a better feel for the intended isolation of each block. You fall into a pattern of writing code that is simple to reason about. Paired with pure methods / immutable variables, this tends to (IMO) generate easily scannable code. Code that looks like it does what it does, rather than code that requires reading every line to understand.

ajuc · on May 26, 2021

> You're missing likely missing one or more techniques that make this work well:

I know how to do it, I just don't always think it's worth it.

> Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

Not 100 lines, just 34, but it's a good example of a function I wouldn't split even if it get to 300 lines.

    function getFullParameters() {
        const result = {
            "gridType": { defaultValue: 1, randomFn: null, redraw: onlyOneRedraw("grid"), },
            "gridSize": { defaultValue: 32, randomFn: null, redraw: onlyOneRedraw("grid"), },
            "gridOpacity": { defaultValue: 40, randomFn: null, redraw: onlyOneRedraw("grid"), },
            "width": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
            "height": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
            "seed": { defaultValue: 1, randomFn: () => Math.round(Math.random() * 65536), redraw: allRedraws(), },
            "treeDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
            "stoneDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("stones"), },
            "twigsDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("twigs"), },
            "riverSize": { defaultValue: 3, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
            "roadSize": { defaultValue: 0, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
            "centerRandomness": { defaultValue: 20, randomFn: () => Math.round(30), redraw: onlyOneRedraw("trees"), },
            "leavedTreeProportion": { defaultValue: 95, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
            "treeSize": { defaultValue: 50, randomFn: () => Math.round(30) + Math.round(Math.random() * 40), redraw: onlyOneRedraw("trees"), },
            "treeColor": { defaultValue: 120, randomFn: () => Math.round(Math.random() * 65536), redraw: onlyOneRedraw("trees"), },
            "treeSeparation": { defaultValue: 40, randomFn: () => Math.round(80 + Math.random() * 20), redraw: onlyOneRedraw("trees"), },
            "serrationAmplitude": { defaultValue: 130, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
            "serrationFrequency": { defaultValue: 30, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
            "serrationRandomness": { defaultValue: 250, randomFn: () => Math.round(100), redraw: onlyOneRedraw("trees"), },
            "colorRandomness": { defaultValue: 30, randomFn: () => Math.round(20), redraw: onlyOneRedraw("trees"), },
            "clearings": { defaultValue: 9, randomFn: () => Math.round(3 + Math.random() * 10), redraw: onlyRedrawsAfter("clearings"), },
            "clearingSize": { defaultValue: 30, randomFn: () => Math.round(30 + Math.random() * 20), redraw: onlyRedrawsAfter("clearings"), },
            "treeSteps": { defaultValue: 2, randomFn: () => Math.round(3 + Math.random() * 2), redraw: onlyOneRedraw("trees"), },
            "backgroundNo": { defaultValue: 1, randomFn: null, redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "showColliders": { defaultValue: 0, randomFn: null, redraw: onlyOneRedraw("colliders"), },
            "grassLength": { defaultValue: 85, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "grassDensity": { defaultValue: 120, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "grassSpread": { defaultValue: 45, randomFn: () => Math.round(5 + Math.random() * 25), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
            "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
        };
        return result;
    }

There's a lot of value in having all of this in one place. Ordering isn't a problem here, just no need to refactor.

throwaway2037 · on May 27, 2021

I have seen so much GUI code like this in my career! Real world sophisticated GUIs can have tens or hundreds of attributes to setup. Especially ancient Xlib stuff, this was the norm. You have a few functions with maybe hundreds of lines doing pure GUI setup. No problem -- easy to mentally compartmentalise.

Your deeper point (if I may theorise): Stop following hard-and-fast rules. Instead, do what makes sense and is easy to read and maintain.

joshka · on May 27, 2021

> I know how to do it, I just don't always think it's worth it.

Agreed:)

Generally no problem with this method other than it's difficult to at a glance see what each item will get set to. Something like the following might be an easy first step:

    function getFullParameters() {
      function param(defaultValue, redraws) {
        return { defaultValue: defaultValue, randomFn: null, redraws };
      }
      function param(defaultValue, randomFn, redraws) {
        return { defaultValue: defaultValue, randomFn: randomFn, redraws };
      }
      const result = {
        "gridType": param(1, onlyOneRedraw("grid")),
        "gridSize": param(32, onlyOneRedraw("grid")),
        "gridOpacity": param(40, onlyOneRedraw("grid")),
        "width": param(1024, allRedraws()),
        "height": param(1024, allRedraws()),
        "seed": param(1, () => Math.round(Math.random() * 65536), allRedraws()),
        "treeDensity": param(40, () => Math.round(Math.random() * 100), onlyOneRedraw("trees")),
        "stoneDensity": param(40, () => Math.round(Math.random() * 20 * Math.random() * 5), onlyOneRedraw("stones")),
        "twigsDensity": param(40, () => Math.round(Math.random() * 20 * Math.random() * 5), onlyOneRedraw("twigs")),
        "riverSize": param(3, () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, onlyRedrawsAfter("river")),
        "roadSize": param(0, () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, onlyRedrawsAfter("river")),
        "centerRandomness": param(20, () => Math.round(30), onlyOneRedraw("trees")),
        "leavedTreeProportion": param(95, () => Math.round(Math.random() * 100), onlyOneRedraw("trees")),
        "treeSize": param(50, () => Math.round(30) + Math.round(Math.random() * 40), onlyOneRedraw("trees")),
        "treeColor": param(120, () => Math.round(Math.random() * 65536), onlyOneRedraw("trees")),
        "treeSeparation": param(40, () => Math.round(80 + Math.random() * 20), onlyOneRedraw("trees")),
        "serrationAmplitude": param(130, () => Math.round(80 + Math.random() * 40), onlyOneRedraw("trees")),
        "serrationFrequency": param(30, () => Math.round(80 + Math.random() * 40), onlyOneRedraw("trees")),
        "serrationRandomness": param(250, () => Math.round(100), onlyOneRedraw("trees")),
        "colorRandomness": param(30, () => Math.round(20), onlyOneRedraw("trees")),
        "clearings": param(9, () => Math.round(3 + Math.random() * 10), onlyRedrawsAfter("clearings")),
        "clearingSize": param(30, () => Math.round(30 + Math.random() * 20), onlyRedrawsAfter("clearings")),
        "treeSteps": param(2, () => Math.round(3 + Math.random() * 2), onlyOneRedraw("trees")),
        "backgroundNo": param(1, onlyTheseRedraws(["background", "backgroundCover"])),
        "showColliders": param(0, onlyOneRedraw("colliders")),
        "grassLength": param(85, () => Math.round(25 + Math.random() * 50), onlyTheseRedraws(["background", "backgroundCover"])),
        "grassDensity": param(120, () => Math.round(25 + Math.random() * 50), onlyTheseRedraws(["background", "backgroundCover"])),
        "grassSpread": param(45, () => Math.round(5 + Math.random() * 25), onlyTheseRedraws(["background", "backgroundCover"])),
        "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
      };
      return result;
    }

For someone looking at this for the first time, the rationale for each random function choice is obtuse so you might consider pulling out each type of random function into something descriptive like randomIntUpto(65536), randomDensity(20, 5), randomIntRange(30, 70).

Does it add value? Maybe - ask a junior to review the two and see which they prefer maintaining. Regardless, this code mostly exists at a single level of abstraction, which tends to imply simple refactorings rather than complex.

My guess is if this extended to multiple (levels / maps / ?) you'd probably split the settings into multiple functions, one per map right...?

ajuc · on May 28, 2021

> My guess is if this extended to multiple (levels / maps / ?) you'd probably split the settings into multiple functions, one per map right...?

This was handling ui dependencies for https://ajuc.github.io/outdoorsBattlemapGenerator/

Basically I wanted to redraw as little as possible so I build a dependency graph.

But then I wanted to add more parameters and to group them, so I can have many different kinds of trees without hardcoding their parameters. It was mostly an UI problem, not a refactor problem. So I'm rewriting it like this:

https://ajuc.github.io/kartograf/

Graph editor keeps my dependencies for me, and user can copy-paste 20 different kinds of trees and play with their parameters independently. And I don't need to write any code - a library handles it for me :)

Also now i can add interpolate node which takes 2 configurations and a number and interpolates the result between them. So I can have high grass go smoothly to low grass while trees go from one kind to another.

nojokes · on May 26, 2021

I am surprised that this is the top answer (Edit: at the moment, was)

How does splitting code into multiple functions suddenly change the order of the code?

I would expect that these functions would be still called in a very specific order.

And sometimes it does not even make sense to keep this order.

But here is a little example (in a made up pseudo code):

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    positiveInt total = 0
    positiveInt max = 0
    for (positiveInti=0; i < values.length; i++) 
      total = total + values[i]
      max = values[i] > max ? values[i] : max
    return total - max

===>

  function positiveInt max(positiveInt[] values)
    positiveInt max = 0
    for (positiveInt i=0; i < values.length; i++) 
      max = values[i] > max ? values[i] : max
    return max

  function positiveInt total(positiveInt[] values)
    positiveInt total = 0
    for (positiveInt i=0; i < values.length; i++) 
      total = total + values[i]
    return total

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    return total(values)-max(values)

Better? No?

viktree · on May 26, 2021

> How does splitting code into multiple functions suddenly change the order of the code?

Regardless of how smart your compiler is and all the tricks it pulls to execute the codein much the same order, the order in which humans read the pseudo code is changed

  01. function positiveInt max(positiveInt[] values)
  02.   positiveInt max = 0
  03.   for (positiveInt i=0; i < values.length; i++) 
  04.     max = values[i] > max ? values[i] : max
  05.   return max

  07. function positiveInt total(positiveInt[] values)
  08.   positiveInt total = 0
  09.   for (positiveInt i=0; i < values.length; i++) 
  10.     total = total + values[i]
  11.   return total

  12. function positiveInt calcMeaningOfLife(positiveInt[] values)
  13.   return total(values) - max(values)

Your modern compiler will take care of order in which the code is executed, but as humans need to trace the code line-by-line as [13, 12, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11]. By comparison, the inline case can be understood sequentially by reading lines 01 to 07 in order.

  01. function positiveInt calcMeaningOfLife(positiveInt[] values)
  02.   positiveInt total = 0
  03.   positiveInt max = 0
  04.   for (positiveInt i=0; i < values.length; i++) 
  05.     total = total + values[i]
  06.     max = values[i] > max ? values[i] : max
  07.   return total - max

> Better? No?

In most cases, yeah probably your better off with the two helper functions. max() and total() are common enough operations, and they are named well enough that we can easily guess their intent without having to read the function body.

However, depending on the size of the codebase, the complexity of the surrounding functions and the location of the two helper functions it's easy to see that this might not always be the case.

If you want to try and understand the code for the first time, or if you are trying to trace down some complex bug there's a chance having all the code inline would help you.

Further, splitting up a large inline function is more trivial than reassembling many small functions (hope you got your unit tests!).

> And sometimes it does not even make sense to keep this order.

Agreed. But naming and abstractions are not trival problems. Often times it's the larger/more complex codebases, where you see these practices get applied more dogmatically

nojokes · on May 26, 2021

Well, inlining by the compiler would be expected but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

Splitting the code into smaller functions does not automatically warrant a better design, it is just one heuristic.

A naive implementation of the principle could perhaps have found a less optimal solution

  function positiveInt max(positiveInt value1, positiveInt value2)
    return value1 > value2 ? value1 : value2

  function positiveInt total(positiveInt value1, positiveInt value2)
    return value1 + value2 

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    positiveInt total = 0
    positiveInt max = 0
    for (positiveInt i=0; i < values.length; i++)
      total = total(total, values[i])
      max = max(max, values[i])
    return total - max

Now this is a trivial example but we can imagine that instead of max and total we have some more complex calculations or even calls to some external system (a database, API etc).

When faced with a bug, I would certainly prefer the refactoring in the GP comment than one here (or the initial implementation).

I think that when inlining feels strictly necessary then there has been problem with boundary definition but I agree that being able to view one single execution path inlined can help to understand the implementation.

I completely agree that naming and abstractions are perhaps two most complicated problems.

TeMPOraL · on May 27, 2021

> but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

That's the thing, isn't it? Various arguments have been raised all across this thread, so I just want to put a spotlight on this principle, and say:

Myself, based on my prior experience, find code with few larger functions much more readable than the one with lots of small functions. In fact, I'd like a tool that could perform the inlining described by the GP for me, whenever I'm working in a codebase that follows the "lots of tiny functions" pattern.

Perhaps this is how my brain is wired, but when I try to understand unfamiliar code, the first thing I want to know is what it actually does, step by step, at low level, and only then, how these actions are structured into helpful abstractions. I need to see the lower levels before I'm comfortable with the higher ones. That's probably why I sometimes use step-by-step debugging as an aid to understanding the code...

hackinthebochs · on May 27, 2021

>the first thing I want to know is what it actually does, step by step, at low level

I feel like we might be touching on some core differences between the top-down guys and the bottom-up guys. When I read low level code, what I'm trying to do is figure out what this code accomplishes, distinct from "what it's doing". Once I figure it out and can sum up its purpose in a short slogan, I mentally paper over that section with the slogan. Essentially I am reconstructing the higher level narrative from the low level code.

And this is precisely why I advocate for more abstractions with names that describe its behavior; if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code. I wonder if how the bottom-up guys understand code is substantially different? Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code, or does your mental model mostly hold on to the low level detail even when reasoning about the high level?

TeMPOraL · on May 27, 2021

> Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code,

It does!

> or does your mental model mostly hold on to the low level detail even when reasoning about the high level?

It does too!

What I mean is, I do what you described in your first paragraph - trying to see happens at the low level, and build up some abstractions/narrative to paper it over. However, I still keep the low-level details in the back of my mind, and they inform my reasoning when working at higher levels.

> if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code

I feel the same way. I'm really grateful for good abstractions, clean structure and proper naming. But I naturally tend to not take them at face value. That is, I'll provisionally accept the code is what it says it is, but I feel much more comfortable when I can look under the hood and confirm it. This practice of spot-checking implementation saved me plenty of times from bad naming/bad abstractions, so I feel it's necessary.

Beyond that, I generally feel uncomfortable about code if I can't translate it to low-level in my head. That's the inverse of your first paragraph. When I look at high-level code, my brain naturally tries to "anchor it to reality" - translate it into something at the level of sequential C, step through it, and see if it makes sense. So for example, when I see:

  foo = reduce(map(bar, fn), fn2)

My mind reads it as both:

- "Convert items in 'bar' via 'fn' and then aggregate via 'fn2'", and

- "Loop over 'bar', applying 'fn' to each element, then make an accumulator, initialize it to first element of result, loop over results, setting the accumulator to 'fn2(accumulator, element)', and return that - or equivalent but more optimized version".

To be able to construct the second implementation, I need to know how 'map' and 'reduce' actually work, at least on the "sequential C pseudocode" level. If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code. Like floating above the cloud cover, not knowing where I am. I can still work like this, I just feel very insecure.

One particular example I remember: I was very uncomfortable with Prolog when I was learning it in university, until one day I read a chapter about implementing some of its core features in Lisp. When I saw how Prolog's magic works internally, it all immediately clicked, and I could suddenly reason about Prolog code quite comfortably, and express ideas at its level of abstraction.

One side benefit of having a simultaneous high and low-level view is, I have a good feel about the lower bound of performance of any code I write. Like in the map/reduce example above: I know how map and reduce are implemented, so I know that the base complexity will be at least O(n), how complexity of `fn` and `fn2` influence it, how the data access pattern will look like, how memory allocation will look like, etc.

Perhaps performance is where my way of looking at things comes from - I started programming because I wanted to write games, so I was performance-conscious from the start.

hackinthebochs · on May 27, 2021

>If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code.

This is probably the biggest difference with myself. If I have a clear concept of how the abstractions operate in the context of the related abstractions and the big picture, I feel perfectly comfortable not knowing the details of how it gets done at a lower level. To me, the details just get in the way of comprehending the big picture.

ajuc · on May 31, 2021

A common problem with code written like that is checking the same preconditions repeatedly (or worse - never) and transforming data one way and back for no reason. I remember a bug I helped fix a fresh graduate that joined our project. It crashed with NPE when a list was empty. It's weird cause empty list should cause IndexOutOfBound if anything and the poor guy was stumped.

I looked at call stack and we got list as input then it was changed to null if it was empty then it was checked for size and in yet another function it was dereferenced and indexed.

Guy was trying to fix it by adding yet another if then else 5 levels in callstack below the first time it was checked for size. No doubt then another intern would have added even more checks ;)

If you don't know what happens to your data in your program you're doing voodoo programming.

smolder · on May 25, 2021

There's certainly some difference in priorities between massive 1000-programmer projects where complexity must be aggressively managed and, say, a 3-person team making a simple web app. Different projects will have a different sweet spot in terms of structural complexity versus function complexity. I've seen code that, IMO, misses the sweet spot in either direction.

Sometimes there is too much code in mega-functions, poor separation of concerns and so on. These are easy mistakes to make, especially for beginners, so there are a lot of warnings against them.

Other times you have too many abstractions and too much indirection to serve any useful purpose. The ratio of named things, functional boundaries, and interface definitions to actual instructions can easily get out of hand when people dogmatically apply complexity-managing patterns to things that aren't very complex. Such over-abstraction can fall under YAGNI and waste time/$ as the code becomes slower to navigate, slower to understand in depth, and possibly slower to modify.

I think in software engineering we suffer more from the former problem than the latter problem, but the latter problem is often more frustrating because it's easier to argue for applying nifty patterns and levels of indirection than omitting them.

Just for a tangible example: If I have to iterate over a 3D data structure with an X Y and Z dimension, and use 3 nested loops to do so, is that too complex a function? I'd say no. It's at least as clear without introducing more functional boundaries, which is effort with no benefit.

mywittyname · on May 25, 2021

Well named functions are only half (or maybe a quarter) of the battle. Function documentation is paramount in complex codebases, since documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).

Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.

cratermoon · on May 25, 2021

> documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.

danShumway · on May 25, 2021

What's the alternative? Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

You can argue that there should be separate functions for `send_to_database` and `lock_database` and `format_data_for_database` and `handle_db_error`. But you're still going to have to document the same stuff. You're still going to have to remind people to lock the database in some situations. You're still going to have to worry about people forgetting to call one of those functions.

And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.

cratermoon · on May 26, 2021

> Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

Invert the dependencies. After many years of programming I started deliberately asking myself "hmm, what if, instead of A calling B, B were to call A?" and now it's become part of my regular design and refactoring thinking. See also Resource Acquisition Is Initialization.

danShumway · on May 26, 2021

> See also Resource Acquisition Is Initialization.

I'm not sure I follow. RAII removes the ability to accidentally forget to call destruction/initialization code and allows managing resource lifecycle. It doesn't remove the need to document how that code works, it just means you're now documenting it as part of the class/block. Freeing a resource during a destructor, locking the database during a constructor -- that stuff still has to be documented the same way it would have been documented if you put it into a single function instead of a single class.

Even with dependency inversion, you still end up eventually with the same problem I brought up:

> And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.

Maybe you call your functions in a different order or way, maybe you invert the dependency chain so your smaller functions are getting passed references to the bigger ones. You're still running the same amount of code, you haven't gotten rid of your documentation requirements.

Unless I'm misunderstanding what you mean by inversion of dependencies. Most of the dependency inversion systems I've seen in the wild increase the number of interfaces in code because they're trying to reduce coupling, which in turn increases the need to document those interfaces. But it's possible I've only seen a subset, or that you're doing something different.

cratermoon · on May 26, 2021

> increase the number of interfaces in code because they're trying to reduce coupling

Yes, exactly! You want lots of interfaces. You want very small interfaces.

> which in turn increases the need to document those interfaces.

Not if the interfaces are small. For example, in the Go language standard library we find two interfaces: io.Reader and io.Writer. They each define a single method. In the case of io.Reader, that method is defined as Read(p []byte) (n int, err error) and correspondingly io.Writer has Write(p []byte) (n int, err error)

These interfaces are so small they barely need documentation.

TeMPOraL · on May 27, 2021

> These interfaces are so small they barely need documentation.

Sort of.

On the other end of the dependency inversion chain, there is some code that implements those interfaces. That code comes with various caveats that need to be documented.

Then there's the glue code, the orchestration - the part that picks a concrete thing, makes it conform to a desired interface, and passes it to the component which needs it. In order to do its job correctly, this orchestrating code needs to know all the various caveats of the concrete implementation, and all the idiosyncratic demands of the desired interface. When writing this part you may suddenly discover that your glue code is buggy, because the "trivial" interface was thoroughly undocumented.

throwaway2037 · on May 27, 2021

My style is similar about tiny interfaces: My usual style in Java is an interface with a single method and a nested POJO (struct) called Result. Then, I have a single implementation in production, and another implementation for testing (mocking in 2010s forward). Some of my longer lived projects might have 100s of these after a few years.

Please enjoy this silly, but illustrative example!

public interface HerdingCatsService {

    /*public static*/ final class Result {
        ...
    }

    Result herdThemCats(ABunchOfCats soMuchFun)
    throws Exception;
}

eweise · on May 25, 2021

Yikes, I hope I don't have to read documentation to understand how the code deals with failed records or other use cases. Good code would have the use cases separated from the send_records_to_database so it would be obvious what the records were and how failure conditions are handled.

mywittyname · on May 25, 2021

How else are you going to understand how a library works besides RTFM or RTFC? I guess the third option is copy pasta from stack overflow and hope your use case doesn't require any significant deviation?

You seriously never have to read documentation?

Must be nice, I've been balls-deep in GCP libraries and even simple things like pulling from a PubSub topic have footguns and undocumented features in certain library calls. Like subscriber.subscribe returns a future that triggers a callback function for each polled message, while subscriber.pull returns an array of messages.

That's a pretty damn obvious case where functions should have been named "obviously" (pull_async, pull_sync), yet they weren't. And that's from a very widely used service from one of the biggest tech companies out there, written by a person that presumably passed one of the hardest interviews in the industry and gets paid in the top like 1% of developer.

Without documentation, I would have never figured those out.

throwaway2037 · on May 27, 2021

"Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design."

This, this, and... this.

Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.

bluGill · on May 25, 2021

Documentation is only useful it is up to date and correct. I ignore documentation because I've never found the above are true.

There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)

mywittyname · on May 25, 2021

Oh I agree, but a person who won't take the time to update documentation after a significant change, certainly isn't going to refactor the code such that the method name matches the updated functionality. Assuming they can even update the name if they wanted to.

After all, documentation is cheap. If you're going to write a commit message, why not also update the function docs with pretty much the same thing? "Filename parameter will now use S3 if an appropriate URI is passed (i., filename='s3://bucket/object/path.txt'). Note: doesn't work with path-style URLs."

duckmysick · on May 26, 2021

Ignore, as in you don't write any?

gnuvince · on May 25, 2021

> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects.

Code telling a story is a fallacy that programmers keep telling themselves and which fails to die. Code doesn't tell stories, programmers do. Code can't explain why it exists; it can't tell you about the buggy API it relies on and which makes its implementation weird and not straight-forward; it can't say when it's no longer needed.

Good names are important, but it's false that having well-chosen function and arguments names will tell a programmer everything they need to know.

hackinthebochs · on May 25, 2021

>Code doesn't tell stories, programmers do. Code can't explain why it exists;

Code can't tell every relevant story, but it can tell a story about how it does what it does. Code is primarily written for other programmers. Writing code in such a way that other people with some familiarity with the problem space can understand easily should be the goal. But this means telling a story to the next reader, the story of how the inputs to some functional unit are translated into its outputs or changes in state. The best way to explain this to another human is almost never the best way to explain it to a computer. But since we have to communicate with other humans and to the computer from the same code, it takes some effort to bridge the two paradigms. Having the code tell a story at the high level by way of the modules, objects and methods being called is how we bridge this gap. But there are better and worse ways to do this.

Software development is a process of translating the natural language-spec of the system into a code-spec. But you can have the natural language-spec embedded in the structure of the code to a large degree. The more, the better.

MillenialMan · on May 26, 2021

Code is not primarily written for other programmers. It's written for the computer, the primary purpose is to tell the computer what to do. Readability is desirable, but inherently secondary to that concern, and abstraction often interferes with your ability to understand and express what is actually happening on the silicon - even if it improves your ability to communicate the abstract problem. Is that worth it? It's not straightforward.

An overemphasis on readability is how you get problems like "Twitter crashing not just the tab but people's entire browser for multiple years". Silicon is hard to understand, but hiding it behind abstractions also hides the fundamental territory you're operating in. By introducing abstractions, you may make high-level problems easier to tackle, but you make it much harder to tackle low-level problems that inevitably bubble up.

A good symptom of this is that the vast majority of JS developers don't even know what a cache miss is, or how expensive it is. They don't know that linearly traversing an array is thousands of times faster than linearly traversing a (fragmented) linked list. They operate in such an abstract land that they've never had to grapple with the actual nature of the hardware they're operating on. Performance issues that arise as a result of that are a great example of readability obscuring the fundamental problem.

hackinthebochs · on May 26, 2021

>Code is not primarily written for other programmers.

I should have said code should be written primarily for other programmers. There are an infinite number of ways to express the same program, and the computer is indifferent to which one it is given. But only a select few are easily understood by another human. Code should be optimized for human readability barring overriding constraints. Granted, in some contexts efficiency is more important than readability down the line. But such contexts are few and far between. Most code does not need to consider the state of the CPU cache, for example.

throwaway2037 · on May 27, 2021

Joel Spolsky opened my eyes to this issue: Code is read more than it is written. In theory, code is written once (then touched-up for bugs). For 99.9% its life, it is read-only. That is a strong case for writing readable code. I try to write my code so that a junior hire can read and maintain it -- from a technical view. (They might be clueless about the business logic, but that is fine.) Granted, I am not always successful in this goal!

MillenialMan · on May 28, 2021

Code should be written for debugability, not readability. I don't care if it takes someone 20 minutes to understand my algorithm, if when they understand it bugs become immediately obvious.

Most simplification added to your code obscures the underlying operations on the silicon. It's like writing a novel so a 5-year-old can read it, versus writing a novel for a 20-year-old. You want to communicate the same ideas? The kid's version is going to be hundreds of times longer. It's going to take longer to write, longer to read, and you're much more likely to make mistakes related to non-local dependencies. In fact, you're going to turn a lot of local dependencies into non-local dependencies.

Someone who's competent can digest much more complex input, so you can communicate a lot more in one go. Training wheels may make it so anyone can ride your bike but they also limit your ability to compete in, say, the Tour de France.

Also, this is a side note, but "code is read by programmers" is a bit of a platitude IMO - it's wordplay. Your code is also read by the computer a lot more than it's read by other programmers. Keep your secondary audience in mind, but write for your primary audience.

MillenialMan · on May 27, 2021

My point was not just about performance - a lot of bugs come from the introduction of abstractions to increase readability, because the underlying algorithms are obscured. Humans are just not that good at reading algorithms. Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem. Every time you add an abstraction, you increase the degree of misrepresentation. You can argue that's worth it because code is read a lot, but it's still a tradeoff.

But another point worth considering is that a lot of things that make code easier to read make it much harder to rewrite, and they can also make it harder to debug.

hackinthebochs · on May 29, 2021

>Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem.

Do you have an example, as this is entirely counter to my experience. Of course, you can misrepresent the behavior in words, but then you just used the wrong words to describe what's going on. That's not an indictment of abstraction generally. Abstractions necessarily leave something out, but what is left out is not an assertion of absence. This is not a misrepresentation.

MillenialMan · on May 30, 2021

Let me try explaining a few ways:

1. ---

You don't need to assert absence, the abstraction inherently ignores that which is left out, and the reader remains ignorant of it (that's the point, in fact). The abstraction asserts that the information it captures is the most useful information, and arguably it asserts that it is the only relevant information. This may be correct, but it may also be wrong. If it's wrong, any bugs that result will be hard to solve, because the information necessary to understand how A links to B is deliberately removed in the path from A to B.

2. ---

An abstraction is a conceptual reformulation of the problem. Each layer of abstraction reformulates the problem. It's lossy compression. Each layer of abstraction is a lossy compression of a lossy compression. You want to minimise the layers because running the problem through multiple compressors loses a lot information and obscures the constraints of the fundamental problem.

3. ---

You don't know a-priori if the information your abstraction leaves out is important.

I would go further and argue: leaving out the wrong information is usually a disaster, and very hard to reverse. One way to avoid this is to avoid abstractions (not that I'd recommend it, but it's part of the tradeoff).

4. ---

Abstractions misrepresent by simplifying. For example, the fundamental problem you're solving is moving electrons through wires. There are specific problems that occur at that level of specificity which you aren't worried about once you introduce the abstraction of the CPU's ISA. For example, bit instability.

Do those problems disappear at the level of the ISA? No, you've just introduced an abstraction which hides them, and hopefully they don't bubble up. The introduction of that abstraction also added overhead, partly in order to ensure the lower-level problems don't bubble up.

Ok, let's go up a few levels. You're now using a programming language. One of your fundamental problems here is cache locality. Does your code trigger cache misses? Well, it's not always clear, and it becomes less clear the more layers of abstraction you add.

"But cache locality rarely matters," ok, but sometimes it does, and if you have 10 layers of abstraction, good luck solving that. Can you properly manage cache locality in Clojure? Not a chance. It's too abstract. What happens when your Clojure code is eventually too slow? You're fucked. The abstraction not only makes the problem hard to identify, it makes it impossible to solve.

hackinthebochs · on May 31, 2021

Abstractions are about carving up the problem space into conceptual units to aid comprehension. But these abstractions do not suggest lower level details don't exist. What they do is provide sign posts from which one can navigate to the low level concern of interest. If I need to edit the code that reads from a file, ideally how the problem space is carved up allows me to zero-in on the right code by allowing me to eliminate irrelevant code from my search. It's a semantic b-tree search. Without this tower of abstractions, you have to read the entire codebase linearly to find the necessary points to edit. There's no way you can tell me this is more efficient.

Of course, not all problems are suited to this kind of conceptual division. Cross-cutting concerns are inherently the sort that cannot be isolated in a codebase. Your example of cache locality is case in point. You simply have to scan the entire codebase to find instances where your code is violating cache locality. Abstractions inherently can't help, and do hurt somewhat in the sense that there's more code to read. But the benefits overall are worth it in most contexts.

MillenialMan · on May 31, 2021

I feel like you didn't really engage with most of what I said. It sounds like you're repeating what you were taught as an undergraduate (I hope that doesn't come across as crass).

I understand the standard justifications for abstraction - I'm saying: I have found that those justifications do not take into account or accurately describe the problems that result, and they definitely underestimate the severity. Repeatedly changing the shape of a problem until it is unrecognisable results in a monster, and it's not as easy to tame as our CS professors make out.

To reiterate: Twitter, with a development budget of billions was crashing people's entire browsers for multiple years. That's not even server-side, where the real complexity is - that's the client. That kind of issue simply should not exist, and it wouldn't if it were running on a (much) shallower stack.

This is a side note, but you keep referencing the necessity of the tower. Bear in mind what happens when you increase the branching factor on a tree. You don't need a tower to segment the problem effectively. 100-item units allow segmenting one million items with three layers, and 10 billion items with five. Larger units mean much, much fewer layers.

hackinthebochs · on June 1, 2021

>I feel like you didn't really engage with most of what I said.

I didn't engage point-by-point because I strongly disagree with how you characterize abstractions and going point-by-point seemed like overkill. They don't misrepresent--they carve up. If you take the carving at a given layer as all there is to know, the mistake is yours. And this isn't something I was taught in school, rather I converged to this style of programming independently. My CS program taught CS concepts, we were responsible for discovering how to construct programs on our own. Most of the students struggled to complete moderately large assignments. I found them trivial, and I attribute this to being able to find the right set of abstractions for the problem. Find the right abstractions, and the mental load of the problem is never bigger than one moderately sized functional unit. This style of development has served me very well in my career. You will be hard-pressed to talk me out of it.

>Repeatedly changing the shape of a problem until it is unrecognisable results in a monster

I can accept some truth to this in low-level/embedded contexts where the "shape" of the physical machine is a relevant factor and so hiding this shape behind a domain-specific abstraction can cause problems. But most software projects can ignore the physical machine and program to a generic Turing-machine.

>You don't need a tower to segment the problem effectively

Agreed. Finding the right size of the functional units is critical. 100 interacting units is usually way too much. The right size for a functional unit is one where you can easily inspect it for correctness and be confident there are no bugs. As the functional unit gets larger, your ability to even be confident (let alone correct) falls off a cliff. A good set of abstractions is one where (1) the state being manipulated is made obvious at all times, (2) each functional unit is sized such that it can easily be inspected for correctness, and (3) each layer provides a non-trivial increase in resolution of the solution. I am as much against useless abstractions and endless indirection as anyone.

MillenialMan · on June 1, 2021

I don't think we're going to agree on this, so I'll just say that I do grok the approach you're advocating, I used to think like you, and I've deliberately migrated away from it. I used to chunk everything into 5ish-line functions that were very clean and very carefully named, being careful to encapsulate with clean objects with clearly-defined boundaries, etc. I moved away from that consciously.

I don't work in low-level or embedded (although I descend when necessary). My current project is a desktop accessibility application.

Like, I can boil a lot of our disagreement down to this:

> 100 interacting units is usually way too much.

I don't think this is true. It's dogma.

First, they aren't all interacting. Lines in a function don't interact with every other line (although you do want to bear in mind the potential combinatorial complexity for the reader). But more specifically: 100-line functions are absolutely readable most of the time, provided they were written by someone talented. The idea that they aren't is... Wrong, in my opinion. And they give you way more implementation flexibility because they don't force you into a structure defined by clean barriers. They allow you to instead write the most natural operation given the underlying datastructure.

Granted, you often won't be able to unit-test that function as easily, but unit tests are not the panacea everyone makes out, in my opinion. Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.

hackinthebochs · on June 1, 2021

> 100-line functions are absolutely readable most of the time, provided they were written by someone talented.

Readable, sure. Easily inspected for correctness, not in most cases. The 100 lines won't all interact, but you don't know this until you look. So much mental effort is spent navigating the 100 lines to match braces, find where variables are defined, where they are in scope, and whether they are mutated elsewhere within the function, comprehend how state changes as the lines progress, find where errors can occur and ensure they are handled within the right block and that control flow continues or exits appropriately, and so on. So little of this is actually about understanding the code's function, its about comprehending the incidental complexity due to its linear representation. This is bad. All of this incidental complexity makes it harder to reason about the code's correctness. Most of these incidental concerns can be eliminated through the proper use of abstractions.

The fact is, code is not written linearly nor is it executed linearly. Why should it be read linearly? There is a strong conceptual mismatch between how code is represented as linear files and its intrinsic structure as a DAG. Well structured abstractions help us move the needle of representation towards the intrinsic DAG structure. This is a win for comprehension.

>Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.

We do agree on something!

MillenialMan · on June 3, 2021

Honestly, this characterisation doesn't ring true to me at all. I find long functions much easier to read, inspect and think about than dutifully decomposed lasagne that forces me to jump around the codebase. But also, like... Scanning for matching braces? Who is writing your code? Indentation makes that extremely clear. And your IDE should have a number of tools for quickly establishing uses of a name, and scope.

The older I get, the more I think the vast majority of issues along the lines of "long code is hard to reason about" are just incompetent programmers being let loose on the codebase. Comment rot is another one - who on earth edits code without checking and modifying the surrounding comments? That's not an inherent feature of programming to me, it's crazy. However, I absolutely see comment rot in lasagne code - because the comments aren't proximate to the algorithm.

With regards to the idea that abstractions inherently misrepresent, I'll defer to Joel Spolsky for another point:

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

nojokes · on May 26, 2021

> Code doesn't tell stories, programmers do

It is like saying the books do not tell stories, writers do.

TeMPOraL · on May 27, 2021

It is, but GP's point is pretty clear. Perhaps a better way to express it would be: unlike natural languages, programming languages are insufficiently expressive for the code to tell the full story. That's why books tell stories, and code is - at best - Cliff's Notes.

dsego · on May 25, 2021

Is code just a byproduct of specs then? Any thoughts on literate programming?

loopz · on May 25, 2021

Literate programming is for programs that is static and don't ever change much. Works great for those cases though.

No, what works is the same that worked 20 years ago. Nothing have truly changed. You still have layers upon layers, that sometimes pass something, othertimes not, and you sometimes wished it passed something, othertimes not.

userbinator · on May 26, 2021

Your argument falls apart once you need to actually debug one of these monstrosities, as often the bug itself also gets spread out over half a dozen classes and functions, and it's not obvious where to fix it.

More code, more bugs. More hidden code, more hidden bugs. There's a reason those who have worked in software development longer tend to prefer less abstraction: most of them are those who have learned from their experiences, and those who aren't are "architects" optimising for job security.

cma · on May 26, 2021

If a function is only called once it should just be inline, the IDE can collapse. A descriptive comment can replace the function name. It can be a lambda with immediate call and explicit captures if you need to prevent the issue of not knowing which local variables it interacts with as the function grows significantly, or if the concern is others using leftover variables its own can go into a plain scop e. Making you have to jump to a different area of code to read just breaks up linear flow for no gain, especially when you often have to read it anyway to make sure it doesn't have global side effects, might as well read it in the single place it is used.

If it is going to be used more than once and is, then make a function (unless it is so trivial the explicit inline is more readable). If you are designing a public API where it may need to be overridden count it as more than once.

Some of the above is language dependent.

tharkun__ · on May 26, 2021

I don't get this. This is literally what the 'one level of abstraction' rule is for.

If you can find a good name for a piece of code I don't need to read in detail, why do you want to make me skip from line 1458 to line 2345 to skip over the details of how you do that thing? And why would you add a comment on it instead of making it a function that is appropriately named and I don't have to break my reading flow to skip over a horrendously huge piece of code?

cma · on May 26, 2021

> why do you want to make me skip from line 1458 to line 2345 to skip over the

You should be using an editor that can jump to the matching brace if it is all in its own scope or lambda. There are other tools like #pragma region depending on language. For a big function of multiple large steps and I only wanted to look at a part of it I'd fold it at the first indent level for an overview and unfold the parts I want to look at. But when I'm reading through the whole thing or stepping through in the debugger it is terrible to make you jump around and needs much more sophisticated tooling to jump to the right places consistently in complicated languages like C++.

If there is a big long linear sequence of steps that you need to read, you just want to read it, not jump around because someone wanted to put a descriptive label over the steps. Just comment it, that's the label, not the function name, since it's only ever used once.

You would rarely want it in something like a class overview since it is only called once, but if you could make a case for needing that, profiling tools are limited to it, etc., then those could be reasons.

tharkun__ · on May 26, 2021

My editor can fold/jump, no issues there. Though to be fair vi can easily do it for languages that use {} for blocks which Python is not for example. But it breaks my flow nonetheless. Instead, if I had a function my reading flow is not broken. I can skip over the details of how "calculateX()" is achieved. All I need to know at the higher level of abstraction is that in order for this (hypothetical scenario) piece of code to do its thing is that in that step it needs to calculate X and I can move on and see what it does with X. It is not important how X is calculated. If calculateX() was say a UUIDv4 calculation, would you want to inline that or just call "uuid.v4()" and move on to the next line that does something interesting with that UUID now?

You mention debuggers too. Here I can't jump easily. I still can jump in various ways depending on your tooling but again it is made harder. With proper levels of abstraction I can either step over the function call because I don't care _how_ calculateX() is done or I can step in and debug through that because I've narrowed it down to something being wrong in said function somewhere.

Maybe you've just never had a properly abstracted code base (none are perfect of course but there's definitely good and bad ones). Code can either make good use of these different levels of abstraction as per Clean Code or it can throw in functions nilly willy, badly named, with global state manipulations all over the place for good measure, side effects from functions that don't seem like they would etc. If those are the only code bases you've worked with I would understand your frustration. Still I'd rather move towards a properly structured and abstracted code base than to inline everything and land in code duplication hell.

cma · on May 29, 2021

Visual debuggers also let you click to where you want to skip through but I agree step over with a hotkey is faster.

Letting the debugger skip over it can be done with with the immediately called lambda approach.

hinkley · on May 25, 2021

> The best code is code you don't have to read because of well named functional boundaries.

I don't know which is harder. Explaining this about code, or about tests.

The people with no sense of DevX see nothing wrong with writing tests that fail as:

    Expected undefined to be "foo"

If you make me read the tests to modify your code, I'm probably going to modify the tests. Once I modify the tests, you have no idea if the new tests still cover all of the same concerns (especially if you wrote tests like the above).

Make the test red before you make it green, so you know what the errors look like.

therealdrag0 · on May 25, 2021

Oh god. Or just the tests that are walls of text, mixes of mocks and initializers and constructors and method calls.

Like good god, extract that boiler plate into a function. Use comments and white space to break it up and explain the workflow.

hinkley · on May 25, 2021

I have a couple people who use a wall of boiler plate to do something 3 lines of mocks could handle, and not couple the tests to each other in the process.

Every time I have to add a feature I end up rewriting the tests. But you know, code coverage, so yay.

Aeolun · on May 25, 2021

I see this with basically any Javascript test. Yes, mocking any random import is really cool and powerful, but for fucks sake, can we just use a DI container so that the tests don’t look like satans’ invocation.

Aeolun · on May 25, 2021

> Make the test red before you make it green, so you know what the errors look like.

Oh! I like this. I never considered this particular reason why making tests fail first might be a good idea.

paiute · on May 25, 2021

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” ― C. A. R. Hoare

this quote scales

fouric · on May 25, 2021

This quote does not scale. Software contains essential complexity because it was built to fulfill a need. You can make all of the beautiful, feature-impoverished designs you want - they won't make it to production, and I won't use them, because they don't do the thing.

If your software does not do the thing, then it's not useful, it's a piece of art - not an artifact of software engineering that is meant to fulfill a purpose.

superjan · on May 25, 2021

But not everybody codes “at scale”. If you have a small, stable team, there is a lot less to worry about.

Secondly it is often better to start with less abstractions and boundaries, and add them when the need becomes apparent, rather than trying to remove ill conceived boundaries and abstractions that were added at earlier times.

kaba0 · on May 26, 2021

Coding at scale is not dependent on the number of people, but on the essential complexity of the problem. One can fail at a one-man project due to lack of proper abstraction with a sufficiently complex problem. Like, try to write a compiler.

TeMPOraL · on May 25, 2021

> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

That's fine in theory and I still sort-of believe that, but in practice, I came to believe most programming languages are insufficiently expressive for this vision to be true.

Take, as a random example, this bit of C++:

  //...
  const auto foo = Frobnicate(bar, Quuxify);

Ok, I know what Frobnification is. I know what Quuxify does, it's defined a few lines above. From that single line, I can guess it Frobs every member of bar via Quuxify. But is bar modified? Gotta check the signature of Frobnicate! That means either getting an IDE help popup, or finding the declaration.

  template<typename Stuffs, typename Fn>
  auto Frobnicate(const std::vector<Stuffs>&, Fn)
    -> std::vector<Stuffs>;

From the signature, I can see that bar full of Bars isn't going to be modified. But then I think, is foo.size() going to be equal to bar.size()? What if bar is empty? Can Frobnicate throw an exception? Are there any special constraints on the function Fn passed to it? Does Fn have to be a funcallable thing? Can't tell that until I pop into definition of Frobnicate.

I'll omit the definition here. But now that I see it, I realize that Fn has to be a function of a very particular signature, that Fn is applied to every other element of the input vector (and not all of them, as I assumed), that the code has a bug and will crash if the input vector has less than 2 elements, and it calls three other functions that may or may not have their own restrictions on arguments, and may or may not throw an exception.

If I don't have a fully-configured IDE, I'll likely just ignore it and bear the risk. If I have, I'll routinely jump-to-definition into all these functions, quickly eye them for any potential issues... and, if I have the time, I'll put a comment on top of Frobnicate declaration, documenting everything I just learned - because holy hell, I don't want to waste my time doing the same thing next week. I would rename the function itself to include extra details, but then the name would be 100+ characters long...

Some languages are better at this than others, but my point is, until we have programming languages that can (and force you to) express the entire function contract in its signature and enforce this at compile-time, it's unsafe to assume a given function does what you think it does. Comments would be a decent workaround, if most programmers could be arsed to write them. As it is, you have to dig into the implementation of your dependencies, at least one level deep, if you want to avoid subtle bugs creeping in.

hackinthebochs · on May 25, 2021

This is a good point and I agree. In fact, I think this really touches on why I always had a hard time understanding C++ code. I first learned to program with C/C++ so I have no problem writing C++, but understanding other people's code has always been much more difficult than other languages. Its facilities for abstraction were (historically) subpar, and even things like aliased variables where you have to jump to the function definition just to see if the parameter will be modified really get in the way of easy comprehension. And then the nested template definitions. You're right that how well relying on well named functional boundaries works depends on the language, and languages aren't at the point where it can be completely relied on.

skybrian · on May 25, 2021

This is true but having good function names will at least help you avoid going two levels deep. Or N levels. Having a vague understanding of a function call’s purpose from its name helps because you have to trim the search tree somewhere.

Though, if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?

TeMPOraL · on May 25, 2021

> having good function names will at least help you avoid going two levels deep. Or N levels.

I agree. You have to trim your search space, or you'll never be able to do anything. What I was trying to say is, I don't know of the language that would allow you to only ever rely on function names/signatures. None that I worked could do that in practice.

> if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?

That's the reason I hate the "Clean Code"-ish pattern of lots of very tiny functions. I worked in a codebase written in this style, and doing anything with it felt like it was 90% jumping around function definitions, desperately trying to keep them all in my working memory.

skybrian · on May 25, 2021

I think part of the problem is imitating having abstraction boundaries without actually doing the work to make a clean abstraction. If you’re reading the source code of a function, the abstraction is failing.

The function calls you write will often “know too much,” depending on implementation details in a way that make the implementation harder to change. It’s okay if you can fix all the usages when needed.

Real abstraction boundaries are expensive and tend only to be done properly out of necessity. (browser API’s, Linux kernel interface.) If you’re reading a browser implementation instead of standards docs to write code then you’re doing it wrong since other browsers, or a new version of the same browser, may be different.

Having lots of fake abstraction boundaries adds obfuscation via indirection.

nytgop77 · on May 25, 2021

One more angle: reliable & internalized abstraction vs unfamiliar one.

Java string is abstraction over bytes. I feel I understand it intimately even though I have not read the implementation.

When I try to understand code fully (searching for root cause), and there is String.format(..), I don't dig deeper into string - I already am confident that I understand what that line does.

Browser and linux api I guess would fall into same category (for others).

Unfamiliar abstraction even with seemingly good naming and documentation, will not cause same level of confidence. (I trust unfamiliar abstraction naming&docs the same way I trust weather forecast)

TeMPOraL · on May 25, 2021

I think it may be harder still: typically, when writing against a third-party API, I usually consult that API's documentation. The documentation thus becomes a part of the abstraction boundary, a part that isn't expressed in code.

skybrian · on May 25, 2021

Oh definitely. And then there are performance considerations, where there are few guarantees and nobody even knows how to create an implementation-independent abstraction boundary.

barrkel · on May 25, 2021

Function names are comments, and have similar failure modes.

gilbetron · on May 26, 2021

Comments that are limited to only a 2 or 3 dozen characters at most, so worse than comments ime.

nojokes · on May 26, 2021

You can put your prose at the top of the function if you really need to explain it more. :)

ric2b · on May 28, 2021

But it's easier to notice they're outdated, because you don't see them only when looking at the implementation.

quantified · on May 26, 2021

> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

Which is often unavoidable, many functions are insufficiently explained by those alone unless you want four-word camelcase monstrosities for names. The code of the function should be right-sized. Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger. I work on compilers, query processors and compute engines, cognitive load from the subject domains are bad enough without making the code arbitrarily shaped.

[edit] oh yes, what jzoch says below. Locality helps with taming the network of complexity between functions and data.

[edit] oh no, here come the downvotes!

CRConrad · on May 26, 2021

> ...many functions are insufficiently explained by [naming and set of arguments] alone unless you want four-word camelcase monstrosities for names.

Come now, is four words really all that "monstrously" much?

> The code of the function should be right-sized.

Feels like that should go for its name too.

> Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger.

The longer the code, the longer the name?

quantified · on May 26, 2021

Quite a bit of sentiment around against long names, I personally am fine with them up to about 30-35 chars or so, then they start to really intrude. Glad you’re not put off by choosing function over form!