Hacker News new | past | comments | ask | show | jobs | submit login
The Economics of Clean Code (frederickvanbrabant.com)
128 points by TheEdonian 16 days ago | hide | past | web | favorite | 108 comments

Most people have never worked on a long term project applying SOLID principles and DDD and therefore cannot fathom how efficient a team can be when it is applied.

Have you ever worked at a company where you join and there is a God class (or two or three) and none of the business people and developers talk the same language. The code does things but you couldn't in a million years guess why or how you came about it without having a huge history lesson on the company and its codebase.

Now imagine how amazing it is to join a team where efforts are constantly made to name meaningfully and that there aren't 8 concerns mixed into a single god class. New devs join and they immediately understand what's happening in the business because the code translates directly to the reality. They don't have to learn two languages (business and dev speak).

Now imagine that dev having to add a grandfather clause to an old account. Where should he put it in the `Account`, `Plan`, `User`, `Subscription` or `Business` class? When the distinction between concepts are clear in the code it's easy to determine where the code should go and where logic of different kinds should reside.

I worked for several years on a team that championed SOLID and DDD, and while it was better than the multi-process, shared-memory, stringly-typed horror that it replaced, efficient is not the word I'd use to describe it. The single responsibility and open-closed principles created a Cambrian explosion of tiny classes and the ratio of boilerplate to interesting code skyrocketed.

Maybe we were just interpreting things wrong, but it was very difficult to agree on what those principles meant in practice.

Liskov's substitution principle is also rather complex. I'm not sure I correctly understand it, but I think it is widely violated even by good code. If I have it right, the principle means the result of your program cannot change, no matter what subtype you substitute.

LSP just says that derived classes should behave similar to their base classes.

    class Base {
        virtual int size() { return size_; }
    class Derived : public Base {
        virtual int size() { return rand(); }
Here we see an issue. Derived is overriding Base's size method, which has a straightforward implementation, with something wild. If a Derived is sent to a method expecting Base-like behavior, something bad is going to happen.

Derived should act like a Base with slightly different implementation details. This is all LSP is. It's a fancy phrase for a very simple concept.

Not just similar. All provable properties must be the same.

As I understand it, Java's Object.toString() is an example of an LSP violation because different subtypes of Object return different strings.

That just goes back to the question of what's the desired property. If the specification is that toString() must return a human-readable string, and eg. be a pure function/not throw exceptions, then most implementations would not violate the LSP.

No, it is very much LSP compliant. The purpose of toString() is to give a string representation of the object. Most overloads of this do exactly that.

An LSP violation of toString() would be something that returns a string that doesn't represent the object. If you were to return the current time as a string for an object that has nothing to do with time, that would be an LSP violation.

I'm very confused. The definition of LSP is:

> Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

Why is the exact value of the string returned not a provable property? And, to your example, if a provable property is that the returned string represents the object, how did you know it was a provable property? And how do you prove it?

The rule seems very formal, but your notions seem very informal, and I don't know how to reconcile them.

With the caveat that LSP is still violated in the given example, I think it's important to note that when talking about "all" instances of a given class you might not have much information about the exact value of a string.

Their notions are informal, but on some level it's just the experimentalist's interpretation of LSP -- much how some physicists (loosely) argue that if you can't make measurements to prove a theory wrong then the theory doesn't matter, it might similarly be reasonable to ignore LSP with respect to properties that you don't care about (where that definition is fuzzy and hard to pin down but should represent a definitive class of things within any given single program). E.g., most people don't rely on toString() doing much more than giving a bit of human intuition into the type and properties of an object, so any implementation which respects that behavior satisfies a realist's LSP.

The Java Language Specification does not state an exact value for what toString returns or anything else that would let you derive an exact value, hence it is not provable: https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html...

Think of LSP as being like semver for inheritance: A derived class can add new behavior, but it can't make any breaking changes.

Can you recommend an introduction that shows the "grandfather rule" aspect? I have read DDD intros but don't remember exactly something that would help me.

I think the problem is that it's difficult to articulate what clean code "looks" like. Unfortunately, this is precisely the medium ("looking") through which many developers learn best. Yes there are books, and rules, and this, and that explaining which properties clean code must exhibit, but for many these rules are simply too abstract when the rubber meets the road. So we start to emulate the "look" of clean code (example are everywhere!) instead of the intent. And you know what happens... a mess.

For those of us who have finally made it to the other side of the above process, clean code is more of a "feeling" than a "look". Of course we can point to pain points in the code and explain why this or that might need to be refactored in order to be more "clean", but our spidey-sense is our guiding light _not_ some internal catalog of "unclean" code snippets to avoid.

For me the most important part of writing clean code is making sure it can be understood (which means it can be changed). And to this end I think Flow Of Control tends to be an important consideration because I have realized it's less about "looking" at the code and more about "seeing" the program.

Spot on. Too many people think of clean code as adherence to some arbitrary rules of the method length or stylistic concerns. In reality it's all about reducing the complexity to the absolute minimum and the flow of control being as flat as it can be.

There's a difference between a super-experienced, super-senior developer "keeping their code clean" and a junior one doing the same thing.

Juniors just thrash around from one design pattern to the next, never really achieving anything other than learning what doesn't work.

A much more experienced developer will have an understanding about what clean code looks and where it's necessary. They will recognise when there is a huge amount of uncertainty, for example, so spend less time writing loads of fantastic code before they've firmed up the requirements.

So it really depends on the nature of the project and who you have on your team as to whether you need to worry about 'clean code'.

The enlightenment moment is when you realize that messy code that models the business domain well is far, far easier to maintain than clean code that lives in its own clean code bubble. The former can be cleaned at any time. The latter will immediately revert to a state of pandemonium on the next requirements change.

This isn't the main thrust of Sandi Metz's famous "All The Little Things" talk, but it's front-and-center in the approach she takes to cleaning up the code. Half the talk - basically all the parts where she keeps mentioning, "You'll notice our complexity metrics are still getting worse even though the code is getting better" - largely consists of getting the way the software models the business domain whipped into shape.

Or, to go full analogy, no number of trips to The Container Store will leave your house feeling clean and organized if half your possessions are junk that you don't need and the furniture arrangement is awful.

I often wonder if I have a different definition of clean code than most people. To me clean code is code that's easy to change without breaking something, and that corresponds mostly to closely modeling the business domain. Of course other things matter as well but that's a whole book.

That's my view as well, but it's built 100% from the trenches of real-world problem solving (I self-studied all of it, no formal academia in computing beyond a few intro classes).

I think it's consensual by now (at least 'round these parts) that we've gone too far with clean and patterns and being too clever for your own sake (or that of your company) and even agile (once management took the concept over and changed it to its very opposite, kinda like democracy nowadays versus the Ancient Athenian spirit if you think about it). It's all pop culture by now, see that Silicon Valley show.

The most tragic, in a sense, is that the people who created those concepts, who formalized them (extreme programming, uncle Bob, etc) very clearly stated that these are not "hard rules" and cannot work as such, ever — like we say "it's a 6" or "it's a 2" by instinct, there's no science there, no genius, just practical methods.

The minute it became some sort of KPI the whole essence turns sour, possibly detrimental to the mission.

When you approach things this way, with this prior or meta-knowledge so to speak, you can very much apply "clean code" principles to an otherwise very domain-driven model and software architecture; in fact one could argue you're making a true mathematical model — which is really not about people and compile times but about exactitude, precision, an as-honest-as-can-be mapping of a real phenomenon. Had we built physics like we approach "great code" nowadays, we'd have much prettier formulas that approximate truth in abstraction but never quite reach it. A treat for theoreticians, a nightmare for people with real-world, industrial problems.

The enlightenment moment is when you realize that messy code that models the business domain well is far, far easier to maintain than clean code that lives in its own clean code bubble.

Indeed. Keeping code "clean" is more about how it represents the concepts and relationships it is dealing with and less about whether it follows any specific style rules. In my experience, code that is easy to understand and maintain often has a relatively simple design that stays close to its problem domain as much as possible.

Of course, good style is still important, and being able to write tidy code when you have to move away from the real world concepts and get into the implementation details is also important. However, without some overall structure that reflects what the software really does and how it's used, it's easy to lose focus and concentrate too much on the supporting cast at the expense of the stars of the show.

> Keeping code "clean" is more about how it represents the concepts and relationships it is dealing with and less about whether it follows any specific style rules.

I really disagree. To take a prime example, short functions are almost always better than long functions. Regardless of the content, codebases that are refactored into smaller, more self-contained functions almost always improve in reliability, performance, readability, and extensibility.

I think the reason for this largely comes down to reducing the surface area of interactions. A 100 line function can introduce subtle interactions between the first and 99th line. Whereas a three line function, the graph of potential dependencies is much smaller.

A similar logic extends to smaller, more self-contained code blocks, classes, modules and even applications.

> A 100 line function can introduce subtle interactions between the first and 99th line. Whereas a three line function, the graph of potential dependencies is much smaller.

Interesting. And what happened with the other 97 lines of code? A fair comparison would be a 100 line function and maybe 20 functions of 5 lines and maybe 20 additional lines for function signatures plus maybe 20 lines coordinating the 20 functions. We also then have introduced 20 function names.

> To take a prime example, short functions are almost always better than long functions.

This one is so very disputed, though.

My own sense is that it's not the length of the function, it's the length of the loops and the complexity of the branching. Short functions help when they prevent those from occurring. But, in cases where neither is present, I haven't seen a refactoring to short functions that I thought was an unambiguous improvement. Even that variable from line 1 influencing line 99 doesn't end up being particularly difficult to see or understand as long as the control flow is linear.

Even without branching or loops, it's still a problem. Every line in a code block shares a namespace and state-space with each one another. If there's one very consistent lesson in the history of software engineering, it's that state is a necessary evil to be minimized. Stateful code almost always is less reliable and harder to reason about than pure code. Small functions keep the surface area of stateful interactions strictly delineated by the arguments and return parameters.

The more lines of code that exist in a block, even with linear control flow, the exponentially higher chance that one line inadvertently clobbers something that subsequent line depends on. Consider a function like

    def foo (msg: str):
      head_index = find_header(msg)
      ... # 50 lines of code
      log(msg, head_index)
      ... # 50 lines of code
      return msg[head_index] 
Now say somebody adds some chunk of logic in the middle

   def foo (msg: str):
      head_index = find_header(msg)
      head_index += offset_digest(...) # Needed for log formatting
      log(msg, head_index)
      return msg[head_index] 
A seemingly innocuous change to the function, has inadvertently clobbered the value of a variable we're relying on in a slightly different context. With 50 lines of code separating each reference to the variable, it's very easy for this bug to fly under the radar unnoticed.

An engineering culture that emphasized small functions avoids this problem, because the change would instead move the logic to its own self-contained function. That avoids the problem of new code accidentally clobbering the old code.

    def foo (msg: str):
      head_index = find_header(msg)
      log_offset(msg, head_index)
      ... # 50 lines of code
      return msg[head_index] 

    def log_offset (msg, head_index):
      head_index += offset_digest(...)
      log(msg, head_index)

The problem you describe is also fixed by any language that is explicit about stateful behaviour and effects. It's not an inherent property of having a long function, just of having insufficient control over mutable state (or poor discipline in using the available controls).

Even if you have strong mutability controls built into the type system, reasoning about correctly applying those constraints is far easier using small code blocks.

If a variable is declared at the beginning of a block, its use has to be audited throughout the entire block. It's a lot easier to confirm that a variable is never mutated across 4 lines than it is across 200.

When the size of a code block exceeds the capacity of working memory (about 8 chunks), then devs will skip the cognitive effort and fall back to the most permissive types by default. Not always, but bugs mostly come from the codebase at its worse, not its best.

Considering your point from the other direction, the most mutability disciplined codebases tend to naturally bias towards short functions. In my experience most long functions are long, because the locally-scoped variables are being used to haphazardly track statefulness across the control flow. To contrast, Haskell's immutability and purity strongly restricts that pattern. Unsurprisingly, one of the most striking features of Haskell codebases are how short the functions tend to be compared to something like Java.

If a variable is declared at the beginning of a block, its use has to be audited throughout the entire block.

Well, if the variable is declared at the beginning of the block and is scoped to the entire block, we have two possibilities.

One is that the variable really is relevant throughout the block, or close to it. In that case, full mutability makes it more difficult to understand the behaviour, but if the variable is immutable or can only be mutated using explicitly controlled tools then the audit you mention is trivial.

The other is that the variable is only relevant to some relatively small part of the block. In that case, it is possible that the function is trying to do more than one thing and that part of it could beneficially be separated into its own function, taking any variables that are only relevant to that part with it. It is also possible that the variable is being declared prematurely and starting its lifetime earlier than necessary, in which case it may be better to move the declaration later so its lifetime is shorter and the area where it is in scope is reduced.

The latter cases do tend to lead to shorter functions, or at least reduced scopes, and for good reasons.

However, what you can't do is magically take a mutable variable that is genuinely relevant throughout some algorithm that really is complicated enough to require 200 lines of code to describe it, say the current state of some fundamentally complicated state machine, and make that complexity go away just by factoring out smaller functions. You're still going to have to pass the variable or some equivalent around, and you're still going to need to mutate the state in lots of places. But now your logic for doing so is scattered across 50 4-line functions, and to establish the true behaviour you don't just have to audit one of them, you have to audit them all and all the code that connects them.

> It's a lot easier to confirm that a variable is never mutated across 4 lines than it is across 200.

It's even easier when the language you're using doesn't let you mutate or rebind in the first place. You mention Haskell codebases having short functions, but that isn't an automatic given. Haskell and other ML languages are quite friendly to long functions specifically because the language guarantees no mutation, among many other reasons.

In fact, the Elm project explicitly tells Elm developers to not worry about the length of functions because of this property. You can't accidentally mutate or change something and not notice, because you simply aren't allowed to do that.

To take a prime example, short functions are almost always better than long functions.

This is a popular claim in some circles, but I don't think it stands up to scrutiny.

As others have observed, while you may gain locally through having simpler individual functions, you may also lose globally because you now need to manage the complexity of how all those functions relate.

There seems to be very little evidence to support the argument that shorter functions are objectively better, and if anything the evidence we do have suggests that extremely short functions are objectively worse in terms of things like defect density.

I would rather argue that functions should do one thing well. This will tend to correlate with shorter functions rather than 100+ line behemoths, but the important thing is that a function represent a single, coherent idea. Maybe that idea can be expressed as some neat one-liner in a functional programming language. Maybe it's some 200+ line ordered sequence of conditions that systematically determines how to shift between states in a state machine that is modelling a fundamentally complicated situation.

Hot take: the "short functions are better" camp is really just accidentally arguing for pure functions and immutable data. If that 100 line function has 3 side effects and ends up broken into 10 smaller functions, you probably now have at least 7 pure functions.

But some other function will need to stitch all the small functions together, and you’re kind of back to square one, no?

Not really, since the dependencies between them are now explicit. Yes the stitching is its own complexity, but divide-and-conquer is a pretty time-honored approach to effective problem solving.

> but divide-and-conquer is a pretty time-honored approach to effective problem solving.

only if you can also divide and conquer the amount of persons that will dev, because now when there was one name to learn and assign meaning to, there are ten different names.

I am speaking as someone who used to have a hard limit on function length (20 lines, in Allman style :-) ) - this ended up being really harmful and creating way too complex code sometimes.

Of course it was harmful, you were focusing on the wrong thing. The point of having methods with no more than [insert number] lines isn't because having a method with more than [insert number] lines is inherently bad. It's because long methods are usually an indication that the method is breaking the single responsibility principle. I get the impression that a bunch of programmers read Robert Martin's work, put no thought into what he was saying about short methods, concluded that the line count was the problem, and began proliferating that idea.

To give you an example, when using Java's stream functionality, I tend to insert every method call in the chain after stream() in a new line, for readability:

When I scan the method to see if it should be broken down, the fact that this code occupies 4 lines doesn't really contribute much to my decision to refactor the method into several methods, because those 4 lines are accomplishing one thing.

Function application is THE killer feature

Clean Code is easy to change by definition. What you are describing is not Clean Code, but something else, maybe consistent code? Or organized code? The boundaries are definitely wrong if it is hard to change.

I've had a few arguments with junior developers about clean code. They want to hyper-abstract everything because that's what they learned in school.

Then I point out that, if they just refactor to a single method that's 20 lines, they can read it from beginning to end and it makes sense.

Here's another tip for juniors: Avoid Golden Hammer syndrome.

Developers do deep work with a framework which ties them to a specific language and the accompanying ecosystem. So, what you see is the "not invented here" syndrome, and how already solved problems are tackled again in that specific ecosystem.

Many solstices ago, I was a PHP developer. If you'd ask me to split a 2.55 Gb CSV file in several CSV files as a one-off request, I would have fiddled with a PHP script, because that was my bread and butter. Probably would have taken something like https://csv.thephpleague.com/ off the shelf to get the job done.

Well, no, you can do exactly that in a single line with tail, head, split and cat.

Since I moved away from PHP, I have come to find a new, deep respect for the already existing general purpose commands which are part of the Unix spec. And reflecting on my earlier work, I realize that I could have solved some challenges a lot quicker.

Now, this is not a stab at clean code or PHP. Both are valuable tools.

Rather, a big lesson for young developers is that the cleanest code out there... is the code you didn't write at all, as you leverage what's already there.

> Well, no, you can do exactly that in a single line with tail, head, split and cat.

You can't, because csv records may span multiple lines. So dicing them with standard tools may subtly break them. Unless you know your particular CSV data only has single line records.

(I take your broader point though. It's just that CSV is a bad example, because you usually wind up needing specialized tools to deal with them. It's likely that your PHP script would actually be more correct!)

Thanks for clarifying that overlooked detail.

I should have mentioned another lesson for junior developers: learn to understand the structure of your data and avoid making assumptions about formatting.

Thank you for xsv and other things, Andrew.

To GP: also - CSV headers. Still possible in one line of shell, but it is starting to get unwieldy.

I assume what you're arguing against is using small functions with descriptive names as documentation. If that's the case, I think the term "hyper-abstract" is a massive overstatement.

In any case, I think the core issue is really how much faith you have in the proficiency of your teammates. If you can't trust the developers in your team to actually only write functions that do what they say in the name (without nasty side-effects), then I can see why not having all the logic in front of you in one giant function can be annoying - you have to jump into all the definitions and manually double check to be sure what's happening.

When I read it, what I thought of as 'hyper-abstracted' is more the mistake of building a set of generic objects to then compose your functionality out of, rather than just composing your functionality out of direct function and method calls.

I've been thinking about this a lot lately actually. At some level, when you create a hierarchy of classes and/or a collection of objects, you're deciding that some logic is best implemented with that set of objects and the way they talk to each other as the fundamental building block. It's very much like designing a language; you're designing the lego bricks and then putting them together to make something. The hope is of course that you're going to be able to use those lego bricks in other situations by composing them in different ways.

Where I think we can go wrong is in getting stuck on the one abstraction type; particularly in this case it's falling into a mindset where the only way to compose software is by comnposing objects. Yet at some level, you're going to be composing your logic out of function calls (or what essentially look like them). You need to know at what level you're building and viewing the system. Sure, you can build a set of objects that you can then use to build some other logic that's implemented by their composition and collaboration, but sometimes maybe all you need is a method that implements this logic more directly, using function calls and their results. From far away, it's OOP with objects collaborating via messages; from up close, it's just procedural (or even functional) code with restrictions on the state it has access to.

This mistake I think is embodied in a sentiment I used to pick up on a lot at the tail end of my CS education (around the time Java was really starting to come into fashion). It was a sentiment I'd describe as something like "OOP has superceded procedural programming". To which I'd always want to ask the question "but what about inside your objects?"

>> rather than just composing your functionality out of direct function and method calls.

It's called imperative programming. Funny how rarely you see the word.

I think it's more about recognizing that structure is like cement in that once it's laid down, it holds things in place and can be a pain to pull up and change.

For this reason, it's better to prefer as little structure as you can reasonabley get away with, not more. Now, this is a nuanced view, a senior person can absolutely insist on more structure up front and avoid issues, but they'll have a good reason for doing so that doesn't involve "it's clean code".

But in general, you want as little structure as you can get away with. It's a lot easier to change something that hasn't been abstracted to death with guesses about what the future is going to hold.

I think part of the issue is a perspective thing. If I spend a day throwing something together and it sits for 6 months, great. I got ROI from that code. If 6 months in new requirements come in and I decide I need to rewrite it, who cares. It worked day in and day out for 6 months off a days worth of effort.

Too many people view code as needing to be long lived and unchanging.

I totally agree. It's hard because management relentlessly pushes you to get things 'done', to move onto the next thing. That thing is done already, why do you want to touch it? Why is this thing going to take so long to do, didn't you do those other parts already? Sure, you can have some maintenance time later.... Maybe, I guess. No sorry this new feature is a higher priority. Also why aren't you quicker?

There's a lot of social pressure that pushes people to view code as a thing that's either done and permanent, or not done, unfinished and therefore unacceptable.

This sounds more like polymorphism vs switch statement

I also think that the whole code reuse thing is pushed too hard - leading to lot of over engineering as a result.

I don't think code reuse is itself a bad thing, but it can easily be overused and cause far more confusion than benefits.

Yes, I remember working on a product at a startup that had so much code reuse that simple flow become impossibly hard to understand, with dozens of conditions in each reusable block that you'd have a hard time making heads from tails. Code reuse can be abused badly.

Yep, wasn’t until I started working in large codebases that I realized that inheritance/don’t repeat yourself/etc. could easily become antipatterns if used too much

Inlined code also helps make performance howlers really obvious, e.g. repeated database fetches or code that touches more data than it needs to.

> they can read it from beginning to end and it makes sense.

I disagree. First 20 lines in a single block means that any single line can have interact with any other. The 19th line's behavior can be subtly influenced by what's on the first line. That means it's not just an issue of reading line-by-line. You have to keep every line in the entire code block in working memory simultaneously.

The limits of human working memory is about seven or eight chunks. When functions get longer than this, they take exponentially longer to understand. 100 lines of code broken up into 25 line functions can take ten times longer to understand than 1000 lines of code broken up into well-abstracted four line functions.

> Juniors just thrash around from one design pattern to the next

I contend that one of the ways a junior developer becomes a senior developer is learning about what works and what doesn't that comes from thrashing about in a codebase. They're being paid less than the senior folks so don't worry about them thrashing about (and pair with them using an open and inquisitive mindset to help them think through the consequences of their changes).

You're right, and my comment probably sounded way more dismissive of juniors than I intended it to.

That's the problem -- the seniors should mentor them such that they don't have to thrash around. Unfortunately that type of skill is on the opposite end of the spectrum as wrangling computers.

Code reviews can help with that. If your experts are so lacking in people skills, you can establish ground rules for the reviews. This is actually a thing in some places.

Yes it can really help if structured right

Put another way, a more experienced developer will recognize which issues will actually make future changes slower, vs issues that are just engineering aesthetics. Ultimately the goal is to get to a great product quickly, and an experienced developer keeps sight of that.

For a junior to become a senior dev one needs to gather experience. Meaning there needs to be an investment in a error culture and a review process, maybe mentoring, pair programming, scheduled time to learn, communicate/discuss, play and learn. These are deliberate long term investments, which are based on trust, patience and growing relationships. Senior (and implicitly) maintainable, extendable clean code doesn’t just happen out of nothing.

> Juniors just thrash around from one design pattern to the next, never really achieving anything other than learning what doesn't work.

I couldn't put this into the words as well as you did, I know myself one of the turning points of realising I had become a better developer was learning from those who taught me how to identify the design patterns and how to build up requirements.

Clean code is usually less important than a good structure, if you have bad deverlopers or a lot of juniors, your processes should be in place to not allow them to go too far off course. This is what really improves the development time and the maintenance.

The messy code is usually localised to a small area and if a refactor is required, its much smaller.

Not to mention "clean code" suffers a lot from "no true Scotsman" fallacy and it usually means "code the code reviewer likes"

Not every situation requires the same level of verbosity and be structured in the same way, for example.

From my experience a good thing to do is to set up strong static code analysis (like eslint, TypeScript strict mode, IntelliJ inspections, Sonarqube, ...) when the project begins. This helps to avoid some endless discussions and some bugs. Another upside is, that the rules are enforced automatically what is better than to remind your colleagues each and every time when the break the rules. With a set of rules written down as a config file for a code analysis tool, there is a definition of clean code (even if not a complete one).

Starting with overly strong rules and relaxing them later is better than starting with loose rules and tighten them later, because it would require much refactoring to introduce a new rule, whereas dropping a rule is a thing of a few seconds.

Keeping code clean doesn't mean architectural perfection. It means architectural consistency. So, if your controllers talk directly to your database, great! Just make sure it's easy to do and doesn't involve too much boilerplate.

Clean code means architectural perfection where it matters. Should the controllers talk directly to a database or have an abstraction? I can make either argument, which is right depends on your problem. Plan to switch databases (for testing, you don't trust your vendors, or your customers demand different ones), then the abstraction is required. If everything else is locked into the 40 year old database (which isn't normalized because that concept was just an academic idea with no real world use back then) then there is no way you will change anyway so maybe you don't need it. Or maybe it is the other way around - good sql "should be" portable to any database, while the non-normalized database needs the abstraction to ensure all the constraints are kept sane. I just put forth 4 different arguments for a simple choice, but before you decide which is right for you need to figure out if this isn't some other consideration my arguments didn't cover.

There's also more than one way to carve up the architecture. I do think that we humans need slices in order to mentally compartmentalize and be able to work with manageable bits. But there's no reason I can think of to systematically prefer vertical slices or horizontal slices.

Horizontal slices give you something like the classic monolithic three-layer architecture, and vertical slices give you more of a component-based approach.

The big trick is to know which approach you're using, and stick to it. In a layered architecture, you absolutely do not skip layers, but it's generally fine to have one layer talk to pretty much any part of a neighboring layer. In a component-oriented architecture, it's really not a big deal IME to let controllers talk directly to the database, just so long as you never let one component directly access another component's schema. You get data out of other components by talking to their public interfaces.

As long as you color within those lines, things won't get too tangled. If you're really worried, I suppose you could try doing both vertical and horizontal splits, but my impression is that that always places the application at risk of collapsing under the weight of its own bureaucracy.

I generally think you have several levels of architecture.you might go vertical at the high level and then delegate each slice to more junior architects who do different things in their slice. Of course this depends on the size of the project too.

Yeah. Experience teaches you that "it depends" is the correct answer to a whole lot of questions. And it hopefully teaches you to have some wisdom for deciding which side of "it depends" is the best answer for your situation.

Exactly, a lot of people assume that any of those options aside from the strict layering is "unclean" code. Clean code is code that is easy to read, easy to modify and more or less DRY. The rest is dressing.

I used to think that clean code was short lined straightforward functions in a nlayer app with logical services ( in general)

I can write decent code that is easily readable with that flow.

But ever since I integrated DDD and event driven design for splitting up modules for their bounded context, the old code seems not that clean anymore. And I love it, although it caused a lot of refactoring and it is slower to develop, aside that... the code became way more flexible / future-proof/performant.

Even integrating with legacy code-bases is more clear and straightforward. I recommend every Junior to read "preserving domain integrity".

Ps. You don't need 4 layers for every bounded api, just use it, when it's appropriate ( the smaller the logic, the less you split up) and if you need a quick intro then DDD quickly is ok.

Can you recommend any open-source codebases that are designed around this style of architecture? I'd really like to see what a good example of this looks like.

Mmm, i would say eshoponcontainers ( Microsoft/microservices ) or Pacco ( devMentors).

Both are for microservices, but just replace microservice with module and it's the same thing. Just less of an devops overhead since it's in one application.

Please take time to familiarize yourself with the concepts first.

Domain events ( = broadcast important changes. Eg. PayPalPaymentConfirmed, CustumerEmailChanged ). Tip: this is important for having a lovely coupled legacy integration ( eg. For an anti-corruption layer )

Cqrs instead of inserts and edits in one IService/ILogic.

Handlers and Mediator ( c# library)

Events for combining logic ( something like Nats supports 5 million messages/ second)

correlationId and excellent logging should be your number #1 priority.

DDD is better for a NoSql backend than a Sql one. Although I prefer MartenDb though, using it by default as NoSql ( BSon) and using duplicated properties for performant searches, so I don't need a Search Service, NoSql allows you to quickly store your Bounded Contexts.

Accepting duplicate data on different domains and handling it with replaying events.

Database per service ( or at least table), do not integrate as you would do before in SQL.

Clean architecture is a good fit ( domain.core, domain.infrastructure, domain.application and if you want a microservice, add domain.api )

Event Storming for discussing new projects with domain experts.

If you want to read about it, try DDD quickly, it's quick to read.

There is one downside and that is that there are a lot of patterns. But coming from a guy that hated the word ( but applied the concepts), it becomes very quick to communicate with other teams and discovering their entire architecture within 10 minutes.

That's just my 2 cents though, since it's literally adapted to fit my needs.

Awesome, thanks for the recommendations.

Your welcome, minor typo though

lovely coupled legacy integration, had to be loosely coupled :)

This article sounds much more clever than it actually is. Admittedly it does take time to actually know how to write clean code, but if you can, I'd argue it takes less time to write clean code than anything else. Especially if you have a great senior engineer who can architect all of the inheritance and fan out the implementation of the adapters to more junior people.

The problem with the word "clean" is that it is a very ambiguous term.

It can refer to indentation, identifiers, coherent types, cyclomatic complexity, encapsulation, coupling, modularity, configurability...

In a sense these pertain differently to clean code only by degree. Poor indentation is some dust on the six windowsills in your home; small issue, easy to address. Poor coupling for a large, long lived project is guano all over the carpet.

The ideal clean code of course has proper indentation, self-explaining identifiers, coherent types, minimal cyclomatic complexity, perfect encapsulation, perfect coupling, perfect level of modularity and perfect level of configurability. To which degree we want to achieve the ideal and to which degree these individually factor into it varies from project to project.

It probably means all of the above. You should be able to read the code and understand how everything works.

True. On the flip side I think "clean code" is supposed to imply a particular type of coding popularized by people like Uncle Bob. When people say clean code, I don't think of tabs vs spaces etc. I think "Oh this guy codes to Bob Martin's spec".

I don't feel this article brings too much new information to the clean code discussion table. As always, the advice is to strike a balance according to your best judgement.

The summary says it well:

> don’t over-invest (financially and technically) but also don’t try and outrun the liabilities

I think the author mistakes messy code for technical debt. Even with clean code, you will have technical debt because as the business evolves, your code becomes stale. For instance, your UI might start out simple so you start with just jquery. The code is clean because you didn't write crazy long methods, etc. Business keeps adding features to the page and after a while jquery becomes unmanageable. Now you have technical debt.

> Even with clean code, you will have technical debt because as the business evolves, your code becomes stale.

Suppose the ideal clean code base. There is of course no stale code there. In the event of new business requirements, it has been reconsidered carefully and rewritten to cover the new requirements in a way that's architecturally consistent. If that was impossible with some previous iteration of the architecture, the overall architecture has also been reconsidered and changed.

For your jQuery example, with unlimited resources available to address code smell, you'd of course either replace it with some existing higher level library that suits your use case, or you'd write your own high-level generalizations. I have seen a lot of code that was terrible not because they had unwieldy business requirements, but frankly because the code had poorly considered abstractions and unclear boundaries. Features were added without consideration for the architecture. Quick and dirty fixes were considered preferable to overhauls. Short term workarounds ended up being long term don't-touch-this-we're-not-sure-what-it's-doings. What worked at some thousand LOC became the guiding principle for further additions.

The real problem is that rewriting or cleaning up can become unbearable for some combinations of organization and code base. It takes time and can become an organizational hurdle if people are simultaneously implementing new features or fixing bugs. Every time a developer says "this will take several weeks to implement" and the project owner successfully insists that it can't, the code base gets dirtier. In the real world some PPM of fecal matter is acceptable because getting the last few particles out is going to increase the cost by an order of magnitude.


I think people conflate clean code (readability) with code refactoring. For the sake of readability I don't think having clean code matters much... but refactoring I think can be important depending on the complexity you're dealing with. But even then, I'd only worry about refactoring when it comes time to worry about it. Usually I find that's when the scope of the software grows by some factor, or when you foresee it growing by some factor and you find it cumbersome to add simple features, yet a simple refactoring would make the addition of features much easier.

So IMO, the importance of "clean code" depends on the complexity you're dealing with. Always having clean code just for it's own sake seems unnecessary...

Maybe you experience that conflation of concepts because other people have a different idea of what clean code means. IMO readability is just a side effect of clean code, and refactoring is part of the process of achieving it.

I think a lot of people forget the last part of, ship the first thing that works: it has to "work" for some reasonable expectations of fitness for purpose.

Our short iteration times in software are the envy of many engineering professions but it's the fitness for purpose part that nips some of us.

I don't necessarily believe Clean Code is the silver bullet to avoiding difficult to maintain code but some kind of standard is important. There are principles and patterns of design that are known to be effective which can be useful in guiding a project through the tactical phases from cascading poor short-term choices that lead to long-term technical debt.

The economics seem like they should include many more factors. There's a time and a place where the effort for a correct-by-construction approach costs too much and the risk tolerance of introducing errors or incorrect behaviors is tolerable to the people funding the project. In such cases it's time to market we care about and nobody is going to lose much sleep if you lose 0.0005% of your SLOs over a 3 month window.

> But I’m sure you would agree that before you can have a maintainable product, you should first strive to have an actual product.

Actually, no. The goal is to have a sufficiently profitable (or for non-profit use cases, worthwhile) project.

It is entirely possible, easy even, to ship a product and gain revenues, but in an unsustainable way that is a net negative to stakeholders, users, or other relevant parties.

In the case of "unclean" code, it might not be as bad as ignoring significant security considerations, but it could sink entire business plans, causing negative side effects for people you are notionally supposed to be looking out for: teammates, partners, collaborators, etc. In a for profit business, that often involves wasting investment in the process.

The problem that I observe at my work place is that the spaghetti has moved a level up. The "code" which sits inside individual microservices is mostly fine. However the topology of how those interact is a convoluted, intractable mess.

That's just taking the trash from under the sofa and hiding it under the carpet. That happens quite a bit

One of the things that has almost disappeared from these kinds of discussions, is pedantry and feuding over formatting and naming styles. The ubiquity of automatic linters has saved millions of hours of useless bikeshedding.

I’m not sure I buy the “clean code” dogma anymore. If the purpose is to keep the code clean for the sake of the language I think you have missed the point. I think the code should be clean with regards to what it is trying to solve not wrt what language or style you have chosen. Code that is obvious what’s business solution it solves, no matter how messed up that case is is a win.

I think it has conway’s law written all over it. And as soon you try to fight that you have lost that battle.

Code for the sake of Keeping code clean is wrong, but pretty.

PHP community is here a funny example. Basically an example what happens when you bully coders that their languages sucks. They went for fullblown corporate style clean code rules (slow), for a language used in places where speed is crucial. This language currently is completly suboptimal for business if you apply clean code rules.

I've worked in 2 companies that used PHP, where php programmers formed a "clean code religion". One company died, another lost a lot of money. It was directly connected to the clean code rules - they went over the deadline, by a lot, they created overbloted code(but ok with the rules) that was pain in the ass to work on. "Code is documentation" was repeated like a mantra, which is a pile of bullshit by the way and is not making stuff easier (and I ended up with my own docs anyway). Clean code rules are cool if you don't know what you're doing and you want to hide it. It's a way to say "its not my fault, look I followed clean code rules, my work if flawless".

And coders in those companies didn't even noticed that there is something wrong. I've noticed when it was too late. In the second company, on my last day (half the company was sacked cause client got pissed and cut the money), we where in a restaurant eating and talking about programming stuff. Guys (who weren't sacked) where discussing a new web page they where working on. The main frontend guy, said he spend 2 weeks perfecting a dynamic menu cause he had some problems with loading time. The guy was rendering the menu in JS, on frontend, record by record. I asked him "if making dynamic menu is such a problem, why don't you have it prerendered in a text format, and then load that text when page loads, you won't need to render anything?" He replied that this would be against the rules and the code would have to use hack or something. I don't know any fucking rule that wouldn't allow me to do such a thing. I recognize eyes of a fanatic when I see them, so I've shut up. They solved the problem of menu by displaying users a loading gif when the code waited for menu to be rendered

After that day I decided I wont do any PHP gigs anymore. Nor work with something similar in syntax to PHP.

It's not just clean code, it's also things like microservice migrations or complete rewrites (preferably in a trendy framework or language) that makes programmers think they finally have the solution to all their problems (not those of the business, mind you), so they spend a bunch of time infrastructuring themselves down a hole they'll never escape, and bring the company down with them because they're "doing the right thing" or "fighting tech debt" or "breaking up the monolith".

Programmers should not have the responsibility to decide these things on their own; it should come from the companys risk assessments and help with the goal of the company, not the goal of the programmers (which is to put neat things on their CV.)

This is spoken as a programmer, by the way.

I haven't had that problem, but I do see a tendency for devs who enter the mid-level hump start to get big for their boots and want to clamp down on those below/around them doing things the "wrong way" (the way that isn't the way they like).

Normally you need a senior dev with a sense of pragmatism above them all to smack down any attempts to straitjacket the coding below/around them.

I use PHP a lot, simply because there's so much of it out there (lots of work), and one of the firsts thing I often do when taking on a project from another vendor is rip out any "no space at the end of the line" linting garbage and the like. As always in life, it's not about swinging for one side (vomitcode) or the other (anal styleguides), but achieving balance.

As someone who came after the whole php craze, could you describe what some of these practices were?

Part of the problem is that nobody agrees what "clean code" looks like. The only objective definition is "code that somebody else can understand as easily as if they had written it themselves". Which never happens.

Follow good rules of thumb like keeping methods short, reuse code rather than cutting and pasting, and write unit tests (and, of course, use spaces and not tabs). Otherwise, all code is as clean as all other code.

> code that somebody else can understand as easily as if they had written it themselves

Neah, that's not it. You can write crappy code, as hard to understand by others as it is to understand by you - in fact, that's what happens more often than not. It's not that you understand it but nobody else does - it's that in 1month, nobody understands it.

The objective definition that I like is "code with no incidental complexity" where "incidental complexity" is as defined by Rich Hickey (see "Simple made easy").

> Follow good rules of thumb

Those are not necessarily great rules of thumb, though. There can be too many unit tests, or bad unit tests. And it may be more appropriate to test something at the component level, or integration level. The rule of thumb for a good test is that it should test the business logic as opposed to current implementation. Or that it should not be non-deterministic (e.g. don't test with random data unless you can save the inputs that caused the failure, and reproduce it later; if you can't and just e.g. add a random amount of delay in your tests, it's not a great test).

> Follow good rules of thumb like keeping methods short, reuse code rather than cutting and pasting, and write unit tests (and, of course, use spaces and not tabs). Otherwise, all code is as clean as all other code.

You can find people who'll disagree with you on every point in this list. So yeah, nobody agrees what "clean code" looks like.

FWIW, at least according to Hillel Wayne's summaries of the state of our knowledge on this stuff (www.hillelwayne.com), all three of those have been studied, and the only one that's been shown to be clearly beneficial is unit tests.

(Not TDD, mind. It apparently doesn't matter much whether you write tests first or second, just that you write them at all.)

I agree with everything in the list. I have also seen clean code that violates every item in the list. I also have seen bad code that follows every item in the list.

I agree, I tend to find myself assuming everything not "clean" is technical debt - putting things in these three buckets has helped (YMMV).

1. If you don't have users, you don't have technical debt (you might have lot's of #2 though). You could frag the entire app and the only thing of value (not potential value) lost would be your job. Debt implies value was quickly received in exchange for a flexible re-payment schedule (of time in this case, not money); if the code is not actively being used to save/spend/gain some resource then that value exchange has not happened yet.

2. Correct Code* that is unclean and could be improved in <= time that it took to write is just bad code. Fix this code now.

3. Correct Code that is unclean and would take significantly more time to improve than it took to write is technical debt (but only if you have users). This is where a balance needs to be struck between business needs and development needs. Manage this code carefully.

*code that correctly implements business logic, incorrect code is always bad - no matter how long it took to write.

I mean, code less. I'm fine landing in 3k lines of garbage that makes perfect use of some great libraries and platform features. Much better that vs taking ownership of 30k lines of great code that duplicates a lot of stuff that should be imported.

Of course, more likely case is that bad code + bad factoring comes together

The problem underlying is fundamentally a bargaining and expectations issue. Sorry for the plug but you should check out my post about time estimates: http://kyleprifogle.com/dear-startup/

> But I’m sure you would agree that before you can have a maintainable product, you should first strive to have an actual product.

The assertion that proponents of clean code make is that the fastest way to the end goal (in the ~6 week window) is with high internal quality[1]. If your project is <6 weeks, then do whatever. If you think you'll be around longer than 6 weeks, then make it clean.

[1]: https://martinfowler.com/articles/is-quality-worth-cost.html --> Not Robert C. Martin, btw.

I have Malcolm in the middle look on it... (senior dev)

There is business reality - parents (expect miracles, but don't put enough effort pushing it to Malcolm, because he is smart) There is developers reality - Reese (they think everyone else is stupid but not them) There is testers reality - Dewey (they trust developers too much) There is customers reality - Francis (business had high hopes for them but it turns out they were wrong)

How does it come to clean code? In reality there is no clean code as we would like there to be. There is only messy reality. Uncle Bob is a salesman selling dreams and promises.

"In reality there is no clean code as we would like there to be"

Sure there is. You want clean code, here you go:


Fancy title, with content everyone knows. Meh!

Simple trade-off: the more "clean" your code is, the more it will benefit when others/future-you reads it. If it's a quick one-off or if it's just for you, the value of "cleanliness" is far less than the value of speediness and responsiveness to client.

Has there been a post-2000 company that has succeeded with no one complaining about tech debt?

Its likely wrong to apply a template Rule to every single project. For a small project, no this would not necessarily apply. An experienced lead knows When to apply what and not just go by single principles

Cool metaphor of “well balanced and nutritional dinner” vs “fast food.”

"We are in too much of a hurry to do a good job."

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact