Hacker News new | past | comments | ask | show | jobs | submit login
Code doesn’t have to be a mess (danielsieger.com)
184 points by dsieger on July 25, 2022 | hide | past | favorite | 182 comments

In my experience people refactor code to their own understanding of the problem and not all refactorings improve the code.

People abstract before an abstraction is necessary.

I find single file dense leetcode style code easier to understand and follow the flow. Algorithmic code I can reason around. A large mature codebase is far harder to get to know.

One of the first things I do when I study a new codebase is find all the entry points and follow the flow of code from beginning to the thing I am interested in.

One person's beauty is another person's mess.

It's harder to change an existing codebase than to write a simple program that does the new thing but not in the context of the original program. A reference implementation of the various components is far easier to understand than one big ball of mud. Fitting problems together is hard. You need to understand the old thing before you can introduce the new thing and it ends up being forced or hacked in if the design doesn't support the new thing.

I tend to write reference implementations of everything, then combine them together as a separate project.

I find an empty file far more reassuring than a large codebase.

Maybe I'm weird but a lot of my refactoring actually concretizes overly abstract code. It's easier to think about adding functionality to a block of code when you acknowledge that at the moment it only does 2 things, rather than using obscure wishy-washy language that implies it could do a dozen things.

Where I'm definitely weird is that I have a higher verbal score than your typical developer, and I'm not afraid to use a thesaurus to find a better word for something. Too often we end up recycling jargon in situations where they are not quite doing the same thing but nobody could be arsed to open thesaurus.com and find a word that telegraphs, "B is like A but is not actually A."

I think I'd like working with you.

Overly generic code is often pre-emptive, and most times the day never comes that you need that flexibility. And often when you do, you discover you need flexibility along a different axis anyway.

And most of what we do is story telling: what were the requirements we understood? What is our model to solve it? What are some precise examples that show it working in different capacities? When people treat the code as simply "the thing the computer interprets", instead of "the thing the next person has to comprehend", you get this inevitable slide into incomprehensible code.

Unfortunately our profession is obsessed with outdated approaches to (premature) performance optimization, and an addiction to being "clever".

I think I'd much rather surround myself with driven folks that have empathy and a strong desire to be understood. That's a long way from the programmer stereotype I'm familiar with.

> It's easier to think about adding functionality to a block of code when you acknowledge that at the moment it only does 2 things, rather than using obscure wishy-washy language that implies it could do a dozen things.

This is why Sum Types are so great. They give you an option between "concrete type" and "any type that implements this interface": "one of these specially enumerated types".

Yep, I do similarly.

More tightly bound code is often easier to understand and mechanically modify later - there are fewer places where you lose "if it compiles, it works" guarantees.

I feel like a lot of people are blindly pulling coding habits from libraries, and applying them everywhere. Libraries and applications (i.e. "terminal" products not used as a library by someone else) have different needs and different goals - don't write your application like a library, it'll be a huge pain.

I had to look up "telegraph" to confirm I'd understood you correctly. I don't recall seeing it used in the context of describing language, so I doubted my interpretation. I've heard it used most in discussions of boxing: a boxer's posture or their sequence of muscle activations _telegraph_ their planned attack such that their opponent has time to block or counter.

I like the way you used it. I'll try to use that myself.

It means communicating something quite clearly but by indirect means (and possibly inadvertently).

I've had this argument in code reviews a couple of times:

Them (on the subject of 4 new methods): hey why did you make method 3 look "weird"? You should make it look like the other three.

Me: because one of these methods can set the building on fire, and the rest can't. I bet you can guess which one is the dangerous one. Works as expected.

> People abstract before an abstraction is necessary.

This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

> ... not all refactorings improve the code.

While true, I have a low tolerance for code that requires constant bug fixing, or is so overly complex that the thought of modifying it makes you want to cry. Some projects truly require that level of complexity. But in my experience many do not, and once you've gained a solid understanding of the problems it is trying to solve, incremental refactoring is a fantastic way to improve the code's stability and maintainability. This is especially true in C++.

> This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

Mastery will be, when you write code in a way, that does not impose unwarranted limitations from the start, and still keep it readable and only containing mandatory complexity.

Usually this can be achieved through deep understanding of the problem, mapping to simple concepts or finding or making that one concept that captures things well.

Not always it can be done. Not always can a masterful solution be found, which keeps complexity low. However, it is definitely a mistake to draw a black and white picture of "if you want to make it work for the future, you must add complexity". Often people simply choose bad abstractions or wrong ones and will only realize, when the future has become the present and the system they built cannot fulfill some requirement.

> Often people simply choose bad abstractions or wrong ones

When writing software, ideally I'd like to make all the right choices and use simple implementations of abstractions that do not impose unwarranted limitations.

I think that it's sometimes worth it, early on, to do things the quick way despite bad abstractions. This can get you to a place where it's easier to reason about good abstractions.

Sadly, I've been on teams where a bad abstraction was adopted because it was just assumed that that would be quicker. Instead of doing it the quick way, we just did it the bad way.

We write abstractions to tame complexity. But abstractions themselves are inherently a form of complexity. A good abstraction may be simple to use, understand, and extend, but it can also make it harder to understand or debug issues because underlying data/state has been obscured. I agree that making anything "black and white" is a mistake. I merely said what I said because in my experience, most developers tend to abstract things that don't benefit anyone, and add unnecessary complexity. Besides, if you're writing an abstraction for a future hypothetical problem, chances are that you don't have enough information to create a good abstraction, and you're going to have to redo it anyway.

> Often people simply choose bad abstractions or wrong ones...

This is me. They tend to lead me to good abstractions (after merciless refactoring), and I'd like to think that I'm sucking less at this over time. But my overall process is very slow (good thing I'm self employed). Understood that it'd be better to stop and think instead of diving into new-abstraction boilerplate work.

Worse though is to be under heavy pressure to ship and move on — with the bad abstractions getting hopelessly calcified / buried.

>> People abstract before an abstraction is necessary.

> This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

I worked with a guy who did that. He had a plan for what the project would look like 5 years down the road, and he built abstractions to support that. He could get away with it because he could hold it all in his head and it all made sense to him. When version 1 was half finished he was called away to work on another project, and those of us who followed in his wake struggled to make any sense of what he left behind. A year later he was laid off. The project was a success, but nobody ever asked for version 2.

>> People abstract before an abstraction is necessary.

> This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

No, the ecosystem the code exists in and my ability to reason about the codebase is worth way more than any gain that comes from blindly stacking "simplest solution for problem a, b, .. z" atop one another without regard for higher level understanding of a codebase.

I was not implying a lack of understanding of the ecosystem/codebase or to blindly stack "simplest solutions" on top of each other. I was stating that one should not add more complexity than is necessary for the problem at hand. Often people solve a simple problem with a complex solution due to some hypothetical future problem they've conceived in their mind. Most of the time these hypothetical problems never materialize, and they are now stuck with code that is much more complex than it needed to be.

> One person's beauty is another person's mess.

This is so true. Also I believe that when the original author wrote the code he had a (hopefully) clear vision of the solution. He wrote it as tidy and fitting to the problem as he saw it. Then sometime later someone else comes in and is supposed to alter the code in a way which does not fit the original author’s idea of the problem. This creates a mismatch. The new guy can’t and won’t change the code too much, as it is too risky/much to do and therefore will only do as little as possible to make his change. Then some other comes along and do some more changes, which again isn’t enough etc etc, et voilà, you have a ball of mud that screams of a rewrite.

Spot on. You've described Peter Naur's "Programming as Theory Building" paper exactly.

I try to encourage newcomers to refractor the code into a form they understand, fix the problem and then undo that refactoring as much as possible. If they actually come up with a better abstraction I'm up for it.

Refactoring will give them the chance to see what the actually moving parts of code are.

I like your use of the words moving parts. Eventually code ends up looping over memory locations, copies, moves, adds, subtracts, multiplies, divides, reads, writes data or memory locations.

All the files and code on the way to get this to happen such as Classes, parameters, arguments, variables, functions, methods, closures, objects are ideas of the languages compiler to abstract the instruction stream.

Command line arguments, class constructors, URL query parameters, marshalling, JSON field names, method parameters, function arguments, HTTP headers, cookies, request objects, events are just complicated variations of passing data in the right shape. They are not the above list of "moving parts" or computation that is easy. In other words modern coding is just configuration.

I feel the complexity of modern code is a problem we created. And I feel there's something missing. It's hard to update code.

When I find the loop that does the thing, I feel I can understand the codebase such as the magic +1, -1 or the relationship of objects linked together in a data structure or the assignment to a list or array or variable.

"How does that get to here"

> I find single file dense leetcode style code easier to understand and follow the flow. Algorithmic code I can reason around. A large mature codebase is far harder to get to know.

I genuinely can't tell if you're being serious or not. If you are, do you also like to read books written as one giant chapter? Or entire chapters as one giant paragraph?

I have trouble with you equating "leetcode style" with "one giant chapter" and also with you equating "enterprise code" with one chapter following another, because when I read enterprise code it's

1. read one line of first chapter,

2. then skip to the last sentence of the middle chapter,

3. then realize the first chapter was actually the penultimate,

4. then read the forth sentence of the first paragraph of the second chapter, bearing in mind what you have learned,

5. then throw your hands up in the air in dispair

Jumping around is the nature of code in general, whether its in a 1000-line file or split up amongst multiple files.

For a maintenance programmer, they may already understand how everything works. They aren't following a particular code path, necessarily. Maybe they're working on a new feature and they need to re-familiarize themselves with previous chapters. It that case, it's nice to jump to a file that concerns itself with things grouped together.

>> things grouped nicely together

fixed it for ya :)

I'm a maintenance programmer. Even working layers of layers of layers above the actual shit does not make it not stink.

>> jumping around

...should be intuitive and joyful, not a disaster to your brain.

EDIT: I am a fan, though, of SOC. I guess enterprisey code tries to be that (but fails hard at it).

Okay, great. You are a maintenance programmer. You have a 1000-line program all in one file. You need to update the e-mail functionality of this program. You aren't following a stack trace or following a particular code path. There are 15 or 16 different e-mail related functions. How do you find the e-mail function you need to update? Do you memorize line numbers? Use regex search? Do you have vim marks setup?

>> here are 15 or 16 different e-mail related functions

Is this you showboating the greats of layered code?

Depth is not width and width is not depth, but surely you see the difference in the two?

If you have 15 or 15 different e-mail related functions spread across a whole bunch of files, how do you find the one you need to update?

At one point in time I used OpenGrok to try understand large projects.

Without documentation I find large projects difficult to understand. There's literally too many global symbols and I cannot see the forest for the trees.

What's the model of this program? What are the core principles that the author is using? Do I really need to read every file to understand what is going on?

With Leetcode style programs there's one file with everything in it and I can usually find the entry point. The problem is well defined.

I can see the moving parts in a Leetcode style problem. The looping, the data structure creation and control flow, arrays and recursion.

Large mature codebases such as Java projects have thousands to millions of small files and packages it can be difficult to see how things fit together, every file seems to be 10-20 lines long.

I like C projects as they have lots of code in one file. Everything I need to understand a module is in one file and I can use vim folding.

I’ve been programming for nearly 30 years and I feel the same way. 1000loc+ files are increasingly common in my code bases. I find it really hard to read code with lots of tiny files that individually don’t do anything. It’s like the programmer is embarrassed by their code so they’re making me search for the core logic.

Splitting code into multiple files really only makes sense to me when there’s a clear division of responsibilities. That can mean a lot of things - like client / server, utility methods / core algorithm or class A / class B. But plenty of complex data structures are much easier to read and understand all at once. For example, I have a rust rope library which implements a skip list of gap buffers. The skip list is one (big) file. The gap buffer is another file. Easy.

Same. Well, 1000 is an outlier but 300-600 feels right. I sometimes feel bad for doing it, because it's not what some other people might consider good code to look like. I occasionally have to do code review or help fix a bug in a the other kind of codebase and even the devs often don't seem to understand how those thousand 15-line files all fit together.

> Without documentation I find large projects difficult to understand. There's literally too many global symbols and I cannot see the forest for the trees.

Good abstractions let you see the shape if the forest.

Most "good code" or "simple code" is an inedible potluck of "simplest solution for the problem at the time" with some documentation.

Actual good code has intention revealing abstractions that communicate the essence of the problem and problem domain.

back when i inherited a legacy C codebase i found https://www.jgrasp.org/ pretty useful to explore large files.

I could see this working with vim folding. That's an interesting approach. I never really used folding in IDEs.

The book analogy you use isn't very accurate. Even if you merge chapters and paragraphs like that, you still read it sequentially. Just in a less comfortable way.

Which is not at all like a modern codebase that is modular, abstracted, etc. If you're new to a codebase, and want to understand one particular feature, you'd likely need to jump back and forth across 10 files.

It's not far-fetched to say that makes it difficult to understand.

The thing is that if you just need to understand a specific part of something you will need to jump as well even if everything you needed for that one thing is written sequentially in one file. You will want to skip over implentation details of certain things to get the general picture first on a more abstract level.

Let's say you have a simple endpoint that takes a list of comma separated inputs, parses them as numbers and spits back a sorted version of that.

I don't want to see a version of that, which a compiler might have inlined. Including the implementation of the sorting algorithm. I only want to see a high level of abstraction version of it. Basically just something like (pseudo code in a non existent language) :

    fun endpoint(input):
      inputs[] = split(input, ',')
      numbers[] = parseAsIntegers(inputs)
      return quicksort(numbers)
I can easily understand what this does and what the idea is behind this "algorithm" in 3 lines. If I had the "inlined" version of this I would have to manually identify each of these parts and potentially skip over tens to hundreds of lines.

I think this is really a bit about trust. Do you trust that these named functions I am calling do what their name does? Does quicksort actually do a quicksort or has someone implemented bubble sort in there? Of course this is a minimal example and especially quicksort would probavly just be a library but imagine all of these were large complex pieces of our code base.

Personally I am an advocate for using functions (methods or whatever your language calls them etc) and naming them properly and then trusting those names by default. I want to spend time making this nice and understandable and abstracted once when writing. Not every time someone reads it. As soon as something does not seem to behave in the way the name suggests I will then and only then go check the actual implementation and for example find out that parseAsIntegers actually also supports floats and quicksort is not actually quicksort but bubblesort and that is why this endpoint was slow etc.

This would be the “simple” code. The “abstract” code would be more like this:

  public class EndpointManager {
    private EndpointInputManager eim;
    private StringSplitter splitter;
    private NumberParser parser;
    private Sorter sorter;
    public EndpointManager(EndpointInput input) {
      eim = new EndpointInputManagerFactory().setInput(input).build();
      splitter = new StringSplitterFactory().setDelimiter(new Delimiter(",")).build();
      parser = new NumberParserFactory().setFormat(NumberParserFormat.INTEGER).setMode(NumberParserMode.LIST).build();
      sorter = new SorterFactory().setSortOrder(SortOrder.ASCENDING).setAlgorithm(SortingAlgorithm.QUICK_SORT).build();
    public EndpointOutput endpoint() throws ParseException {
      return new EndpointOutputFactory().setOutput(sorter.getSortedList()).build();

My guess is this is Java or something close? We can make that much more readable. We may have to do away with bad libraries. A lot more could actually be magicked away, which can also be a problem sometimes. In a "real" application this resource's interface would probably not just take a comma separated string in a body but accept a proper JSON object or somesuch and not just be a "sort endpoint" but it's not going to be much different from this if written properly. I happen to like the few annotations you'll see me use here. I also think that something as simple as a line break can unclog things. Also the choice of having each of the stream operations in a separate line is deliberate for readability. There are linter/auto formatting rules to enforce this (we do this at my current place for example).

    public class SortResource {

        public List sort(@Body String input) {
            return Arrays.stream(input.split(","))
I do recognize the kind of code you pasted. Had to work in code bases like that for way too long. Never want to work in one of those again. There's probably lots of EJBs and other such nonsense around that?

I was going to make the predictable joke that you need a factory. Disappointing that you took care of it already.

You can easily understand what it does because you picked an example that is easy to understand :)

This discussion often ends up in the extremes: inline everything versus abstract everything. I don't think anybody reasonable would opt for either of those extremes, we should focus on the very large middle ground where there's a lot of subjectivity.

Trust me on this, I've been raised on the DRY dogma and all related architectural patterns in favor of abstraction. I've lived the life, for 2 decades. But I cannot ignore the outcomes. Most codebases are extremely difficult to understand and it's very painful to change things. As 90% of all software development is maintenance, that's a planetary-sized problem.

This doesn't mean you should inline everything, it means sane choices. As a simple example, say you're using a literal in your code:

(if orderAmount > 100000)

This code is incredibly easy to read. Common convention says to put this in a constant, at the top of the file. Old me agrees, new me does not. For as long as it's the only occurrence of the value, it doesn't need abstraction. The only thing it would do is make the code more difficult to understand. The very eager abstracter might even put that constant in a separate file.

The point of this example is to abstract based on real reuse, not imagined reuse. I'm not against abstraction only against unnecessary abstraction.

A second example. Say you have a reusable UI component. A change request comes in that's pretty large and specific for one niche need. It kind of goes against the spirit of the initial purpose of the component but is still related enough to consider it in scope of the component.

Old me might add a "toggle" to the component, after which it can render in two modes. Sometimes called a "god component". This approach sucks. It makes the component much more complicated and changing and testing it becomes a nightmare.

Even older me would break down the component into smaller components and then "compose" them based on its mode. This is even worse, now you have to jump around many places whilst the sub components are never actually reused (pointless abstraction).

New me says fuck it and splits the component in two. Allowing significant code duplication between both components. It's not as radical as it sounds, it's in fact incredibly comforting. Each component can easily be understood (less complexity) and making changes becomes far less stressful as your blast radius is tiny.

Developers spent the vast majority of their time not coding, instead figuring out how something works and how to make a change that doesn't break anything.

It may not have been evident from my simple example but I do agree with your "middle" approach. That's where I try to end up in our code base. Endless interfaces, methods that are only one line long and such are counter productive. But nobody can tell me that inlining quicksort will ever be useful outside of a place where your compiler can't do it for you and you need to favour execution speed over everything else. I don't believe such places really exist much if at all any longer.

What I do have to have to question is the strict non-use of constants. It can be very very useful to use constants for such things, e.g. if you are calling libraries that do not make it apparent what is what. Say you have something that takes a timeout value.

    send(data, 10)
What is this? I have to know what send is, what parameters it takes etc. I might have to look that up. I can easily work around that with a constant.

    const timeoutInMillis = 10
    send(data, timeoutInMillis)
The same principle can easily apply for other similar situations. I really like it for things like

    doSomethingThatCouldTakeLongButAlsoShouldHaveATimeout(data, 64800000)
What is that and what does that value even mean in human readable? Of course some of these values you will recognize if used enough but so far my domains have been sparse enough that I don't recognize all of them and have to compute. Much better (with shorter, real names anyway but ya know, we're dealing in simple examples here :) ):

    doSomethingThatCouldTakeLongButAlsoShouldHaveATimeout(data, REALLY_LONG_RUNNING_PROCESS_TIMEOUT_IN_MILLIS)
So far most people I've talked to find that it's much easier to recognize that this has a timeout of 18 hours but the method happens to want milliseconds.

Oh and don't get me started on people that use the constants from the real code in their tests, completely defeating the testing. Especially if they then do math with the constants and simply copy the math - or worse, put the math into a method and call it from the tests too - to their tests. Test expectations have to be computed once, when writing the test and just hardcoded into them, otherwise they serve no purpose as changing the code itself will always result in green tests even if you've just made a major mistake by changing the values without thinking.

I think he is, can second that. Not having to jump around 10 files with multiple classes and remembering where goes what usually means I can understand the code faster.

Not sure about literature but for almost any technical matter I prefer learning from the smaller details instead of the big picture - it's often too vague and just doesn't stick in my memory.

Reading enterprise code is like reading a book where 99% of the pages don't contain any meaningful information.

>I find single file dense leetcode style code easier to understand

I find it really difficult to go through huge chunks of iterative code. I need abstractions otherwise I can't get my head around it. I often wind up refactoring into manageable chunks (even in pseudocode/diagrams) just so I can understand stuff.

For reference, my cognitive abilities are heavily skewed towards verbal/abstract reasoning - like several standard deviations above the norm - and my spatial/concrete reasoning is nearly the inverse of this, it's terrible.

I wonder if this has something to do with it!

> find it really difficult to go through huge chunks of iterative code. I need abstractions otherwise I can't get my head around it. I often wind up refactoring into manageable chunks (even in pseudocode/diagrams) just so I can understand stuff.

Understanding lots of code at a module, function, or even more granular level with a magnifying glass feels more productive than struggling to understand the full picture.

It also rewards you with instant gratification. Reading and writing ncrete code gives much more immediate gratification.

One of my recent coding experiences was teach a friend in grad school(MA) to code sufficiently well to finish his Master's project.

Refactoring was absolutely necessary for this. He was writing a single simulation program that was single file in size. But once he'd created ten subroutines all changing a raft of the global variables, the slightest changes produced hair-raising bugs that he'd obsessively dive into debugging.

The intuitions of structured programming and object oriented are more important than absolute fidelity. My points were: "If you can't have an object here, at least have a well defined, standard interface to values that need to be in a consistent state" and "decompose long action sequences into subroutines and if you can't do that, least group similar actions with similar actions in that long action sequence".

Which is to say a given piece of code might not the structure you want but if has a structure, that can be enough. But then again, that piece of code might not structure at all and then rewriting it really is necessary and often is easier than debugging it a few time.

And working with a large piece of "bad" corporate code, I've more than once that you something with one sensible if idiosyncratic structure that was refactored more than once by people who didn't understand the structure and imposed their own structure on just part of the code. But through an exercise in archeology, one can make the whole artifact work.

But that doesn't mean you can't have code that is a true mess when the writer has no experience and no concern with structure.

> People abstract before an abstraction is necessary.

Sometimes an abstraction cuts to the core of the reason why.

See for example https://algebradriven.design/

Good abstractions can communicate intent better than mounds of concrete code because they speak at a higher level.

However, mounds of okay concrete code is way easier to deal with then poorly thought out abstractions.

This means pragmatists get little practice in abstractions, where their pragmatism is needed most to uncover the useful abstraction and avoid the overly complex invented abstraction.

Abstract code also has the advantage of parametricity in strongly typed programming languages.

I started designing an algebraic language by writing code.


It's designed to be expressive and powerful and practical.

The core insight to a problem is rarely what we spend most of our programming time doing.

> The core insight to a problem is rarely what we spend most of our programming time doing.

I believe that is a mistake.

I'll have to check your language out though!

Are those code samples formatted correctly? (they are difficult to read as is)

> In my experience people refactor code to their own understanding of the problem and not all refactorings improve the code.

That's a great point. I once read (here on HN I think) that the value behind a piece of software is not the code but the team whose members all have the same mental model of the problem and can successfully map it to the code. Lose the team and you lose that map.

> One of the first things I do when I study a new codebase is find all the entry points

I follow the same strategy but… Good luck with that when you're facing a Spring application. :)

> I tend to write reference implementations of everything, then combine them together as a separate project.

This is how I write software. My stackblitz is full of domain independent experiments https://stackblitz.com/@Pyrolistical. I then copypasta this into private projects once I figured out how the individual piece works.

Yeah I love this approach too. I’ve been learning CRDTs lately and I’ve gotten so much value from making tiny, inefficient reference implementations of things before diving in and optimizing. Toy implementations shake out all your misunderstandings of your design, and you can refactor like crazy. Going from simple code that works to complex code that works is much easier than creating the complex code correctly from scratch, in situ.

Also I've discovered (and reported) so many bugs when I realized my very simple toy example was broken

Is it deliberate that your comment itself is like an instance of the methodology you describe? Small separate stand-alone thoughts vs. a well-understood description (harder to write, to consume) in longer form commentary?

That's not a criticism at all by the way, I just found it striking.

It was accident and not deliberate but you may reveal how I think. Thank you for this insight.

In hindsight if you focus on three questions: is it good? Is it right? Is it true? You'll head toward a good direction

Synthesis of ideas is really important and the building blocks of understanding are fascinating. Programming and mathematics is taught as building blocks and then deliberate practice.

I really enjoy reading plain descriptions of things, especially of other people's code.

If you understand the core insight, difficult things can be easier to understand and apply for you.

I really want to understand how tracing compilers work and LuaJIT, JVM and V8 but I found the code a bit too hard to understand as I jumped into the wrong locations.

There has been two instances where Wikipedia was enough for me to understand and write an algorithm that implemented the description. Wikipedia doesn't have pseudocode for multiversion concurrency control but it does have an accurate if subtle description. I did the same for btrees but I did read some other people's implementations to get a feel. I of course wrote mine completely differently.

I want people to document their code enough so that the core principles or idea behind their code could be reimplemented by someone else just by reading the description of how it works.

Rpython and Pypy documentation is good but I still don't understand it enough to implement what it does. Which means I'm missing some detail or core insight.

What a great way to do it, too, because once you're done, your "reference implementations" can later serve as test harnesses if need be.

> I tend to write reference implementations of everything, then combine them together as a separate project.

"In my experience people refactor code to their own understanding of the problem and not all refactorings improve the code."

Same for rewrites. Often the rewrite will have the same number of problems, just different ones.

If you are out there and WANT to write terrible code, An amazing essay: https://cs.fit.edu/~kgallagher/Schtick/How%20To%20Write%20Un... I cried the first time i read this.

Generally most of my refactoring is cutting out code; I love an excuse to chunk down excessively verbose classes.. maybe I'm weird but I generally prefer a lean, concise codebase that might share a few free functions than a perfect OOP pyramid.

No mention here is made of test coverage (hopefully because everyone already knows); to refactor without near-100% test coverage is insanity.

>>People abstract before an abstraction is necessary.


Personally I think the most important thing to minimizing code complexity is ensuring that it understandably maps to the business logic. The business logic is the essential complexity and everything else can be seen as waste.

The first step is getting the lexicon right. Frequently the business lexicon is ambiguous in such a pervasive way that the people immersed in the business aren't aware of the discrepancies. For example I remember from working in healthcare the words "claim" and "member" often have very different meanings in different contexts and I would see developers hacking code together to get the data model of one context conform to the data model of another when they should have been treated as different entities.

I agree. As a dev in a pretty large organization, I have seen the knowledge of business logic dissapate as the org grew, with some churn. To the point now where very few people actually know how the current system works, let alone how it is supposed to work. This means the only concrete definition of "this is what the system is supposed to do" is only in the code. The organization is disorganized, and the code is only as good as the level of organization outside the code - so very poor. All this gotme thinking about what does it mean exactly to be organized? We call companies "organizations" because they are groups of people getting together and organizing. If the organization is not organized the code (or any other artifact it produces) will also be poorly organized. Organizing is sorting, classifying, grouping, communicating etc.

Oh god you just brought back a memory from when I was brought in to manage a team at a dysfunctional organization and I was trying to figure out how a complex service was supposed to work. I asked: "Do you have any documentation or requirements", I was told "The code is the requirements", to which I responded "Wonderful that means there can't ever be bugs because there will never be a discrepancy between the code and requirements".

Getting requirements in writing was an uphill battle and the lack of requirements always wound up screwing over the developers because there was no contract to prevent scope creep and the developers were the ones that were held accountable for misunderstood features and missed deadlines. As a result everything was constantly rushed and not well thought out. It took me a long time to convince my boss that the issue stemmed from unwritten requirements and a lack of planning.

I'm going to steal your quote. I love it!

To add, how could they do any QA when testing needs to map to those unwritten requirements

This is a great point. Often I find situations when working with the subject matter experts where the code unveils edge cases in the business reasoning that the SMEs haven't considered. In many of those situations, the issue can be pedantic and the code can point to a catchall. Even so, it is funny how common these kinds of knowledge gaps appear when you are tasked with transforming assumptions into a functional working entity.

This is huge. Often the coder automating some manual process is the first person to sensibly create an unambiguous, correct taxonomy just to discuss it precisely.

It’s almost as though business and software development within a company should agree on a ubiquitous language to describe concepts, and when it becomes too difficult to build a single unified model to define individual concepts, then perhaps the core domain should be split into bounded contexts … (and so on)

> Say No

Getting junior devs to do this is like pulling teeth. Trying to get a feature stopped after they've built it is soul crushing for them. It's a problem.

At this point I've all but given up beyond minimizing the blast radius in code review.

Being married to your code/output is just another flaw typical for juniors. Put them on features that aren't critical or ensure they are made aware up front that their work may be rejected if it doesn't meet design expectations.

I'm not a junior dev and I still get peeved when my time is wasted. I guess I would only not care if I didn't care about what I was working on, but that's a different kind of existential torment haha.

Beyond opportunity cost, you can think of it as deleterious to your performance. If 10% of your work never gets merged because of shifting priorities, compared with someone else who has miraculously dodged these problems, that's pretty unfair.

10%? Holy shit, I'd say a solid 90% of my output never brought enough value to justify doing it in the first place. Much of it without ever seeing a real user. Software development feels like digging holes and filling them back up again, over and over, to me.

Feels like a recipe for burnout IMO; that would be tough for me to deal with.

If only 10% of your work is thrown away consider yourself very lucky, I would think of that would be a minimum number in the industry...

Myself I not too long ago did a 6 month crunch with the rest of my team on a product that was cancelled right before launch...

Try to direct your personal energy away from getting your code into the product and towards making the best product possible. Often after building a feature it is obvious that it makes the product worse. I've had many such features cut.

My irritation comes from that decision/realization happening after we already devoted engineering time to something--the most expensive time to reverse yourself.

I've hit this many times over the last 20+ years, and have come to recognize/accept that most people can not really grok something until they can 'use' it in some capacity, often with 'real' data.

Clickable wireframes, design sessions, mockups, etc - they can all help explore ideas before code, and potentially save things. I've had numerous examples where I can identify "this is confusing" or "this doesn't solve the problem, just moves it around a bit" and I'm usually 'outvoted' by others, and do the work. It's usually only after it's in peoples' hands that they identify the rough edges (or more).

100% agree; I'm a huge fan of the design and product process. In particularly I feel like every product team needs a designer--probably not dedicated, but like, dedicated designer hours. It's yet another case of "1 hour (of design) upfront saves 20 engineer hours further on". I've had the (mis?)fortune of working on teams with and without designers and the difference is super obvious, at least to me anyway.

Years ago we had an open session for how to display some unintuitive data on one of our websites, and while few outside the dev team participated, theirs combined with devs' ideas resulted in around 5-10 totally different designs. Mine was one of the first eliminated, most people thought it was weird and confusing. Four months later, after developing and putting the two top-voted ones in front of users and finding they still didn't understand it, our designer came up with basically what I originally suggested, with a few tweaks to make it look nicer. We've kept that version since then.

That was a frustrating period.

No code changes should be merged purely because the author got less code merged than someone else on their team. This sounds harsh maybe, but by your reasoning, your feelings could end up being the cause for bugs, poor quality code and/or bad design ending up in a release.

Of course, you and your team should work together to _avoid_ having to reject work! But it can and will happen, it's perfectly normal for mistakes to be made, it's how all humans learn.

Trying to deny that people sometimes fail is foolish. Punishing yourself for making a mistake is on you.

The only unfair thing here is taking others in the team hostage with the idea that you are entitled to getting your work merged regardless of its quality, purely because it would make you peeved, cranky, annoyed! That constitutes toxic behavior. If this is a pattern for you, people will avoid working with you.

Instead: embrace the opportunity to learn. Get feedback, reflect with the team, do better next time. Maybe pair up to refactor your work. Take the positive approach!

I feel like my little comment here became something of a Rorschach test. I'm definitely not saying we should merge bad code to keep merge rates even. All I'm saying is:

- Someone says "build this thing"

- I build "this thing"

- That someone says "just kidding, we're not gonna use it"

- I'm peeved

Someone else in this thread is arguing this is an entitled position, and here you're arguing that... well, I think you're arguing that I think all my code is always amazing and should always be merged.

I'm not! Like I wrote elsewhere I've written some pretty shit code, I've built the wrong thing, and I've built broken things. I'm sure this is true for most SWEs. This isn't the scenario I'm describing.

But I think this discussion has some merit in terms of how we navigate code review. For example, conversely, I've been on the other end of some pretty... bad feedback. The first example that comes to mind is that we had a portal where you could search by text or category, but once you selected a result we wouldn't save your search anywhere (query params, session storage, etc.). Consequently, when you clicked our "back" link, your search would be gone. We YAGNI'd it for a long time, but we accepted a very tight deadline project (COVID/government related) that required a category that needed to be sticky.

I built this using query params, like pretty much every search out there (for good reason). This ended up changing a lot of templates, a couple of front-end React components, and required extra logic in a couple Django controllers. It was a big-ish change, maybe (to my recollection) 300-400 lines across a few stacked PRs--meticulously, for ease of review. All previous tests passed, all the new tests I wrote (typically >= 50% of my PRs were new tests) passed, I even built a punch list of UI tests I ran through (this was going to be a big user-facing feature and I wanted it to be bulletproof). This took I think... 2 days of constant work, so something like ~30 hours.

This wasn't our typical process; we skipped our usual engineering meetings about implementation strategy and what-not. Our team was small--4 people including our CTO--but even so we had a wide diversity of opinion when it came to implementation, architecture, and style, so it kind of ended up being the case that if we wanted anything to get through PR we had to hash it out beforehand. But we literally had 7 days or something to do this, so we just didn't have time.

But, predictably, despite all my tests and punch list, my PRs were rejected as "too much code", and we missed our deadline. We launched without the feature. Our CTO reviewed the vast majority of our PRs, he reviewed these and he was pretty furious about the scope of the changes, blaming me for missing the deadline.

Afterwards, he tried reimplementing it using query params in less code, but failed. He then tried reimplementing it using local storage, which was less code, but had multiple problems: local storage works across tabs which is deeply weird, but even if he switched to session storage, it didn't work in lots of versions of mobile Safari if you're in private browsing mode. I rejected that PR for those reasons, which we disagreed vehemently about. Eventually, a couple months later, we paired on it, and basically reimplemented my work together.

There are obviously a lot of flags in this little story, but I don't think they're wildly out of the ordinary for a startup (if anything, it's way too much process for a 10 person company). My point is that, while I'm sure there are a lot of cases of "I'm God's gift to this company merge all my work never question me" out there, there are also a lot of cases of "no PR is fit to merge the first time" and "I'm a great programmer, you didn't do this the way I would, therefore this isn't good enough" as well.

> that's pretty unfair.

Hilarious. I wonder what a plumber or carpenter would say if you were to complain to them on how unfair your job is, because 10% of your output doesn't show in the finished product, yet you are still paid for that output. Imagining the reaction to that just made my day.

The enormous silliness of this argument doesn't seem to stop it from popping up all over the place. People's expectations are relative to their environment. The large majority of things you might think to complain about in your life would look hilarious to a caveman or a medieval peasant or even someone from 1980. Similarly, the vast majority of HN users are extremely high-percentile for global wealth and income: this would be a very boring place if people bought in to your paralyzing insistence that you can't ever discuss improvements to something because problems larger than it exist somewhere in the world.

It's a worldview that's so nonsensical that it's its own reductio ad absurdum: If a fast-food worker complained about wanting to be treated with dignity at work, would you similarly scoff at them because coal miners or sweatshop workers don't even get physical safety?

Having high standards is a _good_ thing. It's the hallmark of society's progress. It doesn't preclude being grateful for the privileges you do have, and it's nothing to be ashamed of unless your self-esteem is so low that you think you don't deserve to be treated well.

These two things taken together:

> People's expectations are relative to their environment. [...] Similarly, the vast majority of HN users are extremely high-percentile for global wealth and income

> Having high standards is a _good_ thing. It's the hallmark of society's progress.

seem to suggest you subscribe to the "trickle-down" ideology. I don't. No, having pockets of "extremely high-percentile" people who are entitled to complain about "unfairness of 10% of their work not being appreciated" is not a hallmark of society progress. It's closer to systemic exploitation. It's a pattern we should know very well from history lessons. No bread? Let them eat cake! Sure. Just brace for the impact when the bubble bursts - there's a sharp blade at the end of this road.

> it's nothing to be ashamed of unless your self-esteem is so low that you think you don't deserve to be treated well.

Because having 100% of someone's work accepted as useful when it's not - for whatever reason - is a basic human right that everyone deserves. That's called "being treated well". I didn't know; I thought not getting 100% sunny days in a year is called "just life", but now I know it's a violation of my rights. How could I be so wrong for so long?

I'm being sarcastic, but you have to accept this comment in its entirety and tell me how happy you are that I wrote it for you. I put work into writing it. I deserve being praised for it, no matter how much you like what I wrote. Right? Please, do treat me well.

How's that for a reductio ad absurdum?

> Because having 100% of someone's work accepted as useful when it's not - for whatever reason - is a basic human right that everyone deserves.

Again this wasn't what I was saying. My argument is that engineering time (and time in general) is valuable, and we should be careful how we spend it. In other words: if, over the course of a project, you're wasting a lot of time, that's bad if you care about the project (and you probably should). Maybe you disagree, but I don't think this falls under the "entitled millennial SWE" category, but rather the "we can do better" category.

I'll also, for the sake of discussion here, say I've done some pretty shitty work and have benefited tremendously from code review and general discussions with my colleagues. I'm definitely not someone who thinks they're a "extremely high-percentile" person, probably above average, but definitely not like a Brad Fitzgerald or something.

> I wonder what a plumber or carpenter would say if you were to complain to them on how unfair your job is

I feel like this is a pretty uncharitable caricature of my position. I'm not whining about all my work not getting in. I'm saying, "We're building houses for poor people, I care about this, you had me spend a week on this thing you said was going into one of the houses, you were wrong, I blew a week of work that could've gone to building the houses, and winter is coming". It's not about me personally, it's about me caring about efficiency.

> yet you are still paid for that output

I don't only work to be paid. It's a necessary but not sufficient component. I try to find fulfilling work that I think improves peoples' lives, and I'm fortunate enough to achieve that more often than not. I'm not saying I'm not selfish, just that I'm not entirely selfish, haha.

Plumbers and carpenters find purpose in what they do, and so do engineers. Getting paid is just one factor of several for job satisfaction.

Time you got paid for is not time wasted ;)

Hah, an older colleague taught me "it pays the same" as a kind of mantra for dealing with the capriciousness of management. Feels cynical, but I've gotten a lot of use out of it over the years.

It’s especially difficult when they came up with it themselves.

We’re amidst a rewrite and we have an off-shore team involved. They same team that built the original starting 4yrs ago.

One of their team members decides “we should validate the TLD for email addresses entered by the user.” Code is added, a TLD file is added, and in code review I reject the whole concept. Show me the ticket or feature docs, and I’ll argue with the author of those instead.

“We did this in the last app…” Maybe, but we didn’t spec that for this app.

He got his local project lead (non-tech) to write a Jira story for us to “discuss the technical implementation.” Dude, srsly.

Our app is web-first and is used on congested mobile networks (like, hundreds of people all using the same cellular site simultaneously.) A TLD file does not need to be delivered to each of them for validation that’s pointless.

The idea is off the table, code rejected, but the guy spent time doing something no one asked for and had his local team onboard with it.

> Trying to get a feature stopped after they've built it is soul crushing for them.

That's why I suggest first filing an issue, discuss a design, and only then actually implement the feature. Will save you many hours.

Why are people going directly to your junior devs for feature requests?

Because the senior dev always says no.

This sounds odd to me.

Is there no planning flow where you work/have worked? I understand giving developers freedom to build, but some sort of oversight by someone with a view of everything that's going on is also necessary.

Typically there are entire planning exercises that happen before stuff even hits a JIRA board.

These are usually conducted by Product Managers, Project Managers, analysts, and often in startups the CEO themselves.

The fact that people are off building features willy nilly sounds like it would contribute to messy code base.

Knowing what's next helps plan the work on the developer side as well which also minimizes the blast radius.

I believe many companies also incentivize this. Not many people get praise, raises and promos by saying no or simplifying things, much less juniors. Once you've built something and put it in your promo doc, no one wants to remove it or say it was the wrong thing to build.

As an example, I did an spike to explore a request from another team. I wrote a short document with my findings and recommended against it due to the cost/benefit analysis. My manager told me not to use this story in my performance review since "we don't want to display failures".

There ought to be a Zen lesson hidden here somewhere.

Perhaps tech companies could have kickoffs/workshops where the participants would create sand mandalas together?

Yes, this is more practical than anyone might imagine.

The guy who just poured the concreted for a foundation, does he care that much that it's torn up or not use? Probably not, even though he's likely skilled and professional.

We are far too precious.

I know for a fact that a lot of people who build real things take pride in seeing them in use and still around many years later. I think they'd absolutely be demoralized if the typical case saw their work torn up and discarded without ever being used.

Depends. Can I put the mandala in my promo packet or not?

Something that shocked me was working with junior programmers for the first time. For decades, I had either worked solo or with other experienced developers.

It was an eye-opening experience.

My style is influenced by Haskell and Rust, even when I program in, say, C# or PowerShell.

A simple example: I will extract the read-only logic into a pure function and minimise the size of the mutable procedure. This makes it trivial to test the logic in isolation without triggering any side effects. Similarly, the logic can have convoluted control flow but the imperative code can then wrap that with a single try-catch block, transaction, or retry loop.

For me this was such second nature that I didn’t even realise I was doing it until I saw the imperative spaghetti written by the juniors. I tried to explain with pair programming sessions what the benefits are of my approach.

Without fail, they would just “hack something” into the existing spaghetti, adding yet another mutable global variable to track some new state.

In every case they said they were in a hurry and that they would “fix it later”.

I replied: “there is no later.”

> In every case they said they were in a hurry and that they would “fix it later”.

This. We need to stop using time pressure as an excuse to do a bad job. Moving things out to a separate function might add 10 min but save 100x when everything comes crashing down.

Of course you shouldn’t overengineer but so much can be gained from spending a little time just thinking about how this should work.

This is my current struggle. I've enjoyed mentoring in the past, but right now I'm getting very exasperated feedback. It's hard because I don't have control of the environment to alleviate the deadline pressure, but I still want to help people learn. The end result seems to be a pile of tech debt for now. C'est la vie.

Something I observed very early on in my career is that bugs will have to be fixed no matter what. You can put them on the todo list and fix them later, or fix them right now. Either way, you're going to have to do the task.

It's like a conservation rule in physics, for every bug found, a bug fix must eventually be implemented. Bug in, fix out.

But... if you leave a bug lingering, then it can cause test failures for unrelated code development. It can trip up other developers. It can cause false positives until resolved.

So the only logical conclusion is that all bugs must be fixed ASAP, otherwise they have a "multiplier" factor dependent on how long they're allowed to persist. If left unchecked, this can blow out exponentially, until you're unable to efficiently fix bugs because you're tripping over thousands of other unfixed bugs while doing so.

You would think this kind of thing is logical, but no-one ever believes me. There's just slow blinking and then a slower repeat of the same old mantra: "We'll fix it... later?"

I agree. I think a zero defect mentality keeps things moving along smoothly. Unfortunately, people think I'm just being a perfectionist. I'm just trying to not trip on the treadmill. Every bug is another stumbling block.

I think this is terrific advice.

Over decades I have compiled my own list which contains all these and bunch of other behaviours that are needed for successful project.

I would add one or two very important thing missing from the list.

One, not explicitly mentioned but covered in other points is to plan for simplicity. Make simplicity an explicit goal of the project and set up process to remind of it at various important points in the process. For example, I have a checklist for adding a new technology which has a very long list of things you have to think about before adding new tech of any kind (like "is it possible to replicate it with couple pages of code"). My goal is to have tech stack so simple that newcomers can feel right at home and productive immediately.

Even if I (we, me and my team, whatever) screw up, then the future owner will tend to have much easier time fixing it if we tried to keep it simple. My hardest challenges were not difficult technical problems (most backend applications tend to be very simple problem from technical point of view) but rather past teams that were very smart and created a monster so complex they themselves ground to a halt after some key people left.

And connected to it (part of the checklist) is to be aware of when you are about add things to the project for intellectual gratification rather than practical purpose -- and cut it mercilessly out.

Engineers tend to really dislike working same technology all over again, but this is what is needed to become really proficient. It seems exciting, but every time you add or change something in the stack you need to learn that thing (and accept being less productive for some time), you accept risk of new problems (and risk is a cost) and, finally, you cause the same to every team member and any future hire.

And while it is easy to see the benefits of something, the costs and risks are usually much less understood before you have invested enough in it. And, additionally, frequently the benefits are much overvalued.

Any chance that your list is organized in a way that you wouldn't mind sharing it? I've been on a documenting spree the past few years at work (after ~14 years of not documenting much), and it would be great to see your decades-long process notes. I totally understand if that's your secret sauce and you'd prefer to keep it that way though.

Thanks for your comments, this is helpful.

I especially like your point about planning for simplicity and making it an explicit design coal. Communicating this clearly to the team seems super important.

As for adding things out of intellectual gratification: This is so true. I've seen this all too often, myself included. Many good engineers are curious by nature, and it can be tough to restrict this curiosity. Maybe it's a question of having different outlets for that sort of creativity, either at work or in private.

If your list is available in some form, I'd be curious to have a look!

The list is long but if I wanted anybody to take one thing from it is that complexity is the real killer.

The job of the most experienced person in the software development organisation should be to spot unnecessary complexities and find ways to eliminate them.

The issue is, most experienced people tend to be engaged in activities for political reasons like adding new technology -- which tends to be perceived as more valuable than removing it.

I might be biased (by selection). I am called upon to join and help projects that face significant problems (emergencies all the time, no time to breathe, way behind schedules, unable to deliver anything, etc.) But every time I join a project like that, the repairing process tends to start with removing stuff rather than adding. And if stuff needs to be added this is usually so that it makes possible to remove much, much more of complexity somewhere.

As an example, we have stabilised at least 3 projects by removing "microservices" and rolling everything to a single monolithic application. Not saying microservices is a wrong idea, but saying it might be wrong for a particular project without strong automation culture, tooling and without large enough problem to solve. Somehow this always starts with strong opposition and ends with happy people that can code and not spend significant portion of their time dealing with complex, flaky infrastructure.

My rule as I present it to the team is "I want at most one of anything unless we understand exactly why we need more than one." So one programming language (unless you need one for backend and one for frontend, then we need two), one application (unless you have multiple teams and then one per team might be better to make them independent), one repository, one cloud infrastructure provider (AWS tries to be on parity with GCP, why do you need something from GCP just for it being incrementally better?), one place to store all documentation and procedures, one database tech (do you really need half of the application use MongoDB and another half use Postgres?), etc.

The rule might sound childish, but it is simple and helps people make better decisions on their own which is essentially what you as a tech lead want.

> Make simplicity an explicit goal of the project and set up process to remind of it at various important points in the process.

Which altitude of simplicity is most valuable?

Ah the Unix philosophy. `man ssh' gives `ssh [-46AaCfGgKkMNnqsTtVvXxYy] [-B bind_interface] [-b bind_address] [-c cipher_spec] [-D [bind_address:]port] [-E log_file] [-e escape_char] [-F configfile] [-I pkcs11] [-i identity_file] [-J destination] [-L address] [-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port] [-Q query_option] [-R address] [-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]] destination [command]`

Wasn't the "Unix philosophy" explicitly formulated by Rob Kernighan in 1983 in opposition to this kind of growth? I mean, there's a whole website of Unix purists named after it:

'UNIX Style, or cat -v Considered Harmful' http://harmful.cat-v.org/cat-v/

The Unix philosophy is to do one thing and do it well. The ssh command wraps the large and complicated ssh protocol - I would say it does one complicated job and does it well.

Also, however convenient or well-implemented it is, SSHv2 the protocol itself is very much an all-singing, all-dancing monolith that’s pretty much doomed to have an Implementation Of Unusual Size. The Plan 9 client[1] has less knobs but still quite a few, and it doesn’t even do forwarding as far as I can see.

[1] https://plan9.io/magic/man2html/1/ssh2

> Rob Kernighan

Is this a typo or a proposed Bourbakism?

I can top that: `man read` gives:

BASH_BUILTINS(1) General Commands Manual BASH_BUILTINS(1)

NAME bash, :, ., [, alias, bg, bind, break, builtin, caller, cd, command, compgen, complete, compopt, continue, declare, dirs, disown, echo, enable, eval, exec, exit, export, false, fc, fg, getopts, hash, help, history, jobs, kill, let, local, logout, mapfile, popd, printf, pushd, pwd, read, readonly, return, set, shift, shopt, source, suspend, test, times, trap, true, type, typeset, ulimit, umask, unalias, unset, wait - bash built-in commands, see bash(1)

Never mind ssh(1), ls(1) pretty much uses all of [A-Za-z] as its options plus -1, but still no -0 option. What I'm personally really looking forward for is -2 option, whatever it would do.

Did you mean `ssh --help`?

`man ssh` gives me detailed descriptions of all flags.

Works quite well in conjunction with Googling "how do I do X in ssh stackoverflow".

You can do X. X Window forwarding ;)

I think this is broadly an incentives and mindset problem.

First, people generally don't hire me to set up a WordPress site (I should get into this though); they hire me to write something new and bespoke. So my skills are in exactly that: I build new stuff.

Second, I'm pretty bored by the idea of gluing dependencies together. It's neat to see how fast or neatly I can do it, but that's good for a month or two tops.

So if you want me to cook up new tech with a pretty good amount of code, I'm your guy. If you want me to carefully build something someone else has done 100x before while constantly having meetings about capitalization, line length, and coding-fad-of-the-week stuff, I can't handle it. My (totally rational, at least to me) response will be: customize a CMS for $10k, don't hire an engineering team for ~$500k.

If I'm stuck on this project, I'll subconsciously try to introduce joy into my life by doing bad stuff, like writing a lot of cool new code where I shouldn't, and so on. Our incentives are misaligned.


Or, you can think of it in terms of innovation tokens. Are you building a new database storage engine? Adopt the conventions of the database you're building it in; don't also try to innovate a new architecture/style. Are you building a new JS framework? All the innovation there is in developer experience, so all your innovation should go into abstractions and mental models; don't also include new, surprising algorithms.

By picking a thing you are doing, be aware that there are 10000 other things you picking to not do.

IMO code should be messy to some degree. If it's not, then I'm moving too slowly. Probably over-refactoring and over-analyzing. And usually that's a symptom of something deeper, like too much ambiguity leading to procrastination, which I can fix by scoping in more detail, writing throwaway code (ex a proof-of-concept), etc.

Obviously we don't want a complete dumpster fire of a codebase, but some mess is inevitable and healthy. First see the mess, then refactor. Refactoring before the mess is how you end up with crappy abstractions.

In the past few years I've adopted the attitude that code cleanliness isn't really that big of a deal. There are some obvious guidelines to follow around readability, encapsulation, etc., but these days I care more about system architecture than I do the code itself. Localized code is easy to change/refactor/clean up, the system itself is not.

From "A Philosophy of Software Design" [1]:

> Ideally, when you have finished with each change, the system will have the structure it would have had if you had designed it from the start with that change in mind.

[1] https://web.stanford.edu/~ouster/cgi-bin/book.php

In my personal experience a lot of the mess stems from complex layer-to-layer interactions. Within my own modules I'm pretty good at keeping things clean but marshalling data from C# to C++ (or Python to C or this lib to that lib) is where I get sad. Or mapping return and error codes, or catching exceptions etc....

The cleanest code I write is for embedded systems without an OS, basically a sparkling gem of refactored goodness.

Drawing the line for separation concerns is one of the hardest things to do well in CS.

Not only in CS, it is hard as organisation design as well. E.g. do you want an engineering team, or do you want a product team that has engineer so to eliminate silo. Should engineer care about hiring? Or that's HR's concern. What about security? What about how good the product is performing? What about customer feedback? Should engineer care for all that? Organisation of code is miniature version organisation of a company. Of you nail it, that's your secret sauce

DB schema to DAOs to business interfaces to binary persistence and to json.

It almost boils down to "have good taste and understand the problem well".

One thing I have noticed is that it's almost impossible to keep a codebase under control over time when business needs change and new people come in all the time.

If you try to apply the "write code that is easy to delete, not easy to extend" principle, your code will be less messy.

Reminds me of this (Greg Young - The art of destroying software):


I'm still trying to figure out how to apply this to my personal javascript codebases.

This is also an argument in favor of functional programming over OOP class-based programming.

The part about constraints is kind of muddled. Constraining the scope of your project is not the same thing as working within a set of externally imposed constraints, which is what people are usually referring to in stories about how being forced to do more with less led to an unexpected innovation of some kind. The former is really just defining the scope of the project, which is covered in the following section.

I’ve had a few bosses give the speech about no heroes.

If I’ve given a speech, well there are several but the one relevant here is instead of trying to build the perfect product, building the best product we can build.

If you don’t follow that constraint you end up in Kernighan’s Law territory, and the wheels eventually come off.

Know your strengths. Build up or compliment your weaknesses, stop trying to Fake It Til You Make It when you’ve made it most of the way to where you’re going to get.

There's a great section in The Practice of Programming where the book describes how you should structure your code to not just be structured nicely now, but to plan for the future; to structure it so that changes are easy, organized, and don't break anything.

It's not exhaustive but it's a powerful general idea and I always like introducing developers to it for the first time.

I must say here that I agree this point but one should also exercise caution that they don't go overboard with abstractions while planning for future. Abstractions for future planning should be lean and flexible enough to be extensible.

Sometimes called YAGNI, or You Ain't Gonna Need It.

I've found that the right level of abstraction is the one that saves time and effort and duplication now, for the features you're currently shipping. If you're thinking about hypothetical new features that aren't even on the roadmap then you've gone too far.

As an example, say your embedded program needs to load images, and your standard library only supports raw BMP files. BMP images are going to work great for the time being, since the art team can supply them that way. By all means, add abstraction method around the library method called "load_image" so that you don't have to refactor a million places to replace that library. It'll give you a great place to add error handling, logging, etc.

Beyond a single method to add a point of attack, don't go beyond that and spend the time to add a JPEG and PNG library, don't add support for high bit-depth images, or for grayscale images, or CMYK images. Don't add an abstraction that'll someday be able to load the Nth frame from a video file. Perhaps throw an exception for unusual input, but beyond that don't waste your time.

In my experience having simple, direct code makes it easier to refactor in the future when the need arises. Having too much abstraction or future proofing gets in the way because the future inevitably will bring different changes than what you were expecting.

Having not read the book OP mentioned, I would say it's important to emphasize that structure does not necessarily mean abstraction. Starting a new project is more like getting groceries and stocking the pantry than it is starting a stew and agonizing over which type of onions to toss in.

Good abstractions evolve naturally out of good structure.

Just curious, have you ever seen a project that is structured nicely according to this criteria that you can share?

I like the ideas, but how many us have the political power at our jobs to say "No" on a project to a feature ?

> If you’ve been developing software for a while, you know that code has this natural tendency to turn into a mess.

That doesn't apply to everyone and every project. When people leading the project are experienced developers understanding clean architecture the mess is just an oversight which is local and can easily be corrected.

Would be curious to know what strategies other people apply in order to keep complexity down over time!

I like a lot of your other replies. I also have a philosophy of doing net improvement every time I go in. If you put a little bit of elbow grease in every time, the net effect on your code over months is pretty nice.

But you also have to understand and internalize that it's OK to do a little bit of improvement each time. You don't have to go in, pick up a piece of code, sigh dramatically, and fix everything you can see about it. Just fix a bit. Turn some strings into enumerations or a custom type. Turn a recurring series of arguments into a single struct. Rename a deceptively-name parameter or function variable into something correct and meaningful. Add a test case for what you just did, or add a test case for something even related to what you just did that was not previously covered. Even just one of those is a good thing. Don't give in to the temptation to throw a 15th parameter on to a function and add another crappy if statement in to the pile of the god function. Don't fix the god function all at once, just take a bit back out of it.

If every interaction on the code base is net positive, even just a bit, over time it does slowly get nicer, and if you greenfield something with this attitude, it tends to stay pretty nice. Not necessarily pristine. Not necessarily nice in every last corner. But pretty nice. And if you do need to take out some technical debt, you'll have the metaphorical capital with which to do it; a non-trivial part of the reason why technical debt has such a bad rap is that it is taken out on code bases already bereft of technical capital, which means you're on the really bad part of the compounding costs curve to start with.

I'm not a greybeard by any stretch, but I personally get a lot of mileage out of just stopping to ask: Does the extra layer of abstraction, or extraction of code to a method, or creation of a class - does it make the code *right now* easier to understand? If yes, do it, if not, don't.

The example I keep coming back to is when I was a junior, one of the other juniors refactored the database handling code in one of our apps to use a class hierarchy. "AbstractDatabaseConnection" "DatabaseConnection" etc. And mind you this was on top of the java.sql abstractions already present.

I don't necessarily know what his end goal was, since the code still seemed pretty tightly coupled to how java and postgres handle connections and do SQL. One might theoretically now be able to create a testing dummy connection that responds to sql calls and returns pre-baked data. But the functions we had were already refactored to be pure functions, and the IO was just IO with no business logic.

Anyway, all it ended up doing was making it so I never touched the database code in that app ever again. Integration testing was handled by just hooking it up to a test db via cli args and auto-clicking the UI. And eventually when people started side-stepping it, I took the opportunity (years later) to just go back in and replace both it and all the side-stepped code with plain ole java.sql stuff that literally anyone with two thumbs and 6 months of java experience could understand.

So now, unless I have some really strong plan (usually backed up with a prototype I used to plan out the abstraction) for an abstraction model, I just write code, extracting things where the small-scale abstractions improve current readability, and wait for bigger patterns (and business needs) to emerge before trying to clamp down on things with big prescriptive abstraction models.

I'm a big fan of the "IO Sandwich". This is where you keep complex computation as pure functions as much as possible. And push the IO to the edges of the system. So you might have read-compute-write. This keeps the computation functions testable and composable.

In probably my favorite software-related talk[1] (certainly the one I most frequently share), this is referenced as “functional core, imperative shell”.

1: https://www.destroyallsoftware.com/talks/boundaries

Does anyone know of a transcript of this talk? There is a link on the YouTube copy of the video, but it seems to be dead.

Thank you for asking. I regret posting this without looking for a transcript first, especially since my capacity for consuming video/audio content has declined as rapidly as a lot of topics I’d be interested in have embraced video. I may well contribute to transcribing it if I find some free cycles.

Yes, this is the way. In addition, often the internal and external representations of information will be different, in which case I normally prefer to keep any conversion or validation logic as close to the corresponding I/O as possible. Then all the internal computation logic only has to work with a clean and well-defined internal data model.

For me the number one thing I try to focus on is _naming_. If something is hard to name, it's likely hard to understand or overly abstracted (misdirected). If something is easy to name, it likely follows [insert any software development "best practice" here].

What's a good name? I love the phrasing from _Elements of Clojure_ by Zachary Tellman [1]

> Names should be narrow and consistent. A *narrow* name clearly excludes things it cannot represent. A *consistent* name is easily understood by someone familiar with the surrounding code, the problem domain, and the broader [language] ecosystem.

1. https://leanpub.com/elementsofclojure/read_sample

Yeah, I find that if you can name something well then everything else falls in place much easier.

At work, for any large feature, we usually go over naming pretty extensively, and aim to be consistent in documentation, code, and discussions, so everyone knows exactly what everyone is talking about.

Trust your tooling, and your repository. It's safe to delete if you still have a record of the way the code was before. Too often I see code that doesn't need to exist because someone is afraid to remove it. Modern IDEs are excellent at showing dependent code, and GIT and other source control tools are excellent at giving you freedom to remove things.

Oh, and have good testing in place to make sure you aren't breaking a required path that your IDE can't detect, obviously. No IDE in the world can detect "Oh, we still had one client on that old obsolete REST call and they are pissed"

That's what we call 'scream testing'

Unit Tests. If you can't write a unit test for it, it's too complicated and it's going to snowball quickly into a giant mess.

Unit tests, while good at promoting decoupling, can absolutely be a major driver of complexity, as it may break the code into far more units than what is reasonable.

Be careful with this. Unit tests don't tell you much about the correctness of a system overall, and they rarely survive a substantial refactoring. Optimizing for unit testability can make individual classes/functions "simple" but at the expense of creating a ton of them and pushing the complexity to the interfaces and integration between them.

I love unit tests, but admit I have absolutely seen unnecessary complexity including complete classes and namespaces solely to enable testability in many cases.

It's a justifiable trade off for me, but I don't pretend that unit testing reduces complexity.

I think it is of at least slight interest to some who missed it, to bring back this thread from 2018, about Oracle code (I too once worked on it so I immediately saved that comment link when it was posted):


I'm not sure if you're saying so, but those are not unit tests.

Yes it is about tests in general. I think it fits the discussion and many comments very well, this does not really seem to be about only unit tests specifically. Many comments are more general in tone.

The very comment at the top of this sub-thread does not seem to limit itself to the subject of unit tests.

My experience with automated testing was great until I had to test I/O functionality: files, databases. That's when the test suite itself became too complicated.

Absolutely! For me, comprehensive testing is key to keep things clean over time. Not sure why this didn't come to my mind when writing the article. I think I was somehow assuming that this is a necessary pre-condition anyway.

Single source of truth is prob the biggest offender I see.

Same conceptual state gets represented in multiple variables or derived variables, and these must stay in sync. Very brittle

I call this "copy-paste-copy-paste-refactor": don't factor or abstract out a routine before the third time it's implemented. Until then you don't know what the actual commonalities among the uses will be, or if the callers will have so many special cases that the routine isn't really that reusable.

All dependencies should be injected (and possibly wrapped with custom interfaces, if they're libraries).

All globals should be configurable (most codebases I've seen have a ton of hidden globals).

All side effects should be isolated.

"Break any of these rules sooner than say anything outright barbarous."

Prioritize functional testing over unit testing, which penalizes refactoring.

To me the simplicity argument is the greater argument t for microservices

We should stop seeing microservices as a technical problem / solution they are how to divide a "business domain" up into account the smallest constituent parts according to vat business view in the domain

Easier said than done, when working with a big team and the code has been edited multiple times over a long period of time, no one will risk touching that code

In the real world you can't (and shouldn't) always say no to evolving and added functionality, nor can you always make unanticipated changes in the cleanest way given project deadlines.

The real solution is to recognize when new features are slower and messier to implement than they should be (because the evolving requirements have outgrown your original design), and periodically take the time to refactor to clean things up.

Also: If requirements change or new features are added, rather than only making the changes nescessary to existing code, rewrite code it touches. This forces you to keep all corner-cases in mind and will lead to more correct code.

In a similar vein, when starting a new project form scratch, first do a quick and dirty prototype and then throw everything away and start anew. This way you know up-front what the challenges are.

That doesn't work, and it comes from experience. The messiest bit of our code base is the business logic, which out of necessity, is intertwined with I/O logic (we have very tight timing requirements, and we have to query several sources of data over the network) and we only ever add new requirements, never retire old ones (or at least, that's been the case over the past 12 years). Rewriting the code the new requirements touches is pretty much out of the question.

About eight years ago I started a "proof-of-concept" that my manager asked for. I was using Lua for ease of development, and LPEG because it involved a ton of parsing. My intent was to get a handle on what was required and then do it C or C++. I found out a few months after the fact that my "proof-of-concept" was, in fact, in production and running. So much for my quick and dirty prototype. (And in retrospect, it hasn't turned out that bad---the code is way easier to deal with than our business logic in C/C++ because of Lua's coroutines make the event driven code look linear, and it's been fast enough).

Set up your room like you would write your code and look if you are happy with that.

Telling people to define clear goals doesn't help them define clear goals, unfortunately. It usually requires a lot of coaching before developers are self-aware enough to even understand what a clear goal looks like.

Good code is about responsibility. The less things know about other things the better it is. Unfortunately this only comes with experience and concentrated improvement. After a while it is indeed an art form.

A “mess” is subjective. For sure, code that can’t be maintained by the developers you have is a major issue. That’s your first problem to resolve.

Changing code that works, even if it’s a rock you ought to put down, is a risk.

As a journeyman programmer, I have found a few tools to help me reduce complexity.

Abstract interfacing techniques like base classes, abstract classes, and (my favorite) interfaces allow me to model interesting things.

Thinking about relationships between things in my systems versus categorizing things helps me avoid the "if you want to do something in OOP you must first define the universe" type problems.

DDD and conceptualizing how 'infrastructure' components interact with my main system is a nice guide for me.

Trying to write good tests is how I'm able to bounce around a few projects without having to read source code to reload context.

These are things that work for me. As I continue my practice I may find that I'm wrong or misinformed about some things. I should hope that I'll be able to incorporate a higher understanding as I gain more experience.

> Minimize Dependencies ... Consider doing it yourself.

This is terrible advice!

Maximize your dependencies. Adopt as much external code as possible. Build what you can with it. Then, as you reach the limits of those dependencies, and you absolutely understand what needs to get done replace them as you need to.

The vast majority of what people write will be trashed and/or changed radically. You should adopt whatever tools are required to get things working minimally and then make decisions like this.

This might start a fire here but I think dependencies are actually the problem, and see it happening in real time with all the latest gRPC offshoots for inter-service communication (Seems like there is a new one every day).

The libraries attempt to "dumb down" TCP, HTTP, etc and treat them as an abstraction that you don't need to know the details of. But it ends up biting people in the a$$ because networking isn't a perfect world where every request succeeds, terminates cleanly, or goes to the destination you expect. Whisking away all the complexity makes developers dumber as they eschew solving low-level problems with over-engineered high-level solutions that paper over the underlying issue, e.g. using mTLS to get around the fact that you're using DHCP to assign address space to nodes incorrectly, or making every API request a POST because the designer didn't understand HTTP caching, and so on. You get these endless problems that were solved decades ago because people keep trying to reinvent the wheel.

Agreed, it's not like you'll really forget how to write a Berkeley socket; couple of deques, mutexes and an <arpa/inet.h> header later just saved you forty gigs of BOOST BEAST.

It sounds like you work in prototyping, which is cool, but a lot of us work in engineering and need more control and surety in the fit, quality, durability, and predictability than we can expect to find in the work of some stranger with no accountability to or insight into our project.

When the dependency is deprecated, I have to stop what I'm doing and replace the dependency. If the dependency has a show-stopper bug, I either have to wait, vendor the dependency, or rewrite. That's what the original article advocates for: be careful what you import. leftpad, probably write it yourself. React, OK to use, but maybe vendor.

Yeah, dependencies makes you dependent :D

Honestly, I think you're both wrong. You shouldn't be trying to minimize or maximize your use of dependencies. You should add a dependency when it makes sense and write your own code when it makes sense.

Doing more complex time and date work, then a good solid library for manipulating datetime variables will save your sanity. Need to right justify a string to set length then using a leftpad will leave you at the mercy of a random author on npm.

To me, the best case scenario is adding a dependency of medium size and complexity that you're confident you could write yourself. This means that if you run into problems, you can just shrug, and then ditch the library, but if it's ok then you save some time. What's terrifying is dependencies so large that you can't fathom the amount of effort required to make them. It also makes it much harder to tell if the library is actually any good. Luckily for your example of time libraries there's normally a "blessed" library for whatever ecosystem you're in. Tiny / super simple dependencies are a complete waste of time, if I can write it in < 1hr I would much prefer to do so.

IMO there's a lot to be said for writing your own version that does 60% of what some library does, but 100% of what you need it to do.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact