Hacker News new | past | comments | ask | show | jobs | submit login
Tips for beginning systems and software engineers (ilya-sher.org)
217 points by ilyash on May 19, 2016 | hide | past | favorite | 111 comments

I am glad you put automated tests as one of the first ones. To this date I find that many people - often mid level, and sometimes senior developers - are very uncomfortable with autmated tests. As a result they don't use this tool, and end up being less productive, and often write more complex than needed code - and of course code with more bugs that could have been avoided.

I'm no TDD or automation advocate - however I do see that this tool is essential to modern software development, and the only way to learn to use it, is by practice.

I would encourage every engineer who's covered the basics to go full-in on TDD with 100% test coverage for a few days or weeks. Not because it will make you more productive or less bugs (you will be less productive), but it will change your thinking of writing code (similar to how learning Lisp will have a massive impact on your thinking of structuring code). TDD + 100% coverage is an extreme, which is worth experiencing to take learnings from it.

Later down the road, when you've gotten into the habit of developing with no, or barely any automation, it will be much more difficult to learn to fit this tool into your coding routine.

> however I do see that this tool is essential to modern software development

If we are talking about the later stages of the life of a software project then I completely agree with you.

But if you just started a new project and you are still in a constant state of change then a large test suite will slow you down because you have to adapt all your tests all the time. In the worst case you might even stop to improve your software just because updating your tests is too much work.

> But if you just started a new project and you are still in a constant state of change then a large test suite will slow you down because you have to adapt all your tests all the time.

I have been in a new project where progress was slowed by all the tests. But the issue I have with "I will write tests later, after things have stabilized" is that, in my experience, that day never comes.

There is always a feature that would add value, or some fire to be put out.

Of course, you could arbitrarily pick a date or time to start testing, but as some one who was responsible for a smallish code base (30k lines, maybe 2-3 man years) with no tests, I can tell you that it is hsrd to start that. You have to either start testing when building a new piece of software (which is easy, but you really want to have assurances about the working of your existing code), or try to apply tests to previously untested code, in which case you are unraveling quite a bit of complexity and possible untestable/difficult to test constructs.

I write high level automated tests immediately at the beginning of a project, which have several upsides:

They can be read almost like project requirements, so they double as validation for solving the problems in the reqs.

They only use the outermost APIs of your code, which will change much less than internal module APIs, and so won't inhibit changes and experimentation as much.

They enable you to automatically test your code in the "real" scenarios that your project reqs describe, instead of just module-level tests that may not catch bugs in API usage of those "real" scenarios.

Later on in the project, as larger internal modules crystallize, you isolate those modules with their own outer APIs and write automated tests for their APIs.

Yes, this has downsides. High level white box testing exclusively during early stages obviously won't catch all your internal error cases early. However, I have found them to have superb ROI, especially in projects where time constraints have required pragmatic approaches.


Unfortunately, it's difficult to pinpoint the time to abandon that initial flexibility and create a test suite.

Moreover, creating that test suite after the project has grown is likely to be tedious, too... and often requiring serious refactoring to accommodate testing practices.

The XP folks were always all about changing requirements and flexible code, and unless they're all deluded, it's not true that testing necessarily hampers a new project.

When I start a new maven project, it comes with junit by default and with the test tree created. The first thing I do when starting a nontrivial class is to hit the shortcut for create new test. I should make it fail empty classes in the test tree. Automating the testing burden helps especially when interfacing with a tricky API; sometimes it's mocking, sometimes it's a real object in isolation.

I find tests to be invaluable especially in times of high churn and lots of changes. First they force me to think of my design in terms of components I can compose together into higher level systems, and when I find one of those components or systems is no longer how I want it I can make changes with my test suite telling me what I've broken, and what I haven't broken.

"But if you just started a new project and you are still in a constant state of change then a large test suite will slow you down because you have to adapt all your tests all the time. In the worst case you might even stop to improve your software just because updating your tests is too much work."

That's exactly when you need automated test suites, if your pushing multiple versions a day, then automated tests are really required.

Depending on how fundamental the changes are, I agree. However, usually when someone says that to me, its a huge smell that the unit tests are bloated and cover too much.

I'd argue the opposite - you should be very afraid of changing untested software because it is hard to keep it correct, and the more it changes the heavier the burden of manual testing (assuming you care to understand your code's behavior at all).

Then again, I skew towards integration rather than unit testing. So I am not locked into an internal design, but I know outward facing behavior is as I expect it to be.

> But if you just started a new project and you are still in a constant state of change ...

I don't understand what the rush is all the time. Sit down with a pad/pen or whiteboard and think about the design. the last mobile project i started, I sat down and wireframed on paper. Tests came naturally and didn't fight the design.

I think it depends on what type of software you are doing. If it is "just" a mobile project then you can pre-plan most of it. But I come from a more scientific background and most of the stuff I did back at university was pretty much impossible to plan because you did not actually know how you would solve your current problem. Usually you would try several approaches until you find the one that works. In such a situation tests are hard to use because they reduce your flexibility.

> ... most of the stuff I did back at university ...

I would place that kind of programming in a different class. Small, "one-shot" problems probably don't need a lot of planning.

That's the experimental / exploratory class of problems where you're not interested in the correctness of the process but results. AKA risk reduction.

For the other classes where there is "The Process" of validation, testing and approvals, a little planning up front saves some churn on the back end.

I have the same concern about iterative development. Seems like a lot of teams jump right into a coding sprint/iteration/cycle w/o a plan.

How do you verify that the lack of understanding of the form of the solution was due to the problem domain (scientific) and not your lack of exactness of confirmation and ability to see what is in the problem due to not actually knowing?

You are completely allowed to throw out your tests if you are doing this kind of exploratory programming.

Then, tests are simply your hypotheses.

Yes, that's one thing people often forget. It's OK to delete tests that are obsolete.

I work in an enterprise environment and with all the stakeholders involved it's impossible to get a decent design upfront. But I still think you can test a lot of things even while change happens. Just avoid getting too detailed.

Designing tests that need to be changed often is a sign of high coupling between your tests and your solution, which in turn is often a sign of brittle design. You are testing an implementation and not a ndesired feature. Another potential problem is that your interfaces are changing too much, which means you are having trouble with thinking it out.

Possible solutions to this are TDD, or trying to write your tests in such a way that they would fit any implementation of your interface. For interface design, outside-in TDD is an interesting tool.

In any way, your problems with tests at early stages probably come from lack of practice. Doing less tests is certainly not going to help.

Automated testing is a learned skill. You need to develop your toolbox how certain things can be automated. It's a lot of work learning but once you have a decent system and know how to design things right for automation it's super valuable. In my current job we do a lot of automated testing and it took me several years to get my head around it and design things in way they can be automated.

Automated tests are great. Can't imagine not having them.

Automated test systems, in my experience, have sucked pretty hard. Often written by Q/A engineers frustrated at not being "real" engineers [yeah I know, toxic culture], and just not written well. Like the test systems I could never install on my workstation or even get running on my own, in order to reproduce problems and debug them. Or the systems that would assign "Priority 0 wake-up-in-the-middle-of-the-night" issues to failures in the test framework itself. Test frameworks are best kept as simple as possible; be wary of architecture astronautics.

I like the "TDD is nice, use it when it's useful" attitude. TDD as a tyranny works about as well as any other silver bullet we've seen in software development over the years (that is, not very well).

Off-topic, but I've always felt like "silver bullet" isn't a very good metaphor to use because we keep using it sarcastically, as above. Wasn't the silver bullet the only way to stop a werewolf? Or does the phrase come from somewhere else?

Anyway maybe something like magic bean or some other "looks useful but turned out not to be" metaphor would be better, not sure which one though.

"Silver bullet" was popularized (if not coined) by Fred Brooks in _The Mythical Man-Month_, and since that book is pretty widely read, the term is well understood. It's definitely sarcastic, but that doesn't mean it's not accurate.

[Depending on your mythology, you can kill a werewolf with a silver sword, and probably other silver objects. Poul Anderson has an interesting treatment of werewolf vulnerabilities in his collection _Operation Chaos_]

Automated tests are very important but source control is an order of magnitude more important IMHO.

I hope its omission is because OP thinks it goes without saying. It doesn't though, just the other day I read on HN that people have colleagues who don't use any source control.

Absolutely. Using source control is like using an editor.

TDD can be tedious at times, but if you do it, you quickly reap the benefits. It makes testing easier, it reassures you that you didn't break everything completely with a random change, and it keeps your API in check. And it makes more of the process reproducable.

TDD is a kind of semi-formal specific process with a whole legacy of consultant workshops and all of that stuff.

So I think it's great to mostly ignore that, since it's contentious, and focus on that core question: how do I know that the system works correctly?

This also makes it obvious why type systems are a glorious ally, in addition to test suites and proofs.

And it's a core question because if I don't have grounded confidence in the system's correctness, then that's likely to be anxiety-inducing, irresponsible, and sometimes dangerous.

Right, and I didn't learn TDD formally. Sorry for using it freely. I write tests in tandem with the implementation and do not write tests and only then the implementation.

When I knew exactly my code's requirements, I wrote the tests first. I was writing an interpreter. Tests-first was a smart move.

I find writing tests first really helps me sort out the requirements if I'm unclear about them.

>Later down the road, when you've gotten into the habit of developing with no, or barely any automation, it will be much more difficult to learn to fit this tool into your coding routine.

Don't agree, imo a good automation service is abstract enough that it can be hooked into your cycle or at least appended to it. Automated tests and CI is a prime example.

I'm ready for the down votes, but the most important aspect of TDD is job interviews (or blog posts), telling others you do it even at home on your pet project is a big plus somehow. My guess it is the best way to communicate you care about quality, even if equating the two is obviously a half truth. Again Norvig vs. Jeffries tells me practices go a shorter way than actual practicing and ability.

where does one begin with automated testing? I realize it's importance after reading countless posts about it, but for the most part to me it seems like a lot of work is required to design the system that will perform the automated testing, depending on the complexity of your project

I can add another thing from my personal experience - if you've just started out and you land in a job where you're the sole warrior or you don't see a single person who's smarter than you, get out as fast as you can. In best case scenario you will not learn anything, in the worst case scenario, you will learn bad things that you'll eventually have to unlearn.

I am working on a software project where I am the only developer. I was an employee for 3 years, working with team 3 to 5. Then, everybody left and so did I. I continued to do some contract work for the company. I have stepped my toes in another company as a junior contractor (started writing Java, my professional employment revolved around PHP webshops), and left because I didn't fit in that well.

I couldn't get them to guide me when I needed it, instead, my PR's were rejected saying: "Don't see any value", or "This feature is incomplete and undocumented". Also, when I did specify in my comments "Feedback appreciated", I felt that my work was nitpicked way more than of their own, which I saw as double standards. In the end, my self-esteem went down a lot, I couldn't keep working with them and left. I got back to my previous contract work where I experiment and continue working on the same project (PHP), working on an old legacy system that needs improvements. How do I keep learning and continue work as a sole developer? One way is to learn from books, but I realize learning from smarter people has tremendous value.

At the same time I feel that maintaining a live legacy system can contribute to my experience improving the existing projects and provide value for the company (bugfixes, refactoring and improving stability).

I'd appreciate an opinion on that.

That's the catch - you don't improve as a sole developer. You can learn a lot, but actual improvement requires feedback.

I advise you to start collaborating to Open Source projects, don't be shy. Everyone is welcome and I've personally found OSS devs to be a friendly bunch(exceptions always apply though). That will be very valuable feedback from people of different levels of experience and people in different positions. Leverage this feedback to improve yourself, but be ready to flush your pride down the drain.

That's the catch - you don't improve as a sole developer. You can learn a lot, but actual improvement requires feedback.

Feedback can also come from "oh, that doesn't work as well as I'd expected".

Feedback from other people is usually faster and more efficient (so I agree with the recommendation), but isn't the only game in town.

Of course you can improve on your own.

If I come back to my own code a couple of months later, and it isn't immediately obvious how its working, then its a good time to refactor it and think about what you would have expected to make it more obvious. My code gets better as a result.

Just be cautious with:

"if you've just started out and ... you don't see a single person who's smarter than you"

Many people starting out think they're much 'smarter' than they really are - experience often counts for much more.

Of course, you could also have landed in a real dead-end company.

Let me elaborate on that. Usually contracts include a probation period. Use it to evaluate your potential long-term employer in the same way the employer evaluates you as an employee. It shouldn't take more than a month to understand the atmosphere.


Agree. Can you please add it in the comments on the blog or alternatively, can I quote you there?

Feel free to quote me and adjust my quote to be better sounding.

"Learn at least 3 very different programming languages. I’d suggest C, Lisp and Python. The chances are that you will not find a Lisp job but it surely will make you a better programmer."

(I understand these are suggestions and we are in the realm of personal opinions).

Even as a Systems/Embedded programmer it's hard for me not to consider HTML/Javascript a must-have, this day and age (not a nice-to-have).

Even in embedded/industrial systems nowadays it's perfectly normal to have a maintenance interface based on a lightweight web server.

Agreed. C, Lisp, and Python are all integration languages. There's a fair bit of PL history in there, but it's mostly one problem set.

A better mix would be:

Python, Javascript, SQL, and Regular Expressions

This mix would cover integration, ux, persistence, and madness.

I still think C/C++ needs to be in the mix there somewhere(or even better, assembly). Even if you don't write them day-to-day knowing how pointers and memory works can be invaluable.

I'd suggest a statically typed language too. Java/C# etc

Chances are that you will need this but it's hard for me to recommend JavaScript as a starting point so it's actually a "must-have but a bit later" in my opinion.

Yeah. Don't worry about whether or not you should learn javascript. It will be forced upon you sooner or later.

Do i really _must_ know 1) Big O notation, 2) common algorithms and their time and space complexity, 3) which opcodes exist for a CPU of your choice and 4) a kernels main system calls?

I dont know any of these. I know they exist, i know of the concepts, but no specifics. I would argue that these are good to know, and a must if you are working in a domain where this matters. For design, architecture and implementation in a high level language when the problem do not touch these areas this is not a MUST have.

> Do i really _must_ know 1) Big O notation

Big O's not too hard. I would imagine that most programmers have an intuitive feeling for them. For example:

If I need to use a loop, that's going to slow things down (formally, going from O(1) to O(n)).

If I keep putting loops inside loops, it's going to get really slow (formally, going from O(n^x) to O(n^(x+1))).

Each element requires all the previous to be processed again. That's not going to work (formally, reaching O(2^n)).

Maybe some fancy datastructure would speed this up (formally, going from O(n) to O(log n)).

> 2) common algorithms and their time and space complexity

I think this just boils down to appreciating what kind of work must be going on in some algorithm. For example, sorting a list is surely going to loop through it (i.e. O(n)); actually it's O(n * log n), but that's pretty close.

I would imagine that the most effective knowledge for a complete beginner would be bad usage patterns for a bunch of common datastructures. For example, a loop which appends to the end of a linked list is a bad idea, since finding the end of the list requires looping through all the nodes. This is a loop inside a loop (AKA O(n^2)) when we intuitively know that only a single loop is required (AKA O(n)).

> For example, a loop which appends to the end of a linked list is a bad idea, since finding the end of the list requires looping through all the nodes. This is a loop inside a loop (AKA O(n^2)) when we intuitively know that only a single loop is required (AKA O(n)).

That would be indicative of a bad/naive linked-list implementation, wouldn't it? It's not that much overhead to maintain a wrapper with front and back pointers.

Well, there are some caveats. Firstly, that would technically be a double-ended queue ("dequeue") rather than a linked list. That's fine if you're using it as an abstract datatype, i.e. you're doing things like "myList.append(foo)", "myList.get(0)", etc. and hence you're able to swap out the implementation of those methods.

There are some widespread cases of these naive datastructures which don't fit that model though. For example, Lisp makes heavy use of "cons" to pair up two values. By nesting these pairs, we can get arbitrary tree structures, and this is how lists are implemented, e.g. "(list 1 2 3 4)" is equivalent to "(cons 1 (cons 2 (cons 3 (cons 4 nil))))". These structures are heavily used in Lisp (since it does so much "LISt Processing"), and since the language is so dynamic it's hard to compile it away.

Haskell's default list type works in this way too. There, the problem is compounded by immutability: changing the start of a list is easy, since we can re-use the pointer (immutability allows a lot of sharing). For example, if "list1" is "[1, 2, 3]" and "list2" is [9, 2, 3]", they can share the "[2, 3]" part:

    list1: cons * *----+    2           3
                       |    ^           ^
                       V    |           |
                       cons * *--->cons * *--->nil
    list2: cons * *----+
However, if "list3" is "[1, 2, 5]", it can't share any of the "list1" or "list2" structure: the last element is different from the other lists, which requires the second-to-last element to use a different pointer, which makes it different from the other lists (even though it uses the same element); this requires a different pointer in the third-to-last element, and so on:

    list3: cons * *--->cons * *--->cons * *----+
                |           |                  |
                V           |                  |
                1           |                  |
                ^           |                  |
                |           V                  |
    list1: cons * *----+    2           3      |
                       |    ^           ^      |
                       V    |           |      V
                       cons * *--->cons * *--->nil
    list2: cons * *----+
The problem's not as bad in Haskell as it is for Lisp: Haskell doesn't allow as much dynamic behaviour, so a lot more usage information is available during compile time, and allows optimisation like "list fusion". Also, Haskell makes it easy for programmers to define their own datatypes (like dequeues), and allows datatypes to be abstract (either using typeclasses or "smart constructors"). Being abstract makes pattern matching more difficult though (requiring "views").

Ah, I was thinking more in terms of C/C++/Java/C# style linked lists, which tend to look something like this:

  class List<T> {
    ListNode<T> head;
    ListNode<T> end;
    int size;
  class ListNode<T> {
    T data;
    ListNode<T> next;
    // possibly, if it is doubly-linked
    ListNode<T> prev;

3 and 4 are only needed in specific domains.

1 and 2 you won't need every day. You can probably even make a career out of software not knowing them. But every software related domain will have times where knowing them will help you, and where not knowing them risks an inefficient and slow implementation and no way for you, individually, to fix it in any sort of reasonable time; you'll either have to learn it then, or you'll be reliant on someone who does know it fixing it for you.

I've had to address performance issues where knowledge of common data structures and algorithms' time and space complexity (and/or being able to look at an algorithm and realize "...this is O(n^2). It only needs to be O(n). WTF", and fixing it) was the key in Javascript, Erlang, Python, and Java. Exactly what 'high level language' do you feel you won't eventually need this stuff in?

1) and 2) are a MUST across the stack I believe. A lot of those crazy snippets from thedailywtf etc. come from the lack of this knowledge and suddenly performace suffers a lot when someone tries some O(n!) for simple stuff. Also, not to put anyone down but... it is kinda embarassing imho to not know how fast can you sort.

Probably a controversial topic, as this touches on holy grails. My personal take on this, given my own experience so far:

1) You don't strictly need to know Big O notation in order to not introduce performance bottlenecks, or mitigate them when they arise; but you should be able to use common sense resource wise, when developing things in any problem area. In other words: you should conceptually understand how computer systems work (which often implies you'll sooner or later run into Big O notation). Knowing Big O notation can also help you communicate with your peers about problems, if they know it too. 2) For a large number of problem areas, this is knowledge that is near useless I think. Know that they exist, and that they might ever help you. But for your average problem domain, you don't really need concrete knowledge there. It bothers me how much emphasis is put on this in so many job interviews, while it's hardly ever relevant in many of those cases. 3) I'm not sure how that knowledge will really help you. 4) I think this is actually the most important thing of all four. Knowledge of the kernel's main system calls will help you tremendously when your process (or any process in your stack) "just hangs" sometimes and you need to resort to systems such as strace, dtrace, sysdig and what not. That time will come, sooner or later.

> 4) I think this is actually the most important thing of all four.

I was surprised to see this so far down. I, on a somewhat regular basis, go back to system calls. Usually it's related to needing to strace a process, but it is also helpful when thinking about what's going on when I'm writing some high-level code and how that will be translated into calls to the kernel versus user-space execution.

Oh, but I really like watching the algorithm sorting gifs. They're neat.

When you find your self asking this question think..."Am I naive about the subject" Naive = Not knowing you don't know. Ignorant = Knowing your don't know. Many years in IT (40) has taught me a little base knowledge like this goes a LONG way.

Code written by people who don't know 1 and 2 tends to make the mistakes that you learn to avoid by learning 1 and 2. And these are mistakes that you can make anywhere in your code and that can be very costly in performance and correctness.

1 and 2 are critical for anything which you want to scale at all.

3 is mostly irrelevant unless you're doing embedded devices or fast maths. If you're writing Java it might be more useful to know JVM bytecode opcodes and underlying CPU opcodes are completely irrelvant to you.

4 I think you need to understand the existence of the system call mechanism; then learning the API you realise how much of it is just syscalls (e.g. open/read/write/close).

I can write Windows code quite happily without understanding its syscall mechanism.

1 and 2 were answered by others. In my opinion, 3 and 4 make you better understand the reality which makes you a better programmer. Chances are that you will not use this knowledge. On the other hand it could give you some useful ideas.

I think you definitely need an awareness of 1 and 2 but you don't need to know all details. Just be prepared that things that work great at your current scale may not work once you scale up 100x.

I've been told conflicting things about don't repeat yourself; sometimes you end up making something wildly complex to avoid repetition - since learning functional programming I'm starting to think breaking things down into small reusable pieces that can be repeated in different ways is better.

The problem with DRY is people will see "Oh, hey, I'm doing the same thing in multiple places. DRY says I should factor that out!" without actually thinking about whether it makes sense for there to be a dependency between those pieces of code. Moving duplicated code into a shared function/method/etc -creates an implicit dependency- between all callers of that function/method/etc. That's exactly what you want in some situations (and is in fact the benefit of DRY, i.e., because when you have to fix that code for one function, you have to fix it for the other(s)), but if you do it blindly you create a dependency between two unrelated bits of code, and that's where complexity comes from, as you start adding conditionals and things to try and make that shared code handle all cases.

I find considering whether the repeated code is innately related, or in the same domain, or not, to be the best metric for determining whether I should refactor it to not repeat. It's adhering to the principle of least astonishment. Sometimes it should all stay separate. Sometimes a few pieces should be refactored to share code, and others left to repeat. Sometimes one set of the repeated code should share a function, and then the other set should share a different function, -even if that different function has the same implementation-, simply because the second set is so unrelated to the first. Sometimes they all should share the same code, because they're all related.

Breaking things into small, reusable pieces helps avoid this because the dependencies you create are on small, easily understood, easily replaced snippets, which tends to have a very clear context/domain. It's easy to prove that the abstracted function is correct, since it's so simple, and it's clear what domain(s) it involves, since again, so simple, and thus the problem with any given bit of functionality is almost assured to be the composition of those functions, i.e., your specific use of it in this one instance. You can still run into issues if you reuse some of those composed functions, however.

In general, the greater the complexity and domain specificity of the code you're trying not to repeat, the greater the danger you're shooting yourself in the foot.

FP is especially beneficial, as such, because the higher order functions that tend to be common are both pretty simple in their function, and pretty universal in their domain.

There was a great talk in the Elixir Meetup London by Anthony Pinchbeck about organising functionality not around MVC but the functional unit being build.

So for example if you have something that is dealing with registering your website, create a directory called register_website and all models, functions and controllers to do with that code go in there. The talk suggested deleting the controllers and models directories. He also said there was duplication but everything was more loosely coupled and you could always group things in deeper functional units later.

I'm not 100% brave enough to go for this structure quite yet but it certainly makes coming to a project pretty clear as you literally have a list of all the functions of the site where the code sits for each thing :-)

This comment helped me immensely. Thank you.

DRY has many caveats. For example there is the famous "Rule of 3" - On the third time you copy something, factor it out. But really DRY is simply a rule of thumb that is intended to get you to a better design. Because "good" with respect to design is a somewhat ephemeral concept, people often equate DRY with "good". This is incorrect as there are many DRY designs that are not good.

In fact, my experience has been that for good design much more important than DRY code is simply reducing global variables (including classes -- which implies that you will be doing a lot of dependency injection). But, again, you can take it too far.

I think my biggest piece of advice for beginners is to get a mentor and constantly ask for opinions about whether a design is good or not (and what about it is good/bad). It's not something you can learn from rules. It requires a considerable amount of experience.

It's more fundamental than just a rule of thumb, but it's not the only rule of good code.

Loose coupling (which implies getting rid of global state) is at least equally important, as is failing fast.

One problem is that junior and mid level developers have a lot of trouble figuring out good coupling because they are not experienced in the ways things can change.

I was reviewing some code yesterday and the developer had a bunch of interfaces for everything so she could swap out implementations and she was very proud of how modular everything was. The problem was that all of her interfaces accepted a File object instead of a stream so we could change what the behavior is but not what it acts on which is really the opposite of what we needed.

It's hard even if you're not a beginner. One of the hardest aspects of good design is choosing the appropriate level and location for coupling.

I find that most developers who think that they can predict what kind of coupling they will need for future requirements tend to be wrong, too, no matter how experienced. Only backwards-facing refactoring really works at producing a good result, coupling-wise.

I would caution against following the DRY dogma too early or without thought about why people do so. I'd also caution against the "go DRY after 3" rule because it's not moored to the actual software architecture - it causes some code to be less DRY than it should be, and other code to be more DRY than it should be.

DRY is generally meant to be an optimisation towards reducing the cost of change by reducing the amount of duplicate code (repetitive, busy-body work) but it often backfires and increases the cost of change.

For example, say you are tasked with creating 15 widgets. During your first few sprints, every widget is the same, so your colleague berates you for creating them separately. You refactor them so that they all share a common implementation and continue.

3 weeks passes and features are being added left, right and centre. A point of contention within your team is whether to differentiate between 3 different types of widget with different features, or whether to use dependency injection. Development is slow because only one person can work on the widget code at a time, and the widget code has becoming increasingly complex as the number of callers invoking it have given it increasingly diverse requirements (note how looking at a function generally does not tell you which other functions call it, how they call it or why they call it).

The question you should ask yourself in these situations is, how similar will these widgets be in 3 months time? If it's likely that they will diverge significantly then perhaps the best solution would be for them to be completely differentiated now: tested separately, easily updatable/extendable without regression bugs on other widgets, able to be worked on by multiple engineers without merge conflicts, etc. The unfortunate reality is that once you have made a unit of code DRY it is both time consuming and difficult to do this and politically untenable so people typically avoid it.

When should you make your code DRY then? (1) You should make your interfaces DRY early on as this is a safe win without the problems of DRY implementations, (2) When you are very sure that the implementation of different units of code will not change in ways that do not create intellectual work of the following kind: "How do I avoid complecting these unrelated behaviours?"

Having said all of this, perhaps this is not beginner friendly. Perhaps beginners should just follow DRY always. At the very least they will get practice refactoring.

>I would caution against following DRY too early

DRY should only be avoided if the refactored code ends up being much more complex than the duplicated code and it doesn't reduce the duplication all that much.

Pretty much every time I've seen DRY used to eliminate repetitions of three or more times it's been a good idea and 4 or more times it's been a no brainer.

>The question you should ask yourself in these situations is, how similar will these widgets be in 3 months time?

This is an awful approach. You won't be able to predict how similar they will be and you will likely create expensive to maintain architecture for projected scenarios which don't exist and neglect the architecture for scenarios that will.

I've seen many codebases littered with architectural changes that were put in place for presumptive future features and then never used. It happens way, way more often than those architectural changes actually being used.

  > > The question you should ask yourself in these situations is,
  > > how similar will these widgets be in 3 months time?
  > This is an awful approach. You won't be able to predict how
  > similar they will be and you will likely create expensive to
  > maintain architecture for projected scenarios which don't
  > exist and neglect the architecture for scenarios that will.
I'm not talking about writing code for features that don't exist. I'm talking about not doing extra work to generalise your code too early. DRY is generalisation. It is in a sense 'writing code for a projected scenario in which a feature is the same for all of its callees'.

Keeping things specialised towards a single purpose even if it is redundant in the short-term makes code easier to grok, test, and update.

Making a change across 5 redundant functions is not time consuming work and is not easy to mess up. However making a change to a singular function which needs differentiated behaviour for one of its callees is error-prone as (1) it might force an interface change and (2) there is a non-zero probability of a regression error on the other four callees.

I'm not attempting to predict the future. I'm attempting to talk from first principles about real cognitive and communicative costs.

In agreement.

I started using a rule of "use the language's primitives first" to guide my use of DRY. There are many ways in which you can "reduce repetition" by introducing a generalization abstraction, something that in turn couples lots of code to a master definition: Generics, high arity functions, and inheritance are three such temptations. Doing that, in turn, accidentally models the whole problem before the spec is mature.

But if your discipline is "I'll use more built-in types and simple data structures, I'll keep more of the code inline, I'll let this class get bigger and I'll make some duplicated simple, fixed-function code paths," you don't get trapped by that. Sometimes I end up with the same simple math function duplicated across two different modules because I needed it for completely unrelated algorithms: That isn't pretty, but it's not a gross failure.

The opportunities to make a really useful abstraction for a novel problem often take time to ripen, and until that point you might as well assume that the code necessarily involves a "spaghetti" mindset. You don't want to pay the cost of reworking the abstractions unless you have to - it presents a huge hazard to maintenance and it means the attempt to generify failed.

But I think coders do get caught in a "clean code" mindset now where it needs to look both pretty and succinct, 100% of the time, and that leads to pushing the functionality out of the function body and into "ravioli" code.

I think Go is a good language, of the hot ones today, for learning this kind of engineering-centric focus because it actively discourages trying to whittle down the code to the succinct one-liner. Unfortunately by the time I encountered it I already had this figured out(perhaps years later than I should have).

>Making a change across 5 redundant functions is not time consuming work and is not easy to mess up. You're ignoring the hardest part: Knowing that there are 5 redundant functions and finding all of them.

Store them all in the same place. Or if you wish to, make them use the same type signature or interface and search for this with your editor.


In the structure above, each of the widgets are tested separately, each are very easy to find, and each might have redundant code doing the same thing.

This is way simpler and easier to work with than if people have created an `uber-widget` that has a configuration object and some dependency injection to handle all of its features because some overzealous engineer once decided that he was certain that all of the widgets would be able to be represented by a fully generalised `Widget`.

I'm obviously not saying that all cases of DRY are like this, but I am saying that there are far too many people that end up doing this because they make decisions to generalise code too early.

I use DRY to enforce consistency.

Bad repetition is when you or other engineers have to also remember to update the repeated parts.

"breaking things down" is a very good approach. It helps clarity and often prevents code complication.

I would add two things to this:

1. Automated tests are not the only way to write code with less bugs. The other two, currently less in fashion but IMHO very important, techniques are abstraction and assertion.

2. In my view, there are two kinds of automation - bad and good. Bad automation checks if humans did some process correctly. Good automation implements some process without human intervention. It's very easy to start working on bad automation rather than the good.

Automated tests aren't the only way to write code with fewer bugs, but they are the easiest way to write such code. Automated tests are also the easiest to have running constantly, giving you instant feedback when you've broken something. The only downside is that there are some tests (like testing with particular datasets or high-load performance testing) that can be onerous to automate.

My personal rule of thumb is that checks for program correctness should all be automatic. Non-automated tests ought to be reserved for testing non-functional requirements (e.g. making sure the service is performant under load, reliability measures like failovers, etc.).

I disagree, I actually think that in many cases asserts are easier than tests, and many people derive tests from assertions in their head (or use a tool like quickcheck which does it too). Unfortunately, there is too much focus on testing in today's software that tools for assertions are not on the same level as tools for testing.

One class of assertions is type checking, but you can also have dynamic assertions while the code is running.

OTOH, abstractions are harder, but it only takes one person to invent a good abstraction that covers a huge class of bugs, and others can copy it. Examples of good abstractions that prevent many bugs are arrays with bounds checking, garbage collection, or optional data type.

> Automated tests [...] are the easiest way to write [code with fewer bugs].

Actually, automated tests are best at ensuring the code doesn't have regressions. But writing bug-free code takes much more than just that; it needs thought-out architecture and interfaces, and with those you often can get decent code without automated tests, so no, it's not "the easiest way".

They also serve as a form of code documentation (defining expected behavior).

When doing TDD, often speed up development owing to the ability to get much quicker feedback on the code you just wrote.

Code without tests has a tendency to congeal owing to the massive uncertainty surrounding changing any particular part.

> They also serve as a form of code documentation (defining expected behavior).

Oh, far from it. Tests are not documentation, documentation is documentation. Tests can be treated as a set of examples at best.

Good well explained tests serve well as documentation. I've used them before in lieu of specifications.

Bad tests not so much.

>Automated tests are not the only way to write code with less bugs.

It's absolutely a prerequisite.

Humans can model the behavior of small parts of a code base without the aid of automation, but once a code base grows beyond a certain size, you either need to have automation to have a handle on the code or let it congeal (which ends up accelerating the creation of technical debt).

There are bad tests just like there is bad code. All of the rules for good code apply to tests too.

If your code fails an assertion and dies, isn't that still a bug to the end user?

Have any examples to clarify point two?

Well, an example I have from work, where we have a certain very old and obscure process to build fixes for our application. It's somewhat a manual process. Somebody wrote a program to check if this manual process was done correctly. It prevents problems, but better would be to replace this manual process with an automated one, however that would mean to change the process somewhat to accommodate the automation (this is sometimes necessary since humans can deal with weird exceptions much better than machines).

Another example could be used linter or valgrind to prevent C-style bugs with pointers and allocation, instead of using a high-level enough language, such as Java, to avoid those in the first place.

Yeah it's an interesting perspective. Thanks. I'm looking at it from a product UX perspective. It's a good challenge. Thanks for the insight.

> assertion

What does it mean in this context?

Assertion is check for a certain condition. It can be static (type checking is the simplest example), or dynamic, to be executed at runtime (the simplest version is a null pointer check).

Assertions are more general than tests, they test an expected state of application, not if specific case of input gives specific output. On the other hand, you can probably test more things that you can assert. So both have their place.

I'm not disagreeing with you; I really like both of your points however, I have two problems with assertions. Let's say the assertion is inside a function e.g.:

  void function adjust_fan_control(uint8_t fan_speed) {
    assert(0 != fan_speed);
    ... // some complicated code here
What if the adjust_fan_control function isn't necessarily called for a given run of the program? Then you don't know if you actually tested that assertion, that is for all callers of function X, you don't know if all those callers are calling X correctly if you only rely on the above. Of course I'm not trying to suggest that you're arguing for assertions to the exclusion of tests, just pointing this out.

The second problem I have with them is in embedded systems. I'm heavily in favor of putting assertions (which are then omitted in prod builds, or instead of looping the LED blink routine they map to a soft reset if fail) inside functions in embedded systems. However, if I have very limited I/O, specifically, I don't have the ability to log when/where an assert was hit, all I know is that an assert was hit because the LED pattern goes to 'blink really fast like this indefinitely which means an assert failed'. If I have a lot of asserts, that doesn't really tell me anything.

On the other hand, if I have unit tests, and my mocks assert, and I can run the unit tests on a simulator or heavily instrumented system, then I know which assert failed in what function and who called that function. The point is I can't do that (or I don't know how) without the tests.

Edit: formatting.

I must say that the more I read articles of this character, the more I feel that it would make sense to distinct between different types of "engineers".

A lot of the proposed advices are applicable only in certain fields, and not so much (or not at all) in different fields, whereas every field can go very deep (so it's a waste of time gaining knowledge you'll never use there). While a number of these advices are fairly generic, others definitely are not. A front end (client side) web developer won't benefit from having deep knowledge of data structures or the kernel, but he will benefit from knowing about Big O notation. Additionally, he will benefit lots from knowing how browsers work, and having deep knowledge of HTTP. Someone working on ETL processes all day will not benefit much from knowing about interpreters and compilers, but will benefit from knowing a lot about UTF-8. But also, he will benefit lots from knowing a large number of layer 7 transport protocols. A systems programmer (defined as "someone who programs close to the hardware") won't typically benefit much from knowing javascript or how browsers work, but will benefit lots from Big O notation, data structures and algorithms, and C. And so on.

I'm seeing a future in which we will be able to more distinctly treat separate specialisms within the field that we now call "software engineering" (or countless variations on it).

As with any other advice, one should use common sense to see how and which parts of it apply.

> A front end (client side) web developer won't benefit from having deep knowledge of data structures

but basic knowledge is still a must

I consider the basic knowledge of the "must" sections really a must for everyone. The depth might vary.

  > Automated healing / rollback 
Jesus christ on a cracker, please use this sparingly. It's one thing to have buggy software. It's another to change versions of software right underneath your users. This is like polishing a dance floor while the DJ is still spinning. It doesn't matter if you're reverting to an old good version; unannounced, unplanned & unexpected changes lead to disaster. Known bugs are better than unknown changes.

  > Don’t duplicate code
Yes, this is a really bad idea, but forking is the cheapest possible way to make changes that break as few things as possible. Yes, it's a nightmare in the long run. Ask your boss if he really cares about the long run.

Everything else was spot on.

I don't really get the Ease vs Simple section, what point is he trying to make?

This is simple:

  i  = 0x5f3759df - ( i >> 1 );
Where this is just easy:


Good article though.

If you haven't already done so, listen to this talk by Rich Hickey (the creator of Clojure). This should clear it up for you. https://www.infoq.com/presentations/Simple-Made-Easy

Thanks! Added link from the post to the lecture.

This is simple:

    if (str.length < 6) {
      str = ' '.repeat(6-str.length) + str;
This is easy:

    leftPad = require('left-pad')
    leftPad(str, 6)
(note: my JavaScript is rusty; there may be some very valid reason that the simple code is wrong)

Thats one reason that its tempting to use something as simple as left pad. I only spend a small percentage of my time doing JavaScript, and the language has enough gotchas. Using something like that would give me the peace of mind that I haven't missed some quirk of the language. If someone knows how to publish a package on npm, then they probably know more JavaScript than me.

The point is that in the second case some people have no clue what's really happening, others have rough idea and very small group of people know what exactly happens. It might appear simple but it's not. It's easy.

Easy things can be vastly complicated. This means that such complicated easy things are hard to maintain.

Simple is always easy to maintain.

Remember, as software developers, complexity is the enemy. It is the beast we want to slay.

Nice post! I hope lot's of people read and understand it, the world will be a better place for it :)

One thing that would be good to add is: If you implement a way to create/add something, implementing a way to delete/remove that something isn't optional.

As Lovecraft put it: 'Do not call up that which you cannot put down.'

This will result in a system that keeps proper track of stuff.

There are many good points in the comments here. Thank you! I have added a link from the blog post to this discussion.

It's usually managers who buy the hype. They're the ones who need to be told to be sceptical.

Nice post, by the way.

Good list.

I would add "Software that does not ship is essentially useless."

To clarify, I mean that you can polish and try to perfect as much as you want, but at some point you have to get it into the hands of users to be actually useful.

I'm not convinced on servers must be set to UTC timezone. Maybe just write your server code to read/store UTC time, which is easily calculated from any timezone.

Time should be converted to UTC at the earliest opportunity, and kept that way as long as possible.

Also, always, always, always, serialized in ISO-8601 format if it is getting turned into text.

Can we somehow change UTF-8 to WTF-8?

Over the last 12 years or so I have constantly been bitten in the ass by UTF-8. It first started when I moved from the US to Europe and decided to help people make websites in their native language.

More recently I was developing a piece of python code to help me with a thing. Running into UTF-8 problems made me decide to do a deep dive into what it is and how it works. After a week of diving down a rabbit hole I am still none the wiser.

Blog post after blog post, video upon video, and it still doesn't click for me. Yes, I solved my problems, but in a black box type of way.

Can anyone recommend a good book or long source to help me with this?

Missing advice: Learn to read specs.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact