
Tips for beginning systems and software engineers - ilyash
https://ilya-sher.org/2016/05/19/tips-for-beginning-systems-and-software-engineers/
======
gregdoesit
I am glad you put automated tests as one of the first ones. To this date I
find that many people - often mid level, and sometimes senior developers - are
very uncomfortable with autmated tests. As a result they don't use this tool,
and end up being less productive, and often write more complex than needed
code - and of course code with more bugs that could have been avoided.

I'm no TDD or automation advocate - however I do see that this tool is
essential to modern software development, and the only way to learn to use it,
is by practice.

I would encourage every engineer who's covered the basics to go full-in on TDD
with 100% test coverage for a few days or weeks. Not because it will make you
more productive or less bugs (you will be less productive), but it will change
your thinking of writing code (similar to how learning Lisp will have a
massive impact on your thinking of structuring code). TDD + 100% coverage is
an extreme, which is worth experiencing to take learnings from it.

Later down the road, when you've gotten into the habit of developing with no,
or barely any automation, it will be much more difficult to learn to fit this
tool into your coding routine.

~~~
yoodenvranx
> however I do see that this tool is essential to modern software development

If we are talking about the later stages of the life of a software project
then I completely agree with you.

But if you just started a new project and you are still in a constant state of
change then a large test suite will slow you down because you have to adapt
all your tests all the time. In the worst case you might even stop to improve
your software just because updating your tests is too much work.

~~~
jsymolon
> But if you just started a new project and you are still in a constant state
> of change ...

I don't understand what the rush is all the time. Sit down with a pad/pen or
whiteboard and _think_ about the design. the last mobile project i started, I
sat down and wireframed on paper. Tests came naturally and didn't fight the
design.

~~~
yoodenvranx
I think it depends on what type of software you are doing. If it is "just" a
mobile project then you can pre-plan most of it. But I come from a more
scientific background and most of the stuff I did back at university was
pretty much impossible to plan because you did not actually know how you would
solve your current problem. Usually you would try several approaches until you
find the one that works. In such a situation tests are hard to use because
they reduce your flexibility.

~~~
mbrock
You are completely allowed to throw out your tests if you are doing this kind
of exploratory programming.

Then, tests are simply your hypotheses.

~~~
maxxxxx
Yes, that's one thing people often forget. It's OK to delete tests that are
obsolete.

------
aavotins
I can add another thing from my personal experience - if you've just started
out and you land in a job where you're the sole warrior or you don't see a
single person who's smarter than you, get out as fast as you can. In best case
scenario you will not learn anything, in the worst case scenario, you will
learn bad things that you'll eventually have to unlearn.

~~~
mobiuscog
Just be cautious with:

"if you've just started out and ... you don't see a single person who's
smarter than you"

Many people starting out think they're much 'smarter' than they really are -
experience often counts for much more.

Of course, you could also have landed in a real dead-end company.

~~~
aavotins
Let me elaborate on that. Usually contracts include a probation period. Use it
to evaluate your potential long-term employer in the same way the employer
evaluates you as an employee. It shouldn't take more than a month to
understand the atmosphere.

~~~
mobiuscog
Absolutely.

------
Kurtz79
"Learn at least 3 very different programming languages. I’d suggest C, Lisp
and Python. The chances are that you will not find a Lisp job but it surely
will make you a better programmer."

(I understand these are suggestions and we are in the realm of personal
opinions).

Even as a Systems/Embedded programmer it's hard for me not to consider
HTML/Javascript a must-have, this day and age (not a nice-to-have).

Even in embedded/industrial systems nowadays it's perfectly normal to have a
maintenance interface based on a lightweight web server.

~~~
patsplat
Agreed. C, Lisp, and Python are all integration languages. There's a fair bit
of PL history in there, but it's mostly one problem set.

A better mix would be:

Python, Javascript, SQL, and Regular Expressions

This mix would cover integration, ux, persistence, and madness.

~~~
vvanders
I still think C/C++ needs to be in the mix there somewhere(or even better,
assembly). Even if you don't write them day-to-day knowing how pointers and
memory works can be invaluable.

------
thecopy
Do i really _must_ know 1) Big O notation, 2) common algorithms and their time
and space complexity, 3) which opcodes exist for a CPU of your choice and 4) a
kernels main system calls?

I dont know any of these. I know they exist, i know of the concepts, but no
specifics. I would argue that these are good to know, and a must if you are
working in a domain where this matters. For design, architecture and
implementation in a high level language when the problem do not touch these
areas this is not a MUST have.

~~~
chriswarbo
> Do i really _must_ know 1) Big O notation

Big O's not too hard. I would imagine that most programmers have an intuitive
feeling for them. For example:

If I need to use a loop, that's going to slow things down (formally, going
from O(1) to O(n)).

If I keep putting loops inside loops, it's going to get really slow (formally,
going from O(n^x) to O(n^(x+1))).

Each element requires all the previous to be processed again. That's not going
to work (formally, reaching O(2^n)).

Maybe some fancy datastructure would speed this up (formally, going from O(n)
to O(log n)).

> 2) common algorithms and their time and space complexity

I think this just boils down to appreciating what kind of work must be going
on in some algorithm. For example, sorting a list is surely going to loop
through it (i.e. O(n)); actually it's O(n * log n), but that's pretty close.

I would imagine that the most effective knowledge for a complete beginner
would be bad usage patterns for a bunch of common datastructures. For example,
a loop which appends to the end of a linked list is a bad idea, since finding
the end of the list requires looping through all the nodes. This is a loop
inside a loop (AKA O(n^2)) when we intuitively know that only a single loop is
required (AKA O(n)).

~~~
douche
> For example, a loop which appends to the end of a linked list is a bad idea,
> since finding the end of the list requires looping through all the nodes.
> This is a loop inside a loop (AKA O(n^2)) when we intuitively know that only
> a single loop is required (AKA O(n)).

That would be indicative of a bad/naive linked-list implementation, wouldn't
it? It's not that much overhead to maintain a wrapper with front and back
pointers.

~~~
chriswarbo
Well, there are some caveats. Firstly, that would technically be a double-
ended queue ("dequeue") rather than a linked list. That's fine if you're using
it as an abstract datatype, i.e. you're doing things like
"myList.append(foo)", "myList.get(0)", etc. and hence you're able to swap out
the implementation of those methods.

There are some widespread cases of these naive datastructures which don't fit
that model though. For example, Lisp makes heavy use of "cons" to pair up two
values. By nesting these pairs, we can get arbitrary tree structures, and this
is how lists are implemented, e.g. "(list 1 2 3 4)" is equivalent to "(cons 1
(cons 2 (cons 3 (cons 4 nil))))". These structures are heavily used in Lisp
(since it does so much "LISt Processing"), and since the language is so
dynamic it's hard to compile it away.

Haskell's default list type works in this way too. There, the problem is
compounded by immutability: changing the _start_ of a list is easy, since we
can re-use the pointer (immutability allows a lot of sharing). For example, if
"list1" is "[1, 2, 3]" and "list2" is [9, 2, 3]", they can share the "[2, 3]"
part:

    
    
                    1
                    ^
                    |
        list1: cons * *----+    2           3
                           |    ^           ^
                           V    |           |
                           cons * *--->cons * *--->nil
                           ^
                           |
        list2: cons * *----+
                    |
                    V
                    9
    

However, if "list3" is "[1, 2, 5]", it can't share any of the "list1" or
"list2" structure: the last element is different from the other lists, which
requires the second-to-last element to use a different pointer, which makes it
different from the other lists (even though it uses the same element); this
requires a different pointer in the third-to-last element, and so on:

    
    
                                            5
                                            ^
                                            |
        list3: cons * *--->cons * *--->cons * *----+
                    |           |                  |
                    V           |                  |
                    1           |                  |
                    ^           |                  |
                    |           V                  |
        list1: cons * *----+    2           3      |
                           |    ^           ^      |
                           V    |           |      V
                           cons * *--->cons * *--->nil
                           ^
                           |
        list2: cons * *----+
                    |
                    V
                    9
    

The problem's not as bad in Haskell as it is for Lisp: Haskell doesn't allow
as much dynamic behaviour, so a lot more usage information is available during
compile time, and allows optimisation like "list fusion". Also, Haskell makes
it easy for programmers to define their own datatypes (like dequeues), and
allows datatypes to be abstract (either using typeclasses or "smart
constructors"). Being abstract makes pattern matching more difficult though
(requiring "views").

~~~
douche
Ah, I was thinking more in terms of C/C++/Java/C# style linked lists, which
tend to look something like this:

    
    
      class List<T> {
        ListNode<T> head;
        ListNode<T> end;
        int size;
      }
      class ListNode<T> {
        T data;
        ListNode<T> next;
        // possibly, if it is doubly-linked
        ListNode<T> prev;
      }

------
andy_ppp
I've been told conflicting things about don't repeat yourself; sometimes you
end up making something wildly complex to avoid repetition - since learning
functional programming I'm starting to think breaking things down into small
reusable pieces that can be repeated in different ways is better.

~~~
lostcolony
The problem with DRY is people will see "Oh, hey, I'm doing the same thing in
multiple places. DRY says I should factor that out!" without actually thinking
about whether it makes sense for there to be a dependency between those pieces
of code. Moving duplicated code into a shared function/method/etc -creates an
implicit dependency- between all callers of that function/method/etc. That's
exactly what you want in some situations (and is in fact the benefit of DRY,
i.e., because when you have to fix that code for one function, you have to fix
it for the other(s)), but if you do it blindly you create a dependency between
two unrelated bits of code, and that's where complexity comes from, as you
start adding conditionals and things to try and make that shared code handle
all cases.

I find considering whether the repeated code is innately related, or in the
same domain, or not, to be the best metric for determining whether I should
refactor it to not repeat. It's adhering to the principle of least
astonishment. Sometimes it should all stay separate. Sometimes a few pieces
should be refactored to share code, and others left to repeat. Sometimes one
set of the repeated code should share a function, and then the other set
should share a different function, -even if that different function has the
same implementation-, simply because the second set is so unrelated to the
first. Sometimes they all should share the same code, because they're all
related.

Breaking things into small, reusable pieces helps avoid this because the
dependencies you create are on small, easily understood, easily replaced
snippets, which tends to have a very clear context/domain. It's easy to prove
that the abstracted function is correct, since it's so simple, and it's clear
what domain(s) it involves, since again, so simple, and thus the problem with
any given bit of functionality is almost assured to be the composition of
those functions, i.e., your specific use of it in this one instance. You can
still run into issues if you reuse some of those composed functions, however.

In general, the greater the complexity and domain specificity of the code
you're trying not to repeat, the greater the danger you're shooting yourself
in the foot.

FP is especially beneficial, as such, because the higher order functions that
tend to be common are both pretty simple in their function, and pretty
universal in their domain.

~~~
andy_ppp
There was a great talk in the Elixir Meetup London by Anthony Pinchbeck about
organising functionality not around MVC but the functional unit being build.

So for example if you have something that is dealing with registering your
website, create a directory called register_website and all models, functions
and controllers to do with that code go in there. The talk suggested deleting
the controllers and models directories. He also said there was duplication but
everything was more loosely coupled and you could always group things in
deeper functional units later.

I'm not 100% brave enough to go for this structure quite yet but it certainly
makes coming to a project pretty clear as you literally have a list of all the
functions of the site where the code sits for each thing :-)

------
js8
I would add two things to this:

1\. Automated tests are not the only way to write code with less bugs. The
other two, currently less in fashion but IMHO very important, techniques are
abstraction and assertion.

2\. In my view, there are two kinds of automation - bad and good. Bad
automation checks if humans did some process correctly. Good automation
implements some process without human intervention. It's very easy to start
working on bad automation rather than the good.

~~~
quanticle
Automated tests aren't the only way to write code with fewer bugs, but they
are the easiest way to write such code. Automated tests are also the easiest
to have running constantly, giving you instant feedback when you've broken
something. The only downside is that there are some tests (like testing with
particular datasets or high-load performance testing) that can be onerous to
automate.

My personal rule of thumb is that checks for program _correctness_ should all
be automatic. Non-automated tests ought to be reserved for testing non-
functional requirements (e.g. making sure the service is performant under
load, reliability measures like failovers, etc.).

~~~
dozzie
> Automated tests [...] are the easiest way to write [code with fewer bugs].

Actually, automated tests are best at ensuring the code doesn't have
regressions. But writing bug-free code takes much more than just that; it
needs thought-out architecture and interfaces, and with those you often can
get decent code without automated tests, so no, it's not "the easiest way".

~~~
crdoconnor
They also serve as a form of code documentation (defining expected behavior).

When doing TDD, often speed up development owing to the ability to get much
quicker feedback on the code you just wrote.

Code without tests has a tendency to congeal owing to the massive uncertainty
surrounding changing any particular part.

~~~
dozzie
> They also serve as a form of code documentation (defining expected
> behavior).

Oh, far from it. Tests are not documentation, documentation is documentation.
Tests can be treated as a set of examples at best.

~~~
crdoconnor
Good well explained tests serve well as documentation. I've used them before
in lieu of specifications.

Bad tests not so much.

------
objectified
I must say that the more I read articles of this character, the more I feel
that it would make sense to distinct between different types of "engineers".

A lot of the proposed advices are applicable only in certain fields, and not
so much (or not at all) in different fields, whereas every field can go very
deep (so it's a waste of time gaining knowledge you'll never use there). While
a number of these advices are fairly generic, others definitely are not. A
front end (client side) web developer won't benefit from having deep knowledge
of data structures or the kernel, but he will benefit from knowing about Big O
notation. Additionally, he will benefit lots from knowing how browsers work,
and having deep knowledge of HTTP. Someone working on ETL processes all day
will not benefit much from knowing about interpreters and compilers, but will
benefit from knowing a lot about UTF-8. But also, he will benefit lots from
knowing a large number of layer 7 transport protocols. A systems programmer
(defined as "someone who programs close to the hardware") won't typically
benefit much from knowing javascript or how browsers work, but will benefit
lots from Big O notation, data structures and algorithms, and C. And so on.

I'm seeing a future in which we will be able to more distinctly treat separate
specialisms within the field that we now call "software engineering" (or
countless variations on it).

~~~
ilyash
As with any other advice, one should use common sense to see how and which
parts of it apply.

> A front end (client side) web developer won't benefit from having deep
> knowledge of data structures

but basic knowledge is still a must

I consider the basic knowledge of the "must" sections really a must for
everyone. The depth might vary.

------
peterwwillis

      > Automated healing / rollback 
    

Jesus christ on a cracker, please use this sparingly. It's one thing to have
buggy software. It's another to change versions of software _right underneath
your users_. This is like polishing a dance floor while the DJ is still
spinning. It doesn't matter if you're reverting to an old good version;
unannounced, unplanned & unexpected changes lead to disaster. Known bugs are
better than unknown changes.

    
    
      > Don’t duplicate code
    

Yes, this is a really bad idea, but forking is the cheapest possible way to
make changes that break as few things as possible. Yes, it's a nightmare in
the long run. Ask your boss if he really cares about the long run.

Everything else was spot on.

------
gremy0
I don't really get the Ease vs Simple section, what point is he trying to
make?

This is simple:

    
    
      i  = 0x5f3759df - ( i >> 1 );
    

Where this is just easy:

    
    
      famework.authService.login('user','password')
    
    

Good article though.

~~~
scriptdevil
If you haven't already done so, listen to this talk by Rich Hickey (the
creator of Clojure). This should clear it up for you.
[https://www.infoq.com/presentations/Simple-Made-
Easy](https://www.infoq.com/presentations/Simple-Made-Easy)

~~~
ilyash
Thanks! Added link from the post to the lecture.

------
BugsBunnySan
Nice post! I hope lot's of people read and understand it, the world will be a
better place for it :)

One thing that would be good to add is: If you implement a way to create/add
something, implementing a way to delete/remove that something isn't optional.

As Lovecraft put it: 'Do not call up that which you cannot put down.'

This will result in a system that keeps proper track of stuff.

------
ilyash
There are many good points in the comments here. Thank you! I have added a
link from the blog post to this discussion.

------
any1
It's usually managers who buy the hype. They're the ones who need to be told
to be sceptical.

Nice post, by the way.

------
karlgrz
Good list.

I would add "Software that does not ship is essentially useless."

To clarify, I mean that you can polish and try to perfect as much as you want,
but at some point you have to get it into the hands of users to be actually
useful.

------
cmcginty
I'm not convinced on servers must be set to UTC timezone. Maybe just write
your server code to read/store UTC time, which is easily calculated from any
timezone.

~~~
douche
Time should be converted to UTC at the earliest opportunity, and kept that way
as long as possible.

Also, always, always, always, serialized in ISO-8601 format if it is getting
turned into text.

------
nekopa
Can we somehow change UTF-8 to WTF-8?

Over the last 12 years or so I have constantly been bitten in the ass by
UTF-8. It first started when I moved from the US to Europe and decided to help
people make websites in their native language.

More recently I was developing a piece of python code to help me with a thing.
Running into UTF-8 problems made me decide to do a deep dive into what it is
and how it works. After a week of diving down a rabbit hole I am still none
the wiser.

Blog post after blog post, video upon video, and it still doesn't click for
me. Yes, I solved my problems, but in a black box type of way.

Can anyone recommend a good book or long source to help me with this?

------
eternalban
Missing advice: Learn to read specs.

