Hacker News new | comments | ask | show | jobs | submit login

How to be an Excellent Programmer for Many Years

(Excellent==Successful. Money & fame are more difficult to control.)

1. Choose a small subset of available technology, learn it intimately, and embrace it. Then evolve that subset.

2. Understand the pros and cons of various data structures, both in memory and on disk.

3. Understand the pros and cons of various algorithms.

4. Understand your domain. Get away from your computer and do what your users do.

5. Be ready, willing, & able to deep dive multiple levels at any time. You must know what's going on under the hood. There is a strong correlation between "number of levels of deepness understood" and "programming prowess".

6. Use your imagination. Always be asking, "Is there a better way?" Think outside the quadralateral. The best solution may be one that's never been taken.

7. Good programmer: I optimize code. Better programmer: I structure data. Best programmer: What's the difference?

8. Structure your data properly. Any shortcomings there will cause endless techincal debt in your code.

9. Name things properly. Use "Verb-Adjective-Noun" for routines and functions. Variables should be long enough, short enough, and meaningful. If another programmer cannot understand your code, you haven't made it clear enough. In most cases, coding for the next programmer is more important than coding for the environment.

10. Decouple analysis from programming. They are not the same thing, require different personal resources, and should be done at different times and places. If you do both at the same time, you do neither well. (I like to conduct analysis without technology at the end of the day and start the next morning programming.)

11. Never use early exits. Never deploy the same code twice. Never name a variable a subset of another variable. You may not understand these rules and you may even want to debate them. But once you start doing them, it will force you to properly structure your code. These things are all crutches whose use causes junior programmers to remain junior.

12. Learn how to benchmark. Amazing what else you'll learn.

13. Learn the difference between a detail (doesn't really make that much difference) and an issue (can end the world). Focus only on issues.

14. Engage your user/customer/managers. Help them identify their "what". Their "how" is not nearly as important.

15. Write a framework, whether you ever plan to use it or not. You'll learn things you'll never learn any other way.

16. Teach others what you know, either in person or in writing. You'll accidently end up teaching yourself, too.

17. Always tell your customer/user "yes", even if you're not sure. 90% of the time, you'll find a way to do it. 10% of the time, you'll go back and apologize. Small price to pay for major personal growth.

18. Find someone else's code that does amazing things but is unintelligible. Refactor it. Then throw it away and promise yourself to never make the same mistakes they made. (You'll find plenty.)

19. Data always > theory or opinions. Learn the data by building stuff.

20. At some point, run your own business (service or product). You will learn things about programming that you'll never learn as an employee.

21. If you don't love your job, find another one.

"Never use early exits" is tantamount to saying "never study programming languages deeply enough to understand control-flow graphs". That's not simply a good rule of thumb, it's a terrifying omission. I would suggest that early exists are simply a matter of taste and never using them is the crutch that keeps junior developers junior.

Data are on my side here: a quick perusal of the Quake 3, the Linux kernel, the Clojure runtime, and LevelDB show that Carmack, Torvalds, Hickey, and Dean all use them where appropriate.

Specifically try to judge whether an "early exit" leads to simpler, more maintainable code in this particular case. Optimize for lower cost, not for just following a rule.

Agreed. A lot of the time, early exits just make sense. Instead of holding a variable called "succeeded" and doing a bunch of if elses, why not just continue on every failure? You're only testing for half a dozen side-cases and just exit out early when you encounter any one. No need to wrap all of your actual work in 5 levels of if elses.

That one caught me up too. Maybe if you have full control of the input data and performance is not a concern, then they should be avoided, but "never" is a pretty small set.

Although 'never' is a very hard word, I found that in real live most code that uses early exists simply breaks SRP (come to think of it 99% of the time that is the case)

I know people will say "but I will only write this little return statement here" before you know it you will "add this little assignment before it returns" ...

Never might be strong but it make a very good point.


> 17. Always tell your customer/user "yes", even if you're not sure.

I have found that you get higher marks for saying NO in the right occasions than being known as the one that always says YES.

Learning to say no is the biggest challenge and virtue of the best developers out there. It's not a "no I can't do it" it's more of a "no I can't do it in that time" or "no I don't think it's the right way to go, here is how I think we should do it". Saying NO can make a difference between a successful project and a nightmare.

There's clearly a middle position to be had here. By saying NO if you're not completely confident you can do something, you miss out on the best learning opportunities. That said, if you suspect something cannot work, or isn't the right way of doing something for another possibly non-technical reason, being prepared to say NO is critical for delivering.

So possibly: If you always say NO, you never learn anything. If you always say YES, you never deliver anything.

It's important to also put either answer in the context of how it will affect the Time, Cost and Quality Triangle of program management. The customer loves to hear 'yes' but sometimes don't realize that that 'yes' is going to typically mean more time and/or money.

That's a good way of putting it.

"Never use early exits" is exceedingly bad advice, in my experience. What has led up to your claim that early exits are bad? As I've gained experience, I find myself writing code like this more and more:

    if (UnexpectedOrUnhandledCondition(x)) {
      return false;

    if (OtherError(y)) {
      return false;
and so on. Serial code like that is trivially easy to understand when reading; more importantly, six months later, it's far more explicit and easier to comprehend than the comparable single-exit version.

The alternative to early exits is nesting, and nesting is the readability killer. My experience has apparently been just the opposite of yours: that avoiding early exits is a sign of a junior programmer, and returning early whenever possible is a sign of an experienced programmer who knows that understands code maintenance.

Agreed. Same for using goto in C for error handling (e.g. http://www.xml.com/ldd/chapter/book/ch02.html#buierr) I found this used often in Linux kernel and I absolutely love it how clear it is.

I agree. The alternative as you state is nested conditionals. Yuck! An alternative construct I have seen and used is to wrap code in a do {} unless(0) block and break to a single failure return or complete the block to a single success return.

Always ignore advice that begins with the word "always". The use of "never" is never a good sign either.

Wahh, I'm caught in a recursion. halp.

Case-distinction-man to the rescue

Just what I needed, a lexer that cares :)

I never say always

know when to break the rules

Only Sith deal in absolutes!

> 11. Never use early exits.

As a general piece of advice, this is flat out wrong. Or, to put it more correctly, this is only valid advice when using certain languages and only under certain conditions. Just because it is sound advice in languages with C/C++ derived syntax doesn't mean it should be presented as good advice in general.


Also, for example, when you have lots of "if" clauses, those languages force us to use mutable state (and a variable initialized as null) just to avoid early exits. But we can avoid both by treating the "if" as an expression, as in Ruby, Lisp, etc, or with the ternary conditional operator...

Plus, IMO, there are some situations where early exits are a good idea even in languages derived from C, such as with guard clauses/preconditions.

There's a nice discussion about that here with some points I made: http://c2.com/cgi/wiki?SingleFunctionExitPoint

(EDIT: I edited a lot but still couldn't make it clear that I agree with you, but I still ended up making points about completely different things, sorry about that, my english skills are lacking sometimes)

Why do you think it is bad for C? Take for example the Linux kernel where you find early exits basically everywhere. (See also my other response here: http://news.ycombinator.com/item?id=4626763)

Even less bad in C++ where destructors can automatically handle any cleanup (exactly that is also common practice in C++; making the cleanup implicit). Similarly in many other more high level languages with GC and/or automatic refcounting or so.

In rare cases, it might be more complicated, though.

But you seem to have some languages in mind where it really doesn't matter? What languages?

Well, most early returns are avoided mostly because it makes it harder to understand the code. But when you use guard clauses, like on your Linux Kernel example, you actually improve the code, because you reduce cyclomatic complexity (everything is now a linear path: a -> b -> c), you can reduce nesting, you can avoid unnecessary mutable state... so it's a win. The problem with early returns is when you use them in non-obvious places.

I believe that the majority of programmers agrees that using early returns for guard clauses is an okay exception.

Yes, I think really "avoid early returns" is an overextended special case of "avoid the nonobvious".

I understand why not to use early exits. But what's wrong with early returns?

Huh? You _want_ early exits in c code how else will the compiler optimize?

It will find a way...

Please elaborate.

> 13. Learn the difference between a detail (doesn't really make that much difference) and an issue (can end the world). Focus only on issues.

Good advice. So why do you give the same amount of attention to variable naming and early exits as to much higher level issues? I don't think the difference between Carmack and a random developer has much to do with adherence to coding standards.

Variable naming and early exits are not details. They are fundamental issues, two of the most common causes of shitty code. Most programmers don't understand how important they are and relegate them to the pile of "coding standards" or just want to debate their theroetical pros and cons. Just look at what's happened to this thread.

I think you should heed your own (quite excellent) advice. In this case the real issue is how to best structure code for readability, maintainability, and fewer errors. Things like naming conventions and module structure are the details by which we accomplish these higher goals. Don't get too hung up on doing things one way or the other -- focus on what you're trying to achieve.

Could you please elaborate on No. 11?

"Never use early exits". Could you please elaborate what you mean by that?

PS. I'm not trying to unintentionally start a flame war. Just trying to understand.

Excellent question that is difficult to answer. I'll give you a short response here and then write a blog post with example code when I have time. Put your email in your profile and I'll make sure to let you know when that's ready.

A little background: I have gotten many calls when legacy code has a bug or needs a critical enhancement and no one in-house is willing or able to figure it out. I'm no smarter than anyone else, but because I'm stupid and fearless, I often do something that few others attempt: I rewrite or refactor the code first, then work on the bug/enhancement. And the first thing I always do is look for early exits. This has always given me the most bang for my buck in making unintelligible code understandable. The only entry to any function or subroutine should be on the first line and the only exit should be on the last line.

This is never about what runs fastest or produces less lines of code. It's strictly about making life easier for the next programmer.

Early exits can make things really easy (I'll just get out now.) 20 lines of clean code. No problem. Until years later, when that function is 300 lines long, and the next programmer can't figure out what you're doing. Much of the maintenance had been done in emergency mode, each programmer just trying to get in and out as fast as they could.

Early exits make it much easier for bad things to evolve:

  - 200 line multiple nested if statements
  - 200 line multiple nested recursions
  - unidentified but critical "modes" (add/change) (found/notFound)...
Removing early exits forces you to understand what really should be happening and enables you to structure code into smaller, more intelligible pieces.

Short, hand-wavy response, but I hope that helps clarify some. Stay tuned for a better answer...

I'm not sure if I would agree on this. Take for example the Linux kernel. You have basically early exits just everywhere. Take any random file from here: http://git.kernel.org/?p=linux/kernel/git/stable/linux-stabl...

It often goes like this:

  check if we have already done this or so -> exit
  check if this is possible to do. if not -> exit
  do some first thing. on error -> exit
  do some other thing. on error -> exit

This coding style is also referred to as "guard clause", embraced by e.g. Martin Fowler (http://martinfowler.com/refactoring/catalog/replaceNestedCon...) and Jeff Atwood (http://www.codinghorror.com/blog/2006/01/flattening-arrow-co...) (and by me ;-))

He also said his statement "This is never about what runs fastest...". Kernel code needs to run fast and take advantage of shortcuts.

IME, the jobs most programmers are doing don't need to try to accomplish maximum speed or need to wring a few bytes out of RAM. Certainly we don't want to be wasteful, but long-term maintainability is more important, again IME, than absolute speed or minimising memory footprint by a few (k) bytes.

Back in the bad old days when I was writing programs to run on mainframes, yeah, we did need to fight for every byte. A $5 million machine back then had less RAM and less raw CPU power than a tablet does today. We don't live in that world now.

> He also said his statement "This is never about what runs fastest...".

This style has nothing to do with running fast, it has to do with lowering the cognitive load of the rest of the function. A branch means you have two states to keep in mind (the state where the branch is taken, and the state where the branch is not taken). Without early exit, you have to keep both states in mind until the end of the function just in case.

With guard clauses, you can discard on of the states (the one which matched the clause) entirely.

Computers can do that repeatedly and reliably with even (hundreds of) thousands of discarded states. People reading code and trying to understand what's actually in play at line umptyfratz of function iGottaGetThisFixedNow(), not so much.

In the end, I'm not disagreeing with early exits per se, just that over time they can make it more difficult to understand function because assumptions about state have to adjust as a maintainer goes through the code. Those assumptions may have been crystal-clear to the writer originally but how many times is the original writer the only maintainer?

> And the first thing I always do is look for early exits. This has always given me the most bang for my buck in making unintelligible code understandable.

Ouch. It will be kind of amusing if we ever work on the same code - I frequently start a bug fix by refactoring to introduce as many early exits as possible. I find guard clauses so much easier to understand than nested conditionals that sometimes refactoring it like this is the only way I can understand the code at all. I would love to see a blog post where you compare different styles.

OTOH, If you're talking about things other than guard clauses then I think we might have a much more similar viewpoint.

> when that function is 300 lines long

This is what I would focus on avoiding instead.

What if you fail on line 20? Carrying on may cause cascading errors, what alternatives are there to return an error code directly apart from storing a return value in a variable, use a goto to the end and return once? (assuming this is C)

But, a computer scientist once argued, multiple returns are structured: they return to the same point. Knuth wrote an article on structured programming with GOTOs, and pointed out that this can save memory.

I disagree because I find that it makes my code more elegant. A lot of smart developers I respect have also embraced early exits. But then again I haven't yet worked with large legacy codebases.

I'm still learning so I'd be grateful for a heads up if you make that blog post. I've put my email in my hn profile.

Also early exits are the exact same things as GOTOs. It equals to:

  goto end;
It should help people figure out why it's bad - in 99.99% of the cases.

I think you misunderstand why goto's are frowned upon. Goto's are bad because they allows you to jump to anywhere in the code, regardless of structure, which undermines all assumptions you can make about control flow.

Function calls and return are like goto that obeys structure, and therefore don't have the same problems.

Goto's are bad because they allows you to jump to anywhere in the code, regardless of structure, which undermines all assumptions you can make about control flow.

Knowing where you are jumping doesn't help you to make assumptions about the control flow.

Goto's are bad because they allows you to jump. The jump in itself is the problem because it breaks the instruction flow arbitrarily - without explicitly expressing the boolean condition for it. Early exits are of the same kind: they don't express explicitly the boolean condition of the jump. We know where we are jumping. Not why. With time, the boolean equation of the code which determines the instruction flow is unmaintainable. And then you end up not understand where your flow is going through, not because you don't know where a jump is going, but because you have lost why.

Most gotos, early returns, breaks and continues (C speaking) are considered to be bad habits for this reason.

return only goal is to return values to the function caller. Not to jump.

Function calls jump back to whatever call them so it's like there has been no jump at all in terms of instruction flow - you basically can continue to read the code assuming some code has been executed behind the function name.

While loops are the exact same thing as GOTOs. It equals to:

    if(!p) goto end
    goto start;
It should help people figure out why it's bad - in 99.99% of the cases ;)

A loop is based on a conditional jump (aka. a flow control statement in imperative langs) which makes all the difference. The problem with jumps (is usually called a jump, purely arbitrary jump) is that they are arbitrary: http://news.ycombinator.com/item?id=4627730

How do know what happens at an end-brace? Have to go look at the matching start-brace to see if it is a while or for or if. Madness! Can I assume the code above even executed once? No! What of I check the guard clause first? It has variables! Oh heavens, control flow depends on arbitrary RUNTIME data!?

I believe early exit is strictly better than `goto end`. There's no language enforcement requiring the "end" label to be at the end of the function. If there was, I wouldn't mind the use of `goto end`. I don't think your argument is persuasive to anyone who doesn't already agree with you.

Indeed. I have found "guard clauses" with return/error at top of function to be clearer and less error prone.

I believe he referred to the rule to use a single return statement at the end of any single function definition.

I'm glad you asked, I have the same question exactly.

I think the biggest problem is that early exits make you forget you have to clean things up before returning (a bigger problem in C and sometimes Java, than it is in modern dynamic languages). There are certainly other reasons, but I'll leave them to others that are more experienced!

19. Data always > theory or opinions. Learn the data by building stuff.

This is not true.

It depends on how much data you can get and how representative it is.

A big mistake is to assume that an unrepresentative sample of data is representative, and to draw conclusions based on that assumption.

If you do not have a representative set of data (and you can't always get one) then data can be less-than "theory or opinions".

If we are in a situation where we are unable to collect the data in the first place and you propose that theories or opinions are better, from what place did these putatively superior data or opinions come from? Because they clearly aren't coming from data.

Leveraging the experience of many? I wan't my website to be fast. I don't have deep understanding on web optimization and don't have yet enough data or time to build my own solution. I search around in the web, and go to http://developer.yahoo.com/performance/rules.html. I trust them (yahoo) because them are probably more competent and have MORE DATA than me. I implement some of the ideas, get some results (and if I'm smart, get a clue on why are good ideas and how it affect the performance).

Acting in data IMHO is only if have LOT of data. If not, acting in expertise of others look best to me..

If we are in a situation where we are unable to collect the data in the first place

that's mischaracterising my point. I didn't say anything about having no data. I talked about when the data you're able to get is unrepresentative.

from what place did these putatively superior data or opinions come from? Because they clearly aren't coming from data.

i think that's just playing word games with "data". We're talking about data concerning the problem at hand. Theories may come from data, but they don't come from data you obtain about the problem at hand. And knowledge can come from experience, which again is a form of data, but clearly not the sort the person was talking about. You can reason based on knowledge and principles.

Sure, it'd be better if you had good data, but when you can't have that sometimes the best you can do is to reason based on knowledge/experience/theoretical ideas.

And to add to that: one can easily misinterpret (or even just mismeasure) data, especially if it is data about a complex system.

Great list Ed.

About #17, one of my clients once told me he hired and kept me because I never outright, or in the beginning said no, and instead said 'let me think about it', or 'anything is possible relative to time and money'.

As guides to implementing technology, I think this is a fundamental skill.

> 18. Find someone else's code that does amazing things but is unintelligible. Refactor it. Then throw it away

Throw away the unintelligible code, or throw away the refactor?

Thoughtful, elaborate comments like this are why I read HN. Thanks, ed.

And Never let someone tell you that you're 'junior' and they know better. Rules were made to be broken.

When you advise against naming a variable a subset of another variable, are you talking about confusing names, or practices like breaking off a chunk of array to process instead of finding a smarter way to step through it like mapping or in-place logic?

No, just variable naming.

As we speak, I'm maintaining some old code with the following variables names:

  - CommandRec
  - Command
  - Comm
  - Com
along with the fact that some of these are reused for different purposes and all are global.

I'm having a hell of a time finding all instances to rename them properly.

I understand that some programmer interfaces are better than others hanlding this, but that belies my main point:

"Just name shit what it really is."

Awful names for sure. In Java and with Intellij IDEA for example, renmaing variables is a breeze, with a text editor much harder.

If only the programmer had the sense to name them in a style-compliant way






(Couldn't resist that last one)

I bet you could turn that into a profitable book. HyperInk?

Great comment. But aren't 7 and 8 kinda at odds with each other? I understand that #7 is a bit hyperbole, but still...

I think that 7 and 8 are actually saying the same thing. #7 is saying that structuring your data correctly is optimization, and #8 is directly telling you to structure your data well.

The best programmer in #8 already did #7. So he never had to write a lot of code to handle poorly structured data. That's why he says, "What's the difference?"

Oh, now I get it. When reading #7 I got the impression that you were dismissing the so called micro-optimization - something along the lines of "ship early, fix things later. Don't get stuck over-engineering and delaying writing code".

I like the last point..

well worth the read. thanks!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact