Hacker News new | past | comments | ask | show | jobs | submit login
What should every programmer know about programming? (stackexchange.com)
84 points by spolsky on Sept 15, 2010 | hide | past | web | favorite | 84 comments

I don't have an account there, but a few things immediately spring to mind.

- Caching and cache invalidation are really hard - if you have to go there, make sure you can disable the cache at any time

- Using regular expressions for anything but processing lines of text means you're probably doing it wrong

- Few programmers can handle concurrency, provide APIs with lots of run-time checks

- Have coding guidelines and enforce them in code review

- Boring code is good code

- Lack of automated tests make every commit a crap-shoot

- Use consistent logging (with levels), if this is a major performance hit scrub the code out for the production build

- If there is no expert on the team, become the expert. If there is, learn from the expert and become one as well

- Always take responsibility for your code

- Sometimes ugly hacks are necessary. Deploy them, but make it a priority to factor them out at the start of the next cycle (don't leave this crap around for 'someone else later')

Oh man, I could go on like this for hours.

you don't need an account to post. Pick one of those and write it up!

I wonder why he thought he needed one.

Because it's a reasonable default assumption, when you're looking at an unfamiliar discussion website?

Hmmm... maybe the opposite should be pointed out.

> Using regular expressions for anything but processing lines of text means you're probably doing it wrong

And since half the people noting this usually just handwave about the jwz quote* : Regular expressions have very definite limitations, which is why complex parsing is usually done with a second layer on top of REs. Regular expressions for tokenizing (AKA "lexing": breaking a stream of characters into individual tagged tokens - this is an operator, that's a floating point number, etc.), and then a grammar is made for those tokens with a parser.

If you aren't aware of the limitations of REs, you can just keep adding layers and layers and eventually end up with madness like this RE to recognize RFC822-valid e-mail addresses (http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html).

* "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." -jwz. Funny for people who already know, but not very enlightening otherwise.

> Using regular expressions for anything but processing lines of text means you're probably doing it wrong

Depends. The theory of regular expression works for all semirings (http://sebfisch.github.com/haskell-regexp/regexp-play.pdf).

On the other hand, if you are using backreferences or something crazy like that, you have left regular expressions.

Lack of automated tests make every commit a crap-shoot

The last thing I want to do is start another broad TDD flamefest, but I honestly believe that if a programmer hasn't even tried a test automation framework (xUnit or similar) then they are being professionally negligent. Would you want to work with a surgeon who was too set in his ways to sterilize instruments?

The last time I was doing developer candidate interviews, I would start the phone screen with "what test automation frameworks do you enjoy using and why". If they couldn't answer that, I did my best to make it a very short phone call.

You are misusing the term, and the conflation around definitions of TDD make up half of every 'TDD flamefest'.


TDD specifically calls not only for unit tests, but for writing them first, ie the test _drives_ the development. You write a test that fails, then write code to pass the test.

One can be anti-TDD but still believe in unit tests, full coverage, etc.

Indeed. I've never seen a TDD debate that has focused specifically on whether tests upfront are sufficient for designing good APIs, etc., though (ironically, the most important part of TDD!). Usually, one side is aghast that the other side doesn't believe in testing (when they usually do, just not to an eXtreme), and the other isn't convinced that letting tests do all the design work makes sense. They're usually talking (or firing cannonballs!) past each other, mistaking disagreement for mutually extremist positions. (And then the "us vs. them" feelings kick in...)

I'm in very strong agreement that automated testing (unit, regression, integration, etc.) pays for itself in most sufficiently large projects, but skeptical of TDD specifically - testing during exploratory programming often gives good feedback, but it's just one of many tools.

Writing a test-cases based on business cases (domain knowledge) before any code is a good practice. It is that simple.

I think it's better to have the spec be the code, as much as possible.

I just wanted to throw in "test automation, yes!" without starting any controversy.

Although I've been operating with a slightly different (and hopefully less controversial) TDD definition. Instead of getting all hung up on sequence and test-first, I ask "are you using automated tests as the primary way of executing your code as you're working on it". If the answer is yes, then at least that part of the code is test-driven. If the answer is no, then at least that part of the code is not test-driven.

You can write the tests first all you want, but if you're not running them, often, and doing all of the "does this work" through the UI or the debugger, how test-driven are you really?

Realizing that this post might just cause another flame out about unit testing, test-driven development (TDD) and policy-driven development I wanted to post what I think is a useful link about a platform that can assist you to "maintain" the human commitment to TDD:


Hope this helps.

Some people can write all the test cases in the world but not get their Sudoku Solver functionally working.

Should you really be looking for people who enjoy using testing frameworks? Testing is a necessary part of development, sure. Enjoyable? No.

You can probably enjoy testing in the same way you enjoy programming.

My old boss had a good saying:

If it isn't tested - it doesn't work.

A former cow-orker had in her .sig, next time you release untested, why not release undeveloped too?

I'm rather new to programming, but sometimes there's good advice to be had from people that are just starting out - they don't take things for granted. Anyway, here's my list:

- How to think. If you can't hold abstract thoughts in your head you should find something other than programming to do.

- How to learn. Progrramming is a dynamic and often changing discipline. You must know how to learn new stuff.

- How to find information. Programming is such a vast field that no matter what you do you'll often come across something you don't know. 99% of the time someone else has had the exact same problem, and the solution is out there if you know how to find it.

If you know how to think, learn and find information everything else will follow.

How to find information is a skill that is no longer just for librarians, that's rapidly becoming a 'must have' skill for just about every job.

If you don't know how to effectively search for information that is already available out there then you are probably wasting valuable time re-inventing various wheels.

If there is one thing I would do if I had a company with employees of any kind it would be to teach newcomers in a couple of hours the finer points of using search engines and other online resources.

You're absolutely right - interesting point.

A friend of mine is a diesel truck mechanic, which to me seems like a job that doesn't have much to do with the Internet. Yet he spends lots of time researching and finding information. When he debugs engines that don't work the Internet is his first stop. Just like in programming there's always someone somewhere that's had the same problem and he's usually able to find it in some obscure place.

> How to find information is a skill that is no longer just for librarians, that's rapidly becoming a 'must have' skill for just about every job.

And don't forget about asking librarians, either. Some research librarians are so good at finding things, it's scary. They're like meta-meta-search engines, except they're usually much better at teasing the relevant details out of vague questions.

It would be quite nice if there were a service half-way between Ask Jeeves and Mechanical Turk, where any query you entered would be passed to a librarian, who would reply with an optimized search string (with the option of starting a chat with you to "tease the relevant details out.")

Meta-meta-search engine is such a great descriptor. I have worked in a library for 4 years now and that is the best explanation of a librarian I've heard.

Being new to the field, you've proved to be astute. Anyone who can do those things earnestly will do just fine.

Historical note: there was a time when you could retain everything in the field, and it was around the time I was taking home more than ten trade rags a week that I decided that a new tactic was required: where to find information. That was well before the Internet, of course. And now we have it, it is a fantastic resource for us. Geez! Complete lumps of working code. It's too easy :)

- How to think. If you can't hold abstract thoughts in your head you should find something other than programming to do.

I agree that proper methods of thinking and holding abstract thoughts is necessary to program well, in fact it is probably the most necessary criteria. But if you cannot do that the answer is not to give up on programming (if you want to program), but to learn to think and reason abstractly.

And programming is good for practising abstract reasoning.

Could you explain a little bit more about #1? What would an example of holding an abstract thought in your head be?

There is a big difference between a new (so-called must-have) skills and essential skills.

All knowledge that you really must have is already developed long ago (look at things like SICP or Mythical Man Month, or XP Explained) and remains unshaken.

In contrary, this or that new cool tool to make a dynamic web-pages, or a brand new pile of a useless abstractions in the next revision of J2EE or .NET are third-rated knowledge. Google search is enough in this case.

Take a look at Berkeley's CS 61A course (with that honorable old-school lecturer and a nostalgic 80x24 text-mode console on Solaris 2.7) - all the essential knowledge is here.

So, I encounter a lot of programmers that have read a little and consider themselves to really know what they're talking about. However, they can't explain how an index works - not even in simple terms. Like, if I'm looking for someone with the username 'johnnygood', why would indexing the username column make that lookup faster? Even a very simplistic answer like, "an index orders the information so that the database server can jump to that record binary-search style rather than looking through every row" would be wonderful.

In that light, I think that programmers should know the basics of computer science: what references are; how a linked list works; a little about tree data structures; stacks, queues; basic sorting and searching stuff; etc. I'm not saying that you need to be able to talk about everything off the top of your head. I just think a familiarity with the theoretical concepts is good.

On the other end, I see a lot of people who know that PHP interprets strings using single quotes faster than double quotes, but have no idea why NoSQL might scale better (and might actively think it's because of the SQL language).

I want someone to say "a lot of the NoSQL technologies work by having you pre-compute data so that your data structures have the information you want to display together stored together in a structure that can be read more quickly like a hash." It isn't magic.

I somewhat disagree with this - it depends entirely on what you're doing. If you're writing a 3D gaming engine you're absolutely right, but if you're writing simple webapps, which most people are nowadays, you don't really need to know what a linked list is. Your programming language of choice will abstract it away.

These are certainly good things to know, but for most people not really essential or even important.

if you're writing simple webapps, which most people are nowadays, you don't really need to know what a linked list is. Your programming language of choice will abstract it away.

You still need to understand it at least well enough to know that you really shouldn't try to access random list elements. Just like you need to understand arrays well enough to not try to insert things into the middle of them.

I think Yegge was right about compilers. Without understanding low-level languages and how high-level languages get transformed, a programmer can never really understand the language he's working with.

If there would be only one thing that you would really have to know it would be divide and conquer, everything else follows from there. It allows you to tackle any size project and to solve problems that are too complicated at first glance.

It allows you to debug really tough bugs and keep moving.

The guy that taught me that trick was a long time COBOL programmer, he took pity on me after not being able to crack a certain problem and it was like someone handed me an electric drill after puttering about for years with a pocket knife.

What would be an example of a successful application of this approach?

When faced with a problem too large to solve divide it in to two roughly equal halves.

Pick the first half of the two and see if you can solve it.

If not recurse with the first half, else solve it.

Now look at the other half and see if you can solve it.

if not recurse with the second half, otherwise solve it.

For debugging it mostly relates to chopping up the program in to bits that are 'verified good' and 'unsure', until you are literally staring at the bug.

Some sort/search algorithms are an almost literal implementation of divide-and-conquer.

edit: the more I think about this the more I realize that this has pervaded in to just about everything that I do. I'm currently rebuilding a car that has crashed and I find I use divide and conquer to troubleshoot and to repair. Tomorrow I'll be puling the engine to replace a cracked gearbox casing and for sure I'll use it again, both to remove the engine, to remove and replace the gearbox and to put the whole thing back together again, one small 'obvious' step at a time.

edit2: see also: http://news.ycombinator.com/item?id=1695794

> For debugging it mostly relates to chopping up the program in to bits that are 'verified good' and 'unsure', until you are literally staring at the bug.

It should be noted there's a whole class of bugs where divide and conquer simply doesn't work. E.g. about anything with race condition.

Good points otherwise.

True enough, and interrupt driven programs tend to exhibit very nasty ones as well where that strategy does not work.

It's not a 100% silver bullet but it will get you quite far.

Excellent suggestion. This open course lecture from MIT has more, for those who want an example or two:



With only four years under my belt, I spend a lot of my time feeling like a woefully ignorant developer. I was hoping for a little more meat, but the discussion so far seems mostly common sense.

What should a self-taught programmer know that schooling or tragedy would have otherwise taught me?

For example, I know there are design patterns for data structures. I've modeled plenty of data and nothing has ever blown up on me, but I suspect I could have done it even better with a solid grasp of best practices for data structures. What should I know about these things?

edit: Also, I owe StackOverflow et al a huge chunk of my success but damn the you're not logged in slidey bar is obnoxious as hell.

I personally think a strong mathematics background is under-rated, and it is uncommon for self-taughts to have one (speaking as one with that background). Of course, even a lot of CS grads don't have the kind of mathematics background I would espouse.

It's not about the specific mathematical techniques, but the general way of thinking.

Personally I absolutely suck at maths, the only maths I use now is basic mental arithmetic. I'm also completely self taught, but agree with your concept that it's a "way of thinking", because I am a very logical person.

Caveat emptor, I'm just a web dev, not a crazy 3d modeller or anything.

I think a really important thing that many people realise is that Maths is not just numbers. Arguably most of Computer Science is a kind of Maths, really Maths is just an abstract notation for describing systems.

For Programmers in particularly, I think Mathematical Functions[1] and Set theory[2] are some important areas that you should look at.

[1] http://en.wikipedia.org/wiki/Function_(mathematics) [2] http://en.wikipedia.org/wiki/Set_theory

Data structures really clicked for me when I started learning OCaml. While it has its strengths and weaknesses as a language, it has an excellent notation for data structures. The combination of pattern matching and type inference really clarifies things.*

Whether you use the language in the long run or not, reading chapter two of _Developing Applications with Objective Caml_ (http://caml.inria.fr/pub/docs/oreilly-book/) and doing its exercises, in OCaml, (http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora020....), will probably help quite a bit, and shouldn't take too long.

* Haskell may help, too, but its emphasis on laziness will also mean many algorithms and data structures won't translate as easily. SML is probably as good as OCaml, but I don't have much experience with it.

And don't forget "Purely Functional Data Structures".

Yes! I originally mentioned that, but it didn't really fit and I cut it.

Okasaki's _Purely Functional Data Structures_ uses SML to describe a variety of "purely functional" data structures, most of which are immutable, persistent, and exploit laziness for amortized performance guarantees. (There's an appendix with Haskell versions.)

'Disregarding a bunch of things I don't want to talk about, what do you think a programmer should know?'

Seriously? You're going to take huge swaths of things off the table to try to force the discussion the way you want it to go?

I'm actually more interested in the parts he chose to ignore, because all the replies are about things that every programmer learns in his firsts 2-3 years of work -- so nothing especially insightful, just common field knowledge.

It's plenty insightful when you don't have 2-3 years of experience.

I guess HN is not the best place for targeting inexperienced programmers.

I'm inexperienced in any language that I did not write at least a mid-sized program in, on every system that I haven't used yet and on every program whose code is not intimately known to me.

There is 'general' experience that applies equally across the board and that experience you either have or you don't have it, but for the rest we are all only as experienced as we are familiar with the tools we use to do our work.

Try switching from your 'favorite language' to one that is your least favorite and making some headway, versus the 'experienced' (but relative newbie) on that platform. You won't stand a chance.

Programming computers and computer science has become too large to still be able to be an 'all-rounder', you can't do all of 'embedded systems', 'web apps', 'operating system kernels', 'algorithm research' and so on. And if you can then you're either not human or exaggerating ;)

But, generally speaking, you are experienced. What you say is of course true, but that's what makes the linked article even less interesting. If I wanted to learn more about new language, new system or new tool, I would rather ask "what should every foo language programmer know about foo". The question posed on stack exchange asks about nothing specific. It's like one asked "what should everyone know about life?" and got replies "always remeber to brush your teeth and flush the toliet". Meh.

If HN was the mathematics forum, one would read and comment news about research in number theory and algebraic topology, not high school math, for it is not particularly insightful and entertaining.

I think its a classic 80/20 rule. You'll learn 80% of what you need in the first 20% of your career, the rest is what makes you really really good.

The thing is, the 20% isnt the technical stuff, its everything else.

Learn to simplify. If your code is long, hairy and convoluted, there's probably a better way to do it.

If the problem itself is long, hairy and convoluted, then redefine the problem in a different way. ie, come up with a different paradigm for achieving the same result.

If you find yourself writing a lot of special edge case handlers, then try to redefine the model to implicitly take care of those edge cases.

None of this is easy, but being aware of it goes a long way towards becoming better at it.

I've had this experience. It took me a while before I realized that a decent chunk of hairy tricky code could be avoided if I just started out with an appropriate data structure in the first place. Starting with a good data structure in the first place is almost like starting with the solution for a surprisingly large amount of problems.

Now, whenever I find myself getting confused I try to think really hard about where I went wrong. Good code is like good prose. If you have to read it six times to understand what is happening, the writing sucks.

Have a look at the data structures in "How to roll your own Window manager" (http://donsbot.wordpress.com/2007/05/17/roll-your-own-window...) for a nice example.

Voted up, not because I thought the SO page was interesting or very useful, but it did have a link to this page, http://docs.sun.com/source/806-3568/ncg_goldberg.html , What Every Computer Scientist Should Know About Floating-Point Arithmetic, that is well worth reading.

Found nothing interesting. Its submitted by spolsky, is he looking for traffic to his website ;) Nothing bad about it though :D

I think he's looking for traffic and for participation. Have a better suggestion? Post it, you don't even need an account.

I was just kidding. I guess was not obvious from couple of smileys. I have a huge respect the SO folks and blogs they write. Have learned great deal by just reading there blog. Thanks for downvoting though!!.

'twasn't me, I rarely downvote.

I think HN may be getting a little more reactionary -- both more quick to upvote and more quick to downvote. I've noticed that some of my comments get far more upvotes then they're worth and also some unexplainable downvotes.

This comment(http://news.ycombinator.com/item?id=1695010), for example, seems on-topic but was downvoted pretty quickly.

I've noticed something like that recently, although I hesitate to call it a trend from just a few observations. I suspect some people upvote/downvote based on personal feelings alone, or whether they agree/disagree with a post. I prefer to be a little more disciplined with my voting. I like the way stackexchange et al works in this regard, where negative votes have a cost to the voter and voting is rate limited (everyone has a maximum number of votes per day), this ensures that most voting is taken seriously.

Questions like this degrade the professionalism of the field. Better programmers than any of us have to tried to cover the topic so generally, and they still came up with whole books that barely scratched the surface. If you want to know how to be a programmer, do some tutorials, then go to college for a few years, then work underneath an experienced professional for a few more years.

Can you imagine how silly it would sound applied to other fields?

"Hey guys, I decided to be a nuclear engineer/attorney/pilot/paramedic. What are the main things I need to know? I'll figure out the rest as I go along."

I was going to reply with a comment. But one of the most Important things to realise as a programmer is that your program is not unique and most likely someone smarter has spent longer on crafting a much better solution . Here's a full post with a comprehensive answer.


An even more important requirement is realising when your solution is better and being able to explain exactly why.

You didn't get it right the first time. You aren't getting it right this time. You won't get it right the last time.

Learn when to fix it, when to move on to something else, and when to come back. This is the hardest part.

Likewise, nobody else ever gets it right. Know when to build, when to borrow and tweak, and when to steal.

That sounds like a good statement about life in general.

I keep looking forward to the time I have it all "figured out". As I get older, I've started to think this never happens.

What lexing and parsing are, just a vague overview is fine. Better yet, passing familiarity with at least one parser generator framework.

Most of the most horrid WTFs I've seen is people's custom parsing routines. Horrible to initially code, worse to maintain.

Understand how computers handle different data types and what their min/max values are. Someone sent me the following code today (in Javascript):


A) It's unnecessary and will get "trimmed" down to 2147483647 on 32-bit systems. B) Clearly the programmer that wrote this has absolutely no idea that his number is going to require a 64-bit long int (on applicable systems) just to generate a random number to make their URL non-cached. What a waste.

</minor rant>

The result of this is a 64-bit double (there are no other numeric data types in JS and this is the data type regardless of the underlying CPU architecture). That has just under 16 digits of precision so the above code will not lose much by multiplying by 10^16 and rounding.

Why would you think this this calculation would get clamped to the range of a signed 32-bit integer?

I wrote this comment (http://programmers.stackexchange.com/questions/1785/what-sho...).

I have a lot more advice for systems developers (which can basically be summarized as "you're never as smart as you think you are -- plan for it") but I don't think they generalize as much to "all programmers."

Joel, didn't you answer this yourself a long time ago? Pointers and recursion. (Though I'd add concurrency if you intend to be programming much longer.)

"Code doesn't exist if it is not checked in Version Control System."

What if it's "checked in" to just another part of the same harddrive (e.g., git)?

git is a version control system.

That point is about people completely unaware of VCSs.

Git is not yet ready for production use in commercial environments. It's got a long tough battle ahead if it wants to displace p4 and svn.

Use the right tool for the job.

The programmer is the important element, and the language and tools should be chosen based on the problem. .

On writing concurrent programs: Why don't we make the interpreters/compilers do that automatically?

A simple answer is because its hard. A slightly better answer I think, is that the current crop of more expressive languages (such as not Fortran and including functional languages) are harder to analyse and even harder (much more so) to synthesize in a sufficiently coarse grained manner (parallelized but now less efficient due to overhead say).

The key is a way of providing non-obtrusive (to the programmer) hints to the compiler - whether by annotation or the type system or whatever.

You are never done learning. (But that is probably true for life in general as well...)

I echo a commenter on the article: Code Complete 2. Read it all, use it all.

"Every single line of code you write will probably solve a problem and will introduce at least one bug"

I believe that sort of extreme skepticism is helpful in writing solid code and thinking about unintended consequences.

Thinking is more important than typing.

The difference between a programmer and a coder.

For example, programmers are talking about ideas, algorithms and abstraction layers, while coders talking about tools - which one is the right tool (Java, of course) which one is the best for so-called web-development (PHP, of course), and about cool features of C# if they're using that system.

The difference is the same as between, say, GM engineer and a taxi driver.

There are also sysadmins, who know that everything sucks, but some things when fine-tuned suck less. ^_^

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact