Hacker News new | past | comments | ask | show | jobs | submit login
Writes large correct programs (johndcook.com)
82 points by ColinWright on May 17, 2011 | hide | past | favorite | 32 comments

If you ask an amateur whether their program is correct, they are likely to be offended. They’ll tell you that of course it’s correct because they were careful when they wrote it. If you ask a professional the same question, they may tell you that their program probably has bugs, but then go on to tell you how they’ve tested it and what logging facilities are in place to help debug errors when they show up later.

I was at a local Hackerspace get together and there were about 4 NASA guys there. My coworker and I were presenting a hardware project of ours, and the NASA guys started talking about one's solder join error rate. I'd never even thought about having one! I just operated on the premise that if I tried hard enough, I could get it all right. Of course, it almost never happens that way. The way those NASA guys thought of it is right: you have some average error rate. Use the error rate to estimate the number of expected defects, then look for them.

I guess that's the difference between the amateur and professional. (I am not a professional hardware person.)

  Good umpire:   "I call them as I see them."
  Better umpire: "I call them as they are."
  Best umpire:   "They aren't anything until I call them."
Same idea:

  Good programmer:   Smart and gets things done
  Better programmer: Writes large correct programs
  Best programmer:   What he/she writes becomes the definition of success.

> "If you ask an amateur whether their program is correct, they are likely to be offended."

But if you ask a computer scientist, they will know the right answer is "I have no way of knowing, but it works on some known cases".

And that's a very important part of looking for people who "Write large correct programs" - the fact that we simply can't know when a program is correct.

Nobody knows unknown unknowns, but good programmers will be able to enumerate the kinds of errors they're guarding against.

It's easy to escalate all of the discussion here.

    Poor programmer expect to be correct. 
    Better programmers guard against a variety of error types
    The best programmers are constantly thinking up new ways 
    things could good wrong, not satisfied with just a set list
And I'm sure we could add more.

> the fact that we simply can't know when a program is correct

What do you mean? Correctness of a program means it works according to the specification. Programs can be proven correct.

They cannot. It is simply impossible to prove that a given algorithm behaves well for all inputs (that is to say for all memory states). You can show that it works as desired for a subset of the problem and infer from there that it will most likely work for other inputs as well.

But you cannot prove it perfectly. You can sometimes strictly prove an algorithm is correct, but that still leaves your implementation, your compiler, your physical machine and a myriad other things in question.

> It is simply impossible to prove that a given algorithm behaves well for all inputs (that is to say for all memory states).

No, it's not. People write proofs of algorithms all the time (see the Cormen Rivest book for several examples).

This belief stems from a misinterpretation of Rice's theorem ( http://en.wikipedia.org/wiki/Rice%27s_theorem ), which says that you can not, in general, create a Turing machine that gives a yes or no answer to any particular property of an input algorithm. That does not, however, mean that you cannot prove particular properties of algorithms for particular algorithms.

I'm sorry. That's simply not true. For instance a program that takes one bit of input and always outputs the 1 bit. This can be trivially proven correct.

"Someone who could write large programs that have a high probability of being correct"

No one can do that.

I can write a number of small programs that are correct, however. This is easy, because I can check all of the inputs and outputs. When I combine them, I get another small program that is also correct.

P: "Doctor, it hurts when I do this"

D: "So don't do that"

This is the reason for FP, DSLs, etc.

Most programs I write have an infinite number of inputs.

Prove your program works for n=1. Induct on n.

You forgot n=0.

edit: and I forgot n=⊥

I'm not sure how this is relevant?

All programs have an infinite number of possible inputs. You check the ones that you expect.

Is that somehow non-obvious?

> You check the ones that you expect.

You expect buffer overflows? Out-of-bounds arguments? These are the things the GP is referring to, I figure.

Mathematics proves things about an infinite class of objects all the time, like, say, the input/output behavior of your program. One technique to do this is induction.

I meant to say programs that have a high probability of behaving correctly. That's not to say that the program is correct. There's almost zero chance that any large program is entirely bug-free, but some applications to perform reliably because the most used paths through the software are correct.

Use a language that makes gluing easier, and you can scale this strategy further. Confer Wadler's "Why functional programming matters".

I'd like some clarification regarding the common sentiment of computer scientists not knowing how to program. Does anyone actually know an incompetent programmer/engineer with a CS degree that actually knows his/her CS?

I always get the impression that people think the computer science field is completely orthogonal to software development, but I disagree. I suspect that the incompetent CS graduates this author mentions are in fact not good at CS as well as being poor programmers.

My point of view on this is that the "is smart and get things done" is relevant for developers because there are a lot of smart people in CS that don't want to get things done -- not because they are incompetent, but because they're simply not interested.

For example, for people working in theory, getting things done (in software) looks very boring compared to writing a new theorem. I find it perfectly acceptable that such people don't want to write software for such reasons. It is just that they decided to channel their intelligence to other domains.

This article explains why the KISS principle is required for large projects. Since the only thing we can be sure of is that our 10,000 line program will contain bugs, it solves a lot of problems to develop a program methodically as separate modules which can be tested independently on the way to 10,000 lines.

Separate modules allows a developer to test core assumptions on the way to a massive program which ostensibly produces correct answers.

I can write fairly large correct programs but I still feel very much like a novice because I can't rapidly and correctly modify large programs that I just came across. Especially large incorrect programs.

And yet that's the reality with any large employer.

Nobody can "rapidly and correctly modify large programs."

Yes, that's what most programmers do for a living, and it's just plain hard. Nobody can sit down with a million-line program and whip out a quick change correctly, unless they're lucky. So you learn how to retrofit the code with unit tests (a la Michael Feather's book Legacy Code), get good at using analysis tools, etc.

Nobody can "rapidly and correctly modify large programs."

Really, it depends on the definitions of: rapidly, correct, large, and the particular modification.

I've seen enterprise BS happen, such that the business asks for a new column in a database, which is just attached to a dumb text field that does nothing but save data, which is trivial, and gets done. Then the business points out that the business implications are that the value of something else needs to be this or that in response to this data, and that the original modification is now retroactively determined to be "wrong."

Also, there are some tricks that you can use in OO systems, along the same lines of Refactorings. I've done things, like verify that a certain method can only be called in a certain precisely delimited set of conditions, then carefully verify my code in those conditions. This doesn't always work, but it often does.

I tend to be slower than most at doing this. All the uncertainty and fragility paralyzes me more than most people, it seems.

I think I'm just spoiled because I have more experience with less dysfunctional setups.

A good programmer could write large correct programs, but prefers small correct programs.

A good programmer knows how to represent a large correct program as a number of small correct programs that interact, thereby eliminating complexity. Bad programmers get lost cause they run out of brainspace to hold the complexity of a 10000(0)-line program. Good programmers don't need to.

Yes. If you replace "programs" with "systems," I basically agree with the article.

I'd consider a single program that is "large" to be a failure already, if we define "large" to mean "too big to easily fit in a single programmer's head."

The skill of a decent programmer is to decompose a large problem space into orthogonal components, so that each appears to be a small program that keeps you from being mentally entangled into the needs of the larger problem. Or even better, be sufficiently knowledgeable of the field to recognize when some of these orthogonal components already exist.

Once you've crossed the line where a component is "large", by the above definition, you've reached the point where future programmers will find it easier to insert new functionality ad-hoc wherever it seems to work, rather than attempt to understand the design and modify it appropriately. This design-by-accretion style can rapidly make a codebase unmaintainable.

"Doesn't write large programs."

"Only writes necessary programs."

Despite an careful attempt in the article to define 'large' (as 'necessarily large'), this is provoking the usual dickwaving from people who think that they are cleverer than those poor fools that write 10K or 100K programs, when _obviously_ these 10K or 100K programs are all _clearly_ just loads of tiny little elegant programs that fit together perfectly.

1 - he's already saying that: "One of the marks of a professional programmer is knowing how to organize software so that the complexity remains manageable as the size increases".

2 - the notion that you can just glue a bunch of small elegant programs together into one large, wonderful program is on of the great half-delivered promises of computer science. Spolsky wrote a nice article about 'leaky abstractions' a while back.

3 - in my experience, there are a number of problems that are simply disgusting and don't break up into small, elegant modules, and are better solved by enormous, disgusting 5 page functions. Really. If you haven't met one, be glad, but don't kid yourself that because you've only ever found nice elegant programs that everything out there is amenable to breaking into nice elegant components.

Disgusting code like this is particularly prevalent when you have to handle a lot of cases up against an interface that you don't control, especially when you have to write code that is high-performance and can't resort to some nice 'script-ish' or table-driven approach.

The guts of a real, high-performance compiler (not one of a toy language, but one that has to produce good code for a huge range of bizarre coding styles on the front end) at the point that it's dealing with actual hardware is a good example. On the front-end you've got input programs (many of the properties of which the compiler designers don't control) and on the back-end you've got an hardware ISA that the compiler designers don't control either. Some elegance usually happens here and there between those two points, but sometimes you have to build a large, complex system with a lot of interactions because you have a large, complex problem with a lot of interacting concerns. NOT because you're not using "Left-Handed Object Oriented Semi-Eager Inverness Haskell".

Often the difference between a successful product and a toy project is the willingness to deal with the nasty aspects of a problem that often force you to write these disgusting functions.

Various PL weenies will no doubt appear and tell us how this was all solved X years ago in language Y, but they were too busy proving theorems, writing a new compiler in Y for the even more awesome language Y', and reimplementing trivial projects (Web server, anyone?) in their chosen language to build anything substantial.

How are we defining a large program?

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact