Hacker News new | past | comments | ask | show | jobs | submit login
Go to Statement Considered Harmful (1968) [pdf] (cwi.nl)
55 points by tosh 29 days ago | hide | past | web | favorite | 49 comments



Dijkstra used the concept of program statements as "predicate transformers". The idea is that a program has a series of predicates (think "assert()" on steroids).

A predicate is a formula (usually in first order predicate logic) that characterizes the allowable states of the program at the point where the predicate appears in the program text. I.e., "N == A * k + B".

Unlike assert statements, a predicate may not be executable, but rather serves as an aid to reasoning about your code.

A statement (i.e., "if", "assign", "case" etc.) modifies the allowable states of the program. Hence the phrase "predicate transformer".

The initial state when the program starts has a predicate that characterizes any required assumptions about the inputs. The final state has a predicate that characterizes the required output of the program upon completion.

Dijkstra, Gries, Hoare, and others created formal rules of inference describing exactly how a given statement modifies the predicates.

This approach worked brilliantly for the normal "structured" statements ("if", "do", etc.)

(To me, the most helpful idea from their system was the idea of invariants for iterative statements.)

However, their approach was impossible to do for arbitrary "goto" statements.

I believe Dijkstra was of the opinion that localized, static reasoning about program correctness was (is) the best way to get code correct and error-free.

The fact that arbitrary "goto" statements destroyed the ability to reason about programs in this manner was why he forcefully advocated against them.

(An alternative framework, denotational semantics, includes the idea of continuations, which among other things provides a mathematically rigorous characterization of arbitrary "goto" statements.)

FWIW, as a programmer I have found over the years that the predicate/"invariant" approach of Hoare-Gries-Dijkstra gives me a useful handle on reasoning through my code and getting it right.


The very related concept of design by contract, pioneered in Eiffel by Bertrand Meyer, is something I've found absolutely essential for writing robust code. Every programmer should be taught to reason about preconditions, postconditions, and logical invariants, and properly document those when writing interfaces. Not bothering to think about edge cases is a major reason for software fragility.


Dijkstra’s A Disciple of Programming is a relatively gentle introduction to this programming style using examples. Predicate Calculus and Program Semantics is a much more rigorous formal treatment of the same subject, but arguably a harder read.


What you’re describing sounds like TLA+, among other modeling languages.


Lamport, Dijkstra, Scholten, Hoare, and Knuth, to name just a few, are giants in the field who routinely read each other’s work. I’m positive TLA+ is directly influenced by predicate transformer program semantics.


The classic. See also another viewpoint in that conversation:

Donald Knuth: Structured programming with go to statements (1974)

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103....

[edit] Of the 102 (!) entries in the bibliography of this paper, Knuth cites 9 of Dijkstra's papers (including the goto paper) and 3 personal communications.


Somebody needs to ask Tony Hoare about his axiomization of GOTO described in that...


It's difficult to examine his comments without historical context: I haven't dealt with any code from that era, but there is still plenty of Fortran 77 code from ~20 years later where the use of gotos makes it difficult to understand control flow, e.g.: https://github.com/JuliaMath/openspecfun/blob/master/amos/zb...

That said, gotos are still occasionally very useful: jumping out of nested loops and implementing finite-state machines are both made much easier and clearer via gotos than trying to force them into more standard control flow statements.

The linux kernel also makes judicious use of gotos for cleaning up error handling: https://koblents.com/Ches/Links/Month-Mar-2013/20-Using-Goto...


The use of goto to jump out of nested loops is available in many languages as the ability to break out of a labeled loop. Regular structured programming plus this single extension allows any algorithm that can be written with goto to be written without, with no loss of efficiency.

I have seen multiple ways to implement finite state machines. Autogenerated code with goto is indeed a clean option. However I strongly prefer it if the state machine has a relatively clean definition that does not involve goto.


One thing I've seen a lot in C code is a goto to a single exit point out of a function for common cleanup code


>The linux kernel also makes judicious use of gotos for cleaning up error handling.

Isn't that standard c style? I can't imagine programming c without goto end.


Try as I might, I just don't see why it's important to maintain an independent index into the process progress of a running program. So you can know the exact point in the process progress where n equals the number of people in the room minus 1... What is the usefulness of it? What problem does it help solve to have this kind of process coordinate in time?

I can understand that unbridled use of GOTO tends to lead to a mess, but Dijkstra's argument seems to be more about the purity of the process, for... some purpose I can't fathom.

Still, I imagine he'd have a field day trashing dependency injection and dynamic dispatch!


Ok, I'm not sure I understand your question, but I'll try...

Think of his "index" as a virtual program counter, while you are trying to write the code or understand what someone else has written. At some point, the process is going to be in a state where someone has entered the room (i.e. n = #people - 1), and at a later point you have incremented n (so that n = #people again).

One of the key features of Dijkstra's thinking about how to understand and write programs is that you should be thinking about the text of the program, not its dynamic behavior. You should not be "pretending to be the computer"---when you mentally "execute" the program, you have to keep a finger pointed at the current statement (and procedures, recursion, loops, etc., make that pretty difficult).

However, I don't know if he'd fully developed that idea in 1968; this article reads like he was still trying to understand programs by playing out their behavior. So, if you were counting people going into a room, at some point you will have seen someone go in, but not have incremented the counter while after the next statement you have seen someone and have incremented the counter.

Without a lot of discipline, GOTO has the power to completely throw away whatever understanding of the program you have; you could GOTO that point where you are going to increment the count but without seeing anyone go into the room.


(note: everything I'm about to say is just my take on early CS stuff, I don't know Dijkstra or how he thought, these are just conclusions I've personally drawn from reading about the history of computing)

I think it's hard to imagine because it comes from the mindset of those who pioneered programming, like Dijkstra. You're right that he seemed to be concerned about the 'purity' of the process. It's clear from reading the original article that he (and others) thought about programming in a much more 'mathematical' way than modern people. Dijkstra wanted a program to be clearly defined and computable all the way through, but when you're jumping around the code arbitrarily that's... difficult. All of this comes from the fact that computing was birthed from mathematics, and the people involved in it back then were more likely to have a base in math.


If you've ever had to deal with the previous mess in all its unbridled impurity, you'd know why Dijkstra cares about the purity of the process. The answer wasn't to use GOTO a little bit less, or a little more carefully. That wasn't going to work. The answer was to eliminate it, root and branch.


Ahh yes, the origin of this phrase. I still cringe every time I see it used.


Interestingly [1]:

> The original title of the letter, as submitted to CACM, was "A Case Against the Goto Statement", but CACM editor Niklaus Wirth changed the title to "Go To Statement Considered Harmful"

...

> Frank Rubin published a criticism of Dijkstra's letter in the March 1987 CACM where it appeared under the title "'GOTO Considered Harmful' Considered Harmful". The May 1987 CACM printed further replies, both for and against, under the title "'"GOTO Considered Harmful" Considered Harmful' Considered Harmful?". Dijkstra's own response to this controversy was titled "On a Somewhat Disappointing Correspondence."

[1] https://en.wikipedia.org/wiki/Considered_harmful


Interestingly, "A Case Against ..." is also a bit of hackneyed title pattern.


Why a case against is not my favorite pattern.


A Case Against "Why X is not my Favorite Y" Being Considered Harmful.


Coincidentally, I put up a new chapter in my in-progress book last night that contains a long aside with my thoughts on this paper:

http://craftinginterpreters.com/jumping-back-and-forth.html#...

Overall, I find it pretty disappointing. It lacks the rigor that Dijkstra himself argued so strongly for and may have stifled actual useful language research.


Thanks for sharing this. I had the same reaction. I have another point to add: many people think of “goto considered harmful” as saying something like: you can use all the other control structures that (say) C provides, but the “goto” statement should not appear in your program. At least that's what I thought, but in Knuth's encyclopedic response to the article, after giving some examples where eliminating goto (mechanically or otherwise) doesn't give great results, there's a section [1] where he discusses some proposed alternative control structures to goto — and among them is “break”! And on rereading Dijkstra's article and looking at the responses to it and some languages from the time, it dawned on me that what Dijkstra was advocating was that the only control structures be conditionals, loops, and subroutines — no “break” or “continue”, no early “return” in a function — and for example this (plus a discouraged goto) is in fact how things work in the original Pascal language (where you return from a function by assigning to a pseudo-variable and waiting for control flow to reach the end). Most programmers who echo “goto considered harmful” have not actually tried the discipline of giving up all of those and seeing how happy they are. Languages have realized that the barebones proposed by Dijkstra were inadequate and have added “break”, “continue” and “return” and even more control-flow structures, like exceptions.

The weak version of the thesis is that too much ad-hoc control flow can be confusing and is best avoided. But most programmers agreed with this anyway (there wouldn't have been much of a debate if this was all it was), and in this version there's no reason the limited set of control-flow structures cannot involve some use of the “goto” keyword (like the “goto end” / “goto cleanup” pattern in C code). The strong version of the thesis, as mentioned in the previous paragraph, is something no one really wants to live with (and I haven't seen the “coordinates” in practice either way). So what people seem to have settled on is an in-between version that abandons the actual content of Dijkstra's article, while retaining the word “goto” as a marker of sin.

[1]: http://www.kohala.com/start/papers.others/knuth.dec74.html#c...


> And on rereading Dijkstra's article and looking at the responses to it and some languages from the time, it dawned on me that what Dijkstra was advocating was that the only control structures be conditionals, loops, and subroutines

I'm actually not sure if that's what he's advocating. In fact, one of my main complaints with the letter is that he's really not very clear about what he's saying at all. Some of that may be just context that he could assume people had when he wrote it that has been lost since, but I think some of the blame rests on him.

He does say:

> One can regard and appreciate the clauses considered as bridling its use. I do not claim that the clauses mentioned are exhaustive in the sense that they will satisfy all needs, but whatever clauses are suggested (e.g. abortion clauses) they should satisfy the requirement that a programmer independent coordinate system can be maintained to describe the process in a helpful and manageable way.

Which I interpret to mean that some goto-ish statements are fine if they are restricted in a way that doesn't make them hard to reason about. I think he would probably consider break, continue, and early returns acceptable in that way since he specifically mentions "abortion clauses", which sounds similar to those to me.

But he never really defines what he considers an acceptable "briding", so we're still left in the dark.


That's fair. From just the original article, it's hard to guess what Dijkstra's intent was or if he even had a clear idea himself.

But if you look at the "goto debate" of the time -- the various letters published by various people pointing out "how would you do the following without goto?" as others had meanwhile started treating goto as sinful and avoiding goto with the moral zeal of new converts -- then it appears that regardless of Dijkstra's intention, at least the debate among most programmers at the time was about abandoning "goto" even in the absence of other compensating control structures. So it appears to me that the debate was settled, and eventually faded away into a catchphrase, not simply by everyone agreeing that goto is evil persuaded by Dijkstra's arguments, as the popular opinion holds, but by languages adding most of the control structures that people were using "goto" for, under different names, until ad-hoc "goto" became less and less necessary.

(And even Dijkstra's response to those responses, namely, his "On a somewhat disappointing correspondence" (original: http://www.cs.utexas.edu/users/EWD/ewd10xx/EWD1009.PDF / transcript: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/E...), which incidentally is a good example of why they used to say that arrogance is measured in nano-Dijkstras, reveals that for example he was against short-circuiting AND/OR ("conditional connectives"), that he wrote his loops with boolean variables checked in the loop condition rather than break, and that he wrote of "all sorts of “programming language features” that seem better ignored than exploited" -- so in the original when he writes that "whatever clauses are suggested (e.g. abortion clauses) they should satisfy the requirement..." it's not clear that break/continue would satisfy his requirement because it's never specified, as you say.)

And as you say, it's not clear that it was an improvement as far as language research goes, because it's far from obvious that the set we've settled on is the most natural or useful one: C has while, do-while, and for, each with break and continue, but alternatives like repeat-until (expressible as do {...} while(!(...));) were contenders, and especially Zahn's "event indicators" system looks quite nice (http://www.kohala.com/start/papers.others/knuth.dec74.html#e...). (I guess in Python it's not unusual to do something similar using exceptions, so they've returned in some form.)


> "On a somewhat disappointing correspondence" (original: http://www.cs.utexas.edu/users/EWD/ewd10xx/EWD1009.PDF / transcript: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/E...), which incidentally is a good example of why they used to say that arrogance is measured in nano-Dijkstras

Yeah, when I stumbled onto that gem, it took my breath away. It is some of the absolute pettiest writing I've ever seen in a professional setting.

"I thought that by now professional programmers had learned to be more demanding on themselves and not to belittle the virtue of accuracy. I shall stick to the capital letters."

Jesus H. Christ dude.

The thing I find really ironic is that this is the same guy who invented "Guarded Command Language": https://en.wikipedia.org/wiki/Guarded_Command_Language.

Its looping structure contains a series of predicate guarded clauses. Each turn through the loop, all of the predicates are evaluated to decide which clause to pick. If more than one evaluate to true, a clause is picked non-deterministically!

Now try to define a way to characterize that process in a simple way. It's a non-deterministic multi-way goto. :-O


Well, sometimes they are useful when a break statement is not enough to get you out of a nested loop.

    while (...) {
      while (...) {
        // ...
        if (...)
          goto stop_it;
        // ...
      }
      // ...
    }
    stop_it:


The "labelled goto" of the modern world is very far from the GOTO of pre-Dijkstra, which would allow you to jump into the middle of loops or functions.


Yeah. Structured programming was a huge improvement over the... let's call it "unstructured" programming that preceded it. Essentially, you take any flowchart you can possibly draw, and you can implement it with GOTOs. The result wasn't easy to understand, debug, or maintain, though.

One question, though: Of the current languages that have a GOTO, won't some of them still allow you to jump into the middle of a loop? That is, is it the languages that force us not to do that any more? Or is it just that we've learned not to do that, because it's a horrible idea?


If you have labelled break and continue, you can construct any reducible CFG, including other kinds of unstructured graphs [1]. Loops with multiple entry points are a necessary and sufficient condition for irreducible CFGs, and cannot be described with labelled break and continue. I don't know if you intend to consider generic labelled break/continue as having a goto, but if you do, they can't jump into the middle of a loop.

C and C++ are the most well-known languages with an unlimited goto, and they do allow you to jump into the middle of a loop. Worse, the switch statement also works as a hidden goto and can similarly be used to construct irreducible loops. There's even a name for how to do this... Duff's device.

[1] The definition of unstructured graph changes between authors on which constructs it includes, because it's a general term for "CFGs our algorithm cannot handle." A simple requirement is that every merge point must either postdominate its immediate dominator or be a loop header, but this definition means that even things such as multiple return statements, short-circuit and/or, and vanilla break statements cause unstructured graphs. Reducible CFG is a much more stringent definition, but graphs such as ladder graphs, lattice graphs, and complete graphs that bring out worst-case performance in algorithms are reducible by virtue of being acyclic.


Duff's device is sort-of an example of this, as cases in C are equivalent to goto labels.

It's an interesting point, though. I don't think I've ever seen anybody actually use gotos for this, and I don't remember ever doing it myself. (I will happily put a goto label anywhere that will compile - I'm not against goto in principle.)

I'd assume it's because 99% of the time there's always some other non-goto option that could be used, and C programmers are accustomed to thinking about their programs in terms of these constructs, rather than necessarily goto. goto is always there when all else fails - but when it comes to loops, all else fails rather rarely.

Some (non exhaustive) examples.

Just thinking about the loop mechanics: jumping into the middle of a C for loop probably isn't super-useful, as you're only skipping the one-time init, which you could skip anyway: just leave it empty and put that code elsewhere. Then maybe turn the loop into a while loop? - that's an option. Anyway, for and while are basically equivalent for the purposes of this discussion.

Jumping into the body of a do...while makes it a while, and jumping into the middle of a while makes it a do...while. So C programmers would just write the right sort of loop in the first place.

    goto X;       --->  do {
    while(cond) {           A;
    X:                  } while(cond);
        A;                    
    }

    goto X;       --->  while(cond) {
    do {                    A;
        A;              }
    X:
    } while(cond)
And regarding the body: jumping halfway into the body is equivalent to one of those annoying loops where the setup for the condition is rather involved. One option here is an infinite loop with a break in the middle, so C programmers might just write the loop that way in preference to the one with a goto.

    goto X;         --->  for(;;) {             
    while(cond) {             B;
        A;                    if(!cond) break;
    X:                        A;
        B;                }
    }
The do...while case is equivalent.

    goto X;         --->  for(;;) {
    do {                      B;
        A;                    if(!cond) break;
    X:                        A;
        B;                }
    } while(cond);

Or another option is to pop B in a function, that checks the condition.

    while(B()) {
        A;
    }


> Just thinking about the loop mechanics: jumping into the middle of a C for loop probably isn't super-useful, as you're only skipping the one-time init, which you could skip anyway: just leave it empty and put that code elsewhere.

Irreducible loops [loops that have multiple entry points] tend to cause compilers to bail out on optimization; there are some algorithms which just don't work on irreducible loops. In a modern optimizing compiler, introducing irreducible loops will not give you a performance benefit and will probably give you steep performance hits instead.

I will point out that literally none of your examples are irreducible loops, since you have unconditional gotos immediately before the loops and thus they have only one entry point, even if it isn't the normal lexical entry point.


This is the thing I keep pointing out. Goto statements of that era had the same semantics as an assembly language jmp instruction. You have to be of a certain age to have seen this shit in practice[1]. There were reasons for that though computers then had very little memory and reusing fragments of code was the only way to get the program to fit.

The scoped goto in modern languages bears no relation.

[1] Consider setting a global variable to X, then jmp to a code fragment which when done then tests the variable and jmp's somewhere if var == x or somewhere else if not. Edit: I think of this type of stuff as a type of 'compression'. Done right there isn't anything 'wrong' with the program. But it's brittle, hard to reason about, and small mistakes results in bizarre wormholes.


And some languages did even stranger things.

For example COBOL's ALTER statement allowed you to modify the target of a given GO TO. The result is that any given GO TO could theoretically transfer control anywhere in the program, based on a statement that could have appeared anywhere else in the program.

Yes, this was used in production. In code that handled real financial transactions.


Also: “everything in moderation”. If over half your control flow statements are GOTOs, things get hairy fairly soon.

For an example, look up ancient Fortran code.


If you want to see the poster child for this problem, consult the sources for procmail.

In the really-not-very-distant past, a clear majority of e-mail went through procmail.


Yes.

Lua doesn't have continue or break in loops. But it has goto. That's one of the uses of it.

It also has, IMO, a sane goto. You cannot jump to any label outside of the function you are running from.

https://www.lua.org/manual/5.3/manual.html#3.3.4


C goto can't jump to another function either (except for longjmp, but that's a whole other story).


Double-breaking is a good sign that you need to refactor your code and put any of the while statements inside its own function.


That generally makes it harder to follow and harder to optimise?


Quite the opposite IMO: it gives a name to the computation and simplifies the control flow by making the computation more functional.


Well then you'll need double-returning, which isn't generally available. One could use longjmp I suppose.


In a high level language, yes, you should probably refactor code that looks like that. In C, no--just write the nested loop that iterates the pointers over the array, and use break or goto.


this use case is specifically addressed by "labeled statements" (e.g., https://docs.oracle.com/javase/specs/jls/se12/html/jls-14.ht...)


I don’t know if it’s a semantic difference or not but some languages, like swift now have labeled loops so the break statement can reference the specific loop, allowing control flow to jump out of the nest.


Nothing new there. Labeled loop exits are a feature of Ada.


Edsger, that is.


Soon in hacker news: "Go 2 considered harmful"


A good (although often misundertood) paper.


"I don't know how many of you have ever met Dijkstra, but you probably know that arrogance in computer science is measured in nano-Dijkstras."

- Alan Kay

With Alan Kay putting it into context here: https://news.ycombinator.com/item?id=11796926




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: