Hacker News new | past | comments | ask | show | jobs | submit login

That was quite unfair criticism, and even Doug McIlroy knew it (as he admitted later). The background is this:

- Bentley, the author of the column, invited Knuth to demonstrate literate programming using a program of his choice.

- Knuth insisted that to be fair, Bentley ought to specify the program to be written; else someone might object that Knuth chose a program that would be good for literate programming.

- Bentley chose (what we'd now call) the term frequency problem (list the top k most frequent words in a text file), and accordingly Knuth wrote a system program for solving just this one particular task. (Did everything from opening the input file to formatting the output, etc.)

- Doug McIlroy was asked to “review” this program, the way works of literature are reviewed. He happens to be the inventor of Unix pipes. Towards the end of his review, along with many other points (e.g. Knuth didn't include diagrams in cases where most of us would appreciate them, something I still struggle with when reading TeX and other programs), he used the opportunity to demonstrate his own invention, a shell pipeline using now-standard Unix tools (tr, sort, uniq, sed).

There are a few things wrong with this criticism:

- The main thing is that DEK wrote the program he was asked to write, so pointing out that he shouldn't have written that program is a criticism of the one who chose the program (Bentley mentioned this when printing McIlroy's review).

- At the time, Unix wasn't even widely available outside Bell Labs and a few places; it definitely wasn't available to Knuth or most of the column's readers.

- Knuth's program, fine-tuned for the task, is more efficient than the shell pipeline.

- Even if you use a shell pipeline, someone has to write the “standard” programs that go into it (the "tr", "sort", "uniq" and "sed" above), and literate programming can be used there. In fact, Knuth did exactly that a few years later, rewriting Unix's “wc” (IIRC) and compared the resulting program with that of Sun Unix's wc, and his LP version had, among things, better error handling. (He's explained it by saying that in conventional programming, if you have a small function and 90% of it is error-checking, it looks like the function is “about” error-checking, so there's a psychological resistance to doing too much of that, while with LP you move the error-handling to a separate section entirely about error-checking and then you tend to do a better job. BTW, TeX's error handling is phenomenally good IMO; the opposite of the situation with LaTeX.)

All that said, there is some valid criticism that Knuth prefers to write monolithic programs, but that works for him. He seems not to consider it a problem that to change something you have to understand more-or-less the entire program; he seems to prefer doing that anyway (he reads other people's code a lot: in Peter Seibel's Coders at Work, he was the only person interviewed who read others' programs regularly).




re: At the time, Unix wasn't even widely available outside Bell Labs and a few places; it definitely wasn't available to Knuth or most of the column's readers.

At that time, 1986, Unix was widely available - there were more than 100k Unix installations around the world by 1984. AT&T Unix Sys V and UCB 4.3bsd were available. Knuth was friendly with McIlroy, who was head of the Bell Labs computer science research group that begat Unix. Sun Microsystems was formed in 1982 - their boxes ran Unix, and Sun was a startup spun off from Stanford.


Hmm interesting; I remember checking this for the time when TeX was written (1977, because of questions about building on top of troff) -- what I remember finding is that Unix wasn't widely available at colleges then. Perhaps things changed in the next 9 years. As far as I know, Knuth's access to computers at the time was still through Stanford's AI Lab (that's why the first version of TeX in 1977-1978 was written in SAIL; see also the comment for 14 Mar 1978 in http://texdoc.net/texmf-dist/doc/generic/knuth/errata/errorl...). Do you know if Unix was installed on Stanford lab computers by 1986? What was the distribution of these 100k Unix installations (academia/industry)?


>Sun Microsystems was formed in 1982 - their boxes ran Unix, and Sun was a startup spun off from Stanford.

Right. Even their name was derived from Stanford University Network:

https://en.wikipedia.org/wiki/Sun_Microsystems#History


OK, Unix was probably available to Knuth, but the task given to Knuth was not to promote any already written programs! Had he done so it would be claimed that he missed to do what was requested of him to do.

Even today, if you would get the exactly same task, with the goal to make the most efficient solution when you have to care about the limitations of hardware available to you and to produce the self contained program (e.g. because your algorithm should run with hundreds of billions of words of input) you'd still at the end probably produce something closer to what Knuth did then what McIlroy did.

Which doesn't mean that it's not brilliant. But it's also not obvious, i.e. not something a "normal user" would "know":

- even if you knew that "tr command translates the characters to characters" did you know that you could (and must) write

   tr -cs A-Za-z'
   '
to perform the first operation from 6? What the -c does? What the -s does? That you could and even had to form the command line to contain the newline? I bet a lot of Unix users of today would still not know that one.

- did you know what the fifth line was supposed to do "sort -rn" Would you know that you're to sort "numerically" (-n) and that it would "work"?

- "sed ${1}q" how many people even today would know that one?

And after all that, the first of two sorts needs to sort the file that is as big as the original input! If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing, and if you knew that you can use these commands as he used them. But it's still not "a program" in the same sense a Knuth's program is.

Knuth's algorithm would, unsurprisingly, handle the huge inputs orders of magnitude more efficiently. And that is what McIlroy was aware of and intentionally hand-waved it in his "critique." Read the original text:

https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-...

But the major point is still: Knuth's task was not "use the existing programs" (or libraries) but "write a program" that does what's said to be done: The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.

And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.

Which on the other side also doesn't mean that for some purposes "worse" isn't "better": https://yosefk.com/blog/what-worse-is-better-vs-the-right-th... But for some purposes, on "better" works and "worse" simply doesn't, e.g. when the scale of the problem is big enough. And Knuth teaches us how to solve such, harder problems. And presents the complete solution v.s. doing tricks (just call that library/executable which I'm going to avoid to explain you how it is implemented and what its limitations really are).

And giving the misunderstanding in the difference between showing how something is implemented (most efficiently) and that "just use pre-written tool X" approach, I understand even more why Knuth uses assembly in his "The Art of Computer Programming" books.


> But it's also not obvious, i.e. not something a "normal user" would "know":

> - even if you knew that "tr command translates... "sed ${1}q" how many people even today would know that one?

Are you suggesting it's ever been more likely for people to understand how to manage a trie structure in Pascal than use Unix command line tools? Or look flags up in the manpages?

Personally speaking, I'm comfortable doing both, but can't imagine many scenarios where I'd rather have ten pages of a custom datastructure than six lines of shell. (And they all involve either high volumes of data or situations where I can't easily get to the process-level tools.)

> The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.

If you're including the code that provides the surface abstractions, where do you draw that line? If the code for sort, uniq, etc. is fair game, why not the code for the shell that provides the pipelining, the underlying OS, the file system, the firmware on the disk? After all, who's to say that the programs in the pipeline don't just run one after another with temporary files written to disk, rather than in parallel? (Which I've seen happen in reality.)

The same is true for the other side, of course. The 'fair comparison' could easily require Knuth's solution to include the source for weave/tangle, TeX/Metafont/CMR, the OS, etc.

> And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.

What definition of 'worse' are you using?

* I expect sort/uniq/tr/sed to be more well tested and understood than a bespoke program.

* If there are issues with the program, it'll be easier to find skills/time to maintain a shell pipeline than custom trie-balancing code written in Pascal. (Sitting aside a prose description of the same.)

* The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.

* It's possible that the custom solution runs faster, but not obvious without testing. (None of which is worthwhile outside of demonstrated need.)

Point being: it's very easy to find a definition of 'worse' that applies more to the custom solution than to the pipeline.


> The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.

That argument points to the fact that your “view” of the whole topic changes the assumed definition of the problem that was given to Knuth to solve. Read once again the original text: he was supposed to illustrate how the “Literate programming” could be ised while writing a program which solves a given program. It was definitely not “write an example of calling the existing pre-written programs”.

And, of course, it was all in 1986, definitely not “to target the machine which can be found in the retail store in 2018.”

McIlroy already behaved as the goal had been different than it was.


> McIlroy already behaved as the goal had been different than it was.

How would you feel about McIlroy's solution if it was semantically exactly the same, but written in a literate approach? (Essentially a version of 'weave/tangle', but for shell scripts.)


How would you feel if somebody would present the “six steps clicks in Excel and SQL Server” that eventually produce the same result? The starting goal was simply not “show how to use and combine external programs.” Even if you need some kind of skill to combine them. It’s exactly the same kind of missed fullfilment of the given starting task.


The reason I asked about a literate programming version of the shell script is that it speaks directly to Knuth's original stated goal: "I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose"

In the context of that requirement, the it's the use of literate programming that's more of a concern than the specific implementation. (Which is why I asked about a literate version of the shell pipeline.)

Earlier in the thread, you also mention this concern around data volumes:

> If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing,

There, your concern is not justified by the stated requirements of the problem: "I did impose an efficiency constraint: a user should be able to find the 100 most frequent words in a twenty-page technical paper"

I do think McIlroy failed to solve the problem of demonstrating the value of literate programming, but I'm not sympathetic to arguments that he should've used more complex algorithms or relied on less library code. This is particularly the case when the additional complexity is only relevant in cases that don't fall into the given requirements.

(A literate program that uses SQL server or Excel might be an interesting read....)


> The reason I asked about a literate programming version of the shell script is that it speaks directly to Knuth's original stated goal: "I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose" In the context of that requirement, the it's the use of literate programming that's more of a concern than the specific implementation.

And McIlroy's "solution" is provably not the "best possible solution" if you are interested in the algorithms, algorithmic complexity, the resources used, you know, all the topics studied by people doing computer science. All these topics are still relevant today.

That is the domain that was of interest to both Bentley and Knuth, and McIlroy "sabotaged" the whole by presenting effectively only a list of calls to the stand-alone programs which he hasn't developed himself, which he avoided to present, and even without looking at them, just by analyzing what are the best possible implementations of these, every student of computer science can prove that McIlroy's solution is worse.

If you carefully read the original text (and if you understand the topics of computer science), you can recognize that McIlroy was aware of the algorithmic superiority of Knuth's solution.

Knuth definitely and provably won.


While I have been watching this subthread with increasing dread, I feel I should point out it was not a competition or contest to be "won" -- Knuth wrote an interesting program, and McIlroy wrote an interesting review of it.


> it was not a competition or contest to be "won"

Sure, it wasn't a competition. The fact remains: McIlroy criticized Knuth's presentation of complete algorithms which effectively solved specified problem, by presenting just a sequence of calls which provably have to call the implementations that must implement provably worse algorithms. In the column in ACM whose topic were... algorithms, edited by Jon Bentley.

So if you consider that two sides presented their arguments regarding the algorithms related to the specific problem, we can still say that Knuth "won" that "dispute."


> presenting just a sequence of calls ... that must implement provably worse algorithms.

You've never really established why this matters given that the goal of the challenge was to present the value of literate programming.

The goal wasn't an optimal algorithm or minimal resource consumption - the goal was to demonstrate the value of literate programming on a small data processing problem.

This is a very different problem than writing an optimal or highly scalable algorithm.


> The goal wasn't an optimal algorithm or minimal resource consumption - the goal was to demonstrate the value of literate programming on a small data processing problem.

No. You obviously still haven't read the column and the article that explained what the goal actually was. The actual goal was to demonstrate Knuth's WEB system, which was explicitly made only for Pascal in 1986. That was what Bentley asked Knuth to do (quoting from the article "Programming pearls: literate programming", Communications of the ACM, Volume 29 Issue 5, May 1986 Pages 384-369):

"for the first time, somebody was proud enough of a substantial piece of code to publish it for public viewing, in a way that is inviting to read. I was so fascinated that I wrote Knuth a letter, asking whether he had any spare programs handy that I might publish as a “Programming Pearl.” But that was too easy for Knuth. He responded, “Why should you let me choose the program? My claim is that programming is an artistic endeavor and that the WEB system gives me the best way to write beautiful programs. Therefore I should be able to meet a stiffer test: I should be able to write a superliterate program that will be noticeably better than an ordinary one, whatever the topic. So how about this: You tell me what sort of program you want me to write, Crin d I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose’--at least the best by current standards.”"

So, in the context of demonstrating Knuth's WEB on the substantial piece of code, the modification of the request was only that Knuth wasn't allowed to use the program he already wrote, but that he had to write a new one! (So the starting goal was in effectively all the premises exactly the opposite of what McIlroy then showed!)

So the goal was to write and present a wholly new program in Knuth's WEB, which, under the standards of evaluation of the quality of the solution as widely accepted by computer scientists, would be "the best solution." Which is exactly about the optimality of the algorithms, resources use etc.

If you still don't fully appreciate the context of the Knuth's program, do search for all the other columns and computer science books written by both Bentley and Knuth -- the topics of both were never "how to use existing programs" but how to develop/use the best algorithms.


Thanks for the response.

> You obviously still haven't read the column and the article that explained what the goal actually was. ... > do search for all the other columns and computer science books written by both Bentley and Knuth

Since this is public, I'll conclude by noting here that I had indeed read both of the articles, and a bunch of other text by both Bentley and Knuth besides. (In fact, the Programming Pearls books are particularly high on my list of recommendations...)


Then what I can conclude is that you claim that you've read the articles but that you still ignored their content, in case you've read them before writing here, otherwise you would not instead present here the false interpretation of what the Knuth's specific task in these specific articles explicitly was.


Well put.

I think it's possible and desirable to simultaneously respect both approaches for their respective merits. Maybe ironically for this profession, the choice isn't either/or.


Absolutely agree - Unix was ubiquitous in academia by 1986. I myself learned C on a VAX running BSD 4.2 in 1986.


I never saw that as a criticism.

Just an interesting counterpoint.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: