
Questions for Donald Knuth - robinhouston
http://www.informit.com/articles/article.aspx?p=2213858&WT.mc_id=Author_Knuth_20Questions
======
ghswa
"I did write a compiler manual in 1958, which by chance was actually used as
the textbook for one of my classes in 1959(!)"

How many people in the world could claim something similar?

~~~
reedlaw
When I first read that, I assumed he was teaching the class. But after
checking Wikipedia I found that he received his bachelor of science degree in
1960. Now I'm astonished.

~~~
noselasd
Yes, that makes it a bit more remarkable - As I remember at least 3 classes at
Uni where the professor wrote the book we were using.

------
jbapple
This part confused me:

"My job is to go beyond correctness, to an analysis of such things as the
program's running time: I write down a recurrence, say, which is supposed to
represent the average number of comparisons made by that program on random
input data. I'm 100% sure that my recurrence correctly describes the program's
performance, and all of my colleagues agree with me that the recurrence is
"obviously" valid. Yet I have no formal tools by which I can prove that my
recurrence is right. I don't really understand my reasoning processes at all!
My student Lyle Ramshaw began to create suitable foundations in his thesis
(1979), but the problem seems inherently difficult."

There are formal tools to prove recurrences are correct.

First, any proof assistant like Coq or Isabelle could be used to verify
recurrences.

Papers like "A Machine-Checked Proof of the Average-Case Complexity of
Quicksort in Coq" show that this is not limited to just the recurrence, but
can also be applied to programs and random input data. The Certified
Complexity (CerCo) project does similar work.

Maybe I misunderstood what Knuth meant by "formal tools" or "recurrence is
right". Here is a link to the abstract of Ramshaw's thesis:

[http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=ht...](http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA086916)

~~~
pavpanchekha
I think there is a bit of a disconnect here between the old guard of
verification work and the avant-garde. Partly this is a difference of tools
and terminology—current work in verification has a type-theory feel, and
Floyd-style axiomatic semantics are out of vogue. It is also the role of old
professors to slowly lose currency with the cutting edge of a field. I
wouldn't think too much of Knuth's comments.

------
YAYERKA
What an amazing exchange of ideas;

Knuths response to Tarjans Q15 was particularly interesting since he was able
to illustrate his insight with a concrete example;

>Thus I think the present state of research in algorithm design misunderstands
the true nature of efficiency. The literature exhibits a dangerous trend in
contemporary views of what deserves to be published.

> Another issue, when we come down to earth, is the efficiency of algorithms
> on real computers. As part of the Stanford GraphBase project I implemented
> four algorithms to compute minimum spanning trees of graphs, one of which
> was the very pretty method that you developed with Cheriton and Karp.
> Although I was expecting your method to be the winner, because it examines
> much of the data only half as often as the others, it actually came out two
> to three times worse than Kruskal's venerable method. Part of the reason was
> poor cache interaction, but the main cause was a large constant factor
> hidden by O notation.

~~~
jallmann
Reading that was rather vindicating. The same concern is raised by phk in his
B-heap essay [1], where he criticizes Knuth for the exact same reason -- that
algorithm analysis has become disconnected from the characteristics of the
underlying hardware.

While phk has a point, the dig at Knuth's work feels a bit unfair. In any
case, I'm curious to see whether future editions of TAOCP note this caveat in
its analysis.

[1]
[http://queue.acm.org/detail.cfm?id=1814327](http://queue.acm.org/detail.cfm?id=1814327)

~~~
jamesmiller5
I believe tilde notation is meant to bridge the gap between a pure mathmatical
behaviour analysis and realistic expectations when using an algorithm.
Analysing an algorithm and determining that say for 2x input size expect 8x
resource usage gives more information that would otherwise be hidden in big O
notation.

[http://introcs.cs.princeton.edu/java/41analysis/](http://introcs.cs.princeton.edu/java/41analysis/)

~~~
Intermernet
Thankyou, you in part answered my question elsewhere in this thread.

I'll give it a read.

------
redshirtrob
I've been to couple of Donald Knuth's Christmas Tree Lectures. While he was
signing my copy of 'TAOCP Volume 1', I asked him if he thought anyone in the
world had read all of TAOCP, himself excluded. I didn't clarify, but he seemed
to assume (correctly) that by 'read' I meant 'read and comprehended.'

His reply was yes, but not many. He mentioned a guy in Germany who goes over
his books again and again looking for errors. This guy is the leading
recipient of Knuth's checks.

That was the question I'd wanted to ask Knuth for several years. It was a good
question for a thirty second interaction and photo.

~~~
laichzeit0
>His reply was yes, but not many. He mentioned a guy in Germany who goes over
his books again and again looking for errors.

I call these types of people "moles" because I imagine they basically never
see daylight. Thank god for these people. If I ever have the time and will
power to study TAOCP it will be a very relaxed mindset, knowing that with a
high probability I can be confident that the book, exercises and solutions are
error free.

------
natural219
"So who were the geeks of the early 19th century?"

This question made me pause and ask another question: Who are the geeks of the
early 21st century, specifically, 2014? I fear in my heart that software
development is becoming less and less of a "geek" hideout over time.

I did, however, go to Maker Faire last weekend. It was a geek wonderland.
There were so many cool projects on display and a huge emphasis on getting
_you_ started. It shocks me when I hear people say how "commercialized" and
"mainstream" it's gotten -- for me, it seems like the hardware market is just
starting to open up to some really cool innovation, and there's a huge
community of people doing it right in their backyard.

A more long-term question might be: what's going to be the next "geek" hideout
after hardware?

~~~
njonsson
DIY biotech. Bespoke pets for the risk-averse, and bespoke personal physiology
for the rest.

------
hatred
Awesome Read.

Picking out a line:

"For simplicity, let me say that people like me are "geeks," and that geeks
comprise about 2% of the world's population."

~~~
meowface
"I know that my books still aren't terribly easy to fathom, even for geeks.
But I could have made them much, much harder."

------
huskyr
I love the fact that Donald Knuth, one of the most knowledgable minds about
computer science, hasn't had an email address since 1990. He used to have one
for fifteen years (between 1975 - 1990) but it killed his productivity so he
no longer uses it.

[http://www-cs-faculty.stanford.edu/~uno/email.html](http://www-cs-
faculty.stanford.edu/~uno/email.html)

~~~
ForHackernews
I don't think that says much about email itself. He has a secretary to deal
with his paper mail. There's no reason a secretary-curated email inbox
wouldn't be similar.

~~~
adrianonantua
from [http://www-cs-faculty.stanford.edu/~uno/email.html](http://www-cs-
faculty.stanford.edu/~uno/email.html)

> My secretary prints out all messages addressed to taocp@cs.stanford.edu or
> knuth-bug@cs.stanford.edu, so that I can reply with written comments when I
> have a chance.

He just does not USE email himself.

------
mcguire
I'm picturing this as all of them sitting around a big table in a hotel bar,
chatting while the writer records their questions and answers.

Andrew Binstock, Dr. Dobb's suddenly asks, "At the ACM Turing Centennial in
2012, you stated that you were becoming convinced that P = N P. Would you be
kind enough to explain your current thinking on this question, how you came to
it, and whether this growing conviction came as a surprise to you?"

Don Knuth replies, "As you say, I've come to believe that P = N P, namely that
there does exist an integer M and an algorithm that will solve every n-bit
problem belonging to the class N P in nM elementary steps...."

Scott Aaronson, MIT immediately has an aneurysm and after all of the
convulsions and frothing is over, when the EMT's leave, they prop him back up
in his chair and he only has the wherewithal to ask, "Would you recommend to
other scientists to abandon the use of email, as you have done?"

------
akkartik
I was surprised to hear that the hardest program he had to write was a
simulator for MMIX. This seems to have the scoop:
[http://mmix.cs.hm.edu/doc/mmix-pipe.pdf](http://mmix.cs.hm.edu/doc/mmix-
pipe.pdf)

~~~
pdw
I like that he explains it using car analogies.

> Instructions that have completed execution and have not yet been committed
> are analogous to cars that have gone through our hypothetical repair shop
> and are waiting for their owners to pick them up. However, all analogies
> break down, and the world of automobiles does not have a natural counterpart
> for the notion of speculative execution. That notion corresponds roughly to
> situations in which people are led to believe that their cars need a new
> piece of equipment, but they suddenly change their mind once they see the
> price tag, and they insist on having the equipment removed even after it has
> been partially or completely installed.

------
sytelus
Thanks for sharing. This is _just_ amazing!

Few favorite tidbits:

\- Only 2% of population seem to have aptitude for computer science

\- Knuth writes 2 programs per week even today

\- He is leaning towards P=NP

\- Most software projects fails because they are not entrusted to geeks

\- 50 years in writing 3000 page bible of CS. I would suppose it takes
probably same time to read _all_ of that material.

------
aeberbach
In the Q&A Knuth mentions TAOCP conversion to ebook several times - I did a
quick search and found only millions of pirate sites. Does anyone know the
legitimate source where the ebook can be bought? Informit seemed likely but
there are only two digital volumes available there.

(edit: there is a note saying "as they become available". I guess TAOCP is
only partially portable so far, more to come.)

(brag: I got a decimal dollar by check from Professor Knuth for finding a typo
in 2010. Nothing clever, just a cheap typo, but I was happy.)

~~~
Tomte
My experience with bug reports to Knuth were mixed, although number one
obviously was most important to me.

I got $2.56 for an error in content in one of his books. Actually the very
first word in the very first sentence on the very first page (well, Arabic 1)
was wrong. At least nobody can claim that he didn't get that far!

I got another $0.32 for a bug report which he actually demonstrated to be
invalid, though he counted it as some kind of input or improvement. Oh, and he
threw in a T-Shirt with the MMIX instruction set!

The third bug report was turned down without explanation (I think he actually
replied "sorry, but no cigar").

The fourth bug report was also turned down and I even felt a bit insulted: his
MMIX book says on the back cover that all programs are in the public domain,
even though they all carry copyright notices with restrictions.

Knuth accused me quite harshly (IMO) to be some kind of Stallman fan talking
nonsense about licenses and did not acknowledge the bug.

~~~
ssprang
> Knuth accused me quite harshly (IMO) to be some kind of Stallman fan talking
> nonsense about licenses and did not acknowledge the bug.

Strange. Knuth appears to be a significant GNU supporter:
[http://www.gnu.org/thankgnus/2014supporters.html](http://www.gnu.org/thankgnus/2014supporters.html)

~~~
Tomte
I guess he had a bad day or whatever. Or maybe I read too much into what he
wrote.

------
Intermernet
I found the reply to Robert Tarjan interesting, and I wonder how many people
who conduct programmer job interviews would have guessed that the Big O
derived value vs the actual performance of the algorithm would be so
unpredictable!

"Another issue, when we come down to earth, is the efficiency of algorithms on
real computers. As part of the Stanford GraphBase project I implemented four
algorithms to compute minimum spanning trees of graphs, one of which was the
very pretty method that you developed with Cheriton and Karp. Although I was
expecting your method to be the winner, because it examines much of the data
only half as often as the others, it actually came out two to three times
worse than Kruskal's venerable method. Part of the reason was poor cache
interaction, but the main cause was a large constant factor hidden by O
notation."

Question: in what ways can having a large constant factor in an algorithm
cause such a dramatic performance difference on modern hardware. Can anyone
give examples? I can only assume that the "large" constant factor in this
example was large enough to be always overflowing, and needed some bignum
functions to work.

~~~
mcguire
Jon Bentley (I think) described one situation once, where the problem was
swapping two substrings in a string (again, I think). One of the algorithms
was the double reversal trick[1] and the other wasn't; both were O(n). He
implemented both and ran them on a large string on his (small) machine. One
program finished in seconds while the other he terminated after several
minutes.

The difference was that the string was large enough to force the machine to
swap memory out to disk and the slow version was exactly pessimal regarding
the machine's paging algorithm. It would touch a location on a page, forcing
it to be brought back into memory, and then wouldn't touch the page again
until it had touched all of the other pages of the string. The fast program
did a lot of work on one page before going to the next and finished relatively
quickly, the slow program thrashed. The same thing can be seen with modern
machines' memory caching.

However, this kind of complexity analysis is intended to abstract over machine
specific details like cache line sizes and paging algorithms, to let
algorithms be compared in the abstract. Usually, if you don't trip over weird
edge cases and if the problem is large enough (an assumption that is part of
the analysis), an algorithm with a better O notation will kind, sorta, maybe
always be faster than one with a worse.

(I think you have a misapprehension: the "large constant factor" is part of
the simplification process from the hairy formula that actually describes an
algorithms' performance (see Knuth's discussion of a "recurrence relation") to
the abstract summary of Big-O: O(f) = n, O(g) = n^2,.... It is not any factor
in the program itself. For example, O(f) = n is true if f(x) = x and if f(x) =
2x. But the second one will be slower.)

[1] [https://tecnocode.co.uk/2008/11/19/string-swap-
algorithms/](https://tecnocode.co.uk/2008/11/19/string-swap-algorithms/)

------
thebear
Given the facts that functional programming has been all the rage lately, and
that advocates of functional programming tend to emphasize its mathematical
roots, I would find it interesting to get Donald Knuth's take on the subject.
Web search came up empty for me. Does anybody know of any statements he's made
on functional programming?

~~~
laichzeit0
Some quotes of Knuth on functional programming:

"Out of the ~250 programs I wrote last year, 2-3 would have benefited from
being written in a functional style."

"With 1 or even 2 hands tied behind your back it’s hard to do anything
dangerous."

I remember looking this up once and couldn't find much. I guess he figures
everything that can be written in functional style can be written in
imperative style so why bother.

------
justin66
The unpopularity and inherent goofiness of literate programming, and the way
Knuth always mentions it in interviews, feels like a sort of elephant in the
room whenever a new interview appears.

~~~
kcorbitt
The closest thing I've seen to literate programming "in the wild" is the
ipython notebook concept, which encourages you to intersperse your code with
explanatory text to give it context and structure (eg
[http://nbviewer.ipython.org/github/corbt/city-
weather/blob/m...](http://nbviewer.ipython.org/github/corbt/city-
weather/blob/master/time_change.ipynb)). I don't think it's the right fit for
all (or even most) programming, but I like it for that class of programming
because often a reader is more interested in _why_ you chose to do what you're
doing, rather than _what_ the code actually does.

~~~
burntsushi
I always understood literate programming as a tool for _teaching_ and not as a
tool for writing software.

As I've written more and more about code I wrote, I've found myself inventing
weird ways of making sure the code samples are compilable, runnable and
correct. I've done it enough that now I'm considering using a literate
programming tool next time.

(To be clear, I'm not referring to traditional API documentation with
examples. There exist good tools for testing examples without resorting full-
hog to literate programming. I am referring to more general prose whose aim is
less specific than documenting an API.)

------
mcguire
The paragraph that jumped out at me:

" _My work on introduced me to applications where 'correctness' cannot be
defined. How do I know, for example, that my program for the letter A produces
a correct image? I never will; and I've learned to live with that uncertainty.
On the other hand, when I implemented the routines that interpret
specifications and draw the associated bitmaps, there was plenty of room for
rigor. The algorithms that go into font rendering are among the most
interesting I've ever seen._"

------
TKMDbass
[http://www.safariflow.com/search/?q=Donald%20E.%20Knuth](http://www.safariflow.com/search/?q=Donald%20E.%20Knuth)

------
inappropriate
"Most of the techniques that have turned out to be important were originally
introduced for the wrong reasons!"

How inappropriate.

