How many people in the world could claim something similar?
"My job is to go beyond correctness, to an analysis of such things as the program's running time: I write down a recurrence, say, which is supposed to represent the average number of comparisons made by that program on random input data. I'm 100% sure that my recurrence correctly describes the program's performance, and all of my colleagues agree with me that the recurrence is "obviously" valid. Yet I have no formal tools by which I can prove that my recurrence is right. I don't really understand my reasoning processes at all! My student Lyle Ramshaw began to create suitable foundations in his thesis (1979), but the problem seems inherently difficult."
There are formal tools to prove recurrences are correct.
First, any proof assistant like Coq or Isabelle could be used to verify recurrences.
Papers like "A Machine-Checked Proof of the Average-Case Complexity of Quicksort in Coq" show that this is not limited to just the recurrence, but can also be applied to programs and random input data. The Certified Complexity (CerCo) project does similar work.
Maybe I misunderstood what Knuth meant by "formal tools" or "recurrence is right". Here is a link to the abstract of Ramshaw's thesis:
Knuths response to Tarjans Q15 was particularly interesting since he was able to illustrate his insight with a concrete example;
>Thus I think the present state of research in algorithm design misunderstands the true nature of efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to be published.
> Another issue, when we come down to earth, is the efficiency of algorithms on real computers. As part of the Stanford GraphBase project I implemented four algorithms to compute minimum spanning trees of graphs, one of which was the very pretty method that you developed with Cheriton and Karp. Although I was expecting your method to be the winner, because it examines much of the data only half as often as the others, it actually came out two to three times worse than Kruskal's venerable method. Part of the reason was poor cache interaction, but the main cause was a large constant factor hidden by O notation.
While phk has a point, the dig at Knuth's work feels a bit unfair. In any case, I'm curious to see whether future editions of TAOCP note this caveat in its analysis.
I'll give it a read.
His reply was yes, but not many. He mentioned a guy in Germany who goes over his books again and again looking for errors. This guy is the leading recipient of Knuth's checks.
That was the question I'd wanted to ask Knuth for several years. It was a good question for a thirty second interaction and photo.
I call these types of people "moles" because I imagine they basically never see daylight. Thank god for these people. If I ever have the time and will power to study TAOCP it will be a very relaxed mindset, knowing that with a high probability I can be confident that the book, exercises and solutions are error free.
This question made me pause and ask another question: Who are the geeks of the early 21st century, specifically, 2014? I fear in my heart that software development is becoming less and less of a "geek" hideout over time.
I did, however, go to Maker Faire last weekend. It was a geek wonderland. There were so many cool projects on display and a huge emphasis on getting you started. It shocks me when I hear people say how "commercialized" and "mainstream" it's gotten -- for me, it seems like the hardware market is just starting to open up to some really cool innovation, and there's a huge community of people doing it right in their backyard.
A more long-term question might be: what's going to be the next "geek" hideout after hardware?
Picking out a line:
"For simplicity, let me say that people like me are "geeks," and that geeks comprise about 2% of the world's population."
> My secretary prints out all messages addressed to email@example.com or firstname.lastname@example.org, so that I can reply with written comments when I have a chance.
He just does not USE email himself.
Andrew Binstock, Dr. Dobb's suddenly asks, "At the ACM Turing Centennial in 2012, you stated that you were becoming convinced that P = N P. Would you be kind enough to explain your current thinking on this question, how you came to it, and whether this growing conviction came as a surprise to you?"
Don Knuth replies, "As you say, I've come to believe that P = N P, namely that there does exist an integer M and an algorithm that will solve every n-bit problem belonging to the class N P in nM elementary steps...."
Scott Aaronson, MIT immediately has an aneurysm and after all of the convulsions and frothing is over, when the EMT's leave, they prop him back up in his chair and he only has the wherewithal to ask, "Would you recommend to other scientists to abandon the use of email, as you have done?"
> Instructions that have completed execution and have not yet been committed are analogous to cars that have gone through our hypothetical repair shop and are waiting for their owners to pick them up. However, all analogies break down, and the world of automobiles does not have a natural counterpart for the notion of speculative execution. That notion corresponds roughly to situations in which people are led to believe that their cars need a new piece of equipment, but they suddenly change their mind once they see the price tag, and they insist on having the equipment removed even after it has been partially or completely installed.
Few favorite tidbits:
- Only 2% of population seem to have aptitude for computer science
- Knuth writes 2 programs per week even today
- He is leaning towards P=NP
- Most software projects fails because they are not entrusted to geeks
- 50 years in writing 3000 page bible of CS. I would suppose it takes probably same time to read all of that material.
(edit: there is a note saying "as they become available". I guess TAOCP is only partially portable so far, more to come.)
(brag: I got a decimal dollar by check from Professor Knuth for finding a typo in 2010. Nothing clever, just a cheap typo, but I was happy.)
I got $2.56 for an error in content in one of his books. Actually the very first word in the very first sentence on the very first page (well, Arabic 1) was wrong. At least nobody can claim that he didn't get that far!
I got another $0.32 for a bug report which he actually demonstrated to be invalid, though he counted it as some kind of input or improvement. Oh, and he threw in a T-Shirt with the MMIX instruction set!
The third bug report was turned down without explanation (I think he actually replied "sorry, but no cigar").
The fourth bug report was also turned down and I even felt a bit insulted: his MMIX book says on the back cover that all programs are in the public domain, even though they all carry copyright notices with restrictions.
Knuth accused me quite harshly (IMO) to be some kind of Stallman fan talking nonsense about licenses and did not acknowledge the bug.
Strange. Knuth appears to be a significant GNU supporter: http://www.gnu.org/thankgnus/2014supporters.html
I hope you framed that check!
The reason for the change: http://www-cs-faculty.stanford.edu/~uno/news08.html
"Another issue, when we come down to earth, is the efficiency of algorithms on real computers. As part of the Stanford GraphBase project I implemented four algorithms to compute minimum spanning trees of graphs, one of which was the very pretty method that you developed with Cheriton and Karp. Although I was expecting your method to be the winner, because it examines much of the data only half as often as the others, it actually came out two to three times worse than Kruskal's venerable method. Part of the reason was poor cache interaction, but the main cause was a large constant factor hidden by O notation."
Question: in what ways can having a large constant factor in an algorithm cause such a dramatic performance difference on modern hardware. Can anyone give examples? I can only assume that the "large" constant factor in this example was large enough to be always overflowing, and needed some bignum functions to work.
Algorithm A has complexity like AO(n) and algorithm B goes like BO(n^2) (just made-up examples). So one would say that A is better. But the constant A happens to be very large, so that, until n becomes very large, B is actually faster, because the constant B is so much smaller. So conventional analysis would say that A is better in general, but its actual poor performance is "hiding" in the huge constant A, and is worse than B in real situations, where n never gets huge enough for A's complexity advantage to become relevant.
The difference was that the string was large enough to force the machine to swap memory out to disk and the slow version was exactly pessimal regarding the machine's paging algorithm. It would touch a location on a page, forcing it to be brought back into memory, and then wouldn't touch the page again until it had touched all of the other pages of the string. The fast program did a lot of work on one page before going to the next and finished relatively quickly, the slow program thrashed. The same thing can be seen with modern machines' memory caching.
However, this kind of complexity analysis is intended to abstract over machine specific details like cache line sizes and paging algorithms, to let algorithms be compared in the abstract. Usually, if you don't trip over weird edge cases and if the problem is large enough (an assumption that is part of the analysis), an algorithm with a better O notation will kind, sorta, maybe always be faster than one with a worse.
(I think you have a misapprehension: the "large constant factor" is part of the simplification process from the hairy formula that actually describes an algorithms' performance (see Knuth's discussion of a "recurrence relation") to the abstract summary of Big-O: O(f) = n, O(g) = n^2,.... It is not any factor in the program itself. For example, O(f) = n is true if f(x) = x and if f(x) = 2x. But the second one will be slower.)
"Out of the ~250 programs I wrote last year, 2-3 would have benefited from being written in a functional style."
"With 1 or even 2 hands tied behind your back it’s hard to do anything dangerous."
I remember looking this up once and couldn't find much. I guess he figures everything that can be written in functional style can be written in imperative style so why bother.
The book can be compiled into the production-quality renderer http://pbrt.org/
I think of it as analogous in feeling and usage to how a musician practices a difficult passage slowly, with a metronome, perfecting it at gradually increasing speeds until they can play it perfectly at speed. It's tedious and time-consuming, so I tend only to use it when I feel the need, but when I do, I'm glad it's there.
As I've written more and more about code I wrote, I've found myself inventing weird ways of making sure the code samples are compilable, runnable and correct. I've done it enough that now I'm considering using a literate programming tool next time.
(To be clear, I'm not referring to traditional API documentation with examples. There exist good tools for testing examples without resorting full-hog to literate programming. I am referring to more general prose whose aim is less specific than documenting an API.)
I haven't really used either, but they seem to go a bit in that direction. Although they are strictly sequential, I guess, so maybe not.
I'd definitely say it's unpopular, but hardly goofy from a pedagogical point of view. C Interfaces and Implementations is such a beautiful book and the literate programming style really compliments it well.
(1) the primary emphasis is on the description and commentary rather than on the code;
(2) the order of presentation is determined by the description rather than the program.
The first is a useful and valuable tool. The second is an artefact of using a language like Pascal which has a strict inflexible order to its code and no suppor for modules. If you have a decent language then literate programming is just a different kind of comment syntax.
"My work on introduced me to applications where 'correctness' cannot be defined. How do I know, for example, that my program for the letter A produces a correct image? I never will; and I've learned to live with that uncertainty. On the other hand, when I implemented the routines that interpret specifications and draw the associated bitmaps, there was plenty of room for rigor. The algorithms that go into font rendering are among the most interesting I've ever seen."