Hacker News new | past | comments | ask | show | jobs | submit login
Knuth – Estimating the Efficiency of Backtrack Programs(1975) [pdf] (ams.org)
64 points by lamchob 33 days ago | hide | past | web | favorite | 6 comments

Note that this paper (Knuth's “P69”) was reprinted as Chapter 6 of Selected Papers on Analysis of Algorithms, and this material is covered (in compressed form) in Pre-Fascicle 5B of The Art of Computer Programming.[1] This will be part of Fascicle 5, due to be published later this year, and eventually the first one-third of Volume 4B (the middle third, namely Fascicle 6, has already been published).

Pre-Fascicle 5B is[2] entirely about (what Knuth considers an “introduction to”) backtrack programming, and Pre-Fascicle 5C follows it up with a special low-level technique for making backtracking more efficient, namely Dancing Links. And Fascicle 6 is about Satisfiability (aka SAT), the premier instance of a problem solved by “backtracking”. All of these are already available!

If you're interested in algorithms for combinatorial problems you should check out whatever Knuth has published since he started working on Volume 4; there's a lot of fresh material and he has a unique perspective of his own.

Edit: What I mean by unique perspective: For example, unlike mainstream Theory-CS, Knuth is not primarily interested in the asymptotic complexity of problems (despite being the “father” of analysis of algorithms and the popularizer of Big-O notation in computer science), but is interested rather in real programs that can be executed on real computers, and in the mathematical analysis thereof. Given a problem, what I can imagine a mainstream theoretical computer scientist doing is coming up with a suitable abstraction that is mathematically elegant, reducing the problem to/from other known problems, getting asymptotic upper and lower bounds on complexity, not minding if one ends up with a “galactic algorithm”, etc. What I can imagine Knuth doing is writing an assembly program (perhaps in MIX/MMIX, his best approximation to the wide variety of actual machine languages in existence), then doing a no-holds-barred mathematical analysis of its running time (not just Big-O but including the constants for the leading terms: here Big-O is used only to suppress less-useful information), then comparing this running time against empirical estimates derived from writing and running a C (or CWEB) program. See some talks on Sedgewick's site for examples of the difference (sorry can't find the exact reference right now).

[1]: https://cs.stanford.edu/~knuth/news19.html#targetText=5b (search for “5b” on the page)

[2]: https://en.wikipedia.org/w/index.php?title=The_Art_of_Comput...

To put it differently, Knuth's perspective is that of someone who's fundamentally a programmer at heart, but also happens to be a supremely competent/skilled mathematician. There are not many people like that (Bill Gosper comes to mind).

In this paper, which is worth reading even if you've never heard of backtracking before — Knuth introduces backtracking with a very simple example that he then deeply analyzes to great effect — you can get a sense of this from the very first page:

> Sometimes a backtrack program will run to completion in less than a second, while other applications seem to go on forever. The author once waited all night for the output from such a program, only to discover that the answers would not be forthcoming for about 10^6 centuries. A "slight increase" in one of the parameters of a backtrack routine might slow down the total running time by a factor of a thousand; conversely, a "minor improvement" to the algorithm might cause a hundredfold improvement in speed; and a sophisticated "major improvement" might actually make the program ten times slower. These great discrepancies in execution time are characteristic of backtrack programs, yet it is usually not obvious what will happen until the algorithm has been coded and run on a machine.

> Faced with these uncertainties, the author worked out a simple estimation procedure in 1962, designed to predict backtrack behavior in any given situation. […] during subsequent years, extensive computer experimentation has confirmed its utility.

He spends 4 pages (one-fourth of the paper) gently introducing backtracking, then a page making a mathematical formalism out of it, then the method falls out almost naturally. The method is elegant and almost too good to be true: instead of doing your backtracking (systematically trying every possible way to extend a partial solution), at each step just pick a random way to extend the partial extension. Continue till you get stuck or find a solution. (You're basically going down just one random path of the search tree.) The cost of this path, weighted by appropriate probabilities, is an estimate of the cost of the backtracking program.

Two corollaries show the worth of the formalism, then he refines the method to work better in practice, etc. The motivation is always better tools for the working programmer, but he's happy to, e.g. introduce generating functions and differentiate them, when needed.

> Furthermore, the game [hand calculation] is worthwhile, because it gives insight into the behavior of the algorithm, and such insight is of great use later when the algorithm is eventually programmed; good ideas about data structures, and about various improvements in the backtracking strategy, usually suggest themselves.

Another notable mathematician and exemplary programmer is D. J. Bernstein.

Knuth was Sedgewick's Ph.D. advisor and if I remember correctly Sedgewick's Ph.D. was on a very careful analysis of the actual running time of quicksort and it's variations.

Edit: yes it was, see [1]

[1] Sedgewick, R.: Quicksort. Stanford University, Stanford Computer Science Report STAN-CS-75-492, Ph. D. Thesis, May 1975

Knuth mentions this paper in Things a Computer Scientist Rarely Talks About. He said that the success of randomization for quantitative problems led him to use similar techniques for qualitative problems -- like studying the Bible by studying only a "random" sample of verses. He ended up choosing to focus on verse 3:16 of each book (actually, the 16th verse after the start of the 3rd chapter, provided such a verse exists). Questions about whether this was an appropirate sample and whether this was a good idea at all are discussed extensively in TACSRTA.

Perfect timing, I just realized this evening that I might need backtracking https://gist.github.com/cellularmitosis/aa46e1e88c3bcd2db83c...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact