Hacker News new | past | comments | ask | show | jobs | submit login
A Comparison of Four Algorithms Textbooks (2016) (porgionesanke.wordpress.com)
239 points by kerneldeveloper on Oct 7, 2017 | hide | past | favorite | 46 comments

The criticism of Knuth's use of MIX seems a bit off because:

+ Assembly language level instruction execution is a good way of talking about the running time of algorithms at a finer level of detail than Big O notation. Finer grain than Big O is helpful when analyzing and optimizing programs.

+ MIX is a good abstraction of the Von Neumann architecture and that's where the analysis of actual programs occurs.

+ MIX programs are still accessible to a general auidence 50 years after they were written. No actual programming language available in 1967 is still similarly accessible to such a degree.

+ MMIX is the successor of MIX...but in so far as learning MIX is an issue, it's more complex child probably is not an improvement.

The art of MIX is that it keeps the reader's focus on the fact that computer programs ultimately load registers, read memory, execute mathematical primatives, and branch no matter what object-oriented, functional, or type safe abstractions a high level language might offer.

Also, as I counted when this came up a few months ago[1], only a very tiny part of the TAOCP books is in MIX. For example, in Volume 4A, there are a grand total of 3 programs in MIX (and 2 of these in Answers to Exercises), over the 883 pages of the volume. Even in Volume 1 which has the most, there are only 36 small MIX programs over the 652 pages.

Knuth gives MIX programs only when it makes sense for a specific purpose (“…makes it possible to carry out meaningful studies of the effects of cache and RAM size and other hardware characteristics (memory speed, pipelining, multiple issue, lookaside buffers, the size of cache blocks, etc.) when comparing different schemes.”). He even gives C code sometimes, e.g. in Volume 2 when presenting a summary of the chapter on random numbers, there are five C snippets on pages 185–188.

Anyone who doesn't want to learn MIX can just ignore those few parts of TAOCP; there's enough value in those books without them. A good place to start may be the most recent fascicles[2].

[1]: https://news.ycombinator.com/item?id=14520230

[2]: http://www.cs.utsa.edu/~wagner/knuth/

This seems to be a point where reasonable people may differ. I generally go to Knuth when I'm really stuck--not just how to do something but sometimes how to frame the problem so it can be solved. In that sense MIX does not make much difference since it's the discussion and backing mathematics that are most interesting.

On the other hand outside of OS and embedded systems people very few people regularly look at assembler, so for many people it's no longer an effective lingua franca for sample implementations. It's a bit like reading Principia Mathematica in the original Latin--sure, some people can do it but it's not the best way to transfer information to a broad audience. Personally I would be happy with some variant of C--the syntax is very broadly understood and the language is close enough to the hardware that implementation concepts come across clearly.

I've got nothing against C. C was not around when Knuth published the first (1968), second (1969), and third (1973) volumes. [1] By inventing a programming language, Knuth was able to avoid syntactic changes between versions (e.g. K&R C looks odd for someone who started with one of the ANSI C's).

At the time, 'best practice' might have been Algol 60 or maybe Algol 68 because they were what it meant to be a standard language in those days even though implementations varied widely.

[1]: Whether C would have been a reasonable choice the time of the fourth (2011) volume is another matter.

Neither Algol 60 nor Algol 68 were very good choices, but it was basically them or FORTRAN which were used back then in published algorithms.

Algol 60 was the forbearer of most modern programming languages. However, implementations of Algol 60 were all a bit different and even after the the revised report came out a few years later there remained serious ambiguities in the meaning of common Algol 60 constructs, see for example [1]. Other aspects of Algol 60, for example the use of call-by-name argument passing semantics, made it not entirely suitable as a means of communicating algorithms.

Algol 68, had, for many years, essentially no conforming implementations. The language was remarkably orthogonal, every aspect seemed to be fully generalized and usable everywhere in the language, but it was too complicated. At least, it had a concept of I/O, which Algol 60 lacked. Some algorithms were published in Algol 68, but I didn't see many of them.

In this environment, Knuth's books were quite valuable. The algorithms, while not structured or written as we see them now were clear and unambiguous and were written for a time when compilers didn't perform advanced code-optimizations; Knuth's careful designs included optimizations that we would leave to compilers today when talking about algorithms.

Eventually, Pascal came along and it became possible to publish algorithms in an easy to read and unambiguous way. (It was still uncommon to see code formatted in any consistent way. See Wirth's own books!) Pascal wasn't perfect, but it was easy to understand and implementations were portable and widely available.

[1] Knuth, The remaining trouble spots in Algol 60*, https://atlas.cs.virginia.edu/~asb/teaching/cs415-fall05/doc...

> On the other hand outside of OS and embedded systems people very few people regularly look at assembler

Make that OS, embedded systems, and standard platform runtime libraries. The people implementing the algorithms in your programming-language-of-choice's stdlib certainly know what those algorithms are compiling down to. And they're the vast majority of the people who write out algorithm implementations at all, any more.

> I generally go to Knuth when I'm really stuck--not just how to do something but sometimes how to frame the problem so it can be solved.

If I may ask, what do you work in that requires reading heavy algorithms stuff? (sounds like an interesting job!)

I currently work on disaster recovery and used to do a lot of work on DBMS replication internals. Both fields have problems requiring compact, efficient solutions that work at scale. What I like about Knuth is that he treats algorithms in depth including trade-offs, which are often quite subtle.

For example, you might not think there's much to sequential search but after reading Knuth Vol 3, Section 6.1 it's clear that view is the mark of a greenhorn. ;)

Disaster recovery like recovering servers from crashes/recovering 'lost' databases?

I never read Knuth but after swimming in FP and above for a while now, one thing that hit me is that:

1) even the most abstract things are still a bunch of tiny operations; but with structure

2) I peeked at Sutherland Sketchpad Thesis, and it's mostly 1).. full of 60s machine assembly but, it's nicely structured, very regular. I had probably more pleasure reading this than all of the OOP book I know combined.

I’m surprised, but I like this (I typically don’t like or agree with textbook comparisons, but I think this tour is mostly right). The Sedgewick text, Algorithms should be in there too but the author apparently didn’t read that.

My own experience basically agrees: I’ve read and enjoy Skiena, it’s written in clear style and it’s the “cover to cover” text for a working developer or for interview practice. But I also have TAOCP and CLRS on my shelf, and I haven’t read either of them. I’ve certainly used both of them a lot, but I simply haven’t read all of them because I use them more as reference texts.

Personally I find it bothersome that these textbooks are written with idiosyncratic pseudocode. In my opinion, many of these authors lose their grasp of common implementation difficulties if they don’t provide students with working code that will compile and run. It’s easy to throw down the theory and some okay-ish pseudocode while in effect saying, “...and the rest is just a minor implementation detail, which I leave as an exercise to the reader...”

Oh, dear lord, the algorithm descriptions and pseudocode in research papers...

A while back, I played around with Earley parsing. The Earley parsing algorithm, the basic one, is pretty simple. In fact, it's almost recursive descent, if you didn't use recursion, but managed the current state of the parse manually.

Unfortunately, the original algorithm was described as dynamic programming, which was the hot new thing at the time. Then, there's a bug in it, it's hard to recover a parse tree from it, and it can be extended to be efficient on left-recursive (?) grammars.

Translating the algorithm out of the dynamic programming world isn't too hard. There's a reasonable description of fixing the bug. But recovering a parse tree (technically, a "Shared Packed Parse Forest", since Earley is context-free and to avoid exponential blow-up when producing multiple parse trees, the trees need to share structure) killed me. That paper's pseudocode was the worse spaghetti I've seen in a long time.

I fully agree, having been looking into the same papers. It also doesn't help that each author start out with their own version of the basic implementation which often won't match the implementation the other authors use.

I think I finally managed to get the SPPF to get the right structure now, but it is way harder to penetrate compared to the complexity of the underlying idea.

Now I only have to figure out how to actually iterate over the different parse trees..

It’s easy to throw down the theory and some okay-ish pseudocode while in effect saying, “...and the rest is just a minor implementation detail, which I leave as an exercise to the reader...”

I don’t feel that’s really the case with TAOCP. It’s machine language for a fictional machine, but there exist emulators for the machine. The code is runnable.

The Sedgewick text is quite good but even better are his classes on Coursera! Very accessible and built around practical exercises.

I found the assignments to his class to be really weird and challenging. The first one had you fill in a few functions of a half-complete percolation simulation, but without having written the other part of it, it felt like having to reverse engineer and guess at the solution. There was no real way to measure your progress. Either it runs or it doesn't, and there's no incremental progression towards a solution.

I'm curious if anyone else experienced this or could point out where I went wrong. I really wanted to do the class and I liked the lectures, but i started it after finishing Gardner's Stanford algorithms Coursera class which had some of my favorite exercises ever. They had you write an algorithm in it's entirety, gave you sample data sets to check against, and let you write it in any language. Compared with that, Sedgwick's assignments were like coding with one hand tied behind my back.

Also on Safari (both the books and the classes)

Segdewick is a breezy read (well as breezy as an algorithms text can be). It has somehow never been considered a canonical tome, but I thought it was a decent introductory text.

I was expecting Sedgewick to be in the list---I've never seen Dasgupta---since it's one of the most commonly seen, at least on the cubicle bookshelves I've browsed. It seems as canonical as any.

> I’ve certainly used both of them a lot, but I simply haven’t read all of them because I use them more as reference texts.

If I may ask, what is your line of work? If it requires reading heavy algorithms stuff, it is probably an interesting job!

Honestly, I'm blown away that someone has actually read cover to cover Knuth, CLRS, Dasgupta and Skiena. Is this a common thing? Has anyone here done something similar? For years I've had the textbooks of CLRS and Skiena at home (and a pdf of Dasgupta) but they are used only in the event I need to drill down to understand a particular algorithm to solve a particular problem. I feel that the most effective use of my time is to have a birds view of the landscape (i.e understand the categories of algorithms and the problems they solve) and dive down only when I actually need to know the nitty gritty to implement a solution for a real problem. I can understand the joy of learning them all just for fun, but who has the time? I wonder if he did all the exercises too... :-)

Yes, well, once upon a time in the dark ages, we didn't have HN or even the Internet to spend our time on. At home I didn't have a computer, no one did in the 60's or early 70's. I did have a few books so I read a lot including Knuth and the publications of the ACM that I subscribed to when they arrived in the mail. I wrote my code out on paper when I worked on it at home.

Where does it say that he has read all these books cover to cover?

Generally when someone writes a review this in-depth, this is assumed. Not always accurate, of course.

"Common" probably not. Knuth is large, CLRS too but less so. People needing or craving such amount of knowledge aren't a lot.

As other said, lots have CLRS as a reference bible to pick a solution or read about one subject or one idea in particular.

CLRS is the size of a reference manual. Algorithms courses tend to cover material from it, but never the entire book at once. I've always wondered who has the time to read and retain all of CLRS.

I haven't read these particular books but normally I have one "huge" CS book under progress at any time. It's usually a topic I want to know more about and get a basic fundamental understanding of, so mostly driven by the need for knowledge and the feeling that my knowledge is lacking in this area. That said, it usually takes me months to get through with one and often I already know some of the contents which I end up speed reading/skimming.

Many grad level algorithm courses still use Dexter C. Kozen's book The Design and Analysis of Algorithms, Springer-Verlag, 1992. It's a timeless book with clear analysis.

There's also this excellent free draft on analysis of common undergrad algorithms for parallelism http://www.parallel-algorithms-book.com/

TAOCP is more than an algorithms book, Knuth even have strategies for writing lengthy programs from scratch, how to build test libraries, how to optimize a program to make it cache memory friendly ect.

I became curious about quantum computing a few years ago. Googling around, I came across the QC chapter in Dasgupta. I found their description of the topic to be excellent, exactly what I was searching for. They provide a substantive presentation of QC, as contrasted with many "pop sci" hand-wavy impressionistic descriptions of QC. They tactfully avoid a few of the extremely difficult proofs, but they do give a solid mathematical underpinning for the subject. And, they convey in prose what the point of the whole thing is. Also, they forthrightly express the mysterious nature of what seems to be going on here. (If you have an N-qubit quantum computer, the universe seems to somehow maintain and manipulate a vector of 2^N complex elements as a quantum computation proceeds.)

They give a brilliant presentation of Shor's factorization algorithm, and how quantum computers offer an amazingly natural way to do FFT's (a key aspect of the Shor algorithm).

I would not contest the OP's questioning of the choice of a chapter on QC in an undergrad algorithms textbook. I would, however, offer the standard "your mileage may vary" caveat to the OP's very negative characterization. I had no idea what QC was about, and this chapter provided me with a great "on-ramp" to understanding what QC is all about.

It's odd to me that the author describes CLRS as graduate-level. It (back when it was CLR and in its first edition) was the text in my undergrad Introduction to Algorithms class (taught by L!).

I agree that Sedgewick belongs in this comparison, but I can't fault the author for not having read it. I think it's the book I would most easily recommend to others; it's quite clear and has lots of very nice visualizations. I do think that Skiena deserves a special mention for explicitly being about how to come up with your own fundamental algorithms and data structures rather than just plug an existing one into your program.

CLRS was also the undergraduate textbook at my school, RIT. I didn't think it did a very good job of explaining algorithms if you didn't already have a good grasp of them.

Skiena on the other hand does a nice job of both describing the algorithm in a straightforward manner and also getting you into the algorithm headspace.

Kleinberg/Tardos book that I am reading right now needs trimming for brevity. A good math text editor could, probably, easily lop off 1/3 of it without affecting its content. Otherwise I prefer it to most other algo books because its proofs are closest to how mathematicians approach their proofs. By contrast, I am confused as to why CLRS contains proofs at all. At the beginning of the book, the authors say something to the effect of actual proofs being too messy/hard so they simply wave their hands through them. But if most any intro discrete math books can do these proofs, why can't CLRS? To me it's a turn-off.

Which book/website has the best exercises for developing algorithmic problem-solving skill? I've started working through Skiena's exercises, but haven't really looked at much else.

If you're actually interested in drilling algorithm exercises - in which you map a problem statement to a particular algorithm and time/space complexity, then implement the solution - competitive programming websites are the best way to practice. There's a good number of them now, including TopCoder, LeetCode, CodeForces, SPOJ, HackerRank, etc. You can typically sort the problems by difficulty or solution acceptance rate. Working through these problems requires:

1. Breaking down a problem statement into a heuristic pattern,

2. Analyzing the best, average and worst case complexities of possible solutions,

3. Mapping the heuristic(s) to the data structures and algorithms with the most optimal complexity,

4. Implementing the code and having it successfully run, with runtime performance feedback.

I think the aforementioned competition websites are better than things like Project Euler because solving the problem requires running actual code instead of just giving a correct answer. That makes them much more interactive (and harder, in my opinion), because you might have to obey particular performance or complexity constraints, and you can receive feedback about how efficient your solution is.

I wouldn't say that practicing these problems will make you a better software developer, in the sense that you can develop maintainable software to solve business problems in a team setting with a large codebase, and I make no comment on whether these sorts of problems are optimal filters for tech interviews (that's a dead horse). But much like mathematics, programming (and specifically algorithm analysis) is not a spectator sport. You can't efficiently learn the material just by reading it, you have to do it, just as mathematical maturity comes about by solving many mathematical problems in different domains. To that particular end, I would say that competitive programming is about as perfect a formulation of practice as you can get for improving algorithmic problem-solving skills.

> I wouldn't say that practicing these problems will make you a better software developer

I know. Apart from it being fun, I'm mostly interested for the social-proof value in interviews. I don't want to beat the dead horse either, but it's definitely a socially useful skill.

Thanks for the suggestions.

I enjoyed reading Anany Levitin's Introduction to the Design and Analysis of Algorithms. Algorithms in this book are organized around problem-solving techniques - Decrease-and-Conquer, Divide-and-Conquer, Transform-and-Conquer etc - rather than application areas. I found this style very useful to develop an arsenal of techniques I can employ when I come across a new problem.

Also, for practicing programers, studying data structures is more important than algorithms. I often refer to Sally Goldman's book A Practical Guide to Data Structures and Algorithms Using Java for this.

I really enjoyed Udi Manber's /Introduction to Algorithms: A Creative Approach/ when I was starting out. It is not the place for sexy new stuff, but it's a great resource for learning the fundamentals and developing a taste for the subject.

+1 to this book. Developing algorithms and their correctness proofs together is a powerful technique and ought to be more widely used.

Skiena is very good - Algorithms are lucidly explained - Actual code is provided - Its relatively newer but highly recommended

Have read only parts of Knuth / CLRS - They are good, but for real problems analysis , would prefer Sienna

I'd actually add one thing: coding style.

Shame on me for admitting this! In fact, despite having personal CS favors like everyone, the painfully obvious subjectivity of the whole matter has always striked me as entirely futile to be taken into account for basically everything. I even worked for years of uninterrupted peace with people who would, for example, prototype pointer arguments of c/c++ functions gluing the wildcard to the type, then space then boom variable name. I've always been cool with this, even reading things like glibc's code.

This liberalism however has two exceptions: my own code obviously, a dictatorship where merciless enforcement of inflexible and rigorous coding style is accepted. But unfortunately, in algorithm books too! I know it is completely idiotic but I'd be in denial not to admit how much of a total turn off the coding style of code in algorithm books can be to me. Segdewick for example, such a wonderful book, the prose is excellent, the content really is outstanding (probably one of the most comprehensive I own), the editions I have even has a superb typeface and paper quality, unfortunately the C coding style of this book has this effect on me which makes me want to close the book immediately. I feel the same deep pain every time I have to look at it (which does happen a lot since it is a great book really!). I'm actually jealous of those more advanced human beings who are able to make an abstraction of that when reading a coding book!

I've used dasgupta in my ug algorithms class at Berkeley, which was actually taught by papaD one of the coauthors. I love the text because it is so concise. A great book to get started on algorithms. A good exercise is to implement them in the language of your choice.

As a self-taught developer, I prefer the Skiena book. The topics are ordered in a way that make sense to me, and the examples are practical.

However, the other books are probably a better match for a standard university algorithm course curriculum.

Can we get a comparison of Algorithms undergrad courses? :)

I have Knuth's series on my bookshelf. It looks great!... especially since I haven't broken the bindings by actually reading any of it...

This is an interesting read because Dasgupta is what the current OMSCS algorithms class is using.

I didn't realize there were that many textbooks that just covered four algorithms.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact