Hacker News new | past | comments | ask | show | jobs | submit login
Programs to Read (stanford.edu)
133 points by danso on Sept 16, 2015 | hide | past | web | favorite | 39 comments

This page has obviously been around for awhile (or at least looks like it has :))...but one thing I didn't notice until recently is that Knuth is apparently still updating it with new programs. If you scroll to the bottom, there's at least one program labeled as "August 2015"

I am afraid that this is a wrong recommendation - too narrow, too esoteric, too academic.

I would suggest to read Norvig's old Common Lisp code from his Paradigms Of Artificial Intelligence Programming masterpiece (and the whole book, of course).

AIMA and SICP are obviously good reads - the code is polished like fine poems.

Norvig's new Python courses on Udacity are also absolutely wonderful, but the classic books are still the classic books.

And then arc.arc, of course!)

> I am afraid that this is a wrong recommendation - too narrow, too esoteric, too academic ... I would suggest to read ... Common Lisp code

That is sure mainstream.

Which courses would those be? The Udacity offerings are starting to get many enough to be daunting to go through linearly, and searching for Norvig turns up only "Intro to Artificial Intelligence" and "Design of Computer Programs", both of which have been around for a few years now, if I recall correctly.

I'm not asking this to be snide, but as somebody generally unfamiliar with the literate programming concept - for practical purposes, what makes this different from having well-encapsulated functions with thorough commenting?

For an example of what I mean, see underscore.js [1] and the prettyfied version of the source [2].

[1]: https://github.com/jashkenas/underscore/blob/master/undersco...

[2]: http://underscorejs.org/docs/underscore.html

Most documentation is non-linear. Functions are documented without regard to their calls, etc. The trouble with them is that it's hard to scale up to large sub-systems. If you are modifying some cross-cutting facet of a large system, it's often hard to find the right place to add an illuminating comment given all the different places you are modifying simultaneously. Literate programming is a way to give that place by introducing linear narrative structure in documentation. It can be quite valuable at large scales.

That said, my critique: http://akkartik.name/post/literate-programming (I actually read many of the programs in OP to write that.)

You have an excellent point in there: if you aren't free to put your code in any order you like, then it probably won't fit very well into your prose.

Using LP, comments can be arbitrarily formatted, with TeX equation handling, beautiful figures, tables, cross references, indices, table of contents, etc.

Code can be presented in a way that best suits the reader (versus the writer or the compiler). Though Knuth's examples usually seem to dive right in, lacking the introduction to be provided by the latest stuff in TAOCP.

When I use it, I write it like a paper decorated with code, where the code can be extracted and run. I usually include the necessary make file and perhaps some test data. Entries in figures/tables are sometimes generated automatically by the program being described.

There are lots of old papers and critiques about LP, by Knuth and others. You should check 'em out.

As a fairly modern example, let me suggest "Physically Based Rendering" by Pharr and Humphreys [1]. It is a complete (and fairly large) book in the LP style. There are couple of sample chapters available online that you can view [2][3]. To me, one of the real advantages is that you can present the code and commentary in a logical structure that goes in an orderly progression of concepts for human understanding rather than the more strictly physical structure that a compiler expects.

[1] http://www.pbrt.org/

[2] http://www.pbrt.org/chapters/pbrt_chapter7.pdf

[3] http://www.pbrt.org/chapters/pbrt-2ed-chap4.pdf

As Knuth himself said, literate programming facilitates the task of thinking about a problem without having to use a formal language to construct such understanding bottom-up. Therefore literate programming can, for example, produce much less stack depth in a program while making it easy to understand and maintain. This is quite an interesting aspect with regards to performance of the resulting code, that is often overlooked. He also claims that if TeX had been written without it it would not have a depth of 4-5 subroutines on the call stack, but around 50-100 [1]. He also states that he would not have been able to produce MMIX [2] without literate programming because he would not have been able to reason about it's design bottom up [1].

So, a lot of complexity is added to a program just so that it can be understood and maintained in a formal language; but he also states that many modern comment styles go a long way towards literate programming.

Having worked on large codebases I concur that using literate programming to construct these seems far-fetched and too academic, yet I am enticed by the idea of optimizing code by literate programming since I have seen far too many abstractions for the purpose of maintaining easy to read code.

[1] http://www.codersatwork.com/ [2] http://mmix.cs.hm.edu/

One of the primary differences is that the document can be structured, i.e. ordered, to benefit readability and not for the needs of the software. At the point that source code is needed for tools the literate version can in one step be used to generate it. Even multiple files in a hierarchical directory structure can be generated from a single literate software document.

I like to point out that when Literate Programming was first proposed, software was a great deal less malleable than it is now. LP vs. the rigid structures of the time is one thing, LP vs. the malleable structures of today is quite another. Cast back in time I might prefer LP in the past, but today, it's an awful lot of complexity and a huge layer to add on top of our systems for what is of significantly less relative utility. Very little stops you from having more narratively-coherent code today, in existing systems; the problem is more human than tool now.

To clarify, could be said, literate code is ". . . ordered, to benefit readability and not for the needs of the _compiler_ or _interpreter_". Tangling and weaving are concepts by which compilable or runnable source code is created from the "literate" source, which will not generally be directly compilable. Here's one of many links to more info: http://www.ross.net/funnelweb/tutorial/intro_what.html

TIL about cweb. [1]

> CWEB is a version of WEB for documenting C, C++, and Java programs. WEB was adapted to C by Silvio Levy in 1987, and since then both Knuth and Levy have revised and enhanced the system in many ways, notably to support C++ and ANSI C. Thus CWEB combines TeX with today's most widely used professional programming languages.

> If you are in the software industry and do not use CWEB but your competitors do, your competitors will soon overtake you---and you'll miss out on a lot of fun besides.

I guess it never took off due to the "I'll do it latter when I have time mantra". Documenting and testing are always seen as low priority.

Trying to convert maxcliques.w but it rightfully warns about missing gb_types.w

Is there a zip file with all programs and dependencies?

texlive-binaries ctangle also seems to be losing some formatting.

[1] http://www-cs-faculty.stanford.edu/~uno/cweb.html

Nobody's business was ever ruined because of bad documentation, sadly.

Literate programming is an interesting idea from academia, but in the engineering environment of most startups it's a waste of time. Our program sources are annoying byproducts of a business, not something of value unto themselves.

> Nobody's business was ever ruined because of bad documentation, sadly.

I've seen quite a few startups fail because they had a smart, but completely undocumented code base. Then the smart programmers leave and who is left cannot handle the code and new hires take too long to get into it. Then funding runs out and the apparent lack of progress ensures that nobody wants to put more money into the company.

> Literate programming is an interesting idea from academia, but in the engineering environment of most startups it's a waste of time.

Amazingly, we're actually getting back to something that resembles literate programming - via TDD. Instead of asking for a proper spec, we now ask for test definitions that the code must pass. Those can even be written in something that is pretty close to natural language (cucumber etc) so that management can write them. Then we spend time to make the test framework understand all those pseudo-natural-language test definitions and then we finally write the code. Is that really faster? Or are we just missing the tools that would allow us to tie a natural language specification to code?

> pretty close to natural language (cucumber etc) so that management can write them

This sounds so familiar ...

What a jokester! Management doing work?

Then the engineers get to program in a less-powerful, more cumbersome language.

Nobody's business was ever obviously, directly ruined because of bad documentation, and so nobody was ever blamed for it or fired over it.

> Nobody's business was ever ruined because of bad documentation, sadly.

Maybe not ruined, but certainly severely hampered. I worked for several months with an original author of a 1M LOC application, and he was let go because the company was too cheap to re-up his contract. The documentation in that time was barely passable and it took many months of tweaking and breaking things just to get back to the point of becoming productive again. No one had a well-documented procedure on how to even compile the code until I decided to set my tasks aside(to my own schedule's detriment) so I could document all the steps and dependencies along the way. Luckily for them the company was well established and could afford the wait, others may not have that luxury.

Just because it doesn't make you money doesn't mean it has no value.

There is no engineering without dollar signs in the equation.

I didn't say it had no value in general, just in a particular context.

That assertion seems arbitrary and unprovable.

I was able to get CWEB working on my mac as follows:

    wget http://www-cs-faculty.stanford.edu/~uno/programs/sat0.w
    brew install cweb
    cweave sat0.w
    tex sat0.tex
    dvips sat0.dvi
"tex" and "dvips" were already on my system and I don't remember how I installed them.

Just got to work also. I also noticed that some code no longer compiles, at least the older programs.

On my GNU/Linux box:

> dependencies:

  wget ftp://ftp.cs.stanford.edu/pub/sgb/sgb.tar.gz
  tar -xzf sgb.tar.gz
> documentation:

  cweave back-pdi.w
  pdftex back-pdi.tex
  okular back-pdi.pdf &
> code:

  ctangle back-pdi.w
  gcc -o back-pdi  back-pdi.c
  back-pdi 2 1

(I'll be damned if I'll ever get fluent in HN thread formatting)

My guess is that you will find the necessary dependencies in the Stanford GraphBase:


If I remember correctly, the GraphBase source files start with the gb_ prefix (sorry, I'm ony phone so can't check).

Note: the Stanford GraphBase for me is an example of how a full blown library can be beautifully written in a literate style. Worth buying the hardcopy and reading it.

For those who may be new to Literate Programming, here's information on how to process CWEB files: http://www.readytext.co.uk/?p=2475

CWEB: the second most effective way to keep people from reading your code, after simply writing it in Lisp.

Use scribble/lp2 [0] to do both !

[0] http://docs.racket-lang.org/scribble/lp.html

Did you use CWEAVE to convert the CWEB source to TeX? Then you can compile the TeX into a PDF that should be more readable than most source code out there.


A lecturer at Stanford CS (Keith Schwarz) also has a good collection of various implementations of interesting algorithms: http://www.keithschwarz.com/interesting/

I'm not sure I see the point...

You could make things much more readable by just using decent formatting and descriptive variable names. I know, it's decorated C code, and academic C code at that, but damn, some of that stuff is a mess to try to read.

Maybe I've been in C#/Java/Python/JS land for too long, but javadoc/xmldoc/docstring documentation is a lot more readable. If you really want to, you can dump in markup, pictures, non-ASCII characters, etc into the html-based documentation.

The more I look at it, this stuff must be a monstrosity to try to actually write. I can't see it being at all attractive, unless you were doing well-defined, algorithmic code. Or you could write it in normal C and package it as a library that people might actually use.

The author is Donald Knuth, in case you didn't catch that. Since most of these are dated from the 90's, you should probably be comparing verbosity to COBOL or PASCAL and keeping in mind this was from before the ubiquity of Linux or Python, and long before Java, Javascript, or C#.

Most of them are dated from the 2000s.

It's hard to miss that it's on Knuth's website. Just because it's published by one of the gods of computer science doesn't mean that it is code in a style that anyone should be encouraged to emulate.

Hmm, why haven't the processed files been posted as well?

Knuth is posting source and you want the assembly?

No, parent is asking for the rendered document. Analogy: The source is like HTML; parent is asking for the rendered document as you would view it in a web browser.

"Change files" are refered to quite a lot. What does that mean?

the algorithms look good but the code seems a bit arcane

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact