
Show HN: Anansi – a NoWeb-inspired literate programming preprocessor - jmillikin
https://john-millikin.com/software/anansi
======
habitue
I came to similar conclusions as the author about the usefulness of literate
programming. Even after "taking it seriously" and "doing it right" and all
those other no-true-scotsman type objections. (see HATEOAS/REST for more
examples of these).

Initially, I was super excited about the idea. Imagine a world where you take
a book off the shelf and peruse a master programmer's in-depth explanation of
the details of some famous software. What they were thinking, the abstractions
they decided on, the algorithms they chose, the tradeoffs made. Wouldn't that
be an amazing world to live in?

In practice, very few pieces of software are written in a way that lends
itself to a cohesive narrative. Software is written by numerous developers,
layer by layer, often with hacks thrown in. New people come on, don't
understand the whole thing, change parts of it, and move on. The requirements
change, and the system contorts itself to serve multiple purposes that weren't
envisioned by the original coders.

Keeping normal auto-generated documentation up to date under those
circumstances is already a herculean task, literate programming asks more:
maintain a story and flow and an overarching rationale. It's just not possible
for most software.

There is a type of software that lends itself to literate programming: the
kind of software Donald Knuth writes. Written by a single author, with
encyclopedic knowledge of the problem domain, excellent writing style (Knuth
is very funny if you've read his work), and above all, the programs Knuth
writes tend to be "done" at some point. There are bugs to be fixed, but no new
features are added that might drastically change the narrative.

Most software isn't like that, but if yours is, then literate programming can
be fun.

~~~
svat
I agree that literate programming works well for Knuth because of the kind of
programs he writes: mainly, as you say, that the program was written for a
particular purpose, and when that's achieved it's "done".

I wonder, though, whether it's necessarily true that more software cannot be
like that. We _could_ in principle move a little more in the direction of
declaring programs done, and when new requirements come up, writing a new
program to cover them. (Knuth has no qualms about having similar code in
different programs; it's almost universally believed by other programmers
today that that's a terrible thing.) The old program would continue to work
well for its original purpose, and those who need the newer program would use
that one instead.

As with books: sometimes you need a new edition of a book, sometimes a reprint
with corrections, and sometimes a new book entirely. It's ok if multiple books
cover the same topics and even if they do so in slightly different ways; all
that matters is that each is internally consistent — we don't demand that
everything related to a certain domain and written by the same author(s) be in
a single book. With programs, almost always they only grow and expand, with
incremented version numbers.

~~~
bitwize
> I agree that literate programming works well for Knuth because of the kind
> of programs he writes: mainly, as you say, that the program was written for
> a particular purpose, and when that's achieved it's "done".

A piece of software in itself really isn't much of anything; the true value
lies in the support you get from the development team. "Done" software is
inherently unsupported, therefore close to worthless and probably won't be
used in a production setting.

> (Knuth has no qualms about having similar code in different programs; it's
> almost universally believed by other programmers today that that's a
> terrible thing.)

It's a waste of programmer effort. In the open source realm at least, it would
be a far more efficient use of programmer time and energy -- and easier on the
users -- for programmers to collaborate on a single definitive program for
each task, rather than reinvent wheels and confuse the marketplace with
competing implementations of the same abstract process.

~~~
svat
Let's talk specifics. Here are four programs written by Knuth, in roughly
chronological order:

1\. The Algol-58 compiler he wrote for Burroughs (specifically, for their B205
machine). You can read about it in many places ([http://ed-thelen.org/comp-
hist/B5000-AlgolRWaychoff.html#7](http://ed-thelen.org/comp-
hist/B5000-AlgolRWaychoff.html#7) ,
[https://www.youtube.com/watch?v=QeiuVNDQg4k&list=PLVV0r6CmEs...](https://www.youtube.com/watch?v=QeiuVNDQg4k&list=PLVV0r6CmEsFzeNLngr1JqyQki3wdoGrCn&index=27)
, or in great detail at [http://datatron.blogspot.com/2015/11/knuths-
algol-58-compile...](http://datatron.blogspot.com/2015/11/knuths-
algol-58-compiler.html)). This was written in the summer of 1960, debugged by
Christmas, and put on their computers. The machine didn't sell very well in
the US, but apparently it (and the compiler) was being used in Brazil over the
next decade, successfully. I wouldn't call this "worthless" by any means; it
did its job. (Of course one might argue that the prevailing model at the time
was for software to get "done", so it wasn't really an exception.)

2\. TeX. This is the most famous example. After developing it for about 10
years, he declared it done
([https://www.tug.org/TUGboat/tb11-4/tb30knut.pdf](https://www.tug.org/TUGboat/tb11-4/tb30knut.pdf))
except of course for bugfixes. (He still looks at bug reports once every few
years
([https://cs.stanford.edu/~knuth/abcde.html#bugs](https://cs.stanford.edu/~knuth/abcde.html#bugs)),
but there was exactly one bug reported during 2007–2013, and it was a really
inconsequential one about the whitespace for how an "empty" macro would be
printed in the error logs.) TeX is stable, well-understood (at one point of
them there were hundreds of people who "knew" the entire program, which is
unprecedented for a program of that size), and is very well-supported (see
TUG, tex.stackexchange.com etc) — and in any case most of the questions these
days are about LaTeX (a set of macros, with horrible error-handling, the
opposite of TeX) or other packages, not about TeX (the program) itself. At
Knuth's request, extensions are released as new programs (pdfTeX, XeTeX,
LuaTeX) etc, and TeX stays the same (and even these programs have approached
stability). This is _definitely_ neither "inherently unsupported" nor "close
to worthless" nor "probably won't be used in a production setting" — at any
given point of time several publishers are using it in production, not to
mention various others who are not even making physical books.

3\. The Stanford GraphBase. This is a suite of programs, also published in
book form (as literate programs). There are people still making use of these
books, and these programs can be used as building blocks for other programs,
e.g. for many of the ones that Knuth writes mainly for himself (see
[https://cs.stanford.edu/~knuth/programs.html](https://cs.stanford.edu/~knuth/programs.html)
or [https://github.com/shreevatsa/knuth-literate-
programs/tree/m...](https://github.com/shreevatsa/knuth-literate-
programs/tree/master/programs)). I don't think continuing to work on it,
versus calling them done for now, would change anything about them.

4\. Any of the programs on that page, e.g. say SETSET, which was written in
February 2001, and “Enumerates nonisomorphic unplayable hands in the game of
SET®”. Or the first two (SHAM, written December 1992, and OBDD, written May
1996) — both written mainly to find that there are “exactly 2,432,932 knight's
tours [that] are unchanged by 180-degree rotation of the chessboard”. Once the
job is done, just what is to be achieved by refusing to declare them “done”?
(Incidentally, note that Knuth did figure out an improvement to SETSET, but
wrote it as a new program. The original program is fine though.)

In fact, if you look at the blog post by Joel Spolsky from 2002 on “Five
Worlds”, in at least three of them (and maybe four), software can be quite
commonly declared “done” (always, of course, except for bug fixes: “done” just
means we've finally decided what the software is supposed to do; it's not the
same as abandoning it even when it's not doing what we decided it's supposed
to do). Throwaway code is often indeed thrown away, games get changes released
as sequels or separate expansion packs, embedded software often _cannot_ be
updated anyway, and internal software can also often be "done". It's only the
first one in which it is usually not.

------
erikpukinskis
> _Eventually the pain of working within a single massive source file became
> overwhelming, and I decided to write my own literate programming tool that
> could consume a filesystem hierarchy_

I wonder if that’s where they went wrong.

Perhaps when your code is too big to fit in a file, it is too big to describe
serially at all. And it’s time to split it in two separate pieces, two modules
with a formal interface between.

The idea that you could have a nonlinear narrative without any formalism
dictating the relationships between paths is perhaps asking too much of the
narrative form.

Perhaps that’s the whole point of a module system: creating formalisms so that
you have a chance at holding in your head the relationship between one series
of procedures and another.

(This is not entirely academic. I have about 100 richly connected single file
modules on NPM)

~~~
jmillikin
> _Perhaps when your code is too big to fit in a file, it is too big to
> describe serially at all. And it’s time to split it in two separate pieces,
> two modules with a formal interface between._

I'm not sure these two issues are related. The "tangled" PDF output of
haskell-dbus 0.9 was split into chapters, each chapter being roughly a single
Haskell module (with a specified API). And books are by nature serial, yet
they can express enormously complex ideas -- think of a compilers textbook.

This was all six years ago so my memory's a bit fuzzy, but I remember feeling
lost in my own code when it was all in one file. I couldn't do things like
"split window and go to top" to find the imports/exports of the current
module, and searches for a symbol name return far noisier results when they're
all implicitly project-global.

It might be possible to fix these with better tooling, like an editor that
could parse literate source and show the rendered output. But then you'd be
treating the "source" as a sort of opaque input and editing at the level of
"compiled" output, which is behavior I associate more with reverse-engineering
than typical software development.

~~~
saulrh
> And books are by nature serial, yet they can express enormously complex
> ideas -- think of a compilers textbook.

I'd argue that, in many cases, well-organized books have split their material
up into chunks with formal interfaces between them. A textbook's table of
contents and introduction create a high-level structure, cross-references
allow like material to be grouped with like material to make a coherent
picture of each subsystem or topic, and chapters can be read out of order or
even skipped entirely by some readers. Textbooks for mathematics or computer
science are the most obvious examples, but I find the same level of
organization and ability to jump around even in popular science books and
philosophical discourses. Even the simplistic "five paragraph essay" that is
taught in elementary school can be interpreted as an implementation of a
formal interface.

~~~
jmillikin
All of that is true of a single source file too -- editors can generate tables
of content and cross-references. The experience of using a well-organized
paper encyclopedia and Wikipedia are fundamentally different. The point of
"serial" is that if I want to read a book, I have to pick an origin point and
then start scanning linearly. There's no search bar in a textbook.

------
jmillikin
I saw
[https://news.ycombinator.com/item?id=17483242](https://news.ycombinator.com/item?id=17483242)
("Literate Programming: Empower Your Writing with Emacs Org-Mode") and figured
HN might be interested in a real-world attempt to use literate programming for
a larger project (Haskell implementation of D-Bus).

~~~
akkartik
Very candid. I've written before about how Literate Programming gets misused:
[http://akkartik.name/post/literate-
programming](http://akkartik.name/post/literate-programming)

This feels like supporting evidence.

~~~
jmillikin
Not misused -- I think I gave a pretty good shot at writing literate code as
Knuth originally intended it. It's just that the goal of literate programming,
the transformation of source code into a document that can be read like a
book, doesn't seem to be useful.

It's worth noting that Knuth wrote WEB in 1981, ten years before the web.
There's no way he could have known at the time that hyperlinks and search
would be a far more useful interaction model for reference documentation.

~~~
akkartik
I don't think this is right. In Knuth's own words at the top of his site:

 _" The main idea is to treat a program as a piece of literature, addressed to
human beings rather than to a computer."_ ([https://www-cs-
faculty.stanford.edu/~knuth/lp.html](https://www-cs-
faculty.stanford.edu/~knuth/lp.html))

This is the consistent message I've gotten from his writings: the goal is to
communicate to other people. "Transformation of source code into a document
that can be read like a book" feels far more low-level than that.

> Knuth wrote WEB in 1981, ten years before the web. There's no way he could
> have known at the time that hyperlinks and search would be a far more useful
> interaction model for reference documentation.

Knuth certainly knew about hyperlinks. On the same page, Knuth says:

 _" The program is also viewed as a hypertext document, rather like the World
Wide Web. (Indeed, I used the word WEB for this purpose long before CERN
grabbed it!)"_

There's a pleasing non-linearity to Knuth's creations, both in the source
(with fragments being named and referring to each other) and in typeset form
(with all the attention to the index of fragments, to showing with each
fragment all the places that refer to it).

\---

In any case, we may be splitting hairs here. I'm not a scholar of Knuth's
work, and maybe your interpretation is correct. I agree that if you define
Literate Programming as "transformation of source code into a document that
can be read [linearly] like a book", then that goal is not useful. If you
define it as "better communicating programs to other people," then I think
that goal is still relevant. All programmers should keep this goal in mind,
while loosening their grips a tad on the precise details of how they happen to
aim for the goal at a specific point in time. There's still lots of room for
improvement.

~~~
samatman
Hi Kartik! Still at it I see...

me too ^_^

------
jsyedidia
See also
[http://literate.zbyedidia.webfactional.com/](http://literate.zbyedidia.webfactional.com/)
and
[https://github.com/zyedidia/Literate](https://github.com/zyedidia/Literate)
for information about "Literate", which is my favorite Literate programming
tool.

------
rixed
I've also tried several times literate programming. I actually like it for
small libraries that requires a good documentation but little maintenance.

What I like in this experience is how literate programming forces me to spell
out each of my assumptions and justify every choice, just because I then have
to program with the reader in mind. And I found that sometimes those
assumptions were not as sound as I though. For that reason I would recommend
anyone to give it a try.

By the way, I also had to write my own tool, extracting the code from the
documentation instead of generating both code and documentation, which allowed
me to use both my favorite document format and programming language.

