Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why did literate programming not catch on?
101 points by dman on Aug 16, 2015 | hide | past | favorite | 98 comments
Would love insights from people who used literate programming on non trivial projects.

It makes it harder to make changes. The story you start telling is not what you end up with later, after you've completed all the non-trivial features and major assumptions have fallen through. Going back and fixing the story as you go along is expensive. Writing the story after it's done is too late - the business value is in the product's shipped functionality, not in the development artifacts.

We have an alternate method of understanding how software developed, which is to look at revision control commits. This method falls more in line with the techniques of development and the genuine evolution of the codebase. Recall that revision control methods were still being improved well after Knuth wrote about literate programming, and the available systems, where people used them(and a lot of shops didn't), weren't nearly as fine-grained back in the 1980's.

Personal experience: I tried using "Leo", an outlining and literate programming text editor, for a project. Although the documentation capability was nice temporarily and gave me an additional method of structure and organization, the hooks messed around with some formatting and refactoring operations, and most of the time, the benefit wasn't clear. The time I spent on the documentation could have gone - from my current perspective - into making the code smaller and simpler. At the time, I didn't know how that would be possible, thus I focused on defending against the complexity by adding more.

A lot of our preconceptions about what makes code both look good and behave well are temporary. That makes it hard to come up with a sensible system of organization, as we'll put effort in to enumerate and categorize only to discover that it falls apart.

Your mention of commit messages resonates strongly with my own professional experiences, and I'm sure with many others' as well.

Several months ago I started working with a commercial code base that has about 3 years of commits and over a dozen contributors, but very few inline comments. Navigating and refactoring it is usually a fairly reasonable process due to well-named symbols, module organization, and test cases.

When that's not enough though, I pull up the "git blame" history in my editor, and a rich story unfolds, telling me things like about how old a block of code is, how recently that one line in the middle was changed, and why that seemingly benign extra bit of code is sitting there. Sometimes the commit messages aren't as detailed as one might prefer, but you can often still get a lot of useful insight just from the date stamps.

I've been accused of not writing enough comments in the code itself and then writing "novels" in the commit messages and "Literate Revision Control" is probably the best name for that sort of style. It's very easy for comments in the source to grow stale, but the commit messages mostly (our source control tools still aren't perfect with regards to moves/refactors) tell the story over time and mostly show only relevant commit information, with "stale" commit information falling away into history/legend/myth as its code gets rewritten and retouched.

I've been liking how Visual Studio's CodeLens (now available in cheaper SKUs in 2015) brings focus to commit history specific to logical units in the code (methods and classes).

Now I'm curious where you might be able to push things if you purpose built a "Literate Revision Control" tool and what sort of strong "epic poem" style of commit messages would best produce useful "novelizations" of a codebase...

Jetbrains IDEs also have an awesome interface to see the git history of some piece of code.

I'm fairly happy with Leo for "documenting" my Puppet nodes - I create a Leo node for each Puppet node or node regex with a link to the node definition, and a link to the "documentation" node of each class included for that node. Classes have "documentation" nodes which similarly link to their definition, files and templates and to documentation nodes of any classes included by the class. There's no more prose involved than you'd expect in-line. I find the structure extremely useful to re-discovering how a particular node is configured. If there are simplifications available, I find the structure makes them visible.

Perhaps the above would be less useful for more traditional code. Could you describe in more detail how you used it?

I have a long catalog of improvements I'd like to see made to Leo (e.g., its XML file format is version control hostile - try resolving merge conflits on several thousand lines of deeply nested machine-targeted XML; or try sending a pull request slathered with "sentinels"). Building such tools being out of scope for my day job, my re-imagined version won't be available any time soon.

I used Leo for a solo game project - AS3 code - made over the course of about a year. I also took some notes with it during the same time. It was over five years ago now. I don't remember all the details of what I did, but:

First of all, there were some encoding conflicts that were introduced when mixing Leo with other editors. When I go back to the project now, it doesn't compile because of the encoding errors. (It's fixable, I'm sure.)

Second, I had more classes than I needed. The secret to writing compact game code is - basically - to write few real classes and rely on plain old data and a large main loop. As it was, they were calling up and down some hierarchy, splitting pieces of the main loop into different classes, running a custom scripting language to drive AI, etc. I had all sorts of ill-considered ideas at the time and no real guidance. It wasn't a _tremendous_ amount of code(running a simple count again, 31,145 LOC with whitespace/comments and 22,372 without), and Leo documented what it all did, but the tool couldn't suggest why it was fundamentally rotten, it just added process on top. By the end I wasn't really using Leo, because it wasn't solving my problems.

Looking back on it now, I have a style that can more naturally accommodate a literary programming approach because I'm more likely to write a straight-line solution first. But I would not rely on an external tool again as I don't want the dependency.

I love the comment of "defending against complexity by adding more." That defines a lot of the tricks we use in every day programming to defend against complexity.

I also love the juxtaposition of this comment and the other top comment.

> We have an alternate method of understanding how software developed, which is to look at revision control commits.

Yes! If your source code files, version control commits, code review comments on those commits, and bug discussion threads are all cross-referenced in a unified Web interface, many problems just go away. IMO it solves the same problems that literate programming was supposed to solve, but less intrusively and more reliably.

Also I agree that code organization is often a bit overrated. I really like linear "hack hack hack" code, even if it has a bit of copy paste, and dislike highly abstract OO soup. On the flip side, I happen to be fanatical about good naming, which is easier if the code is more concrete than abstract.

Thanks, this was useful.

It depends on your programming style. If it's a typical convoluted, constantly refactored OO code - then, yes, literate programming won't help.

If your code consists of a lot of DSLs, clearly separated from each other, each implemented in a small, compact, readable module - then you won't have to change that much in the existing code. In my experience this style is a very good match for a literate programming.

It does survive, in a certain sense, in scientific programming and data science. Both iPython notebooks and Rmarkdown are a sort of literate programming, although with the emphasis on the text more than the code. In that setting, the executable artifact is not really more important than the explanation of why the code does what it does, so the extra overhead is justifiable.

Rmarkdown example: http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html

iPython notebook example: http://nbviewer.ipython.org/github/empet/Math/blob/master/Do...

The "notebook" paradigm in Mathematica is another good example; arguably it's as much a part of the experience as the underlying Wolfram kernel.

The Pander R package renders objects into Pandoc's markdown. This allows you to generate a wide variety of output formats, including ConTeXt or LaTeX documents (and subsequently PDF).


I just wanted to note that this isn't some newfangled invention, Mathcad did interactive notebooks in 1986.

I've been using literate programming for 15 years on many projects. The largest and most visible one is Axiom (https://en.wikipedia.org/wiki/Axiom) which currently has many thousands of pages with embedded code.

I've talked to Knuth. He claims he could not have implemented MMIX without literate programming. Literate programming is really valuable but you only understand that once you really try it. I gave a talk on this subject at the WriteTheDocs conference: https://www.youtube.com/watch?v=Av0PQDVTP4A

You can write a literate program in any language, for instance, in HTML: http://axiom-developer.org/axiom-website/litprog.html

There are some "gold standard" literate programs: "Physically Based Rendering" by Pharr and Humphreys won an academy award. "Lisp in Small Pieces" contains a complete lisp implementation including the interpreter and compiler. The book "Implementing Elliptic Curve Cryptography" is another example.

Suppose your business depends on a program. Suppose your team leaves (they all do eventually). Suppose you need to change it... THAT's why you need literate programming. Nobody is going be around to remember that the strange block of code is there to handle Palm Pilots.

Companies should hire language majors, make them Editor-in-Chief, and put them on every programming team. Nobody checks in code until there is at least a paragraph that explains WHY this code was written. Anybody can figure out WHAT it does but reverse-engineering WHY it does it can be hard.

Imagine a physics textbook that was "just the equations" without any surrounding text. That's the way we write code today. It is true that the equations are the essence but without the surrounding text they are quite opaque.

Imagine how easy it would be to hire someone. You give them the current version of the book, send them to Hawaii for two weeks, and when they return they can maintain and modify the system as well as the rest of the team.

Do yourself a favor, buy a copy of Physically Based Rendering. Consider it to be the standard of excellence that you expect from a professional programmer. Then decide to be a professional and hold yourself to that standard.

> Consider it to be the standard of excellence that you expect from a professional programmer. Then decide to be a professional and hold yourself to that standard.

Whilst I haven't read Physically Based Rendering, I had the same illuminating experience reading C Interfaces and Implementations: Techniques for Creating Reusable Software by Hanson which was also written in the literate programming style.

Just correcting the Wikipedia link: https://en.wikipedia.org/wiki/Axiom_(computer_algebra_system...

Aside from that, i tend to agree with you, and i aspire to the same. Unfortunately laziness and time constraints often get the better of me, though.

> Nobody is going be around to remember that the strange block of code is there to handle Palm Pilots.

Unless someone puts a comment above it? Literate programming isn't the only method of documenting code.

You're essentially doing LP if you have enough high quality comments. It doesn't really matter what tool you use for this.

No, you really aren't. To the point that if you aren't going out of your way to represent the narrative flow of what you are writing, you are probably not going to see any benefit.

Yes, you are ;-)

The thing is, as I wrote in the other comment, that "modern"(like Lisp, as someone noted ;-)) languages already allow you to structure the code as you see fit. You don't have to define (or even declare) functions before their first use in the file, you can easily extract any part of the code into a function or method and you can place it anywhere you want. There are many mechanisms available for threading the context through functions, which makes direct, textual inclusion simply not needed.

In short: for LP to work you need to organize your code a way you'd like to read it. Many languages are capable enough that they don't need third party tools for these.

And I assumed that you do it anyway, because there's no real downside for this in many languages. If you don't do it in a language which supports it - shame on you. If you do, then "enough comments" is the only thing you need for your code to begin being literate.

I used to feel this way. So, in large I want to agree with you. However, the syntactic affordances of languages doesn't really do this justice. In the same way that writing an outline of a novel is not the same as writing a glossary to go with it.

Basically, if there are any parts of the program that you would describe differently in conversation, then it isn't the same thing.

Now, I fully grant I am getting close to a scotsman argument. I also grant that this is not necessary to write bug free software. However, I do see them as different things.

Companies should hire language majors, make them Editor-in-Chief, and put them on every programming team. Nobody checks in code until there is at least a paragraph that explains WHY this code was written. Anybody can figure out WHAT it does but reverse-engineering WHY it does it can be hard.

And those companies will get their lunches eaten by folks who know that you can do well enough simply by throwing bodies at the problem, unfortunately.

Programming occurs in a wide variety of contexts, and different tools and workflows are optimal in different contexts. In the same way that I find interactive programming in Common Lisp optimal for exploratory work in a complicated or uncertain domain, I find literate programming to be optimal for work that requires rigor in a domain with little uncertainty and static requirements.

Literate programming hasn't "taken off" only in the sense that few people are performing the type of tranquil and rigorous work it was made for. Much of the (admittedly difficult) work being done by programmers today is in fact trivial. The difficulty comes from attempting to solve ill-specified or constantly changing requirements by glueing together a constantly changing set of frameworks and tools.

However, I would suggest that even in organization whose primary responsibility is wrangling with a messy soup of ill-defined requirements as fast as possible, there are often sub-problems whose treatment is amenable for the use of literate program, such as a library implementing a unique and non-trivial algorithm. In such cases, it can be worthwhile to carve out an island of tranquility, clear prose, and rigor, even if it means using slightly different tooling than the rest of the project.

Others have already given the reasons about the downsides of comments. One other reason is: Most developers are terrible, terrible writers.

It's one thing to imagine a profession of literate programming as practiced by Donald Knuth; it's another thing entirely to imagine it as practiced by the kind of people who are actually writing code.

I always thought that people who cannot adequately express their thoughts are fundamentally unfit for any kind of programming anyway.

Because software changes so rapidly.

Literate programming is based on the idea that you should explain what software does and how it does it in detail so that a reader can follow along and learn. That works great for TeX, which hasn't changed significantly since 1982. It works less great for say, Google/Alphabet, which wasn't even the same company last week.

The general problem with documentation is that it gets out of date as the software evolves to fit new requirements. Most real-world products face new requirements on a weekly, sometimes hourly basis; as a result, most fast-growing startups have oral cultures where the way to learn about the software is to ask the last person who worked on it.

"Most real-world products face new requirements on a weekly, sometimes hourly basis; as a result, most fast-growing startups have oral cultures where the way to learn about the software is to ask the last person who worked on it."

The weakness of this approach is that the last person who worked on some piece of code may be that guy who quit last week.

Also, once you have a big code base, there may be parts of the code that are critical (e.g., low-level APIs), but that nobody has worked on in a year.

Also, while startups may have smaller code bases and thus have less problems with these issues, successful startups eventually do become larger, established businesses with huge legacy code bases (think Google or Facebook). At some point, lack of accessible knowledge about how the code works can grow into a massive technical debt. (Been there, done that, regretted it.)

All of those are problems in practice. For some, there are mitigating effects, eg. many companies use code reviews or pair programming to ensure that there are multiple eyes on each section of code, or they rotate out tasks between programmers so that multiple people need to get familiar. For some (eg. the large codebase that got big but has no documentation, and all the original authors have cashed out their options and are sitting on a beach somewhere), it just sucks.

The thing is - being eclipsed by a faster, nimbler competitor who steals your market is a bigger problem. Bad code will make your programmers groan and occasionally threaten to quit, but it usually won't threaten the existence of an organization. A bad product will threaten the existence of an organization. So among surviving companies, code generally tends toward shitty, because the ones that spent a lot of time on code hygiene often went under during the growth & competition phase of the industry.

> Bad code will make your programmers groan and occasionally threaten to quit, but it usually won't threaten the existence of an organization.

You're joking, right?

I have seen companies be "eclipsed by a faster, nimbler competitor" because they have a crappy codebase.

The reason for the implosions were that, "during the growth & competition phase of the industry," they could not grow because new developers really could not ramp up on the crappy, woefully-undocumented codebases, whilst clients left for less-buggy products that were adding new features.

But isn't the point of literate exactly that? You keep your code and your docs (design, description etc) closely knit. So even if you need to change your software rapidly, you change your code and docs together.

PS: I'm trying to do literate for my full time programming, still trying to cross the hurdle.

> keep your code and your docs closely knit

It is hard. Most people I know try to replace comments with very descriptive variable and method names - that are easy to change in any IDE. On the other hand comments (and other docs) sooner or later turn into misinformation/lie, unless you have very disciplined developers, all of them.

The only thing worse than a big system with no documentation is a big system with documentation that is full of lies and untruths.

Short of hiring people full-time to write documentation, and to ensure that all changes to the program are done in a heavyweight-enough process to ensure that documentation changes match the programming changes, those are your two choices.

A not as big system decomposed into well-defined parts with clear and concise documentation is best of all. Great documentation hits many purposes including lessening learning curve for training and behavior qualtitative verification without burdening readability by mindlessly duplicating boilerplate that detracts.

> most fast-growing startups have oral cultures where the way to learn about the software is to ask the last person who worked on it

"Oral culture" - I like that euphemism.

It's not disorganized, confused, poorly-planned, and glaring technical debt, no, it's "culture."

> Most real-world products face new requirements on a weekly, sometimes hourly basis

If your requirements are changing on an hourly basis, it's time to have a serious talk with your project manager.

> Google/Alphabet, which wasn't even the same company last week

Let's not pretend that some corporate moniker shuffling had overnight effects upon their codebases.

> Because software changes so rapidly.

So you don't have time to change your tests or other dependencies?

It's arguably not really a fair use of the term "oral culture" either. Most actual oral cultures placed a high premium on the ability to accurately retain core information (e.g. the whakapapa in Maori) with prestige for being able to convey it well.

You must have a nice cushy well defined corporate job if you think that constantly changing requirements are something unusual.

Changing requirements are quite normal, but changing by the hour is just poor project management, no matter what the environment.

Because software changes so rapidly.

I imagine you’re right in practice, but one might also ask whether code that is changing faster than understanding is changing too fast. It’s not as if mechanically updating literate-style documentation to reflect a change should be disproportionately expensive compared to making the change itself, or to making corresponding changes to test suites, getting the code reviewed by other people, or getting the changes merged into source control.

Another thing to note, however, is that not only is the code changing too fast, but also that code doesn't always do what the programmer thinks it does.

That is a good point. Reason I was asking this question is because of my current workflow. The workflow looks has the following stages -

Stage 1. Read a bunch of papers and blogs for the problem being solved.

Stage 2. Assimilate relevant ideas and come up with some kind of design. This stage usually involves hand written notes, doodles etc.

Stage 3. Write code based on the design.

I currently document stages 1 and 2 in a wiki but since it lives far away from the code, those things dont get used as much as I would like. Being able to add written reference material in a file along with the code and then having the ability to see a code only view seems appealing from where I am at currently.

Since you are doing stage 1 and 2 on a computer system, you could go on, and write the code of stage 3 in that very same wiki too!

Then write a spider to scan this wiki, extract the code blocks, and assemble a compilable program, and you're done, wiki literate programming!

A wiki is a terrible instrument for writing software documentation. For example, there is no revision control to track changes as required by by changes in the code, there is much duplication. I could go on, and on, why a wiki is not optimal for any documentation associated to a software project.

Better to use a system like DITA or dockbook.

Which wiki engine do you have in mind? There certainly is revision control in MediaWiki, Confluence, and others. Granted, I wouldn't use those in place of a source code revision control system such as git. There are generally no "annotate"/"blame" or "pickaxe" features. I would agree with writing something code-friendly in the first place, examples of which would include DocBook and Restructured Text.

There's at least one wiki that's built on top of Git:


There's also Gitit, which supports multiple VCSs including git:


Maybe to get the wiki closer to the code, you could put links to relevant wiki pages in your code comments. Some editors even recognize URLs in the code and allow you to click on them for instant access. And if your code repository is HTTP-accessible, you could put links to your code on your wiki pages.

I've been trying to write a JavaScript framework in a literate way for a year or so, here's what I've found:

* I end up deleting most of the prose once the code starts getting good. Good code is sort of literate already.

* As others have said, when you're doing some code churn, it's difficult to maintain a narrative structure that makes sense.

* Existing frameworks, at least for web programming, encourage you do draw boundaries around individual homogenous chunks of one type of code (router, view, migration) rather than human conceptual concerns (upload a file, browse my photos, etc). In order to do a good job explaining what a view does, you need to know what the controller is providing. In order to understand that you need to understand what the domain layer is doing. Frameworks just make it really hard to put, in one file, everything necessary to explain a concern.

I still believe in the idea, but I think for literate programming to work well it has to be done in an ecosystem where the APIs are all structured for literate programming, which doesn't really exist (yet).

Code itself does not tell the story, does not give you any background or any non-formalised constraints. There is absolutely no way code can replace a proper literate prose.

Programmers often say comments are for describing algorithms -- especially hairy ones, but it's a straw man: The code does what it does, and the comment doesn't enter into it. If anything the comment can confuse because programmers don't usually read code, but can find it easy to agree with the comment so they move on.

I think that the decision to bring code into this world is not one to be taken likely, and a literate program is simply the story of why interwoven with the implementation of what: Where did -2 come from? When I ifdef for a FreeBSD bug, how will I know if the bug is fixed (and the code can be removed/changed?) And so on.

Literate programming isn't the only "solution" to this problem. Making programs really big by duplicating efforts, and working around workarounds, using Java seems to "work" (for strange values of work), so perhaps this is a blub thing? If your code is "self documenting" then you don't know what documentation is.

That is to say: Literate programming didn't catch on because it was dismissed by the larger number of programmers who could get "things done" who didn't understand literate programming.

the code already explains in all details _what_ it does. the comment should just tell you the _why_.

TeX benefited from literate programming because it was written in a higher level assembly language.

(Also, it was written by someone who liked to write books, and if a book was about software, he wanted to integrate the writing of the book and the software.)

Better than literate programming is to write code that explains itself. Don't write two things, one of which executes and the other explains it; write an explanation which also executes.

In other words, don't desperately separate and literate; instead, lisperate!

Knuth wrote TeX in Pascal.

Are you talking about MMIX as "higher level assembly language"? MMIX is the (a?) programming language used in TAOCP. For Knuth it was a purely theoretic thing. Other people have implemented MMIX by now.

> Don't write two things, one of which executes and the other explains it; write an explanation which also executes.

I can see how your code can explain the `what' and `how'. How are you going to answer `why' and even more important `why not'?

By `why not', I mean, a short description of why some reasonable alternative choices were not made. Eg why we use algorithm A and not B and not C.

If the choice is testable, you can write a test case for it. B and C don't work because they break the test case; A is required.

If the choice isn't testable in any way, it's not worth commenting on.

In general, in many programs we can find algorithms that could be replaced by better ones (such that no test cases break). There is no need to explain why that is the case; all the various possible reasons for that are obvious, and it doesn't matter which ones of them are true.

> If the choice isn't testable in any way, it's not worth commenting on.

Choices about architecture are hard to test in a unit-test sense, but are worth commenting on.

No, code cannot explain itself. No code will ever tell you why this algorithm was chosen, what are the unobvious consequences of this choice, and which papers should be cited.

Interesting to see that people cite code churn as the main reason for _not_ adopting LP. I've argued in the past that LP is especially useful in highly volatile environments:


Tl;DR: If the environment is so volatile that the code is permanently broken reverse engineering doesn't work well. In such case the documentation may be your only recourse.

This question was also brought up on reddit a while ago: https://www.reddit.com/r/compsci/comments/1zrujz/literate_pr...

My pet theory is that programmers are driven by a constant feeling of obsolescence: Most implementations are incredibly short lived so that it seems futile to optimize the code for understanding (literate programming), when much less (programming experience, comments and help by people who know the code base) is good enough. Especially the last point in parentheses is crucial: Asking someone who knows the code base is likely a more efficient superset to text, because asking can be selective, nuanced and individual while a text just stares back at you.

My previous employer (a subdivision of a global top ten defence company) used literate programming.

The project I worked on was a decade-long piece for a consortium of defence departments from various countries. We wrote in objective-C, targeting Windows and Linux. All code was written in a noweb-style markup, such that a top level of a code section would look something like this:

    <<Initialise hardware>>
    <<Establish networking>>
and so on, and each of those variously break out into smaller chunks

    <<Fetch next data packet>>
    <<Decode data packet>>
    <<Store information from data packet>>
    <<Create new message based on new information>>
The layout of the chunks often ended up matching functions in the source code and other such code constructs, but that wasn't by design; the intention of the chunks was to tell a sensible story of design for the human to understand. Some groups of chunks would get commentary, discussing at a high level the design that they were meeting.

Ultimately, the actual code of a bottom-level chunk would be written with accompanying text commentary. Commentary, though, not like the kind of comments you put inside the code. These were sections of proper prose going above each chunk (at the bottom level, chunks were pretty small and modular). They would be more a discussion of the purpose of this section of the code, with some design (and sometimes diagrams) bundled with it. When the text was munged, a beautiful pdf document containing all the code and all the commentary laid out in a sensible order was created for humans to read, and the source code was also created for the compiler to eat. The only time anyone looked directly at the source code was to check that the munging was working properly, and when debugging; there was no point working directly on a source code file, of course, because the next time you munged the literate text the source code would be newly written from that.

It worked. It worked well. But it demanded discipline. Code reviews were essential (and mandatory), but every code review was thus as much a design review as a code review, and the text and diagrams were being reviewed as much as the design; it wasn't enough to just write good code - the text had to make it easy for someone fresh to it to understand the design and layout of the code.

The chunks helped a lot. If you had a chunk you'd called <<Initialise hardware>>, that's all you'd put in it. There was no sneaking not-quite-relevant code in. The top-level design was easy to see in how the chunks were laid out. If you found that you couldn't quite fit what was needed into something, the design needed revisiting.

It forced us to keep things clean, modular and simple. It meant doing everything took longer the first time, but at the point of actually writing the code, the coder had a really good picture of exactly what it had to do and exactly where it fitted in to the grander scheme. There was little revisiting or rewriting, and usually the first version written was the last version written. It also made debugging a lot easier.

Over the four years I was working there, we made a number of deliveries to the customers for testing and integration, and as I recall they never found a single bug (which is not to say it was bug free, but they never did anything with it that we hadn't planned for and tested). The testing was likewise very solid and very thorough (tests were rightly based on the requirements and the interfaces as designed), but I like to think that the literate programming style enforced a high quality of code (and it certainly meant that the code did meet the design, which did meet the requirements).

Of course, we did have the massive advantage that the requirements were set clearly, in advance, and if they changed it was slowly and with plenty of warning. If you've not worked with requirements like that, you might be surprised just how solid you can make the code when you know before touching the keyboard for the first time exactly what the finished product is meant to do.

Why don't I see it elsewhere? I suspect lots of people have simply never considered coding in a literate style - never knew it existed.

If forces a change to how a lot of people code. Big design, up front. Many projects, especially small projects (by which I mean less than a year from initial ideas to having something in the hands of customers) in which the final product simply isn't known in advance (and thus any design is expected to change, a lot, quickly) are probably not suited - the extra drag literate programming would put on it would lengthen the time of iterative periods.

It required a lot of discipline, at lots of levels. It goes against the still popular narrative of some genius coder banging out something as fast as he can think it. Every change beyond the trivial has to be reviewed, and reviewed properly. All our reviews were done on the printed PDFs, marked up with pen. Front sheets stapled to them, listing code comments which the coder either dealt with or, in discussion, they agreed with the reviewer that the comment would be withdrawn. A really good days' work might be a half-dozen code reviews for some other coders, and touching your own keyboard only to print out the PDFs. Programmers who gathered a reputation for doing really good thorough reviews with good comments and the ability to critique people's code without offending anyone's precious sensibilities (we've all met them; people who seem to lose their sense of objectivity completely when it comes to their own code) were in demand, and it was a valued and recognised skill (being an ace at code reviews should be something we all want to put on our CVs, but I suspect a lot of employers basically never see it there) - I have definitely worked in some places in which, if a coder isn't typing, they're seen as not working, so management would have to be properly on board. I don't think literate programming is incompatible with the original agile manifesto, but I think it wouldn't survive in what that seems to have turned into.

First, thank you for taking the time to write that all out, that has been the best summary of literate programming Ive read.

It sounds very much like an intellectual over experiential persuit. Your code was scripted (like a play) with stage blocking directions, sets, costumes, and musical accompanyment whereas the way I have most often worked has felt more like improv: get the right people in the room and their talents and interactions will produce what we need as we move forward.

For both methods it seems like it still requires a HIGH level of skill to produce something great. My question is: does Literate programming and all its associated documentation give a process that prevents bad programming from happening?

In the 'improv' way of making it up as you go along, if you have low skill you output low value. What would a low skill programmer thrust into a Literate programming product result in?

Thanks so much for this detailed response. I played with literate programming for a bit and I had a similar experience: lots of extra overhead to get things properly documented. Unfortunately, (as nostrademons put it in an earlier comment) if we tried this at our web shop our competition would eat our lunch.

Thanks for the detailed response!

It did catch on, just in a different form. Our web shop is fairly typical of that. We do this with a combination of the Tomdoc (tomdoc.org) documentation specification plus Rubocop (github.com/bbatsov/rubocop) and peer review for enforcement. We use rubocop to ensure that methods are short and simple, and every such method that is public is accompanied by a tomdoc section. The result is that there are about twice the number of lines in our code file that are targeted at humans than computers. We generate separate documentation by extracting the tomdocs that has proven useful in explaining the code to the uninitiated.

However, to the initiated it's mostly in the way. With short, simple methods I find it easier to decode unfamiliar code by reading the methods than the by reading the tomdocs. Like many of my cow-orkers I've configured my editor to hide the tomdocs in normal use. Writing tomdocs is a chore to be done immediately before delivering code for peer review.

That's probably the biggest weakness of literate programming: it isn't for the programmer responsible for that code, or even for that programmer a year later when she has forgotten much of it. It's more for the developer or designer that isn't fluent in the language, but still has to interact with it. Since the value to the actual coder is distant and indirect, while the work of producing it is immediate, it tends to be an early omission under any kind of stress. In our case it would disappear if not enforce by peer review.

Literate programming tries to make every line of code traceable to English (or some other natural language). It's as hard as writing the same program twice in two languages. Perhaps harder: one is for programming a modified lump of sand. Another is for programming humans. The latter is a lot harder to do well than the first.

Then the question becomes: which one is correct? Maybe it's just easier and cheaper to write in the language the lump of sand understands and be done with it.

You don't get it Nick - that lump of sand doesn't understand your code at all.

It only understands the machine code emitted by the run-time engine (assuming Java/Python etc here).

The language you write in has been developed, at enormous expense, to allow you to express your logic, in a way that you, a human, can understand.

If it were not necessary for you (or other developers) to understand the code, then high level languages would not be needed.

In short, you have got it exactly barse-ackwards: the programming language you use is for humans - and only for humans.

The only reason it is apparently "hard" has nothing to do with computers, and everything to with the inescapable fact that correct, consistent and reproducible logic patterns are hard.

I respect Donald Knuth very highly, but in that literate programming thing he is wrong - it is the code itself that must be clear and readable. Accuracy comes second, efficiency is third and 'elegance' is dead last. IMHO.

I think you make a few too many assumptions about what I know. Why do you assume that because I said that the programming language is actionable to a modified lump of sand that I believe isn't also communicating to the programmer? Its first goal is the latter. But until English can be executed by a computer, you must use programming languages to get the lump of sand to do stuff.

> Maybe it's just easier and cheaper

Easier, yes. Cheaper, no - with LP learning a new codebase takes much less time than usual.

As someone else noted programmers tend to be less than proficient writers (to say the least). That's unfortunate. Someone else mentioned that he sees value in LP in the context of education. That's because LP does make understanding the code both easier and faster. If it's good in education, why would it be bad in "normal" programming?

Instead of declaring LP as "too hard for normal people to use" we should try to teach people to be better writers. It's not necessarily one or the other, you can easily learn how to write prose (LP texts) along learning how to write code.

The only real problem with LP (besides lack of relevant skills in programmers) is ensuring that comments and code are in sync. But, how is it worse than the situation with documentation we have now? If anything LP makes it easier to keep docs and code in sync: they are in a single place and you can alter both simultaneously.

Of course, LP is a fuzzy concept. For example I call a certain style of writing comments LP, while others say that you need to have a tool like CWEB and the like to do LP. What's important is the push towards making the code more understandable by humans. There are different techniques to this end and LP is one of the more powerful among them.

> Easier, yes. Cheaper, no - with LP learning a new codebase takes much less time than usual.

Unless the program is different than LP work. Then, which is correct? The LP work or the program? It's been tried many times. It's been called the Rational Unified Process. It's been called Roundtrip Engineering. Not worth the extra cost.

There are two parts to literate programming. One is the style of commenting and structuring the code, which makes it easy for humans to follow. The other part is the tooling you use when you want to write literate programs, which includes what syntax you use for defining blocks and how do you tangle/weave your code.

There are many tools which support Literate Programming for many different languages. The usual reservation about additional tools applies: each member of a team needs to have it installed, build times get longer, time for a new programmer to start contributing gets longer and so on. It makes many people never even consider LP.

But, it's important to remember, that the tools you use are just an implementation detail. What's important is an idea that source code should be written to be read by humans, not only for computers to execute.

Sometimes there's a need to go over a piece of code with a colleague who doesn't know it. It happens a lot when a new programmer joins a project that has some code written already. This is because to understand the code you need to know the context it was written in. The problem is that programmers often don't document this context enough (or even not at all). This means that reading the code is essentially a reverse engineering someone's thought patterns. It's much more efficient just to sit next to the person and assist him in reading the code by providing a live, context-aware commentary.

LP is "just" that commentary written directly in the code. What's important is that you don't need any special tools to use this style in your comment if your language is flexible enough. Most dynamic languages nowadays are capable of supporting LP style. Don't take my word for it, see for yourself. You can just go and read some LP code. A couple of examples:

As you can see it's a plain JS (or CS). The pages were created with "docco" tool, but you can read the source with most of the same benefits (excluding rendering of maths symbols, which docco doesn't support anyway).

To sum up: LP is not dead, it just changed its form a little. Many people adopted (or invented independently) this style of code structuring and commenting. Many real-life codebases are written in a literate style because it makes a real difference on how long it will take for someone new to grok the code. Such codebases use docstrings and other mechanisms available in the language to do exactly the same thing that previous LP implementations did.

IMHO the examples you cite, such as the underscore-example has very little to do with any meaningful definition of "Literate Programming".

While modern languages like JavaScript (or lisp ;-) allow for a very free structure, and ordering of the code in accordance with how comments might fit, or the thought process that went into the design -- that's just proper programming style: proper comments, sane structure.

And I'm not convinced it does much to make sure the narrative keeps up with the code, as the code changes.

The meta-programming that (no)web enables for C/Pascal can be very helpful, as one might have code structured around loops (an algorithm that deals with an iterator, i, and an Array, a) -- code that might see re-use in places where the functionality doesn't lend itself (in a C/Pascal) to re-use in a function/procedure/module. But as in-lining has become more and more main stream, and micro-optimizations have become less and less needed -- this kind of meta-programming -- introduction of named code blocks to languages that otherwise don't have them -- has become less interesting.

I think a better example of "evolved" or "modern" LP is python doctests[1]. It's just a small example, but it is a way to tie commentary tightly with function, and allows for "Here's what this code does"-comment -- that can actually be used for testing.

But even doctests are a pale shadow of what LP offers. On the other hand -- some of the benefits in terms of structure/folding is also provided by powerful editors/IDEs.

I still think LP can have it's place, especially when writing code meant to be read as a teaching tool, like:


LP often lends itself well to a top-down structure, regardless of any limitations in the underlying programming language (or common concessions to efficiency, manual in-lining etc). To a certain extent, one could/can/sometimes-should write C programs like:





But frequently it doesn't make sense to keep the actual sources quite like that -- if nothing else because there might be platform ifdefs, various libraries in use etc.

[1] https://docs.python.org/2/library/doctest.html

I always liked the idea, but it seemed too indirect to me. Software is hard enough as it is, without adding yet another hurdle to get from brain to .exe. IDE's are probably the best middle ground, as they "know" enough about your code to help you find the parts you want.

Besides, literate seems to go against the current view of overcommenting as an anti-pattern.

Software is hard enough as it is, without adding yet another hurdle to get from brain to .exe

In my experience, it ultimately made it easier (see my big comment somewhere else here). It wasn't an obstacle to getting it done; it forced me to do what I really always knew I should - think about the design, make sure it made sense.

Your comment also suggests that you might have missed a key point; it's not for you. It's for other people (and also yourself in the future - I have certainly come back to my own code a year later and wondered what the hell I was doing). It's to help them understand what you've done; a small amount of extra burden on you now to remove a large burden on other people in the future.

It made the design modular and easy to follow, without having to read the code to work out what each section was doing. When I wanted to understand someone's design, I read the design bits. When I wanted to dig into the actual code doing something, I followed the chunks down from the design level into the exact section of code meeting that part of the design.

> Your comment also suggests that you might have missed a key point; it's not for you. It's for other people (and also yourself in the future

Most of the code you write over the years will be thrown away, sometimes before it's even released. So who is that text or literate comment for? :)

Most of the code you write over the years will be thrown away

Ignoring the final truth that of course everything anyone ever makes will one day be discarded, that's not the case in some industries.

Additionally, we had inspections. Every so often, the customer (or rather, the alliance of customers) would send someone. He would pick a handful of requirements, and would then ask to see the complete chain from there; the breakdown of requirements from their level to the more functional level, the design that purported to meet those requirements, the code that implemented that design, the tests of that code, and then the tests of the original top-level requirements. We would literally get the relevant envelopes from QA, open them up and give him the paperwork. Everything signed, stamped, cover-sheeted and ready. In some industries, this kind of traceability is required and if you don't do it properly up front, trying to rebuild that trace afterwards is extraordinarily expensive.

See my other big comment here somewhere; in that case, the majority of the code was written once, delivered, and will be in use for a few decades. It was generally written correctly at the first iteration; the "literate" bit, which was the discussion of design etc., generally didn't change, so even when bugs had to be fixed, only the code changed to more closely meet that design (i.e. take out the bug) - the design was still the same, so the literate bit was still valid.

It is common in some styles of programming (and/or some industries) to effectively plan in advance that you'll write a dozen bad versions that don't work very well and only then will you write it correctly, or that you don't quite know what you're making (be it because the requirements just aren't available properly, or because you're not building to set requirements but just trying to make something that might sell) so you'll just make something and keep bolting bits on as you think of new things, but there are other styles where the aim is to get it right first time.

Besides, literate seems to go against the current view of overcommenting as an anti-pattern.

I don't have a dog in this fight one way or the other, but something about this statement irks me. I can't help but think "so what?". That is to say, Literate Programming either is a good idea, or it isn't. The answer to that question is orthogonal to "the current view" of whatever. Communities have had majority held opinions which were wrong on plenty of occasions. If one is going to question whether or not to use LP, shouldn't they analyze it on it's own, and not simply accept pre-existing biases?

The question was "why didn't?" This was my speculative guess.

It seems plausible to me (total guess again) that people who consciously minimize comments would not be inclined to literate programming, since if you're consciously minimizing comments, what's left to literate anyway? My guess.

And no, I don't have a dog in there either, not the least because I've never seen anyone use literate programming in the same building I was in.

The question was "why didn't?" This was my speculative guess.

Aaah, sorry, guess I thought you were saying something that was more of an actual value judgment.

And no, I don't have a dog in there either, not the least because I've never seen anyone use literate programming in the same building I was in.

Yeah, same here. I got as far as buying the "Literate Programming" book, and I have it in a pile of books around here somewhere, waiting to be read. But that's as far as I've ever gone with the whole deal.

Whenever I'm on a team and I get the opportunity to do code reviews, I strongly encourage it to reduce the Bus Factor

Just being pedantic here, but the Bus Factor is something that a benevolent manager would like to increase, right? :)

Haha :)

I'm slowly and intermittently reading rands's book Managing Humans.

The only large project using literate programming that I am aware of is Axiom, a symbolic math system written in Lisp. From their documentation it sounds like it uses literate programming and would be a good resource.

However, a modern version of literate programming is catching on in scientific fields based on Jupyter (IPython) Notebooks. It allows running simulations and embedding results. It's fantastic for exploratory work. The main downside is transitioning code from notebooks to longer term libraries or full applications can be somewhat tricky. Here's a good write up of notebooks and science: http://peterwittek.com/reproducible-research-literate-progra...

The best example I have seen out in the wild is the code for the renderer pbrt. In fact it has me contemplating whether I should attempt something similar for my code base.

I find that literate programming is something of a code smell. If a program is so complicated that it requires that much commenting, something went wrong during the design process. The program should be the most concise and clear description of what is going on. That is the point of writing a program, to describe the problem to other developers.

This is just me, but when I read through the literate programming book, and the hundred or so pages of literate code resulted in a program that could be replaced with a line of bash script, I decided that writing the program succinctly was more important than writing a novel to accompany it.

Uhm, I write literate code exactly because my code is very compact so I can afford to dilute it with an equal amount of prose. And, yes, I still prefer to print the whole thing and work with a paper rather than in any IDE.

> I write literate code exactly because my code is very compact so I can afford to dilute it with an equal amount of prose.

That I can get behind. My concern with the example in the book was, it didn't matter how good the prose was, the program would be nowhere near as understandable as the one line of bash, just due to there being 0.1% as much code to understand. And I realize that it's not a meaningful knock against literate programming as a whole, it was a bad example that soured the idea in my head. It made me more concerned about succinct, understandable code, rather than seeing the prose as sufficient.

I've seen a lot of Literate Haskell in tutorials and course materials, but not really ever in production code. It's too cumbersome to work with when comments are the default and every line of code has to start with '>'.

1) It ain't dead yet.

2) Tooling/syntax is a big part of it. I like literate programming, but most syntaxes for it turn me off. Some of it also seems very tied to particular languages. None of that helped for a concept that was always going to be a hard sell.

3) I wrote, and am refining, a version of literate programming: https://github.com/jostylr/literate-programming-lib

It uses markdown where headers are the section headers, code blocks in markdown are the code blocks to sew together, and _"header name" is the block syntax. It has a bunch of other features, but that's the core.

My hope is that this might eventually help this style to catch on in some quarters.

4) I am just a hobbyist programmer, but what I enjoy about it is the ability to organize my code in any fashion I like. In particular, I can keep chunks to a screen size. Brackets never go beyond a page. And I can stub out anything I like real easy.

Now in many programming languages, small chunks are easy to achieve in the terms of functions or objects or modules or even just variables. That is, one of the most important in-the-now useful parts of literate programming is implemented in a good-enough version, particularly with good naming conventions and decent commenting. And good enough is what keeps many from using optimal practices. Or rather, and this is important, optimal from one perspective, e.g., literate-programming can easily rub up against "no premature optimization".

On the other hand, I like not using functions for code management. I want functions to be used when a function is really needed. But that's just my preference. I also like being able to see the whole construct in the compiled code put in one place instead of having to trace it out through function calls. But I have never been big on debugging tools; if I was, this would probably be less of an issue.

5) Updating does tend to be a problem in that with the excitement of a new feature or a bug fix, it is real easy to leave bad documentation there. But that would be true of any documentation. Here at least one can quickly look at the code and see that it seems off.

6) One key feature that I like about my system is the management of multiple files, really, a kind of literate project management. I do not know if the other systems did that. This is a game changer for me. When I open a foreign code base, I have a hard time knowing where to start. In any particular file, I can see what is going on, but getting the global perspective is what is missing. Literate project management can tell you what all these files do, do pre and post compiling options (linting, concatenating, minimizing, testing, dev vs production branching, etc.), and allow a true global organization of where you want the code to be. You can have all code, styling, content for a widget all in one place. Or not. There are no constraints to your organization and that is awesome.

It is also a downside. Constraints are powerful tools and when you have a system that allows you to do anything, it can lead to severe issues, particularly (ironically), of communication. I could see teams benefiting greatly from this if they agree on organizational principles and if not, I can see this exacerbating the conflicts.

7) The hurdle to get over is "Is it going to make it quicker to get my code done?" And I don't think previous tools have done this. I am hoping my tool will do this for the web stack. That stack is a mess and we need organization for it. For other programming languages and tasks, I don't think this is as glaring a need. It often feels a lot like just reversing the comments as literate-coffeescript seems to be. In the hands of a master, such as Knuth, a literate programming is a gem of wonder. But for most, including me, it is a different beast and may not be that pretty.

8) Programmers may have an irrational fear of writing. As a mathematician, I see many adults, even programmers, fear math for no reason whatsoever. The same may be true of prose, at least sufficiently so to discourage the desire to use this. Note that I think they could do it, but that they think they cannot. But I am an optimist.

Not every developer speaks English. Especially on OS projects.

While reading English might be OK, writing is a different story.

there are marble towers and there are dirty trenches. There are only few things that are efficient and matter in the trenches. Putting beautiful marble on the trenches' walls just isn't one of them.

First of all, what do we think is literate programming?

* Is it just interspersing documentations with the code?

* or Is it an idea of writing code in the style of communications?

There is subtle difference. The former is writing code first, then write documentation to explain the code (without changing the code significantly). For the latter, one may still write code, but he is not writing the code just to get computer working; rather he is writing code for the first purpose of communicating (to human).

In the first style, the program, stripping away documentation, is pretty much a working code as is. Yes, in many so called literate programming, the documentation are readily to be compiled into pretty web pages or pdf, but they are just pretty documentation. In the second style, the code is, to large extent, rearranged (to the computer, scrambled) due to the need of expressing ideas to human. Often a complex parser program is needed to re-arrange the code into computer acceptable form -- such is the case of Knuth's WEB.

Maybe I am over guessing, but I think many readers are only thinking in the first style (documentation+code) when they are commenting on literate programming.

And maybe my interpretation does not even fit Knuth's original idea of literate programming, but in my interpretation, the ultimate literate programming is just code, in the style of written language, arranged like a book, and readable by most literate readers (who possess the basic background -- the background knowledge of the problem domain and the background knowledge of how computer runs, in addition to basic set of vocabulary) as is, and with a compiler, can be translated into a formal programming language or machine code directly and run by the computer. I find Knuth's example -- writing two versions of the program (the code and the documentation) -- a compromise due to lack of compiler power and impractical -- who, after spending so much effort get the code written and debugged and barely worked, still have the energy to write article to explain it -- especially with no apparent readers in sight?

EDIT: In a high level view, there is just one code, but two groups of readers/listeners -- the human group and the computer machine group. In the first stage, computers are limited in power and parsers/compilers are dumb, so the code has to cater for the machines, and have the human readers stretch themselves to read the code (in so-called programming language). In the next stage, every day computer is powerful enough and it can take in more background knowledge (common vocabulary, idioms, styles and some common algorithm, scientific facts, international conventions, or individual/group/corporate conventions),and this stage will allow code be written in some sorts of middle ground. Then the final stage is of course with AI that can understand most human communications. I think we are ready to enter the early second stage -- there is capabilities and early signs, with most people's mind is still well stuck in the first stage.

> who, after spending so much effort get the code written and debugged and barely worked, still have the energy to write article to explain it

Professionals who care about long-term success of a project?

Most of what I see in this thread are just excuses, born out of either ignorance or laziness. Or both. The same arguments that were used against adopting higher level languages instead of asm.

The truth is, as sklogic says in this thread, the code won't ever tell you the whole story. You're not writing code in a vacuum, you have Jira tickets, requirements, mockups and so on to help you. Yet people throw these away and leave only the code, as if working with code alone was what they do and what should be done.

I wish programmers stopped being like that. They should just learn how to write ok - not great, just ok - prose and get it over with. It's so frustrating to see programmers reject good ideas because of laziness and dumb stubbornness. I'm starting to believe that it's just the human nature at play here. Makes me hate humans even more than I already do.

It did. It's called culture. You just didn't realize all the ways it programmed your biases and perceptions.


Because it was so successful.

Computer programming, by contrast, is so explicit and obvious. The subtlety is lost. And with it, our humanity ( and ability to be subtly manipulated by forces beyond our comprehension and below our level of conscious perception ! ).

Culture is the original literate programming language. Instruction pointer for the collective mind. Virtual machine of choice ? Books. And your reading of them.


The fewer comments the better. Code is constrained by the programming language, since it must be executable. But there are no such constraints on prose. While Knuth may adorn his code with extremely illuminating prose, unfortunately the same is not true for many other programmers.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact