
Literate programming: presenting code in human order - cab1729
http://www.johndcook.com/blog/2016/07/06/literate-programming-presenting-code-in-human-order/#.V32uxZvDTmQ.hackernews
======
jarmitage
Peter Norvig's comment hits the nail on the head for me:

" Peter Norvig 6 July 2016 at 11:47 I think the problem with Literate
Programming is that assumes there is a single best order of presentation of
the explanation. I agree that the order imposed by the compiler is not always
best, but different readers have different purposes. You don’t read
documentation like a novel, cover to cover. You read the parts that you need
for the task(s) you want to do now. What would be ideal is a tool to help
construct such paths for each reader, just-in-time; not a tool that makes the
author choose a single pth for all readers."

Has anyone attempted something like this?

I've heard you can do transclusion in org-mode which might be a starting point

Edit: Some initial ideas:

\- Code can be deconstructed into blocks, and where the code is not self-
documenting, prose can be added. Or even visualisations etc if a block is
conceptually tricky to grok. You could even have MOOC-style validations to
verify reader understanding for each block.

\- Some kind of topology of the blocks should be generated based on how they
interact and how they are conceptually related

\- The author creates a few 'starting points' for different audiences, e.g.
'if you've used X before, start at Y'

\- From there next blocks to read are auto-suggested to the user. A map or
network diagram of all blocks is also provided so the reader can chart their
progress and see where the 'big ideas' lie.

Edit 2: This also really reminds me of Bret Victor's 'Humane Representation of
Thought' lecture ([https://vimeo.com/115154289](https://vimeo.com/115154289),
45:40) where he says there is a conflict between code being an engineering
specification and an authored work meant to be read by humans.

~~~
EdwardCoffin
I don't understand why there not being a _single_ best order of presentation
for explanation is grounds to dismiss the whole endeavour. I bet that there
are many _pretty good_ orders for humans that are an improvement over the
order the compiler wants the stuff in. Besides, the compiler doesn't demand a
single best order, most programs could be re-ordered in a number of ways
acceptable to the compiler (my point being that this criticism seems to hold
literate programming to a standard that the incumbent of compiler-required
ordering is not held to.)

Edit: added the parenthetical comment

~~~
jarmitage
Yeah I agree. What I would imagine a tool like this would do is help to find
and navigate the pretty good order(s) whether they have been hand crafted at
first by the author or not.

A comparison would be how desire paths form and are often turned into real
paths.

I like your parenthetical observation too!

~~~
EdwardCoffin
There was an interview with Guy Steele in the book _Coders at Work_ , and in
one bit he had something to say about actually referring to the (literate)
source code for TeX [1]:

 _Sometimes I 've got a specific goal because I'm trying to solve a problem.
There have been exactly two times, I think, that I was not able to fix a bug
in my TeX macros by reading_ The TeXbook _and it was necessary to go ahead and
read_ TeX: The Program _to find out exactly how a feature worked. In each case
I was able to find my answer in 15 minutes because_ TeX: The Program _is so
well documented and cross-referenced. That, in itself, is an eye-opener--the
fact that a program can be so organized and so documented, so indexed, that
you can find something quickly._

 _The other thing I learned from it is how a master programmer organizes data
structures, how he organizes the code so as to make it easier to read. Knuth
carefully laid out_ TeX: The Program _so you could almost read it as a novel
or something. You could read it in a linear pass. You 'd probably want to do
some jumping back and forth as you encountered various things. Of course, it
was an enormous amount of work on his part, which is why very few programs
have been done that way._

[1] link to the page (337) in Google Books
[https://books.google.ca/books?id=2kMIqdfyT8kC&pg=PA337](https://books.google.ca/books?id=2kMIqdfyT8kC&pg=PA337)

~~~
jarmitage
That's a great anecdote and makes we want to read TeX: The Program.

I really think there's something in the 'jumping back and forth' of the reader
that could be leveraged, like if there was a way of recording, distributing,
aggregating, analysing it etc.

~~~
EdwardCoffin
It doesn't add that much to the above, but the now defunct Bookpool had a
column where they asked various authors for lists of their favourite 10
computer books, and they asked Guy Steele [1]. His first entry was:

 _Computers & Typesetting, Volumes A-E Boxed Set by Donald E. Knuth -- I'll
read anything that Don Knuth writes, and he has written quite a bit, including
Surreal Numbers and 3:16 Bible Texts Illuminated as well as the famous Art of
Computer Programming series. But this five-volume set is my favorite. The two
volumes titled (and sold separately) as The TeXbook and The METAFONT book are
well-known, but what I really recommend to you are TeX: The Program and
METAFONT: The Program, because these are simply the best-written, best-
documented, best-debugged programs of their size ever published. They reward
careful study._

I've tried to read the source for _TeX: The Program_ (I actually have the
boxed set he mentions), but was impeded by my lack of knowledge of TeX. I keep
meaning to get back to it.

[1] Guy's list in archive.org:
[http://web.archive.org/web/20080422183814/http://www.bookpoo...](http://web.archive.org/web/20080422183814/http://www.bookpool.com/ct/184)

------
midgetjones
Literate programming is a thing that's long piqued my interest, but it usually
seems more suited to academic or personal use.

Having said that, a beautiful example of LP in the wild is geom[0], a
Clojure/Clojurescript library written entirely in org-mode. Browsing through
the source, I have to keep reminding myself that I'm looking at the code and
not just the documentation.

[0] [https://github.com/thi-ng/geom](https://github.com/thi-ng/geom)

~~~
vog
I see that this is a honorable goal, but in this particular project the source
looks more like source than documentation. An extreme example is this part,
where the code even contains function docstrings, rather than having the
docstring as part of the surrounding document:

[https://github.com/thi-ng/geom/blob/master/geom-
core/src/mat...](https://github.com/thi-ng/geom/blob/master/geom-
core/src/matrix.org#view-matrix-generation)

This is an effect which I observed with my own literate programming attempts,
too: They quickly end up with large documents that contain lots of code
listings and not much explaination.

~~~
toxmeister
Author here: Agreed & most parts of this particular project aren't really the
best example. With other (newer) projects I embraced the LP style much more,
and in fact by now really often first write the prose parts to get my head
clear about a certain aspect and only then start writing code, e.g.

[https://github.com/thi-ng/fabric/blob/master/fabric-
facts/sr...](https://github.com/thi-ng/fabric/blob/master/fabric-
facts/src/dsl.org) [https://github.com/thi-
ng/geom/blob/develop/src/viz/core.org](https://github.com/thi-
ng/geom/blob/develop/src/viz/core.org)

As for the docstrings included in the core, this is largely to address
complains from other users, who are used to have docs available in the REPL
and I often don't have the resources/time to provide both org-mode and
docstrings...

------
matt4077
It would help if editors stopped treating comments as second-class content.
Most atom themes, for example, render it as I would render unused code (if I
were to commit such atrocities) – grey on (light|dark) grey.

I wish there were a functioning plugin to render markdown inline. I bet people
would start caring about comments just because they looked better.

~~~
akkartik
One thing I've been doing for a few years now is minimizing syntax
highlighting for code, and using some of the colors I save to have different
kinds of comments with different colors:
[http://i.imgur.com/vU783Xo.png](http://i.imgur.com/vU783Xo.png)

Here's the initial idea that led to this:
[http://akkartik.name/post/2012-11-24-18-10-36-soc](http://akkartik.name/post/2012-11-24-18-10-36-soc).
As you can see, it was in turn spawned by.. a comment on HN :)

------
jkot
Language or "human order" is pretty cumbersome to express ideas. It is full of
double meanings, assumption and expectations.

Any attempt to express complicated and precise ideas in natural language
results in hard to understand gibberish. And you will need special training to
decipher it anyway. Law is good example.

I think programming languages as we have them are good for instructing
computers.

~~~
bluejekyll
Programming languages were designed for people, compilers are designed to
interpret that language into instructions for the computer.

Natural language is horrible for a programming language, but don't forget that
the reason we write code in higher level languages is specifically to make it
easier for humans, not for computers.

------
jashmenn
My latest attempt at this problem is a tool I call `cq` [1]. It's a way to
query code with a selector, rather than line numbers. I use it for blog posts
to pull code chunks into markdown.

It lets you write a (static) document that mixes prose and code in a way thats
best for the reader. But 1. the code is runnable on disk and 2. you don't have
to copy and paste.

Generally I find that literate programming solutions don't have good support
for hiding portions of the code. This tends to limit the tool's applicability
to trivial code examples.

I use `cq` to extract key code blocks from complete projects and then provide
the whole project alongside the post.

[1]: [https://github.com/fullstackio/cq](https://github.com/fullstackio/cq)

------
js8
It seems to me that well-documented code should follow some sort of power law:

Each line should be documented (best if self-documented).

Each function (of around 10 lines) should be documented.

Each group of functions (of around 100 lines total) - approximately a class -
should be documented.

Each module (of around 1000 lines) should be documented.

.. etc. up to the final documentation about the whole program.

Each level of documentation should summarize the purpose, inputs, outputs,
assumptions and architecture of the thing being described. So a higher level
should be around 10x smaller part of the total documentation (of course, some
other factor could be used).

------
EdwardCoffin
There was a fantastic set of panel discussions at MIT on dynamic languages in
2001. In the Q&A period of the panel on runtime [1] Guy Steele asked the panel
what they thought about literate programming [2], and some good discussion
ensued. Some points made that I particularly liked were the point that people
generally don't even make the effort to keep comments up-to-date, much less
writing them well in the first place, never mind thinking about a good order
for the human to read them in.

[1] [https://www.youtube.com/watch?v=4LG-
RtcSYUQ](https://www.youtube.com/watch?v=4LG-RtcSYUQ)

[2] [https://www.youtube.com/watch?v=4LG-
RtcSYUQ#t=1h21m43s](https://www.youtube.com/watch?v=4LG-RtcSYUQ#t=1h21m43s)
(discussion lasts about five minutes, participants include David Moon, Scott
McKay, and Guy Steele)

------
kozikow
Org mode is very good, if not the best, way to do literate programming. My
blog post about it: [https://kozikow.com/2016/05/21/very-powerful-data-
analysis-e...](https://kozikow.com/2016/05/21/very-powerful-data-analysis-
environment-org-mode-with-ob-ipython/). I use it at work for data analysis and
"researchy" projects.

Many of my blog posts at kozikow.com are written in the literate programming
style in org mode, and you can see the org files at
[https://github.com/kozikow/kozikow-blog](https://github.com/kozikow/kozikow-
blog).

People have written whole books in it:
[https://github.com/jkitchin/pycse](https://github.com/jkitchin/pycse).

------
bbotond
The first comment, written by none other than Peter Norvig, is also worth
reading.

EDIT: link - [http://www.johndcook.com/blog/2016/07/06/literate-
programmin...](http://www.johndcook.com/blog/2016/07/06/literate-programming-
presenting-code-in-human-order/#comment-871292)

~~~
efaref
His comment sums up exactly my thoughts wrt literate programming. It sounds
like a nice idea, but in practice it falls far short.

The worst program I've ever had to maintain was written in funnelweb. It was a
large, complicated program, so the funnelweb-generated documentation was tens
of thousands of pages long. Nobody was ever going to read that. Trying to find
out what you needed to modify to fix a bug was impossible from that document.
Most developers I know looked at the woven C code (after all, that's where the
line numbers in the stack trace pointed to), and worked from there.

------
kazinator
We have tools that let you examine code from various angles. There is no need
to encode some "human order" in the code itself. Code should be organized in
the way that harmonizes with the module structure and promotes
maintainability, without regard for some "human order" nonsense.

Who decides what is "human order"? Humans want things in different order for
different purposes. For example, business software generates all sorts of
reports of different kinds from the same data.

This is wrong:

> _Traditional source code, no matter how heavily commented, is presented in
> the order dictated by the compiler._

This is only true of one-pass, strict definition-before-use, single-module-
only programming languages, like toy versions of Pascal.

In any decent language, we can take exactly the same program, and permute the
order of its elements with a great deal of liberty, and present them in that
order to the compiler. We can decide which functions go into which modules,
and we can have those functions in different orders regardless of what calls
what.

Ah, but some of the more crazy proponents of literate programming are not
satisfied with that granularity. It bothers them that the individual
statements of a program that are to be executed in sequence have to be
presented to the compiler in that sequence: S1 ; S2.

Knuth's outlandish version, in particular, stretches the meaning of "literate"
by turning code into a dog's breakfast in which functions are chopped into
blocks. For compilation, these blocks are re-assembled by the "literate"
processor into functions.

The result is difficult to understand. Yes, the nice explanations and
presentation order may present something which makes sense. But here is the
rub: _I don 't just want to follow the presentation of a program, I want to
understand it for myself and convince myself that it is correct._ For that, I
need to ignore all the text, which, for all I know, only expresses the
author's wishful belief about the code.

------
freekh
I understood the concept of literate programming of a way to organize _source
code_ in such a way that it 'flows' better for a human, i.e. it starts with
the methods/functions/modules/blocks where you naturally would start if you
were to explain your code and then on to the next method/function/module/block
that you would explain. Perhaps some comments here and there to explain where
to start and what are special points of interest. I should have read the
wikipedia article I suppose where it seems to be actual prose... :) Now I am
not sure I like the idea - always found well-written source code to be easier
maintainable and understandable than prose...

~~~
eru
Look at Knuth's Adventure example:
[http://www.literateprogramming.com/adventure.pdf](http://www.literateprogramming.com/adventure.pdf)

Plenty of code with your prose there.

~~~
efaref
I hero-worship Knuth as much as anyone, but that document is frankly terrible.

Take a look at section 6. There's a flowery sentence that says no more than
"Add word to vocabulary" (completely pointless if this were a method called
"add_word" on a class named "Vocabulary") and then a dense chunk of completely
unexplained code.

~~~
habitue
I think the issue is treating Knuth's literate programming examples as
received wisdom, or the pinnacle of the form, rather than as a pioneering
effort by someone very talented, but done without the benefit of a developed
culture around the practice. He makes mistakes we have memes now for (e.g.
"explain why not how", useless comments for imports). I think if literate
programming takes off, we'll see the form surpassing Knuth quickly

------
amelius
The problem with most programming languages is that programmers write programs
in it by incrementally adding to the code in random places.

I think we need a programming language that allows programmers to build
programs by merely adding things to the bottom of the source file.

~~~
krallja
That's a REPL which can save its state to a file :)

------
auggierose
The best way to document a program is to prove its correctness. Short of that,
it is usually no fun to deal with other peoples (or your own old ones)
programs anyway.

As for presenting things in the right order: This order would largely be
imposed by a correctness proof. As there could be different proofs, there
could be different orders.

Obviously, not all programs can be proven correct (but I think a very large
part of all programs could be). Also, finding the right order is not as
difficult as finding the right level of abstraction. Once you have the right
level of abstraction for presenting something, everything else flows from
that.

------
EliRivers
I wrote this before, a previous time the subject came up. people seemed to
find it useful then, so maybe they will now as well:

A previous employer (a subdivision of a global top ten defence company) used
literate programming.

The project I worked on was a decade-long piece for a consortium of defence
departments from various countries. We wrote in objective-C, targeting Windows
and Linux. All code was written in a noweb-style markup, such that a top level
of a code section would look something like this:

    
    
        <<Initialise hardware>>
        <<Establish networking>>
    

and so on, and each of those variously break out into smaller chunks

    
    
        <<Fetch next data packet>>
        <<Decode data packet>>
        <<Store information from data packet>>
        <<Create new message based on new information>>
    

The layout of the chunks often ended up matching functions in the source code
and other such code constructs, but that wasn't by design; the intention of
the chunks was to tell a sensible story of design for the human to understand.
Some groups of chunks would get commentary, discussing at a high level the
design that they were meeting.

Ultimately, the actual code of a bottom-level chunk would be written with
accompanying text commentary. Commentary, though, not like the kind of
comments you put inside the code. These were sections of proper prose going
above each chunk (at the bottom level, chunks were pretty small and modular).
They would be more a discussion of the purpose of this section of the code,
with some design (and sometimes diagrams) bundled with it. When the text was
munged, a beautiful pdf document containing all the code and all the
commentary laid out in a sensible order was created for humans to read, and
the source code was also created for the compiler to eat. The only time anyone
looked directly at the source code was to check that the munging was working
properly, and when debugging; there was no point working directly on a source
code file, of course, because the next time you munged the literate text the
source code would be newly written from that.

It worked. It worked well. But it demanded discipline. Code reviews were
essential (and mandatory), but every code review was thus as much a design
review as a code review, and the text and diagrams were being reviewed as much
as the design; it wasn't enough to just write good code - the text had to make
it easy for someone fresh to it to understand the design and layout of the
code.

The chunks helped a lot. If you had a chunk you'd called <<Initialise
hardware>>, that's all you'd put in it. There was no sneaking not-quite-
relevant code in. The top-level design was easy to see in how the chunks were
laid out. If you found that you couldn't quite fit what was needed into
something, the design needed revisiting.

It forced us to keep things clean, modular and simple. It meant doing
everything took longer the first time, but at the point of actually writing
the code, the coder had a really good picture of exactly what it had to do and
exactly where it fitted in to the grander scheme. There was little revisiting
or rewriting, and usually the first version written was the last version
written. It also made debugging a lot easier.

Over the four years I was working there, we made a number of deliveries to the
customers for testing and integration, and as I recall they never found a
single bug (which is not to say it was bug free, but they never did anything
with it that we hadn't planned for and tested). The testing was likewise very
solid and very thorough (tests were rightly based on the requirements and the
interfaces as designed), but I like to think that the literate programming
style enforced a high quality of code (and it certainly meant that the code
did meet the design, which did meet the requirements).

Of course, we did have the massive advantage that the requirements were set
clearly, in advance, and if they changed it was slowly and with plenty of
warning. If you've not worked with requirements like that, you might be
surprised just how solid you can make the code when you know before touching
the keyboard for the first time exactly what the finished product is meant to
do.

Why don't I see it elsewhere? I suspect lots of people have simply never
considered coding in a literate style - never knew it existed.

If forces a change to how a lot of people code. Big design, up front. Many
projects, especially small projects (by which I mean less than a year from
initial ideas to having something in the hands of customers) in which the
final product simply isn't known in advance (and thus any design is expected
to change, a lot, quickly) are probably not suited - the extra drag literate
programming would put on it would lengthen the time of iterative periods.

It required a lot of discipline, at lots of levels. It goes against the still
popular narrative of some genius coder banging out something as fast as he can
think it. Every change beyond the trivial has to be reviewed, and reviewed
properly. All our reviews were done on the printed PDFs, marked up with pen.
Front sheets stapled to them, listing code comments which the coder either
dealt with or, in discussion, they agreed with the reviewer that the comment
would be withdrawn. A really good days' work might be a half-dozen code
reviews for some other coders, and touching your own keyboard only to print
out the PDFs. Programmers who gathered a reputation for doing really good
thorough reviews with good comments and the ability to critique people's code
without offending anyone's precious sensibilities (we've all met them; people
who seem to lose their sense of objectivity completely when it comes to their
own code) were in demand, and it was a valued and recognised skill (being an
ace at code reviews should be something we all want to put on our CVs, but I
suspect a lot of employers basically never see it there) - I have definitely
worked in some places in which, if a coder isn't typing, they're seen as not
working, so management would have to be properly on board. I don't think
literate programming is incompatible with the original agile manifesto, but I
think it wouldn't survive in what that seems to have turned into.

~~~
stormbrew
This seems a lot like cucumber, which is basically a regex expander until it
gets down to real code eventually.

I have found this style very frustrating when doing anything complex that's
also reused a lot. Particularly because the flow of information (expressed in
normal code as variables and arguments) can become very subtle and tightly
coupled when it's buried at the very bottom of an expanded expression. Maybe
this is a solved problem in the system you're taking about, but I have a
really hard time seeing how.

~~~
EliRivers
"This seems a lot like cucumber, which is basically a regex expander until it
gets down to real code eventually."

It's not at all like that. I must have explained it badly. Cucumber is written
for a machine. This was written for humans. Not shown; design discussion,
diagrams, links to requirements, pieces of history, and everything else that
was helpful for humans to understand the software from requirements to design
to implementation.

~~~
stormbrew
Ah, re-reading I think I see where I went wrong. So code is written
separately, but cross linked with these document identifiers?

~~~
EliRivers
Broadly speaking, yes. The programmer might have written something like this:

    
    
        The principal components (message queue, network connector, message handler) are all necessary. They are initialised together, and if any one fails, the program should be aborted.
        
        <<Initialise message queue>>
        <<Initialise network connector>>
        <<Initialise message handler>>
    
        Each initialisation signals success or failure independently, but a global "Error state" object exists; any catastrophic error found will trigger an abort of program with logging.
    
        <<Check catastrophic error state>>
    
        The message queue feeds itself directly from the network. It needs only be made aware of the connector. This was not done during initialisation for reasons x, y, z.
    
        << etc etc >>
    
    

Somewhere else might be a piece that looks like this:

    
    
        The message queue initialisation is very simple; it's using a library standard queue. See meeting minutes 3452/r for further detail.
    
        <<Initialise message queue>>=
        // ACTUAL CODE
    
    
    

The document contains various markup explaining the order of the code, and the
order of the human readable sections; upon munging, the human gets a beautiful
PDF that intersperses code with design discussion in a good order for the
human to read and understand (and consequently review and test, so we've got
ourselves a virtuous cycle here), and the machine gets just the code. For the
machine, each << XXX >> pieces gets replaced with other << YYY >>,
recursively, until it's just code.

Because it's munged, there can be a chapter discussing everything there is to
know about a single class, for example, and a chapter discussing System
Initialisation, or whatever else fits to the needs at hand. If someone wanted
to know about initialising the system, they PDF had the broad top-level
discussion, and a break down of smaller sections. The reader could then dig
into each as deep as they liked, at each stage seeing the design
discussion/diagram, and as much of the code detail as they liked.

It made working on code I'd never seen before an absolute dream. By the time I
reached the actual code, I already knew what it was meant to do and how it
worked.

On the one hand, a lot of extra work, and it does constrain the coding style
you can use; it only makes sense with code that _can_ actually be broken into
smaller, meaningful pieces like that. On the other hand, zero bugs found by
our customer, the prime contractor. QA and testing and everything else was a
massive part, but it all helped. I know the prime contractor was definitely
doing some serious testing, because we watched them sue one of the other
contractors for general incompetence.

------
padator
Regarding the tangle barrier issue, there is a tool that helps:
[https://github.com/aryx/syncweb](https://github.com/aryx/syncweb) It allows
to modify both the original WEB document and code and keep them in sync.

------
kenOfYugen
I am a fan of the literate programming style for personal projects.

For a very well executed and interactive example check out

[http://dave.kinkead.com.au/modelling-the-boundary-
problem/](http://dave.kinkead.com.au/modelling-the-boundary-problem/)

------
twblalock
My main problem with literate programming is that it is difficult enough to
keep short comments up to date vis-a-vis the code they describe. Longer
comments would be even worse.

I also suppose the modern equivalent of this, in many use cases, is a Jupyter
notebook.

~~~
Chris_Newton
_My main problem with literate programming is that it is difficult enough to
keep short comments up to date vis-a-vis the code they describe. Longer
comments would be even worse._

Perhaps counter-intuitively, I have found the opposite to be the case so far.
It’s all too easy to overlook a couple of small comments out of 20 when you
update a screenful of code. You can’t really miss a whole paragraph of text
that separates a few lines of code you’re working on from everything else.

I’ve also found that with literate documentation the emphasis is naturally on
explaining the big picture and why things are being done, and that kind of
information tends to remain relevant through minor code changes anyway.

You can still add short comments about particular points in the code as well,
and maintaining those is much the same whether you’re using literate
programming or not.

------
jackfoxy
The F# Formatting library lets you do things like re-order code and hide less
important code for documentation purposes. You can create clean documents
presenting code as a narrative.
[http://tpetricek.github.io/FSharp.Formatting/literate.html](http://tpetricek.github.io/FSharp.Formatting/literate.html)

------
Chris_Newton
I’ve been using Literate Haskell for a project recently. It’s a heavily
mathematical data-crunching algorithm, in the region of a few thousand lines
of executable code. This is the first time I’ve tried writing literate code in
a major professional project, and it’s been interesting to see how the reality
matched up with my initial expectations.

To me, the biggest practical difference is whether your documentation is
primarily written as a tutorial or for reference on demand. Presenting ideas
in a natural order for tutorial purposes is useful to someone coming to the
code for the first time, or if you’re coming back to look at a module a while
after you first wrote it when all those little details are no longer so
familiar.

In my case, the documentation is generated using LaTeX and so can also include
maths, diagrams, tables and other illustrative material right there next to
the associated code, as well as providing a natural place to put module- or
program-wide summary information to give an overview of how everything fits
together. Like a mathematical paper, it takes a little effort to present all
this extra documentation well. However, it’s hard to overstate how much better
it is if you’re trying to understand some intricate mathematical code that you
wrote three months ago and the actual maths is right there and is then
directly reflected in the shape of the code.

Others have mentioned that literate code might be harder to maintain in the
long run. I’m not sure how realistic that really is, based on my experience so
far. If you’re making code changes significant enough that you’d want to
reorder the whole presentation, you’re probably rewriting significant chunks
of that documentation anyway, and it’s not as if our editing tools can’t cope
with moving code and/or text around.

What does suffer, significantly in my experience, is the _scannability_ of the
code. Those few thousand lines of Haskell I mentioned produce well over 100
pages of typeset documentation at this point. That’s partly because of the
extensive textual notes and mathematics and diagrams and so on. It’s also
partly because typeset documentation is naturally more spaced out because of
things like headings and blank lines. But the fact remains, if I looked at
just the source code in my usual editor, I’d probably have 50–100 lines
visible in a single window, and I can open several of those windows at once on
a big screen. If I’m looking through the literate documentation (or the source
file from which it is generated) then I am probably only seeing one third to
one half of that at most, and crucially, that code only appears a few related
lines at a time, often a single function or a small family of related type
definitions. Since Haskell itself is rather uniform in appearance and tends to
be written by composing very short elements anyway, this makes finding and
understanding individual fragments of code noticeably harder than regular
coding when you want to refer back to something in isolation.

So far, I’m finding that a price worth paying, at least for this kind of
heavily mathematical work with me as the sole developer. In practice, I don’t
actually want to refer to a small code fragment on its own very often. I’m
more likely to come back to a whole module, skim the entire literate
documentation for it (probably just a few pages) to remind myself of how it
all fits together, and then not need to jump around understanding small
individual elements in isolation. Still, there is definitely a cost here, and
it definitely affects how I read and understand the code as I’m working with
it later. I suspect some of that cost would be incurred anyway by using
Haskell, or any other language and programming style that emphasize composing
many small elements, but using literate programming does exaggerate the
effect, and so far I’ve found that to be its biggest drawback over more
conventional styles.

~~~
eru
That's very interesting.

I wonder whether a `show me the code only' view would be useful?

I have written some short literate Haskell pieces, but nothing more than
executable blog posts.

~~~
Chris_Newton
_I wonder whether a `show me the code only ' view would be useful?_

What I personally miss most is the ability to look at code from different
perspectives and navigate it in different ways.

I find writing in a literate style is much more like preparing an academic
paper or formal presentation than like programming as I usually would. There’s
a clear order of doing things, but there’s only _one_ order, a very static
form of presentation and reading. As I mentioned before, I find this can work
quite well when the work actually is heavily mathematical and relies on
careful and systematic construction of the final result, but like working with
math papers, it’s definitely an “acquired taste” and takes some getting used
to.

In contrast, when I’m programming in most other languages, I rapidly navigate
all over the code to follow relationships and definitions and so on. We have
lots of tools to help do that quickly and easily in any modern programmer’s
editor or IDE. We also have lots of ways to display different parts of the
code and visualisations of the relationships between them simultaneously. I
really miss that sort of dynamic, flexible working environment with the
Literate Haskell. Sadly, I’ve yet to discover any programming environment that
supports both the documentation side (essentially a good editor for working
with LaTeX and the related tools) and the programming side (more like an
interactive IDE).

To be fair, this feeling is probably due in part to my own relative
inexperience with Haskell projects on this scale. Although I’ve long been
interested in functional programming and used it for various bits and pieces
over the years, my large-scale, professional projects have generally used more
mainstream languages like C++, Python and JavaScript, and the tools and
environments that go with them. Functional programming has a rather different
feel anyway, and perhaps I just haven’t learned to combine that more
mathematical/functional mindset, literate programming style, and the available
tools as well as I could yet.

In any case, I definitely believe there’s a lot of potential for new tools in
this area, combining the kind of documentation and structured presentation I’m
seeing with the literate code with the kind of dynamic, on-demand exploration
of code that modern IDEs for many other languages offer.

~~~
eru
I've been using Haskell a lot (even professionally for 5 years). None of my
commercial Haskell coding was literate, though. That has been restricted to
smaller explanatory pieces.

We do jump around in the programming editor when doing Haskell. Eg jump to
definition is just as useful as for other languages.

~~~
Chris_Newton
May I ask what you use for editing your Haskell code?

For the literate project I’m using my usual programmer’s editor. It handles
switching between the LaTeX and Haskell reasonably well in terms of syntax
highlighting, but it does lack most IDE-like features, even with any of the
extra packages I’ve tried installing so far.

A tool that provided reasonably reliable go-to-definition functionality would
certainly be a helpful addition, and the kinds of pop-up help you get in IDEs
to keep track of function parameters and their types seem particularly
relevant to a language like Haskell, but I’ve yet to find an environment that
both offers those features and handles the documentation aspects well.

~~~
eru
I've never tried the LaTeX literate Haskell in earnest, only the > kind.

I've been using Emacs, Vim to edit Haskell, and at Standard Chartered Visual
Studio to edit their Haskell dialect.

Go-to-definition and display-type-of-expression-at-cursor can be done in Vim
and Emacs already.

------
rwmj
Is the Knuth book on literate programming any good?

Anyway I thought this article would have been better with a few examples. I'm
still not any closer to understanding how the tangling process works.

~~~
Mikhail_Edoshin
I've read Knuth's article on this and here's my thoughts. It's meant to be
used with a single language (Pascal), which is sometimes good (because the
tool really understands Pascal and can parse these code snippets to produce a
cross-reference), but it's also a curse, because many of the tool's constructs
are actually workarounds for Pascal shortcomings. Also, it only generates a
single file, but today's projects are both multi-file and multi-language. So
it's a bit dated.

I think the idea itself is so simple it doesn't really need any extensive
book. Basically you take any text preparation system and add three elements: a
code fragment, a reference to a code fragment, and a file, which is a fragment
that will be written to disk.

(I myself use such a system based on reStructured Text. I want to post it on
Github when it's ready, but it's not there yet, so I cannot show exactly how
it works. But I use it to write a moderately complex personal project with C
and Python mostly, and all the tooling, e.g. Makefiles. I found it very
helpful; it does make the code much cleaner, that is it strongly urges you to.
And it's invaluable when you want to remember why you did things that way.)

~~~
crististm
Isn't Knuth's CWEB used for C?

~~~
Mikhail_Edoshin
CWEB is for C and the original WEB was for Pascal. (C is a good fit, by the
way, because you can use #line directive to make the compiler to report the
line numbers in your original literate files instead of the irrelevant line
numbers in generated files.) But as far as I remember it still produces a
single C file and a single TeX file.

~~~
e12e
When experimenting with NOWEB and Java (1.3 I believe) I discovered that one
of the things that those LP systems did was encourage something like macro
based code re-use, somewhat "lifting" the rather basic language up towards
something a little more high level. This was good and bad; bad like unhygienic
macros can be bad.

As far as I could figure out it was a poor match for OO/"inheritance oriented"
programming. It was a better match for procedural style programming - "writing
C in java" \- then the LP "macros" didn't confuse things. But when mixing OO-
style java, I found things seemed to become uncomfortably verbose.

NOWEB might in some ways be the worst of both worlds - it knows (and cares)
nothing about the programming language - but that also means blocks can't be
parameterized. This is annoying if you're using variables to hold iterators
and state (traversing a graph stored in arrays as edge/node lists etc). You
end up with dangerous variable name re-use or code duplication (and too big
code blocks).

------
krapht
Literate programming - the original IPython notebook

------
obj-g
"I think I understand better now why literate programming hasn’t gained much
of an audience. I used to think that it was because developers hate writing
prose. That’s part of it."

Right, and that's why it's _so_ difficult to find developer-written blogs and
articles and tutorials online....

~~~
Jtsummers
Paraphrasing Knuth, the intersection of good writers and good programmers is
small. That doesn't stop people like me from making attempts at writing blogs
and articles, but it doesn't make them good. It also doesn't mean that people
won't write one-off articles and blogs, but may not be interested in doing
that essentially daily (what the literate style kind of requires).

~~~
obj-g
Your point is taken, I just take issue with the article perpetuating such a
stereotype. I'm a developer, I have a degree in literature and write a ton.
And the article says nothing about developers not writing _good_ prose, just
not liking to write prose period. The intersection between _anything_ and
_good_ writers is pretty slim, to be honest. The author just supposes that
developers don't like writing prose. Which is silly. At least in my opinion.

