Hacker News new | comments | show | ask | jobs | submit login
Literate Programming (and why Docco isn't enough) (judofyr.net)
33 points by steveklabnik on Jan 10, 2011 | hide | past | web | favorite | 29 comments



I've messed around with my own literate programming system that was a more "true" system (where you have a kind of macro-like approach), and I have to agree: the docco approach is missing the key bonus: organizing your code into chapters and paragraphs and then statements.

Think of it this way, if you're building an API, you should consider usage. So you then generate examples. But not just one, maybe 3-5 of them to really do it right.

Then, as you proceed, you implement and build tests, probably down towards the end of "the chapter".

It's a different way of thinking about how your code is consumed. The "documentation systems" of the past (e.g., javaDoc) really are reference documentation on details of a method. But if you want to really grasp the system, you probably lead with examples. Maybe a picture. And you probably don't need to have to navigate to 7 different implementations of your interface Foo to figure out what the design is.

But having those 7 different implementation files is nice when you know the system later.

Where literate programming systems completely break down is the toolset. You typically can't work in "example down" mode and "single source file" mode at the same time - like, to debug, or make a small change. I've tried to build "diff and patch" between the source and the literate documentation, and when that's buggy, whew, that sucks.


One idea about design I take from pg is that it's difficult to "even talk about good or bad design except with reference to some intended user." [1]

Now, I'm relatively new to literate programming, and perhaps the answer to my question(s) will be obvious to someone more experienced with it. What I wonder though every time I read about it is, who is the intended audience of all this extensive documentation? More specifically, how much knowledge of the programming language and problem space do we presume that the reader has?

There is something elegant about presuming the audience has expertise in both areas. That way you only have to worry about documenting one thing, the program, rather than three things: the program, the programming language and the problem space. Readers lacking sufficient understanding of the lang or problem space can refer to the more robust documentation about either one individually to get themselves up to speed. But I'm not sure Docco users are taking this approach.

---

[1] http://www.paulgraham.com/desres.html


Personally, I try to write at the same level that I always would for code, with maybe a larger preamble at the beginning.

Then again, I'm still a student of the ways of the literate myself...


Although I enjoy jashkenas' Javascript documentation a great deal and find it extremely helpful, I'm having a hard time getting excited by the resurrection of interest in literate programming. I enjoy and appreciate the notion that code should be designed by humans to read, but I'm reminded of (and agree with) Bob Martin's admonition in Clean Code that energy spent polishing one's code commentary is better spent improving the readability and flow of the code itself.

Also, comments that come across elegant and attractive when rendered in HTML look teeth-gratingly gratuitous when viewed in the context of the original code. For example,

  // Attach all inheritable methods to the SomeObject prototype.
  _.extend(MyLib.SomeObject.prototype, MyLib.SomeMixin.prototype, { ... })
Yes, that's an accurate description of what it does. But if you can read Javascript, you should know that; in code, it's visual clutter, and worse, it adds overhead to maintain it when you move or update your code.


Redundant comments are as annoying as repeating yourself in writing.

Redundant comments are as annoying as repeating yourself in writing (and I'm certainly as guilty of it as anyone). But the real point of the comments is to explain the "why", just as the code explains the "how". In your example above, to explain the reason that you've chosen to mix in the modules you're mixing in...


I was trying to be subtle about it, but the example I used was drawn from a fairly well-known (albeit, I should add, well-written) JS library. My point is that this style of documentation encourages that behavior, perhaps paradoxically because it's so attractive and big hunks of whitespace tend to stand out.


Mr. Klabnik goes in depth into the argument that literate programming has to be tangled and weaved into a different order, in order to be considered "true" literate programming.

Respectfully, I have to disagree.

It may have been the case that a macro source-code rewriter was necessary before the advent of the subroutine, but in this day and age you can factor out functions as you see fit. Moving around source code with macros is an unnecessary obfuscation between the code you read, and the code as it runs. At any point where you're tempted to pull out a block of code and interpolate it later, just make it a function with a descriptive name, and hey presto.

Edit: Steve wrote it.


> Mr. Klabnik goes in depth into the argument that literate programming has to be tangled and weaved into a different order, in order to be considered "true" literate programming.

> Respectfully, I have to disagree.

If you're not re-ordering the code, there's no difference between a "literate program" and simple comments.

And I find code-reordering useful even in modern languages; most impose some structure like:

  1. module-level documentation
  2. exports
  3. imports / includes
  4. implementation (functions, procedures, etc)
     4.1 return type (static languages only)
     4.2 function/procedure documentation
     4.3 code
With literate programming, you can add supporting text (like export entries, module imports, or documentation) wherever would make the most sense for a human reader, not to the compiler.

> At any point where you're tempted to pull out a block of code and interpolate it later, just make it a function with a descriptive name, and hey presto.

What if you've got a few dozen local variables? Do you want to pass them all as parameters, for every single logical block in your algorithm?

What sort of names do you give your extracted procedures? Literate programs can have block names like "Check for common post-traversal error conditions", which in a procedure name would likely be abbreviated to "traversed_errcheck()".


  What if you've got a few dozen local variables?
  Do you want to pass them all as parameters,
  for every single logical block in your algorithm?
I'm not disagreeing with you, but it's interesting to note that in a stack-oriented language like Forth, PostScript or Factor this isn't a problem.


> What if you've got a few dozen local variables?

In that case, I would consider that I have a larger problem than merely one of documentation.


It seems that folding code to a natural language summary is what you really want.


Not at all; I want to be able to print stuff like this in the generated documentation:

----------------------------------------------

3.2 HTTPS driver

The HTTPS driver is currently implemented using either GNU TLS or OpenSSL:

== driver includes

  #include <gnutls.h>
  #include <openssl.h>
== driver list

  , {"HTTPS (GNU TLS)", driver_https_gnutls}
  , {"HTTPS (Open SSL)", driver_https_openssl}
== drivers.c

  static void
  driver_https_gnutls(DriverState *s) {
    ||common driver local storage||
    gnutls_init();
    gnutls_ctx_init(&ctx);
    check_extra_algos = "FOO:BAR";
    some_feature_flag = 1;
    // etc
  }

  static void
  driver_https_openssl(DriverState *s) {
    ||common driver local storage||
    openssl_init();
    check_extra_algos = "BAZ";
    some_feature_flag = 0;
    // etc
  }
----------------------------------------------


I meant instead of pulling out a block of code and interpolating it later. For example instead of:

    <set up>
    <do stuff>
    <tear down>
And then describing them in the rest of the document, a better solution to me seems to be able to click on e.g. the <do stuff> part to expand it. This allows the reader to see the high level structure and drill down to the stuff he's interested in instead of spreading the code all over the place with references.


That might be an 80% solution, but it still means that you're tying your presentation to the way that the computer needs to have things ordered. What if all of the parts need explained, but I really want to explain the <do stuff> before the <set up> and <tear down>?


Actually, this one is by me. Getting some author tags are on the feature list for Timeless. ;)

You'll have to take that argument up with Knuth, and not with me, though. Also, while you may be thinking of some languages, many others don't make it easy to re-arrange code. C code, for example, pretty much has to be in order. Unless you write a CPP macro. Which is basically the same as a documentation macro.

(ps: oh, and this doesn't mean I don't <3 Docco. Part of writing this post was so that I could stop blathering about it to all of my friends.)


"You'll have to take that argument up with Knuth, and not with me, though."

And.... so what? We don't all run to Alan Kay for our definitions of object orientation, we don't all run to John McCarthy for a definition of a lambda calculus-based programming language, and we don't all go running to Dijkstra as the final word in structured programming techniques. No disrespect intended to Knuth any more than I intend disrespect to the three other shining luminaries I name, but if we're still arguing about Literate Programming's original Knuth definition 26 years later that is nothing more and nothing less than a testament to the utter failure of the term.

Note carefully how I said the term has failed. In point of fact I think what has happened is that the core ideas that led to the formation of literate programming has indeed essentially succeeded (if not exactly "won") and does indeed live on in languages with abstractions powerful enough that they don't need an additional mangling step layered on top of them and already have powerful comment conventions, some even machine readable. We've found that we don't need to narrate the entire program, but just some bits, and that a "literate" approach isn't always the best. If we have not adopted every last trapping of Knuth's proscription, it is not because it hasn't been tried, it is because it wasn't perfect, any more than any of the three luminaries I mentioned above managed to give us the Last Word on any of those topics. I'm not particularly all that interested in Knuth's precise definition in my day-to-day life any more than I sit here agonizing over the fact that my primary language's object model is in gross violation of a couple of Alan Kay's ideas about OO. We've refined our ideas about documentation a lot since 1984, in all kinds of ways.

And I think Knuth-style LP has some serious cost/benefit problems in modern times that didn't exist at the time of the original proposal; maybe in 1984 I'd use it, but the costs have at best stayed steady and maybe even crept up a bit (as you are now no longer augmenting code with good reordering capabilities but now supplementing and interacting with their capabilities and adding additional complexity thereby), while the relative benefits have plummeted unless you're still stuck in 1980s era languages with 1980s era design techniques.


Upvoted.

> We don't all run to

Maybe you don't. ;) I try to keep terms as sacred as possible, usually...

What I was trying to say is this: I don't feel that I have the right to redefine Knuth's term. Yes, it may not be the best anymore, and as MenTaLguY suggested a few minutes ago[1], maybe a rename or rebranding is in order, but I don't feel comfortable saying "Literate Programming is x", for all x that isn't what Knuth said.

I think we're in agreement about the questionableness of the value if going 'fully literate,' which I tried to hint at in my last few paragraphs. I haven't ever written such a program, and so I can't really say if I think it's worth it or not. However, lately I find myself on a quest to compare and contrast and blend together English and code, and so it's something of a great interest to me, and so I do happen to care about those that have come before, for both their successes and their failures.

1: http://twitter.com/#!/mentalguy/status/24667821577338880


Oh, yes, learn your history for sure. Also,

'but I don't feel comfortable saying "Literate Programming is x", for all x that isn't what Knuth said.'

hardly anyone else refrains from going on at length about any other term in computer science. You're putting yourself at a competitive disadvantage compared to other writers with an attitude like that. :)


Haha, quite true. I guess I'll have to get over it.


I agree entirely.

My point is just that in any modern language at a reasonable level of granularity, the order in which you write the code is 99% independent from the order in which you execute it.

To port tangle/weave for a modern language would be to cargo cult a feature that can't possibly help matters, in my opinion.

That said, I'd love to see an example (in a modern language) where a woven bit of literate code was more readable than the well-factored equivalent.


Scheme code that takes advantage of lexical scope with indefinite extent might qualify:

The functions f, g and h use private variables x and y.

  (define f #f)
  (define g #f)
  (define h #f)

  (let ((x #f) (y #f))
    (set! f <<Lambda term for f>>)
    (set! g <<Lambda term for g>>)
    (set! h <<Lambda term for h>>))
F does whatever.

  <<Lambda term for f>>=
  (lambda (p q)
    (21 lines of code))
G does something else

  <<Lambda term for g>>=
  (20 lines of code)
...


> My point is just that in any modern language at a reasonable level of granularity, the order in which you write the code is 99% independent from the order in which you execute it.

I believe that in most dialects of lisp, macros must be defined before they can be called. [citation needed]

This is not to undermine your point, which I think is valid. Macros would just be an interesting exception because they're arguably the feature of lisp, and surely make up > 1% usage for that family of programming languages.


> Macros should be written so as to depend as little as possible on the execution environment to produce a correct expansion. To ensure consistent behavior, it is best to ensure that all macro definitions are available, whether to the interpreter or compiler, before any code containing calls to those macros is introduced.

- Common Lisp the Language, http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/html/c...


That was also the most relevant snippet I found while searching. :) Unfortunately, I couldn't quite figure out if it was a confirmation of my claim or a rebuttal. Would you mind clarifying?


I know about as much as you do. I did some googling, and maybe it's because I'm sitting at pitt.edu, but cmu.edu seems authoritative. ;)

To me, this paragraph says "using a macro before it's defined is undefined behavior," and seems to back up your assertion.


Yep. I certainly don't think that this is as large of a problem now, and I'm unsure if it'd be a large value add. I think that Rocco's presentation over RDoc/YARD is a much, much larger step forward.


A bit more Twitter conversation here, for the curious:

http://bettween.com/jashkenas/mentalguy


Man, I really wish that this tool showed stuff in forwards order rather than backwards order. And when I tried to flip it around, it showed a bunch of your previous conversations first instead...


org-mode in emacs with 'org-babel tangle' works well. But Docco looks great on the web.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: