Hacker News new | comments | show | ask | jobs | submit login
Literate Programming: Empower Your Writing with Emacs Org-Mode (offerzen.com)
175 points by jworthe 3 months ago | hide | past | web | favorite | 65 comments



>On the writing side, the main issue is that literate programming tends to tie your writing into the tools that support your literate programming. This can make collaboration on a document difficult if the people you’re collaborating with are not as sold on the tools as you are. The moment you need to work with a business person who prefers to use Google Docs to share a document, or a university department that insists on receiving drafts as Microsoft Word documents, you start to face the pain of exporting to those proprietary formats. Like many issues in software development, this is really a social issue: for literate programming to work, all of the writers need to agree on the tools being used.

I've used org-mode at work and this was actually the strongest selling point. Using pandoc and massaging the org-mode reader to map better onto the pandoc markdown ast I could export to pdf, html and docx (!!!). I had production code + prose sent to the CTO as a word attachment in an email and she loved it.

The tangle comments also meant I have a workflow to share the code as regular code with the rest of the team and use git to see which code sections they changed. There wasn't enough traffic to justify writing an un-tangler that put their code back into the original document but it seems like a very interesting problem.


It’s strange how few people use pandoc for word processing. Even if your goal is just to produce a docx file in the end it’s so much more pleasant to write it in markdown in a distraction free editor you’re comfortable with than to deal with word or writer.


With the latest versions of pandoc and org-mode I find I write nearly exclusively org-files which are then converted to other file types without any pain. I don't think I've even had to write raw documents LaTeX in the last year, other than using TikZ for graphics.

Emacs+evil+org+pandoc is the Swiss army chainsaw of text manipulation.


From my experience this is just unfortunate lack of awareness that Pandoc exists.


Along the same lines, I've used Pandoc to convert a org-mode file into a revealjs presentation.


I've been using revealjs too! I've been using this: https://github.com/yjwen/org-reveal


Org mode can natively export to all that formats, and more. The end result is always better than pandoc's, and it lends itself better to the customisation of the process and hooking in your own functions to further modify it.


You can export to all these formats, but what if they edit the DOC file and send it back to you? How do you bring those changes back to org? Will pandoc preserve your converted org mode file structure?


Jupyter notebooks seem to be the modern form for literate programming.

People are writing entire books in the literate style:

http://nbviewer.jupyter.org/github/rlabbe/Kalman-and-Bayesia...

The style is similar to literate classics like "Structure and Interpretation of Computer Programs" and "Structure and Interpretation of Classical Mechanics".

The only downside is version control. Jupyter uses a custom file format. It is difficult to work with colleagues on the same notebook, since we cannot use the usual code-versioning tools (git/diff/etc.) to manually merge together concurrent changes.

Anybody knows a good solution for this?


You can use a filter in your gitconfig to strip away the outputs, and thus make it play nicer with git [0].

gitconfig:

  [filter "nbstrip_full"]
      clean = "jq --indent 1 \
              '(.cells[] | select(has(\"outputs\")) | .outputs) = []  \
              | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null  \
              | .metadata = {\"language_info\": {\"name\": \"python\", \"pygments_lexer\": \"ipython3\"}} \
              | .cells[].metadata = {} \
              '"
      smudge = cat
      required = true

gitattributes:

  *.ipynb filter=nbstrip_full

[0] http://timstaley.co.uk/posts/making-git-and-jupyter-notebook...


> Jupyter uses a custom file format

For those curious, the ipynb files are json files, and the code is stored in an array of cells, as string entries


> Anybody knows a good solution for this?

It's right here where you're commenting: org-mode.



README files on higher level (modules, subsystems), comments on the level of individual files.


The issue with all the ideas to make programs easier to understand is that they require time and work.

If I had more time and energy there are plenty of things I could do: refactor, make comments, change variable names or try literate programming.

But given the limited time and energy which is best?

Put another way, how can you beat thoughtful variable or function names and judicious comments?


Literate programming might be my least favourite approach when writing normal code, because you're going to read the prose only once or twice and then find that it gets in the way of the implementation you're concerned about. Maintaining something done with literate programming must be an absolute nightmare if you've got any kind of refactoring, because you basically have to rewrite at least a few paragraphs so the prose continues to make sense. The code is more important than the prose.

But when it comes to writing a tutorial that you can also execute to get the final result, or interact with along the way, then there is nothing better because, really, you're more interested in the prose that lends explanation to the code you're showcasing.

Although this article's about emacs and I've taken the literate approach to my own config too. That's less for the benefit of adding paragraphs of explanation to my config, but because org mode offers a fantastic way to organise code in one file (which is more often than not what an emacs config tends to be).

In either case I don't think you can do better than being thoughtful when writing code, thinking about the bigger picture and not just the individual functions and variables in isolation. For others and also yourself. This is also true for emacs itself, considering the idiosyncrasies it presents when most of us learn elisp by copying someone else, rather than reading the language docs.


> The code is more important than the prose.

This is not the position of someone programming in a literate fashion. Their contention is that prose and code are complimentary. Code explains what the program does, prose explains why it does it that way. They work together to paint a coherent view of your program. With just code, a newcomer reading your program and understanding it 100% is still left with many questions unanswered.


You mentioned writing a tutorial and being able to execute the result, so thought I'd mention a project I'm working on to do exactly this.

It started when I got annoyed with how difficult it was to write a 'code' tutorial.

Essentially it allows you commit a markdown file (or any other format) alongside your source code and you can embed source code snippets / shell commands into this file - which then gets rendered into the main 'output' (the tutorial / article).

https://github.com/chrissound/GitChapter


The main problem I have with literate programming for tutorials is that it forces your writing order to match the code order. You get stuff stuff like this:

"Ignore these for now" ... import statements and setup

"Now lets talk about X" ... some code

"Remember those import statements? Lets talk about them, but scroll up to see the code because I can't repeat it." ...

"Oh yeah, and we could have done this on X, but again, can't show any code." ...


Literate programming tools generally support ordering blocks differently in the human-version of the document, exactly so what you describe is not necessary. And one of the reasons why special tools for exist, instead of just putting many comments in a file.


That's not an issue with org-mode itself though people may use it that way. That's one thing I love about it. All the boiler plate crap? Tucked away in an appendix or an unexported heading (so not displayed in HTML or PDF form).


With the noweb syntax, I think you can just write:

<<imports>>

code you want to show

and then later define what `imports` should be.


Indeed. You can also define several `imports` blocks, and the contents of each block will be concatenated during expansion. So your literate source can have the imports located near where they're used, but in the tangled code they'll be up top where most people expect them to be.


That doesn't sound right. You absolutely don't have to have the literate document follow the order of the code. Absolutely not. One writes the sections and the pieces in a sensible order in the literate document, and the processing of the document into pure code reorders appropriately.

Here's something I wrote before, based on my own experiences. A key phrase from what follows is When the text was munged, a beautiful pdf document containing all the code and all the commentary laid out in a sensible order was created for humans to read, and the source code was also created for the compiler to eat.

People seemed to find it useful then, so maybe they will now as well:

A previous employer (a subdivision of a global top ten defence company) used literate programming.

The project I worked on was a decade-long piece for a consortium of defence departments from various countries. We wrote in objective-C, targeting Windows and Linux. All code was written in a noweb-style markup, such that a top level of a code section would look something like this:

    <<Initialise hardware>>
    <<Establish networking>>
and so on, and each of those variously break out into smaller chunks

    <<Fetch next data packet>>
    <<Decode data packet>>
    <<Store information from data packet>>
    <<Create new message based on new information>>
The layout of the chunks often ended up matching functions in the source code and other such code constructs, but that wasn't by design; the intention of the chunks was to tell a sensible story of design for the human to understand. Some groups of chunks would get commentary, discussing at a high level the design that they were meeting.

Ultimately, the actual code of a bottom-level chunk would be written with accompanying text commentary. Commentary, though, not like the kind of comments you put inside the code. These were sections of proper prose going above each chunk (at the bottom level, chunks were pretty small and modular). They would be more a discussion of the purpose of this section of the code, with some design (and sometimes diagrams) bundled with it. When the text was munged, a beautiful pdf document containing all the code and all the commentary laid out in a sensible order was created for humans to read, and the source code was also created for the compiler to eat. The only time anyone looked directly at the source code was to check that the munging was working properly, and when debugging; there was no point working directly on a source code file, of course, because the next time you munged the literate text the source code would be newly written from that.

It worked. It worked well. But it demanded discipline. Code reviews were essential (and mandatory), but every code review was thus as much a design review as a code review, and the text and diagrams were being reviewed as much as the design; it wasn't enough to just write good code - the text had to make it easy for someone fresh to it to understand the design and layout of the code.

The chunks helped a lot. If you had a chunk you'd called <<Initialise hardware>>, that's all you'd put in it. There was no sneaking not-quite-relevant code in. The top-level design was easy to see in how the chunks were laid out. If you found that you couldn't quite fit what was needed into something, the design needed revisiting.

It forced us to keep things clean, modular and simple. It meant doing everything took longer the first time, but at the point of actually writing the code, the coder had a really good picture of exactly what it had to do and exactly where it fitted in to the grander scheme. There was little revisiting or rewriting, and usually the first version written was the last version written. It also made debugging a lot easier.

Over the four years I was working there, we made a number of deliveries to the customers for testing and integration, and as I recall they never found a single bug (which is not to say it was bug free, but they never did anything with it that we hadn't planned for and tested). The testing was likewise very solid and very thorough (tests were rightly based on the requirements and the interfaces as designed), but I like to think that the literate programming style enforced a high quality of code (and it certainly meant that the code did meet the design, which did meet the requirements).

Of course, we did have the massive advantage that the requirements were set clearly, in advance, and if they changed it was slowly and with plenty of warning. If you've not worked with requirements like that, you might be surprised just how solid you can make the code when you know before touching the keyboard for the first time exactly what the finished product is meant to do.

Why don't I see it elsewhere? I suspect lots of people have simply never considered coding in a literate style - never knew it existed.

If forces a change to how a lot of people code. Big design, up front. Many projects, especially small projects (by which I mean less than a year from initial ideas to having something in the hands of customers) in which the final product simply isn't known in advance (and thus any design is expected to change, a lot, quickly) are probably not suited - the extra drag literate programming would put on it would lengthen the time of iterative periods.

It required a lot of discipline, at lots of levels. It goes against the still popular narrative of some genius coder banging out something as fast as he can think it. Every change beyond the trivial has to be reviewed, and reviewed properly. All our reviews were done on the printed PDFs, marked up with pen. Front sheets stapled to them, listing code comments which the coder either dealt with or, in discussion, they agreed with the reviewer that the comment would be withdrawn. A really good days' work might be a half-dozen code reviews for some other coders, and touching your own keyboard only to print out the PDFs. Programmers who gathered a reputation for doing really good thorough reviews with good comments and the ability to critique people's code without offending anyone's precious sensibilities (we've all met them; people who seem to lose their sense of objectivity completely when it comes to their own code) were in demand, and it was a valued and recognised skill (being an ace at code reviews should be something we all want to put on our CVs, but I suspect a lot of employers basically never see it there) - I have definitely worked in some places in which, if a coder isn't typing, they're seen as not working, so management would have to be properly on board. I don't think literate programming is incompatible with the original agile manifesto, but I think it wouldn't survive in what that seems to have turned into.


Do you know how the process was adopted at the employer?

What tools would you recommend for the actual tangle process? Noweb?


As I recall, we were using noweb. I typically wrote using XEmacs (these days I'm back on EMacs). I certainly remember typing noweb at the command line.

The document I wrote into was turned into two documents; latex, which included all the source code and the accompanying discussion and design, ready to be turned into beautiful PDF; and Objective-C, ready for the compiler. Code review was done using the PDF document, design review had already been done before we started writing the noweb (and thus before we started writing the code) and the code review included checking that the design implemented matched the design approved.


GitChapter fixes this problem. Disclaimer: I wrote it.


>Put another way, how can you beat thoughtful variable or function names and judicious comments?

Writing why and how you did something.

The setup for org lets you write the equivalent of jupyter notebooks for every language, which means that you can show a toy implementation of what you're doing before the real code. This toy code is live, completely independent of the rest of the project and can be poked at by anyone opening the org file without screwing anything else up.

I have programs written in org-mode which tangle and weave not only the code but the devops. Chapter 1 is the setup for the system, chapter 2-N is code, chapter N+1 launches the app.

I have only really done it with python and C so far. But I managed to get Scala setup today in less time than it would take to get an ide working.

The tools are still immature, tangle especially needs much finer, and better documented control. But even in this state it's the only tool I can use to write programs which I can pickup three years later and grok in an afternoon. The only downside is that the development for the tool is stuck in the 90s with emails for bugs and patches.


I'd love to see your org files, do you have any online that you can share?


Sure: https://github.com/ant-t/LiterateHelloWorld

I've gotten interest in running a workshop on org mode for literate programming so expect for that to be filled in the next week or two.


Program design.


I think most professional cameras have a voice annotation button, for recording scene and subject notes. At least my D2X did, which was the last pro DSLR I could justify.

I keep wishing for the same button, mid line, every time I'm working around some kind of"proprietary" serialisation* I find in in house projects.

*I tried to coin the acronym OBSEC to mean security by obscurity, fifteen years ago.. but I have since figured how much pointed derogatory comment serves no enduring purpose unless it at least, like SNAFU and FUBAR releases the frustration felt by the observer.


Another selling point for me is the relative ease of writing custom exporters from org to other formats [1,2].

My latest use cases are: - Extracting a list of issue descriptions (for Jira) from a Getting Started guide, that included "todo" comments. - Extracting a list of contacts from notes taken at during a conference. - Drafting Jekyll blog posts in org-mode.

For todo and contact information, org properties [3] are used to specify semantic fields, that can in turn used by the exporter.

To be fair, the work on these exporters in not completed. I am still getting my lisp up to speed. But from what I can see this way easier than writing pandoc backends or jupyter notebook exporters.

[1] Exporter Documentation: https://orgmode.org/worg/dev/org-export-reference.html [2] Example Exporter (.md): http://repo.or.cz/org-mode.git/blob/HEAD:/lisp/ox-md.el [3] https://orgmode.org/manual/Property-syntax.html#Property-syn...


Reminds me of college prof. doing requirements parsing to extract code out of desires. I'd love to write in commentscript.


Literate programming is fine for done things - e.g. the sort of things you might put in a Jupyter notebook - maths, algorithms etc.

For large normal programs though? I don't think you need that much prose. A program with 100k lines of code would become enormous.

Still, I do wish programming languages had better support for rich comments - why do no IDEs render comment blocks as markdown? Why can't I put diagrams in comments? I have to resort to shitty ASCII art like a commoner.


Checkout Dr Racket, I believe you can have images in the comments and even as part of code.

https://docs.racket-lang.org/quick/


Oddly, large programs that won't change heavily benefit from the literate style. Almost as a forcing function to keep you from changing them. Not surprising, given Knuth's liking of stability.


> large programs that won't change heavily benefit from the literate style.

This has been my experience. Returning to code I wrote as a literate document is a joy.


This is a fantastic article. I've been wanting to write an article like this and Justin Worthe did a better job at describing org-mode and Literate Programming than I could have.

For my part, I've written at least 3 "non trivial" literate programs using org-babel and describe them here: https://gist.github.com/jpf/d71453f535065a0d9281672152541386

I should mention that one of my "dirty secrets" of writing literate programs is that I always start with an "illiterate" program, tweaking, changing and updating as I go along. Then, once it's something I'm ready to "chisel in stone" I start converting the code into a literate document.

I do this by checking all of my code into Git, then re-create the code inside of org-mode. Every time I "tangle" from org-babel, I do a "git diff" and make sure that I haven't changed the code by documenting it.

I do this because, while it's easy to make changes to a literate program, but it's harder to do a major refactor.


The only thing holding me back from experimenting with org-mode is its tight coupling with emacs.

It's a non-starter on a team where people aren't willing to switch over to it (at least to view and interact with org files).

It's difficult for anyone who has a heavy reliance on IDE features; I've spent literally probably close to a hundred hours trying to get emacs + IDE-like integrations working for a variety of languages, with only mixed success.

Yes, I understand that going with emacs is a 'different way' of doing things, and perhaps you won't need your IDE if you just do things the emacs way, but the barrier to entry is very high, between building in new muscle memory for shortcuts, the highly non-standard UX compared to every other editor today, and the hours and hours required to configure everything by hand, even with a starter config like spacemacs.


> The only thing holding me back from experimenting with org-mode is its tight coupling with emacs.

As a non-emacs user, my successful route to orgmode was with Spacemacs and evil for Vim keybindings. After several false starts, what worked for me was:

1) realizing I didn't need to go full emacs: I stick with Vim for plain text editing, Visual Studio for coding at work and PyCharm/Visual Studio Code for coding at home.

2) consciously learning one new org-mode feature at a time, and adding their shortcuts to my own cheat sheet.


Literate programming fits in fine for very dense languages. I was writing J(APL) a while ago and it was nice to employ literate programming using Jupyter Notebook.

Most folks say J/APL is unreadable, but I find that I can always come back and understand the code as fast as I can read the notebook.


I've used org-mode with inline code a few times now, and it is terribly expressive and useful when you want to show reports with "live" results in them, etc.

That said when it came to formatting my Emacs init file, with descriptions, and justifications, I chose to use markdown:

https://github.com/skx/dotfiles/blob/master/.emacs.d/init.md

Markdown allows text, and code, to be mixed, and while there is no inline expansion support or other neat features it is minimal enough that it was painless to write and process.


Curious why you choose markdown over org, especially for an emacs init file?


I spend a lot of my time writing markdown, much more than I do interacting with org-mode, so it felt more natural.

Org-mode I tend to only really use for dynamic things. (i.e. "Reports" or tutorials which contain ebedded code rendered output.) Markdown seems like it is easier to demonstrate to other people - and doesn't really require introducing orgmode (which is worth learning, and which does have excellent documentation of its own).


I've been using org and emacs for 5 years now and I never thought of using it to actually comment what my .emacs file was doing.

This is a brilliant idea I feel very stupid for not having thought of myself.


The love Emacs config files tend to get never ceases to amaze me :-)


I find they tend to look somewhat like what you'd expect to see from a file that's the combination of RC scripts for an OS and configuration files for all applications inside it...


first, i like emacs alright, and the article is awesome and full of good recommendations..

.. however, this article appears to be advocating a style of programming that i'd be more apt to call Test Driven Development than Literate Programming:

it demonstrates a way to use naming conventions and mocks/stubs to describe a program, not a way to use natural language, math notation, etc. to describe a program.

for JS (the lang in the article), proxyquire+simple-mock is a non-emacs-centric way to do this. toss in tape or some other testing library, and you've got some amount of natural language documentation as well.

this is more what i'd consider literate programming in JS land

https://marionettejs.com/annotated-src/backbone.marionette.h...

... by way of disclaimer, i ought to say that, for all i know, i'm grossly misinformed as to the meaning of TDD as compared to Literate Programming


The best thing with Org mode literate programming is that you can easily mix and match code in multiple languages and make these communicate, and then render the result as part of the document w/o any hassles.


It occurs to me that the emacs-averse can still use emacs as a backend (emacsclient/(server-start)) so they can edit org-mode text in their favorite editor and then process it with emacs magic.

Also, I bet this trick would work (maybe) for formatting comments or introducing ASCII wireframes with tools mentioned at https://news.ycombinator.com/item?id=2651745. (For the ASCII art, maybe an extra hop through emacs isn't necessary.)


Literate programming sounds nice... but then I read all those people outlining the horrors of too many comments. I have trouble reconciling these approaches.


The comments of a literate program are not just comments scattered throughout the program. Not are they just an index of what is in a program. Rather, they build a narrative. It is literally explaining the program. In full.


> It is literally explaining the program. In full.

Precisely. Here are real world examples of this:

A program that implements on-the-fly encoding of images into SSTV audio files (3620 words): https://github.com/jpf/dial-a-cat#1-855-meow-jam-sending-cat...

An example SCIM server (7966 words): https://github.com/joelfranusic-okta/okta-scim-beta#welcome-...

An example OIDC "RP" implementation (7731 words): https://github.com/joelfranusic-okta/okta-scim-beta#welcome-...


I'm sorry, but I think I don't get it. all those examples consist of regular old sparsely commented code files with really thouroughly written READMEs, written in org. how is that special or any different than Readme driven development?


The README and the code always stay in sync. An update in once place changes both the code and the documentation.


OfferZen? On MY HackerNews? It's more likely than you think!

I think they've made quite the impact on the South African dev market to get to the front page here.


There are plenty of saffers doing good work, just the vast majority of them get out because the country is, well, you know...

Source: Saffer who got out, knows others from UCT etc.


I started using org mode at work and home a few months ago. Love it.

I want to try both Haskell’s built in literate programming support and try the ideas in this article.


> I want to try both Haskell’s built in literate programming support

Haskell doesn't have literate programming support. The "lhs" format merely changes the syntax of block comments, it provides no additional power over {- text here -}.


I'd love to hear your thoughts after trying both!

When I got started with Literate Programming, what I thought I wanted was the built-in literate programming support that Haskell has (.lhs files) – what I've come to realize is that the ability to re-organize (and re-use!) blocks of code in an org-babel style literate document is extremely powerful and something I would miss using the "we inverted code and comments" approach that Haskell uses.


Instead of embedding code inside org-mode it should be the opposite: embed org-mode inside comment sections. However this is not easily possible because of ‘*’ gobbling up beyond the comment section. (At least for c)


I'm not sure that makes as much sense. At that point, why not just use regular comments? Then add outshine-mode if you want a tree of headlines.


A picture tell a thousand words - so unfortunately as much as I like emacs and markdown, I end up using Visio.


Examples of large enough (real world) literate programs?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: