
From Word to Markdown to InDesign: Fully Automated Typesetting Using Pandoc - rhythmvs
http://rhythmus.be/md2indd/
======
Animats
Well, yes, if you dumb your document down to the level of "markdown", it's not
hard to grind them out as plain text with some styling. You can write HTML in
that style, too. People did that 20 years ago.[1] That was the original vision
of HTML.

Some people wanted to stop there, and limit HTML to describing the semantics
of a document, not its visual appearance. They lost.[2]

Pandoc doesn't really use "markdown". It uses "enhanced markdown":

 _" Pandoc’s enhanced version of Markdown includes syntax for footnotes,
tables, flexible ordered lists, definition lists, fenced code blocks,
superscripts and subscripts, strikeout, metadata blocks, automatic tables of
contents, embedded LaTeX math, citations, and Markdown inside HTML block
elements."_

Pandoc's "markdown" now has roughly the feature set of HTML 3.

[1]
[http://www.animats.com/papers/articulated/articulated.html](http://www.animats.com/papers/articulated/articulated.html)
[2]
[http://www.w3.org/People/Raggett/book4/ch02.html](http://www.w3.org/People/Raggett/book4/ch02.html)

~~~
jolux
If you prove that those who thought HTML should be exclusively for semantics
and not visual appearance "lost" by linking to a history of HTML that was last
updated in 1998 it makes it seem as if you missed HTML5 when great pains have
been taken to _remove_ visual styling information from HTML in place of
semantic markers.

In fact most of the web developers I speak with regularly agree. The consensus
in recent years is that HTML _is_ supposed to be exclusively semantic and
markup should be styled with CSS.

[http://www.csszengarden.com/](http://www.csszengarden.com/) is this concept
taken to an extreme. Same HTML, same semantics, different styling.

~~~
Animats
The claim was made for HTML5 that it would do that. Some HTML5 page source
does look like that, but not most of it.

~~~
jolux
No, there is a specification for HTML5, and it does exactly what I said it
does. Granted, most "HTML5" is a mix of HTML 4 and HTML5 (sometimes even a
little XHTML) but these are not strict validated sites.

Here, try running a few websites through
[https://validator.w3.org/](https://validator.w3.org/) and see what pops up.
You'd be surprised.

For example, [https://news.ycombinator.com](https://news.ycombinator.com) is
extremely bad HTML5, and if you look at the report on it,
[https://html5.validator.nu/?doc=https%3A%2F%2Fnews.ycombinat...](https://html5.validator.nu/?doc=https%3A%2F%2Fnews.ycombinator.com&showimagereport=yes&showsource=yes)
you can see a bunch of "Use CSS Instead" suggestions. This is the separation
of style from semantics that HTML5 was made to reinforce.

------
rubidium
So 30% of my current job is writing technical user guides for large, custom,
automated research systems (the rest is design and development of those
systems).

Each one is slightly different, and so needs its own guide. We currently are
stuck using Word b/c it's so fast. Yes it's ugly. Yes it's a pain to work with
sometimes. But I haven't found anything that's faster to produce these docs
with.

The reason we haven't switched to html, latex, or xml etc... is because I
haven't found a typesetting program that is as easy to drag and drop images
into and out of. Each guide has 30+ images on average. I _need_ that interface
to be drag and drop otherwise it's too tedious to write the guide. The
formatting of the text I'd love to do in some sort of markup language, but the
images have to be dead simple.

Short version: Anyone know of a typsetting solution with some markup language,
version control, print to PDF happen at click of a button, and drag and drop
images?

~~~
rhythmvs
That’s exactly the sort of use case we are focussing on and building our
automated typesetting service for.¹ You’ll probably also benefit from file
inclusions so that you can keep a library of text snippets, with variables,
keeping them versioned separately, and which you can then re-use and assemble
to form a new, slightly different manual, with the variables automatically
populated at typesetting time.

Could you be more specific on the problem you’re facing with placing images?
Is it that there are like hundreds of them, for which you would want to have
created the references (and ![Caption](syntax)) automatically, because typing
out the file paths manually is too tedious? That is doable, although it would
be more of a drag-and-drop feature to be implemented by a dedicated Markdown
editor, or an ST plugin.

Or would you like to have more fine-grained control over floats, sizing and
placement of the images relative to their place in the text narrative? For
which a WYSIWYG interface is indeed more well suited, re: many document
author’s gripe with TeX’s figure placement.

¹ [http://textus.io/](http://textus.io/)

~~~
rubidium
Glad to hear it.

Here's some bullet point thoughts: 1) Workflow needs to be uber-fast and not
require saving/naming the images (at least at the user level, the program will
of course). Many images are screen shots from SW installs. Workflow goes: Run
SW in VM-> screenshot portion of interest -> Copy+Paste into word doc. I never
even save the image in that case. The rest of the images are photos of HW. I
just want to drag and drop into the document, not rename them all and move
them to a same folder with the document. The document/CMS/program should take
care of that all for me (like Word does).

2) I do need the text and images to match better and be more WYSIWYG. Location
is more important than aesthetics for a guide. It cannot be the abomination
that LaTeX is. LaTeX figures are terrible to work with (and that's having
published multiple academic papers + PhD thesis in LaTeX). Sizing is less
important (that can be [width=x, height=y]).

3) Variables/snippets: Libraries of text snippets and variables is what we do
quite a bit of. Right now with a few kludged together macros and custom Word
doc properties. Lots of docs go "Run the <SW install name> for <device>" with
<> items defined at document level. Sounds like you do this, and it's probably
better than how word does :)

sorry for dumping on ya, but I'm excited to find people working in this space.
It's something I feel like there should be better tools for but just haven't
found yet.

~~~
brazzledazzle
You might be interested in Problem Steps Recorder[0]. Word should be able to
import the resulting mhtml file.

[0] [http://blogs.msdn.com/b/patricka/archive/2010/01/04/using-
th...](http://blogs.msdn.com/b/patricka/archive/2010/01/04/using-the-secret-
windows-7-problem-step-recorder-to-create-step-by-step-screenshot-
documents.aspx)

------
jolux
Is Markdown really "relatively new?" According to Wikipedia, it's older than
OOXML, aka .docx .xlsx .pptx etc.

[https://en.wikipedia.org/wiki/Markdown](https://en.wikipedia.org/wiki/Markdown)
[https://en.wikipedia.org/wiki/Office_Open_XML](https://en.wikipedia.org/wiki/Office_Open_XML)

~~~
rhythmvs
Relative to widespread brand recognition, I guess, and relative to similar
initiatives. But indeed, Markdown — as in, the concept of _lightweight markup_
— predates the Web and is at least as old as Usenet. I footnote 9, I linked to
my Github repo with Markdown resources; there you’ll find the following
chronology:

\- Setext (Ian Feldman, 1992) \- AFT (Todd Coram, 1999) \- Grutatxt (Ángel
Ortega, 2000) \- atx (Aaron Swartz, 2002) \- AsciiDoc (Stuart Rackham, 2002)
\- MediaWiki (Magnus Manske & Lee Daniel Crocker, 2002) \- reStructuredText
(David Goodger, 2002) \- Org-mode (Carsten Dominik, 2003) \- Textile (Dean
Allen, 2004) \- Markdown (John Gruber & Aaron Swartz, 2004)

~~~
jolux
Yes, I'm quite aware of the history, which is why I thought it was strange
that you referred to it as "relatively new." Otherwise, it's a great post.

~~~
rhythmvs
Thanks! And yeah, if people would only know their history… Relatively new, to
me, unfortunately, means that lightweight markup still is not something which
the broader public is familiar with, not even by its proxy brandname
“Markdown”, even with big outlets (e.g. Reddit) pushing it forward.

------
shortformblog
From the perspective of someone with experience working in the print industry:
This is impressive work, and you nail down the biggest issue with Word—if you
don't use pure style sheets, you get cruddy code. (Google Docs has this
problem, too.)

I think that lots of folks are trying to find different ways to solve this
problem in the print world, especially as stories have to be pulled into CMSes
as well.

Personally, I'd be curious if it'd be possible to cull copy from a InDesign
file to convert into Markdown, complete with links to images used in the
document that could then be edited.

------
katabasis
It's great to see others in the publishing world moving away from proprietary
software like Word and towards plain-text based processes. As the author
points out there are many benefits of this.

Here's something to consider: you've ditched MS Word, maybe you can ditch
InDesign too? In my experience once content goes into an Adobe program, it's
hard to get it out again in a clean way. It's pretty impressive what you can
accomplish using CSS3 for print layout[1].

I work at an academic publisher, and I've spent much of the last year
preaching the benefits of a similar workflow. Some of the editors are now
editing manuscripts in Markdown files directly. Currently we're building a
system where a single set of text files get fed into a program like a static
website generator. This produces a web version, but also PDF, ePUB, etc.
automatically. We're getting pretty close[2]. I think this is the future for
many forms of publishing.

I think one of the big remaining pieces of the puzzle is creating a better
Markdown editor, something suited for the needs of scholars & academics with
support for things like footnotes, bibliographies, etc, while remaining a
plain-text format.

[1] [http://alistapart.com/article/building-books-with-
css3](http://alistapart.com/article/building-books-with-css3) [2]
[http://egardner.github.io/posts/2015/building-books-with-
mid...](http://egardner.github.io/posts/2015/building-books-with-middleman/)

------
zyxley
Unlinked footnotes in a webpage?

With all that unused margin space on a desktop, it seems like they would have
made more sense as Tufte-style marginal notes anyway.

~~~
rhythmvs
Though they are linked, see e.g.
[http://rhythmus.be/md2indd/#fn2](http://rhythmus.be/md2indd/#fn2) — it’s
Pandoc after all, which creates those ;-)

The wide margins are due to typography best practices, which dictates between
~50 and 70 characters on a line. One could of course blow up font-size, as is
in fashion, since Medium. But then there are too few lines above the fold.

But you’re entirely correct as regards marginal notes. Unfortunately, they’re
not trivially implemented, since you’d need to swap end-notes to side-notes
relative to the viewport, cq. media query breakpoint, but while that involves
DOM manipulation, it cannot be done with CSS alone. Let alone dealing with
collision detection for stacking longer footnotes vertically…

~~~
munificent
I implemented this in my online book[1] and it wasn't too bad. Try resizing
the window to see.

The entirety of the JS (not including jQuery) is:

    
    
        $(document).ready(function() {
          $(window).resize(refreshAsides);
    
          // Since we may not have the height correct for the images, adjust the asides
          // too when an image is loaded.
          $('img').load(function() {
            refreshAsides();
          });
    
          // On the off chance the browser supports the new font loader API, use it.
          if (document.fontloader) {
            document.fontloader.notifyWhenFontsReady(function() {
              refreshAsides();
            });
          }
    
          // Lame. Just do another refresh after a second when the font is *probably*
          // loaded to hack around the fact that the metrics changed a bit.
          window.setTimeout(refreshAsides, 200);
    
          refreshAsides();
        });
    
        function refreshAsides() {
          // Don't position them if they're inline.
          if ($(document).width() < 800) return;
    
          // Vertically position the asides next to the span they annotate.
          $("aside").each(function() {
            var aside = $(this);
    
            // Find the span the aside should be anchored next to.
            var name = aside.attr("name");
            var span = $("span[name='" + name + "']");
            if (span == null) {
              window.console.log("Could not find span for '" + name + "'");
              return;
            }
    
            aside.offset({top: span.position().top - 3});
          });
        }
    

In my book, the asides are positioned very precisely next to certain lines. If
you don't need that level of precision, a pure CSS solution is possible, I
think.

Overlapping footnotes could be a problem, but to me that's a case where trying
to completely separate design from content is a bad idea. Design should
optimize for the actual prose you have and not all possible copy you might
write. Likewise, it's often worth tweaking copy a bit to make it look better
with your design.

In a couple of cases with my book, I tweaked or rearranged asides to avoid
them overlapping.

[1]:
[http://gameprogrammingpatterns.com/introduction.html](http://gameprogrammingpatterns.com/introduction.html)

~~~
Scarbutt
Nice book, what did you use to create the HTML from the Markdown? Or did you
do some CSS fiddling?

~~~
munificent
A very simple cobbled together Python script:

[https://github.com/munificent/game-programming-
patterns/blob...](https://github.com/munificent/game-programming-
patterns/blob/master/script/format.py)

------
akavel
@rhytmvs Have you considered using SILE [1][2] instead of LaTeX (and maybe
even InDesign)? (I'm not affiliated, but I believe it may become a worthy
successor to TeX in future.)

[1]: [http://video.fosdem.org/2015/main_track-
typesetting/introduc...](http://video.fosdem.org/2015/main_track-
typesetting/introducing_sile__CAM_ONLY.mp4)

[2]:
[https://archive.fosdem.org/2015/schedule/event/introducing_s...](https://archive.fosdem.org/2015/schedule/event/introducing_sile/attachments/slides/772/export/events/attachments/introducing_sile/slides/772/sile.pdf)

------
cossatot
Will it work with equations? A major peeve of mine is trying to get Tex to
deal with figures in a better manner, especially in space-limited situations
(e.g. grant proposals), and I used to use InDesign for that before my work got
too mathy.

~~~
adiM
Try ConTeXt. The float placement is much more flexible than in LaTeX. Being
built on top of TeX, it supports all the math

------
pjstew
I've been looking into the same problem for a while, and came to exact same
solution a few weeks ago, coincidently. I haven't actually got round to
completing all the code for it, but have tested each section. I was delighted
when I spotted your article this morning, but was hoping you would have also
shared your code... No git repository? I'm sure I can make the whole system
myself, but I'm always happy to use others work if it exists. If you do have a
working version of this process, please do share it!

------
pessimizer
I prefer asciidoc:
[http://powerman.name/doc/asciidoc](http://powerman.name/doc/asciidoc)
[http://asciidoctor.org/docs/what-is-
asciidoc/](http://asciidoctor.org/docs/what-is-asciidoc/)

------
todd8
I'm really looking forward to the evolution of Markdown, but it will not be a
complete replacement for TeX (for many years). TeX is designed around a
powerful (macro based) programming system. This is an excerpt from a comment
that I posted on HN a while back that is apropos this discussion:

TeX's macro style of programming is too difficult. Nevertheless, people have
done amazing things with it.

TeX has somewhere around 325 primatives, and one of the most important is the
\def primative used to define macros. These primatives are used to define
additional macros, hundreds of them, available in different so called formats.
A basic format known as Plain TeX includes about 600 macros in addition to the
325 primatives. LaTeX is another format, the most widely used, but there are
others, like ConTeXt, that are also very capable. Each of these extend TeX's
primatives with their own macros resulting in different kinds of markup
language. TeX's primatives are focused on the low level aspects of typesetting
(font sizes, text positions, alignment, etc.). LaTeX provides a markup
language that is focused on the logical description of the document's
components: headings, chapters, itemized lists, and so forth. The result is a
system that does simple things easily while allowing very complex typesetting
to be performed when needed.

In addition to the TeX core primatives and the hundreds of commands
(implemented as macros) in a format like LaTeX there are additional packages,
classes, and styles that are used to provide support for any conceivable
document. LaTeX has a rich ecosystem of packages. Typesetting chess? There's a
LaTeX package for that. Complex diagrams and graphics, there's a LaTeX package
for that. Writing a paper in the style of Tufte? Writing a book? or a musical
score? or building a barcode? there are packages for that. The _documentation_
for the Tikz & PGF graphics package is over 1100 pages long! The documentation
for the Memoir package is 570 pages.

The amazing thing is that all of this is built out of macros. Diving into
this, and once one needs to customize the look of a document it's inevitable,
you find yourself in a maze of twisty little passages. Once upon a time, while
writing assembly language for large computers, I enjoyed writing fancy
assembler macros. I was facinated with Calvin Moore's Trac programming
language based on macros and Christopher Strachey's General Purpose
Macrogenerator. These were early (mid 1960's) explorations into the viability
of macro processors as means for expressing arbitrary computations. Reader's
interested in trying out macros for programming can try the m4 programming
language (by Kernighan and Ritchie) found on Unix and Linux systems. m4 is
used in autoconf and sendmail config files. Yet, TeX macros are in a whole
other dimension. All of these powerful macro systems have one thing in common:
parameterized macros can be expanded into text that is then rescanned looking
for newly formed macros calls (or new macro definitions) to expand as many
times as one wants. This isn't just an occasional leaky abstraction; it is
programming by way of leaky abstractions. Looking at TeX packages is some of
the most difficult programming that I've done. It's unbelievably impressive
what people have come up with (e.g. floating point implemented via macro
expansion in about 600 lines of TeX), but it's also unbelievably frustrating
to program in such an environment. The LaTeX3 project is an attempt to rewrite
LaTeX (still running on top of the TeX core). Started in the early 1990's it
is still not done. I think its just that they are mired in a swamp of macros.
They do have a relatively stable set of macros written, with the catchy name
expl3, that are intended for use when writing LaTeX3. Here's a sample

    
    
         \cs_gset_eq:cc
        { \cf@encoding \token_to_str:N  #1 } { ? \token_to_str:N #1 }
    

This is described in the documentation as being a big improvement over the old
macros and "far more readable and more likely to be correct first time". I
can't wait.

I think LaTeX is absolutely without peer, but I wish improving it's
programming method wasn't so daunting. I keep toying with starting a project
to do just that, but so many others have tried and failed. It's disheartening.

Links:

[TRAC]
[https://en.wikipedia.org/wiki/TRAC_(programming_language)](https://en.wikipedia.org/wiki/TRAC_\(programming_language\))

[GPM]
[http://comjnl.oxfordjournals.org/content/8/3/225.full.pdf](http://comjnl.oxfordjournals.org/content/8/3/225.full.pdf)

[m4] info pages available on Unix and Linux

[Tikz & PGF]
[https://www.ctan.org/pkg/pgf?lang=en](https://www.ctan.org/pkg/pgf?lang=en)

[Memoir]
[https://www.ctan.org/pkg/memoir?lang=en](https://www.ctan.org/pkg/memoir?lang=en)

[expl3] [https://www.tug.org/TUGboat/tb30-1/tb94wright-
latex3.pdf](https://www.tug.org/TUGboat/tb30-1/tb94wright-latex3.pdf)

~~~
akavel
Did you maybe have a look at SILE [1][2]? (I'm not affiliated, but it got me
interested very much and I believe it may become a worthy successor to TeX in
future.)

[1]: [http://video.fosdem.org/2015/main_track-
typesetting/introduc...](http://video.fosdem.org/2015/main_track-
typesetting/introducing_sile__CAM_ONLY.mp4)

[2]:
[https://archive.fosdem.org/2015/schedule/event/introducing_s...](https://archive.fosdem.org/2015/schedule/event/introducing_sile/attachments/slides/772/export/events/attachments/introducing_sile/slides/772/sile.pdf)

~~~
todd8
Thank you. I'm definitely going to look into it.

