
Pandoc - otp124
https://pandoc.org/
======
Schiphol
I do all of my academic writing in pandoc. As compared to LaTeX this means no
boilerplate (yet you can still use full LaTeX syntax for equations and the
like) and, if the publisher 'needs' a Word file, you are one click away from
providing it. All with plain text files that you can put under version
control, get meaningful diffs, etc. It's just great.

~~~
smohare
I’ve never understood the impetus for not using full LaTeX in an academic
contex, given that the boiler plate is so minimal and presumably one has a
built up a personal template over time.

For blog posts and notes I see the appeal, since the boilerplate can be a
hindrance to spontaneous writing.

~~~
CJefferson
Latex can't produce web output, which is increasingly a target I want.

Also, Latex can't produce any output which is accessible to blind people
(other than giving them the raw LaTeX). The PDFs latex produces are probably
the least accessible format available (much worse than a word proeuced pdf, or
some html). This matters to me, and should matter more to other people (in my
opinion).

~~~
voltagex_
>Also, Latex can't produce any output which is accessible to blind people

This sounds like it should definitely be a target of a grant. I guess most
government organisations around the world are using Word et al, which isn't
too bad these days accessibility wise (AFAIK).

Can you provide a small example of a LateX document that produces an
inaccessible PDF?

~~~
CJefferson
If you grab any academic paper (particularly two columns) there is a good
chance getting the text out will be hard, and any part of the paper with maths
or tables will be unusable. Sorry. I'm away from a computer now, to make a
smaller example.

~~~
masklinn
The paper "GADTs Meet Their Match" (first I had in my list) seems to work
fine, but I don't know what it was generated with.

~~~
CJefferson
I'll pick on one of my own random papers:

[https://www.cs.york.ac.uk/aig/projects/implied/docs/cp03.pdf](https://www.cs.york.ac.uk/aig/projects/implied/docs/cp03.pdf)

Try extracting "Theorem 2" on page 5, or any text really. I just get random
noise through either a PDF reader, or something like pdf2ascii / ps2ascii.

We just made this with standard latex.

~~~
Gorgor
That’s interesting. Did you \usepackage[T1]{fontenc}?

~~~
CJefferson
Thanks paper is from 2003, so I'm not sure.

This is just an example. From experience, most PDFs at conferences and
journals, generated from pdf, are not accessible to varying degrees.

------
mb2100
Occasional pandoc contributor here, AMA :-)

Just a few links:

\- Where everything is documented:
[http://pandoc.org/MANUAL.html](http://pandoc.org/MANUAL.html)

\- If you have questions or suggestions:
[https://groups.google.com/forum/#!forum/pandoc-
discuss](https://groups.google.com/forum/#!forum/pandoc-discuss)

\- Contributing to pandoc is also a great way to get your feet wet with
Haskell. In my experience, very supportive community. See
[http://pandoc.org/CONTRIBUTING.html](http://pandoc.org/CONTRIBUTING.html) and
for good first issues:
[https://github.com/jgm/pandoc/issues?q=is%3Aopen+is%3Aissue+...](https://github.com/jgm/pandoc/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)

Finally, a great feature, that hasn't been mentioned here, is pandoc filters.
Basically, pandoc provides a way for scripts (in any programming language) to
hook into the transformation pipeline and modify the document AST (similar to
the HTML DOM) in-between the reading and writing steps. See
[http://pandoc.org/filters.html](http://pandoc.org/filters.html)

~~~
paule89
everytime I see a project using google groups, i think it is already dead.
Gladly yours seems to be used quite often. At least you can search it even
years later, compared to an irc or slack channel.

~~~
neves
IRC channels and mailing list are excellent for informal questioning about a
project. You can search for guidance, see if a feature would be well received,
and receive a green light before starting to implement something.

Other day I thought about contributing to Yarn, the Javascript package
manager, but the only way that I found to communicate with the developers were
issues in GitHub. Since I didn't know if the feature I wanted would be well
received, I just quit.

~~~
dean177
That seems a little bit extreme, why not just open the issue?

~~~
y4mi
that issue becomes part of public history.

i'm not the parent, but that is the main reason i try to abstain from posting
on public forums unless its under a pseudonym, which my github account isn't.

I'm not trying to say that my anonymity is guaranteed with irc, its just
unlikely that future employers and similar link it to me.

~~~
cerberusss
Have a look at Firefox Multi-Account Containers. You can open a tab that has a
different color, and it uses a different cookie database. Very useful, because
you can create an extra Github account and quickly switch between those
accounts.

[https://addons.mozilla.org/en-US/firefox/addon/multi-
account...](https://addons.mozilla.org/en-US/firefox/addon/multi-account-
containers/)

------
koolba
My favorite pandoc hack is using it to convert word docs into markdown which
can then be diffed similar to source code. Works great for legal redlining.

~~~
tomrod
> legal redlining

Is this underlining, and not redlining as defined in financial services?
(redlining: differential pricing based on demographic makeup of a zip code or
neighborhood)

~~~
URSpider94
Attorneys and business folks use it to mean “marked up” - a redlined contract
has additions in red, and removals in red with red lines through them.

------
err4nt
I use Pandoc to convert directories of Markdown files into static HTML
websites.

Here's the build command for responsive.style[1]:

    
    
        pandoc $file -f markdown -t html5 -H templates/header-prod.html -B templates/nav.html -A templates/footer-prod.html -o (echo "../$file" | sed '$s/\.md$/.html/') -s  --data-dir=./ --highlight-style breezedark --variable=file:(echo "$file" | sed '$s/\.md$/.html/')
    

Works beautifully!

1:
[https://github.com/tomhodgins/responsive.style/blob/master/s...](https://github.com/tomhodgins/responsive.style/blob/master/src/build-
prod.sh)

~~~
ArlenBales
I know Blot.im as one static site generator that uses it:
[https://github.com/davidmerfield/Blot/blob/master/app/models...](https://github.com/davidmerfield/Blot/blob/master/app/models/entry/build/file/markdown/convert.js)

~~~
privong
Hakyll is another:
[https://jaspervdj.be/hakyll/](https://jaspervdj.be/hakyll/)

------
ggambetta
Another happy Pandoc user here :)

I built a pipeline to convert a Markdown file to publishing-ready files for
ebooks, Kindle and paperback for my novel; the whole thing is described here:
[http://www.gabrielgambetta.com/tgl_open_source.html](http://www.gabrielgambetta.com/tgl_open_source.html)

My website itself is static, generated from a bunch of Markdown files, some
HTML templates, and a bit of postprocessing. But most of the work is done by
Pandoc.

~~~
odiroot
Hey. Can you give us some more context of your novel writing in Markdown? I'd
be interested in your process.

~~~
ggambetta
Sure. The technical side of things is explained here:
[http://www.gabrielgambetta.com/tgl_open_source.html](http://www.gabrielgambetta.com/tgl_open_source.html)
(same link as above). If you're more interested in the creative aspect, I
wrote a bit here:
[http://www.gabrielgambetta.com/tgl_swiss_trains.html](http://www.gabrielgambetta.com/tgl_swiss_trains.html).
If you're interested in anything not covered there, feel free to ask, I'll be
happy to share :)

~~~
bravura
"Soon the structure underlying The Da Vinci Code and Angels and Demons and The
Lost Symbol was laying bare before my eyes. I could see why the stories
worked.

I had reverse-engineered Dan Brown."

Could you talk a little more in depth about what Dan Brown's pattern/structure
is?

~~~
ggambetta
I get that request a lot, so I'll have to write something about it :)

All I can offer right now is my raw notes, which are in Spanish. This is a
structural analysis of Angels and Demons, The da Vinci Code, and The Lost
Symbol: [https://imgur.com/bX6ByJA](https://imgur.com/bX6ByJA)

This is an one-page treatment of the three books, with the "blanks" filled
appropriately for each: [https://imgur.com/LlDVUKn](https://imgur.com/LlDVUKn)

I doubt any of this is groundbreaking. Story structure is a widely studied
topic (and one that I find fascinating). But it seems like Dan Brown uses a
very well defined, customised version of this, that makes for engaging, fast-
paced books.

I sort of proved (for myself, at least) that this works, by writing a novel
whose structure was originally based on this pattern (although it later
diverged a bit), and which causes the expected effect - a couple of readers
have read it in a single sitting :)

~~~
bambax
Very interesting! Could you maybe provide those links in text form, so that we
may run them through Google Translate? ;-)

~~~
ggambetta
English translation done!

------
myself248
The one thing it can't do is give HN posts descriptive titles.

~~~
wadkar
Thanks, this made me chuckle. Its comments like these which make HN a bit more
colorful :-)

------
jaggederest
Also, interesting point of trivia, the maintainer, John MacFarlane is a
professor of logical philosophy at UC Berkeley.

------
tambourine_man
One nice trick that I use all the time is to convert html to md and back again
in order to _clean_ it.

Anyway, pandoc is great.

~~~
gitgud
Would that be a good way to sanitise user input? Like removing script tags
etc...

~~~
tambourine_man
It’s usually not a good idea to “get creative” when it comes to security

------
mwcampbell
It appears that Pandoc generates PDF documents via LaTeX. One problem with
this is that, as far as I can tell, LaTeX can't generate tagged PDFs. This is
an accessibility problem. Granted, for documents that are heavy on math and/or
graphics, the point is probably moot. But many technical documents that are
distributed as PDFs would benefit from being tagged.

Luckily, LibreOffice can produce tagged PDFs. And unoconv is a convenient
utility for doing this from the command line. So you can use pandoc to convert
to a format that LibreOffice can consume, then issue a command like this:

    
    
        unoconv -f pdf -e UseTaggedPDF=true mydoc.odt
    

I've tried it, and it works.

~~~
mkesper
Pandoc can convert into ConTeXt which can produce PDF/A (tagging included)
easily. Why this can't be done in one command like with xelatex, wkhtml2pdf
and what else is supported, I don't know. Many programs can be used to create
PDFs but the quality of output isn't always the same.

~~~
mb2100
> Why this can't be done in one command

ConTeXt is supported as well: `pandoc input.md -t context -o output.pdf`

------
eevilspock
Pandoc's creator, John MacFarlane, is also the lead guy on CommonMark[1].

There are a small number of corner cases that need to be spec'd out before
CommonMark can declare a v1.0 release[2]. If you have the skills for this kind
of thing, please weigh in!

[1] [https://commonmark.org](https://commonmark.org)

[2] [https://talk.commonmark.org/t/issues-we-must-resolve-
before-...](https://talk.commonmark.org/t/issues-we-must-resolve-
before-1-0-release-8-remaining/1287?u=vas)

~~~
pmlnr
Please, please involve definition lists. They are useful. They were present on
the first webpage[1].

[1]:
[http://info.cern.ch/hypertext/WWW/TheProject.html](http://info.cern.ch/hypertext/WWW/TheProject.html)

------
ashton314
I wrote a little utility that uses Pandoc to read Markdown files like `man`
pages in the terminal:

[https://github.com/ashton314/marked-man](https://github.com/ashton314/marked-
man)

It's just a one-liner: `pandoc -s -t man "$1" | groff -T utf8 -man | $PAGER`

(That was basically stolen from an answer to one of my questions on Stack
Overflow—thanks to those who answered! :)

~~~
mixedmath
In a similar vein, I use pandoc to convert markdown pages to man pages, and
write new/add notes to manpages. I think it's definitely easier than actually
writing groff files.

~~~
beefhash
I find it easier to write man pages directly. Admittedly, I write mdoc (not
the ancient "man" macros), which has been around only since the 80s. It's
easier for me to remember the semantics ("Is this a flag/command/function?")
than the correct traditional markup ("Should this be bold/italic/nothing?").

------
flukus
Pandoc (or latex) + make + iNotifyWait work really well together for WYSIWYG
like editing too:

    
    
      watch: $(ALL)
        while true; do \
        clear; \
        make $(WATCH); \
          inotifywait -qr -e close_write .; \
        done
    

"make watch WATCH=build" will now compile documents on every save. Works well
for single documents, collections of documents or entire websites.

~~~
tetov
I've been using a JS script[1] to watch directories, but this seems neater.
Would you mind sharing the whole makefile?

[1]
[https://gist.github.com/timpulver/0d01285952b97deb70df6104cc...](https://gist.github.com/timpulver/0d01285952b97deb70df6104cc316a4b)

------
ryanianian
I sometimes use pandoc to clean up my markdown-formatted documents, especially
given its abilities to "wrap" text and add indentation-style whitespace that
makes plain-text documents look nearly suitable for publishing as-is (almost
kinda like RFC docs but without header/footer cruft).

There are a few things (in latest version, 2.2.3.2) that don't really survive
round-trip from markdown back to markdown:

\- reference-style links (e.g. `[foo][f]`). They are converted to inline links
e.g. `[foo]([http://...)`](http://...\)`).

\- setext vs hashmark headers. `foo\n=====` will get converted to `# foo`.

\- markdown allows for forced-linebreak <br>s to be added with two trailing
blank spaces at the end of a line. Pandoc escapes these with a trailing `\\`
at the end of the line.

These are only occasional nuisances, but overall the documents (at least in my
experience) are not butchered.

I also occasionally go from markdown to docx for the purposes of uploading to
google-docs and copy/pasting large sections into other docs. This is the only
markdown-to-google-docs workflow I've found that works to preserve formatting.
It's never really butchered anything, except a few times the syntax-
highlighting for code-blocks gets confused and keywords get the wrong colors.

~~~
confounded
IIRC, there are CLI flags for your first two points. I think the latter is
something like —atx-headers.

You can choose whether reference links go at the end of the paragraph or the
document.

------
CodexArcanum
I "love" how many comments are one person praising pandoc for helping them in
some workflow, and then commenters ripping into them for not using some other
tool. I wonder if there's a corollary to some internet rule that the more
generally useful a tool is, the more detractors will push for other tools to
be used? It would help explain why programming language discussions get so
contentious.

Pandoc is seriously a great tool! I love the way it's designed and have found
it useful off and on over the years. Truly marvelous for making information
available in any needed format.

------
phalangion
I love pandoc. I've been using it intermittently for years to turn my Markdown
and org-mode documents into other formats. Just wish it would take Asciidoc as
an input format.

~~~
copperx
Asciidoctor and the other asciidoc tools do the job that I use pandoc for:
tables, custom numbering, all the other markdown extensions that one needs to
be able to create a highly structured document. With Asciidoc, you don't need
md extensions. It's all in there.

~~~
phalangion
Ya, I've been using Asciidoctor and Asciidoctor-pdf for long time. Those are
some awesome tools, too.

------
jph
Pandoc is great software for converting among file formats, such as text,
markdown, HTML, PDF, etc.

Example:

    
    
        pandoc in.md -o out.html -V pagetitle="My Title" --to=html5 --template="my.html" --css "my.css"
    

The example converts a markdown file to HTML, using a given title, a template
file, and a stylesheet file.

The pipeline is also well implemented with Haskell, which is good for writing
your own fast functional transformations.

------
caconym_
I write fiction as a hobby, I do it in markdown and use Pandoc to turn it into
epub files with a custom CSS. It works great. Thanks Pandoc!

~~~
clebio
Is the CSS derived from the markdown, or you supplement MD to HTML with custom
CSS? Definitely curious to know!

~~~
flocial
I used a similar workflow. The CSS is for the EPUB and maps to the html
elements supported. But if you get too fancy cross device support could get
hairy.

See:
[https://github.com/FriendsOfEpub/Blitz](https://github.com/FriendsOfEpub/Blitz)

------
patricklouys
I used pandoc to format my book [0]. Not everything worked perfectly, I'm
pretty happy with how everything turned out (especially the print version).

It was a little work to set up the workflow with scripts etc, but being able
to write the book in markdown and still having full control over the design
was definitely worth it.

[0] sample here: [https://patricklouys.com/professional-php-
sample.pdf](https://patricklouys.com/professional-php-sample.pdf)

------
davnn
You can use the Haskell-based static site generator Hakyll with Pandoc to
create the best best blogging experience imho.

An example of how easy this is and the styles I use for my personal blog:
[https://curious.observer](https://curious.observer)
[https://github.com/davnn/curiousobserver](https://github.com/davnn/curiousobserver)

------
basementcat
Maybe I used an older version but my attempts to use pandoc usually resulted
in the document being butchered because the internal representation was not as
expressive as the source or target formats.

------
adzm
Pandoc is also a great educational Haskell project for those looking into how
it all works.

------
scentoni
If you don't want to install Haskell and other dependencies, several folks
have developed Docker images for using pandoc:

[https://users.soe.ucsc.edu/~ivo/_posts/2015-03-12-repeatable...](https://users.soe.ucsc.edu/~ivo/_posts/2015-03-12-repeatable-
paper-generation-with-docker-and-pandoc.html)

[http://gbraad.nl/blog/document-generation-using-markdown-
and...](http://gbraad.nl/blog/document-generation-using-markdown-and-
pandoc.html)

[https://github.com/jagregory/pandoc-
docker](https://github.com/jagregory/pandoc-docker)

~~~
uvtc
You'd only need to install Haskell if you wanted to _build_ Pandoc. Pandoc the
executable is a binary. I install it on Debian via: `apt install pandoc`.

~~~
mb2100
although the version in the default repo is usually quite old. You can grab a
binary from
[https://github.com/jgm/pandoc/releases/latest](https://github.com/jgm/pandoc/releases/latest)

~~~
mkesper
Bonus hint: If you somehow aren't able to upgrade the binary (enterprise...),
using an up to date template helps a lot (at least for LaTeX).

------
subinsebastien
Yet another pandoc user here. I built a blog engine using Pandoc as the core.
Code available here :
[https://github.com/subinsebastien/kyll](https://github.com/subinsebastien/kyll)
And the website built using the blog engine is available here :
[http://xtel.in/](http://xtel.in/)

------
rotorblade
I tried to use pandoc a while ago to convert the latex-sources of arxiv.org
documents to epub, since those are often much more comfortable to read on
small devices than pdfs.

The problem I had was that latex was turned into images, but changing the
font-size of the reader did not change the size of the images, making the text
readable, but the maths barely readable.

This is something I would love to see happen though.

~~~
fntlnz
Take a look at arxiv vanity [https://www.arxiv-vanity.com/](https://www.arxiv-
vanity.com/)

------
disqard
I like pandoc. I've been using Typora [1] for all of my writing, and it's
decent, but a little slow.

What editor do HN folks use? I wonder if there's a leaner editor out there
with an equally nice distraction-free editing interface. Thanks in advance!

[1] [https://typora.io/](https://typora.io/)

~~~
heliostatic
I've been really enjoying the Caret beta --
[https://caret.io/](https://caret.io/)

Not free, but a real pleasure to use.

~~~
applecrazy
I tried Caret and loved it but had to uninstall because of the huge font size
on equation renders in a math-heavy document. Is there a way to fix that? I
tried to look but they don't have much documentation yet.

------
hatmatrix
Even though org-mode has its own exporters, Pandoc is great for the extra
bibtex integration.

------
voltagex_
The only problem I have with pandoc is I have to lug the entire GHC around
with it.

~~~
nh2
That is not the case.

[https://pandoc.org/installing.html](https://pandoc.org/installing.html)

> We provide a binary package for amd64 architecture on the download page.
> This provides both pandoc and pandoc-citeproc. The executables are
> statically linked and have no dynamic dependencies or dependencies on
> external data files.

~~~
loudmax
There's an unofficial Arch package for it:
[https://aur.archlinux.org/packages/pandoc-
bin/](https://aur.archlinux.org/packages/pandoc-bin/)

I wish I'd known about this sooner. I don't spend much time with text
documents outside the web, but when I do, pandoc handles the disparate formats
admirably. The only inconvenience is when I update my system, there's
guaranteed to be a huge pile of Haskell libraries to download.

------
shakna
What don't I use it for?

\+ Static websites from any input to html

\+ Markdown & TeX & References to pdf for academia

\+ Generating manpages for new tools

\+ Generating ebooks

... Let's just say I get a bit lost when it isn't available.

~~~
geraldcombs
> \+ Generating manpages for new tools

Do any of your tools use long options (prefixed with a double dash)? If so,
make sure you disable the "smart" extension, otherwise you might end up with
en dashes.

~~~
copperx
OP said he doesn't use pandoc for such things. It's a list of things that have
better tooling.

~~~
shakna
I'm sorry if it came across that way. The list is what I use pandoc for. I use
it for a lot, so much that I think I use it for every project.

~~~
kaushalmodi
I found it confusing too. You were supposed to use interrobangs (‽) instead of
"?" :)

~~~
shakna
Would that not only be applicative if I were making an exclamation? Whilst
instead, I was making a statement.

~~~
kaushalmodi
It sounded like a rhetorical question, and based on the context that you use
it for a lot of things, I sensed a bit on excitement there too, and thus the
exclamation.. like "Isn't that awesome‽".

------
bovermyer
I love pandoc, but I'm very surprised that such an established tool has (at
time of writing) 865 points and is #1 on HN.

I guess it's not as well-known as I thought.

------
epynonymous
i have been using catdoc and pdftotext to convert doc and pdf files,
respectively. nice to see that there's an alternative that also includes a
library, will be checking this out.

a couple questions i have, seems firstly that old school .doc files are not
supported, docx yes. unfortunately i still get a lot of docs in .doc format
which seems to be microsoft's proprietary format (docx seems to be more open).

my second question is whether or not there's a filter for golang, most of my
development is in golang, so i either need to call your cli as a forked
process or best to have a native library. i have never worked with haskell so
not sure if i can import a haskell library from golang directly. i imagine
there'd need to be a golang wrapper around the cli.

~~~
duckerude
You could use Libreoffice's command line interface to convert from .doc to a
more manageable format.

    
    
      lowriter --convert-to odt some-document.doc
    

odt is not the only supported target, but doc --libreoffice--> odt --pandoc-->
plain seems to give better results than e.g. doc --libreoffice--> txt or doc
--libreoffice--> docx --pandoc--> plain.

~~~
epynonymous
if that's the case, i'll stick with catdoc. my use case is to create a full
text search index of the content, trading libre office cli for catdoc, i'd
rather just stick with catdoc, but thanks.

------
GlenTheMachine
As a guy attempting to transition from macOS to Linux:

Pages to anything else, please.

~~~
jagger27
A quick Google suggests that the most straightforward way is to run an
Automator script to convert everything to PDF using Pages itself.

~~~
GlenTheMachine
Yeah. But then you can't edit it. Converting it to opendoc or something would
be more useful.

------
kccqzy
Pandoc is great! I use pandoc for all kinds of formal writing (conversion to
PDF via LaTeX). We also run pandoc in production to produce customer-facing
PDFs.

------
bkyan
Is there an equivalent of this for spreadsheets?

~~~
scentoni
The closest thing I'm aware of is the spreadsheet functionality in Emacs org-
mode:

[https://orgmode.org/worg/org-tutorials/org-spreadsheet-
intro...](https://orgmode.org/worg/org-tutorials/org-spreadsheet-intro.html)

[https://orgmode.org/manual/The-
spreadsheet.html](https://orgmode.org/manual/The-spreadsheet.html)

~~~
bkyan
Sorry, I'm a little confused... How would I use org mode to convert between
different spreadsheet formats?

------
rllin
frustratingly slow for word docs. antiword is better for those of you who wish
to convert word docs en masse

------
nambit
I have used pandoc with uikit to autoconvert my markdown pages to html. Works
like a charm.

------
rydel
Really one of the best tool! Simple to use and makes things done.

------
fastier
Where is .djvu?

~~~
gwern
Do you need an option for that? You can convert to PDF and then `pdf2djvu` it.

------
boonasty69
updated and secure.

------
another-cuppa
I write any document that doesn't need extensive custom typesetting (which is
90% of stuff) in org-mode and then use pandoc to convert it to "normal people"
formats at the end. I have made a basic template for MS Word that looks pretty
good.

------
Numberwang
I wish they’d fix the md to adoc table conversion issues. Apart from that I
love it.

~~~
kevin_thibedeau
The core problem with Pandoc is that the internal document representation is
limited to its particular flavor of Markdown. Any feature PD-MD doesn't
support is ignored or loses semantics. You can see this in the poor ReST
support (try converting captioned figures). It would be useful to rearchitect
it with a Docbook-style semantics internally since they are more comprehensive
than Markdown.

------
euske
I know it's well intended and somewhat successful, but I can't help but
thinking of xkcd.com/927

Sorry, I couldn't resist.

~~~
Lio
Although it does offer some useful extensions for Markdown, Pandoc doesn't
attempt to establish new standards.

It's a conversion tool for existing formats.

