Hacker News new | past | comments | ask | show | jobs | submit login

I do all of my academic writing in pandoc. As compared to LaTeX this means no boilerplate (yet you can still use full LaTeX syntax for equations and the like) and, if the publisher 'needs' a Word file, you are one click away from providing it. All with plain text files that you can put under version control, get meaningful diffs, etc. It's just great.

Sorry to be pedantic - but I didn't think that 'pandoc' was an actual document format purely a tool for converting between formats. Do you mean that you do your writing in kind of a 'pandoc flavoured' markdown? [0]

[0]: https://pandoc.org/MANUAL.html#pandocs-markdown

Well, between Pandoc's markdown flavor and how it has its own way of letting you insert latex code anywhere, you're not going to be able to process your document using anything that's not trying to be compatible with pandoc documents.

I’ve never understood the impetus for not using full LaTeX in an academic contex, given that the boiler plate is so minimal and presumably one has a built up a personal template over time.

For blog posts and notes I see the appeal, since the boilerplate can be a hindrance to spontaneous writing.

Latex can't produce web output, which is increasingly a target I want.

Also, Latex can't produce any output which is accessible to blind people (other than giving them the raw LaTeX). The PDFs latex produces are probably the least accessible format available (much worse than a word proeuced pdf, or some html). This matters to me, and should matter more to other people (in my opinion).

BUT that is what makes Pandoc powerful. You convert your latex or your whatever into: (Can we please add Racket's Scribble? It is by far the reason why Racket has the best documentation of any language. https://docs.racket-lang.org/scribble/)

Markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Vimwiki markup, OPML, Emacs Org-Mode, Emacs Muse, txt2tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to

HTML formats

    XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides
Word processor formats

    Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML, Microsoft PowerPoint.

    EPUB version 2 or 3, FictionBook2
Documentation formats

    DocBook version 4 or 5, TEI Simple, GNU TexInfo, Groff man, Groff ms, Haddock markup
Archival formats

Page layout formats

    InDesign ICML
Outline formats

TeX formats

    LaTeX, ConTeXt, LaTeX Beamer slides

    via pdflatex, xelatex, lualatex, pdfroff, wkhtml2pdf, prince, or weasyprint.
Lightweight markup formats

    Markdown (including CommonMark and GitHub-flavored Markdown), reStructuredText, AsciiDoc, Emacs Org-Mode, Emacs Muse, Textile, txt2tags, MediaWiki markup, DokuWiki markup, TikiWiki markup, TWiki markup, Vimwiki markup, and ZimWiki markup.
Custom formats

    custom writers can be written in lua.

Well, except LaTeX probably isn't the best base format to write in -- Pandoc's LaTeX parser isn't very good, it doesn't parse (from a quick check) any of the papers I've written. They've tried hard, but I think it's a losing battle, particularly once people start using a large range of packages.

That's not surprising -- it's basically impossible to "parse" LaTeX, as it's defined by execution.

iirc pandoc's markdown provides the set of functionality that one is capable of transforming back and forth. So as long as you stay within those formatting confines, you are set.

This works for everything except table notes a la ```threeparttable```

What about htlatex? It is quite powerful. In most of the cases, it produces nice HTML pages out of the box, with automatic rendering of figures and mathematical equations into PNG. It is part of most LaTeX distributions. On Linux, for example, just type

  $ htlatex mydoc.tex
instead of

  $ pdflatex mydoc.tex

For me, at least, htlatex never works just quite right. There are a lot of edge cases where it's broken. If you want to preserve having non-PDF output, starting in something like Pandoc Markdown is a better idea. And I do most of my documents in regular LaTeX.

>Also, Latex can't produce any output which is accessible to blind people

This sounds like it should definitely be a target of a grant. I guess most government organisations around the world are using Word et al, which isn't too bad these days accessibility wise (AFAIK).

Can you provide a small example of a LateX document that produces an inaccessible PDF?

If you grab any academic paper (particularly two columns) there is a good chance getting the text out will be hard, and any part of the paper with maths or tables will be unusable. Sorry. I'm away from a computer now, to make a smaller example.

The paper "GADTs Meet Their Match" (first I had in my list) seems to work fine, but I don't know what it was generated with.

cairo 1.13.1 is listed as the generator.


The ACM template fails more! http://checkers.eiii.eu/en/pdfcheck/?url=https://www.acm.org..., and it's generated by pdfTex-1.40.15

I'll pick on one of my own random papers:


Try extracting "Theorem 2" on page 5, or any text really. I just get random noise through either a PDF reader, or something like pdf2ascii / ps2ascii.

We just made this with standard latex.

Any chance you could post the source code for this? It's using bitmaps for characters instead of proper fonts, which shouldn't happen nowadays. Maybe you should put "\usepackage{lmodern}" at the start? See for example https://tex.stackexchange.com/questions/1291/why-are-bitmap-...

I work with course materials made in Latex, and students sometimes need/want to copy and paste from them, so I try to avoid these kinds of problems.

That’s interesting. Did you \usepackage[T1]{fontenc}?

Thanks paper is from 2003, so I'm not sure.

This is just an example. From experience, most PDFs at conferences and journals, generated from pdf, are not accessible to varying degrees.

Accessibility is a big current push from the TeX Users Group. The president, Boris Veytsman, has made moving it forward a big goal. I know that a lot of people are working on aspects of that, but the name I hear the most is Ross Moore, who I have heard talk on making the output be PDF/A-3a compliant. I understood that it is a long way there.

I hope so, because honestly, Tex generated PDFs are the single biggest problem with being a blind researcher (I'm not blind, but I know a blind researcher).

>I’ve never understood the impetus for not using full LaTeX in an academic contex, given that the boiler plate is so minimal and presumably one has a built up a personal template over time.

I don't find the boilerplate minimal at all. Contrast the following:

     \item First
     \item Second
     \item Third

     - First
     - Second
     - Third
I won't even get into the hell that is tables.

I loved LaTeX until I discovered Org Mode. Pandoc also scratches the same itch.

I agree. If one is going to use LaTeX directly or indirectly via Pandoc, eventually one would have to build up a personal template to fine-tune the look and feel of the documents.

If one is going to write LaTeX code anyway, it seems easier and cleaner to use LaTeX all the way, move all the boilerplate along with the personal template to say, a file named preamble.tex, and \input{preamble.tex} in the documents.

However, there are situations where Pandoc can be convenient. For example, I wanted a document[1] to be written primarily as README.md (CommonMark format), so that GitHub could render it as the project README. At the same time I wanted to render a PDF output from a customized form of the content. Pandoc is convenient for cases like this although it takes a bit of work to fine-tune the formatting and customize the content for each output format.

[1]: https://github.com/susam/gitpr

[2]: https://github.com/susam/gitpr/blob/master/Makefile

>If one is going to write LaTeX code anyway, it seems easier and cleaner to use LaTeX all the way, move all the boilerplate along with the personal template to say, a file named preamble.tex, and \input{preamble.tex} in the documents.

Not sure why you think it has to be that way. I author LaTeX documents using org mode. Org mode handles most of the boilerplate, and I can still put pretty much any custom LaTeX within the org document, wherever I want it (this includes \newcommand, etc). I lose nothing by going to org mode, and I gain much in terms of reduced boilerplate.

Yup. I’ve got a pandoc template for doing org-latex-pdf conversion, as well as some org templates for common documents that my clients need. Hack away on the document in org (which I’m probably going to be doing anyway, since the rest of my life is in there too), and then when it’s ready to hand off, turn it into a PDF using a shell script.

My absolute favourite moment with that flow was a client who wanted one as a docx instead of a PDF. Pandoc obliged and they commented that I must have spent a lot of time reformatting things for them :)

Why not use Org's built-in org->latex->pdf exporter? AFAIK Pandoc isn't compatible with many of the more interesting Org features, such as Babel.

That's a good question! The flow started out as markdown->latex->pdf via pandoc, and then when I got back into Org, it just slid right into that workflow to replace Markdown.

I'm curious now though... maybe I'm missing out!

It isn't clear to me whether you are saying that Pandoc is necessary or if you are saying that Pandoc is unnecessary and LaTeX alone is sufficient for all purposes.

I think your parent comment was saying that LaTeX alone is sufficient. You also seem to be saying that LaTeX alone is sufficient while using Org mode. Would you please clarify if I am interpreting your comment correctly or not?

>It isn't clear to me whether you are saying that Pandoc is necessary or if you are saying that Pandoc is unnecessary and LaTeX alone is sufficient for all purposes.

I'm not saying either. The parent said it's easier and cleaner to use LaTeX all the way. I was pointing out that it is easier to write in a format like Org mode and export to LaTeX (whether via Pandoc or Org mode's built-in exporter).

Of course LaTeX is "sufficient". It is also, IMO, painful.

Pretty sure they are saying pandoc is unnecessary.

I wrote my dissertation using Pandoc. It might seem that the LaTeX boilerplate is minimal, but Markdown is even more minimal, and it preempts the urge to fuss with your layout. Writing in Markdown means that you can wave your hand at the document and say, "It's a draft, I'll fix the formatting once I'm sure I even want this material." Afterwards, fixing the layout is really easy because you can drop raw LaTeX in wherever you need to, and you haven't wasted countless hours laying out a float you later end up cutting.

Having to use `\textbf{...}` is impetus enough for writing in Markdown instead.

LaTeX editors have simple keybindings for this, like ctl-b., or C-c C-f C-b in emacs, which makes this kind of thing a non-issue for me...

Just reading C-c C-f C-b is an issue for me and I haven't even tried to remember and type it yet.

C-c C-f is a prefix key, you get bold with C-b, italic with C-i and so on. At least it was last time I used AucTex :)

I agree - while pandoc is great it's usually not 'one click' to any format, especially when have html or latex-specific markups.

It's not for everyone, but emacs+auctex really reduces the latex boilerplate (at least writing it) that I don't really feel it's a hindrance.

I didn't use LaTex for years, is it still a hell to make tables? And also very difficult to use templates to generate good looking documents that doesn't like an academic paper?

Yes! It's great to be able to put LaTeX-formatted equations directly into your pandoc-flavored markdown source file.

Incidentally, I really like the thoughtful syntax additions Pandoc makes over olde Markdown (eg., tables, definition lists, and span & div syntax as well). Such a great all-around doc tool.

What's your workflow for inserting and managing references

Not OP, but I used `+citations` and `pandoc-citeproc` along with a bibtex file that I managed by hand for https://bernsteinbear.com/dat-paper/ (a small senior project paper). It worked pretty well for me.

Add bibliography=path/to/library.bib (and optionally specify a csl for bibliography formatting; I like econometrica) in frontmatter yaml. Insert citations with @bibcitekey. compile with --pandoc-citeproc filter.

It was a couple of years ago that I wrote my dissertation using Pandoc, so things may have changed. At the time, I started out using pandoc-citeproc with my BibTeX database, but eventually I needed more control over formatting and I switched to writing \cite everywhere. Even with hundreds of references, it only took an afternoon, so I'm happy I did it the way I did. My approach with Pandoc is to use it until you have to invest LaTeX-level effort into making it do what you want. At that point, swapping in LaTeX is rarely painful. Often you can get away with editing Pandoc's generated LaTeX and pasting it back in to your source.

You can control the formatting pandoc-citeproc (which is now built in to pandoc) produces with a CSL file. That's great if your institution provides one, otherwise... you'll have to learn CSL ;-/

I use pandoc-crossref.

> if the publisher 'needs' a Word file, you are one click away from providing it

Once the work has moved into a Word file, isn't that where it stays? Editors and publishers often make heavy use of features like track changes and notes. Doesn't pandoc lose that information?

It does. I think the assumption here is that the author is the only contributor to the document. Exporting into a Word doc would serve the same function as exporting to a .pdf, others could read it and even mark it up, but the author would have to make the noted changes in their original plain text document themselves.

pandoc has a --track-changes option, so you can convert a docx file with its proposed changes back to, say, markdown.

I tried and it didn't work for me. Pandoc's conversion functionality is good but unfortunately also fails very often, at least in my experience. I suppose with custom templates and a lot of trickery I could get it working for the kind of papers I write, but I've found it easier to convert LaTeX to Word manually when needed - which is a pain in the ass, too, of course.

In my experience it works so long as you keep to very vanilla LaTeX code. Pandoc's support for LaTeX packages tends to be very patchy.

I have to put in a word for Racket's Scribble. Programmiclly creating documents is powerful, and this system makes it simple. You can also basically use it as a "Markup-less" system.

Scribble Code Example:

#lang scribble/base

@title{On the Cookie-Eating Habits of Mice}

If you give a mouse a cookie, he's going to ask for a glass of milk.

@section{The Consequences of Milk}

That ``squeak'' was the mouse asking for milk. Let's suppose that you give him some in a big glass.

He's a small mouse. The glass is too big---way too big. So, he'll probably ask you for a straw. You might as well give it to him.

@section{Not the Last Straw}

For now, to handle the milk moustache, it's enough to give him a napkin. But it doesn't end there... oh, no.

Scribble -

Scribble is a collection of tools for creating prose documents—papers, books, library documentation, etc.—in HTML or PDF (via Latex) form. More generally, Scribble helps you write programs that are rich in textual content, whether the content is prose to be typeset or any other form of text to be generated programmatically. - https://docs.racket-lang.org/scribble/

Some languages based on Scribble

Skribilo -

Skribilo is a free document production tool that takes a structured document representation as its input and renders that document in a variety of output formats: HTML and Info for on-line browsing, and Lout and LaTeX for high-quality hard copies.

The input document can use Skribilo's markup language to provide information about the document's structure, which is similar to HTML or LaTeX and does not require expertise. Alternatively, it can use a simpler, “markup-less” format that borrows from Emacs' outline mode and from other conventions used in emails, Usenet and text. https://www.nongnu.org/skribilo/

Pollen -

Pollen is a publishing system built on top of Scribble and Racket. So far, I’ve optimized Pollen for web-based books, because that’s mainly what I use it for. But it can be used for small projects too, and non-webby things like PDF.

As a publishing system, Pollen includes:

    A programming language. The Pollen language is a variant of Scribble, with specific dialects tailored to different kinds of source files. You don’t need to use the programming features to do useful work, but they’re available when you need them.

    A set of tools & libraries. Pollen can produce output in any format, but it’s especially useful for markup-style formats like XML and HTML.

    A development environment. Pollen works with the DrRacket IDE. It also includes a project web server so you can dynamically preview and revise your publication. http://docs.racket-lang.org/pollen/Backstory.html

They are Domain Specific languages that excel at outputting awesome HTML and PDF. They really aren't markup but really they are a Macro system that is built on top of a full Lisp (Racket) It is easier and much more powerful then anything I have seen on Pandoc and Latex (I use Latex still for specific targets but not for general papers anymore).

Racket has the best documentation period and it is because the documentation

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact