Hacker News new | past | comments | ask | show | jobs | submit login
Pandoc (pandoc.org)
824 points by swatson741 9 months ago | hide | past | favorite | 292 comments



This is one of the most useful programs that I use.

I use it for turning .md files into .html or .pdf.

I use it for creating slides with it.

I even use it for fixing the hard-wrapped text I write in vim before sending emails. When I write in vim, I prefer the text to be hard-wrapped, but for emails, I like it better when the text is not wrapped. I recommend arp242's essay explaining the problem with hard-wrapping [1], but basically the way I workaround this problem is using a local script which uses pandoc at some point [2].

Overall, pandoc is really good.

[1]: https://www.arp242.net/email-wrapping.html

[2]: https://github.com/kugurerdem/dotfiles/blob/2d68357273e1bc30...


I'll probably lose some nerd cred: I do most of my writing in MS word, I find it easier to cut and paste and add footnotes and section headers. And then I use Pandoc to convert the docx into the format I want, usually HTML*. I used to do markdown in vim but I found that for most of what I do I prefer word. I do write code in vim...

* I use this css file when converting: https://gist.github.com/killercup/5917178


I also use Word for all my writing too, so wanted to defend you. People (tech people at least) have lost sight of the fact that writing should happen in "word processing software", and Word is the best-in-class. To people who say "Word sux", I say "That just means you don't know how to use Word properly". Writing any markup or markdown syntax in an IDE is a disaster for the creative process. Jack Kerouac used to type using rolls of paper instead of sheets so he didn't have to stop is his process. He "got it".

As for pandoc, yes it's amazing, and I have been using it to convert my word documents to markdown so I can publish a technical textbook I'm working on using Quarto. I tried writing directly in Quarto for a while, but as per my point above, it really slowed me down and distracted me from actually writing, so I figured out the pandoc pipeline. My most favoritest feature so far is that it converts tables AND equations to markdown and latex perfectly. It's so seamless that I'd actually recommend Word->pandoc as the best way to write a complicated markdown table.


If you want to use Word, then that's great, but when you start saying what other people 'should' be doing then they're going to speak up.

> writing should happen in "word processing software"

Writing should happen where you are comfortable editing text. I am comfortable in same editor where I write code.

> To people who say "Word sux", I say "That just means you don't know how to use Word properly". Writing any markup or markdown syntax in an IDE is a disaster for the creative process.

To all the people who say "Writing markup in an IDE sux", I say "That just means you don't know how to use it properly". I can write in a flow and apply/change formatting easily. I can jump around and rearrange documents with ease. And it is in a format that can be opened and read by native software on almost any computer.

If you think that Jack Kerouac would prefer MS Word over a much simpler plain text editor, I don't agree.


> Writing should happen where you are comfortable editing text.

For that particular context of text.

I write my code, notes, text and emails in vim. Some in markdown. But to this day, I still miss the incredible usability of LyX 1.x when writing pure long-form text.[ß] I could force all writing to occur within central 60% of the editable screen height [doable with vim but not as cleanly]. No whitespace or formatting issues, ever - set the document defaults according to my liking and it would feel "just right".

Proper rendering and visually correct editing of math formulas as part of text. Oh my. Fond memories of being able to type '<raw latex hotkey>\frac' and continue fitting in the values...

If it had had vim's search-powered navigation, it would have been nearly perfect. LyX 2.x was a step up in visual appeal and two steps down in raw usability. I've since picked up writing raw latex where I need good formatting, just because I could not make LyX 2.x bend to my taste anymore.

ß: Back on early 2000's, I wrote a book in LyX. As well as all my university course papers, including the master's thesis.


> but when you start saying what other people 'should' be doing then they're going to speak up.

That's not a prerequisite for the Latex folks, in my experience :D

(I'm also in the "wrote my masters thesis in word, including the fair number of equations" group)


Of course you can choose what you use for your personal pipeline.

But Word for collaborative work is a nightmare. Sometimes fastest way to work with the science text or documentation in a Word document is to hire a freelancer to retype all the text in LaTeX or something else.

It's a kind of Trolley problem: sometimes Word fanboys, unfortunately, _should_ suffer to let all the team get the job done.


> But Word for collaborative work is a nightmare

Why? It has comments, tracking etc. Concurrent edition i impossible, though (even when MS says it is possible). For that Google Docs is great (or some self-hosted systems)


> Why? It has comments, tracking etc. Concurrent edition i impossible, though (even when MS says it is possible). For that Google Docs is great (or some self-hosted systems)

It doesn't scale. At all.

I used to work at a university lab group where all 30 of us would need to concurrently write, edit and review 150+-page, heavily technical reports with lots of diagrams and tables spanning pages. To be clear, most of the time all of us were working on the exact same huge document.

Word's version tracking stood no chance. Formatting was regularly off, tables were breaking apart, diagrams misplaced. Syncing was extremely bad, often with entire paragraphs in changes going missing, other times deleted portion were reappearing, all that jazz.

LaTeX on an online collaborative environment (well-known, not naming it -this post isnt an ad) on the other hand, despite its archaic way of working, never showed any of those problems. If a table was placed somewhere, we could be sure it would never get moved to random places, and changes/rewrites would be always synced correctly (as LaTeX source is plain text, merging algorithms/CRDTs have a much easier time).


> all 30 of us would need to concurrently write, edit and review 150+-page

As I wrote, it does not work at all for concurrent access - I mentioned Google Docs & Co for this.

> LaTeX on an online collaborative environment (well-known, not naming it -this post isnt an ad)

I wrote my MSc and PhD thesis in LaTeX (physics) so I know how fantastic it is. You write content without caring for the container - and since changing anything is black magic you give up and do not try (which is a VERY good thing - it just works).

I never used Overleaf though (I guess that this is the product you refer to). I guess that having a concurrent system (such as etertab or something - or Overleaf if it supports truly concurrent editing) is the graal.

The drawback is that you need to know the language to cooperate. In a university setting this is not complicated, in a company - not so much.


> As I wrote, it does not work at all for concurrent access - I mentioned Google Docs & Co for this.

Indeed Google Docs is much better - we also used that - but it's still a WYSIWYG editor, which IMHO it translates to 'extremely hard to enforce style'.

> I never used Overleaf though (I guess that this is the product you refer to). I guess that having a concurrent system (such as etertab or something - or Overleaf if it supports truly concurrent editing) is the graal.

Yep, Overleaf was what we used. Its paid version was very much like Google Docs but on a plaintext editor wrt. to concurrent access. It could even do change tracking, comments, all the jazz, even Git synching (which we used for backups and CI)

> The drawback is that you need to know the language to cooperate. In a university setting this is not complicated, in a company - not so much.

I'm curious as to why. If the company is new and built on LaTeX from the very beginning why not? When I joined, I didn't know the language at all, but that wasn't a problem-one would learn on the job.


> If the company is new and built on LaTeX from the very beginning why not?

It really depends on the company. I worked (and work) in large high-tech companies and whenever I tried to introduce something like Markdown I quickly hit he wall of non-technical people who did not want to try a new system. They new Word, were suffering with Word but did not have the mindset to give a try to something different.

For the ones on Google Docs it was even more difficult because, arguably, Google Docs is a really neat product for collaboration.

My teams use Markdown for all text (either Obsidian or internal wikis) but this is because they are good in what they do and that they fear their management line :) :) (just kidding)


Google Docs are incompatible with complex formatting (LibreOffice and even Office365 are too). Moreover, even the Word itself is kinda incompatible with complex formatting: sooner or later one of your collaborators will copy/paste some text with a formatting you will be just unable to change/clear.

Anyway, Word «collaborative» features are so much worse than git repository and pull requests! And even you need to collaborate with some extremely non-technical folks, in the case of LaTeX they still use add comments to the PDF file — and this workflow is still way more productive than editing the same Word document.

Other reasons to not to use Word in collaborative pipelines are already mentioned in neighbor threads.


See? I didn't even tell anyone what they should do, and here's a Latex guy spouting off :)


Most tech people who use Word know how to use it and that is exactly the problem. We don't have the benefit of ignorance. We send off drafts to someone for commenting or editing and get an inconsistently formatted mess back because others don't even know that styles exist and use manual / direct formatting instead.

We are the ones that have to suffer because people we are forced to collaborate with do not take 5 minutes out of their day to learn the basics of a tool they use professionally.

It is the equivalent of seeing somebody use right-click to copy-paste, except it tangibly makes my day worse.


Good point, sigh...this is so true. I have created a 45 minute tutorial of 'how to use word properly' to address this exact pain point. I send a link to my students if their first draft commits any 'sins'.

BTW, when I said tech people, I was thinking mostly about the computer savvy academics who use Latex for everything.


People use things like LaTeX exactly because of this problem. Word processing software brought the problem of inconsistent formatting and layout within the grasp of everyone and boy! ...did they grab it with both hands. Systems based on plain text allow the author to concentrate on content only, without the need to format.

Personally I find LaTeX misses the mark. Too much markup is needed and it detracts. I'm a fan of asciidoc though. I just wish the templating was a little better though.


TeX documents have a distinctive look because of their typography. Back when printed resumes were a thing, whenever I saw one done in TeX, I would recognize the CM font right away. I'd then look at the resume early, since I knew that it came from a nerd. Mine was of course also done that way.


Unless you’re looking for Haskell nerds, in which case you should prioritize Comic Sans.


Simon PJ glares :)

For those not in the know, Simon Peyton Jones is one of the originators of Haskell, and sort-of-but-not-quite BDFL of Haskell... and he uses Comic Sans for all his presentations because it filters out people who will complain about font choice for a presentation.


I write code in Comic Sans, so I'm not one to judge :)


There is also Coq's documentation system, which has its own distinctive look:

https://softwarefoundations.cis.upenn.edu/lf-current/Basics....


It's ideal for hiring - you immediately know which resumes to read and which one to dump into the trash.


I wouldn't dump a non-TeX one in the trash. It's just that a TeX one tends to draw my interest right away.


So CVs from folks who know LaTeX good enough to change Computer Modern to something else will be dumped into the trash…


My personal sweet-spot for writing technical documents these days is Asciidoctor with semantic line breaks [1]. There are some warts in the asciidoc syntax but it covers a lot more of the features I need for the types of documents I'm writing compared to Markdown.

[1] https://rhodesmill.org/brandon/2012/one-sentence-per-line/


> I was thinking mostly about the computer savvy academics who use Latex for everything.

To be fair, for a significant number of things LaTeX is the better environment, even with it's own issues.


Would you mind sharing that link? I completely understand if you’d rather not for privacy reasons though! If not could you recommend a different tutorial?

I come from a trades background and am now in the academic field (teaching and admin for my trade) and I would really like to do some professional development in this area. Also excel! I know I’m lacking and it would help my long suffering director haha.


Is there a way to lock documents in some way, so that direct formatting is disallowed and only styles work? Not that we could truly lock a document, but at least having some sort of header that lists allowed features, such that one would get a warning whenever they veered off course?


Don't think such fine grained control exists. There is read only mode[1], which (I assume) limits the reader to only comment.

[1]: https://support.microsoft.com/en-us/office/make-a-document-r...


Sometimes direct formatting is the correct choice, such as when applying bold or italics to individual words or sentences within a paragraph. Making a separate style for those is lunacy!


Yes. Protect document and then select limit formatting to a selection of styles. Can’t remember the exact name of the checkbox.


So what you're saying is that the problem with Word is that it makes it easy to write unpolished and unprofessional documents? In your opinion, is that the fault of the word processor or the user? [0]

[0]: Not a rhetoric question that implies one answer to be the only "right" one.


I find that question uninteresting, because no matter where the fault lies (it's probably somewhere in the middle) the end consequence is that there is no use case where it makes sense for me to use Word (unless I'm literally held at gunpoint).

I'm either working on a document for myself or with collaborators.

In the first case I'll use markdown for simple things or LaTeX for bigger things since I can work in a familiar environment where I work most efficiently (VSCode).

In the second case, I'll work with collaborators so I will never be able to trust that a document I have sent off for reading is still consistently formatted when I receive it back. This means that any benefits of the collaboration tools (eg. review history or suggesting changes) are wiped off the table. I will have to integrate any suggested changes into my own authoritative version of the document by hand anyways. At that point I may as well work where I fell comfortable and use markdown / latex, and send off Pandoc converted word files for comments by others when it is relevant.

This is of course for serious pieces of writing, not throwaway stuff like eg. meeting notes, but for those Google docs is plenty.


IMO entirely user.

Word (and similar text processors) is a toolset. All it gives you are tools. Word lets you define formatting rulesets, lets you apply formatting rules and create rulesets from applied rules.

Some evangelists may say that "safe" text processor would only allow application of rulesets, because direct application of rules leads to "spaghetti formatting". However that is one of the powers of WYSIWYG text processors: you apply the rules and extract those to rulesets once you are satisfied with results, in an explorative way. Direct application of rules is a feature that makes Word what it is.

Now, if a user takes a document with predefined rulesets and still applies their own rules inconsistently that's simply misuse of the tool.


But what of the potential lack of a compatible toolset on the other end of this pipeline? Even embedding styles in a Word document offers no assurance the document will appear as you intend when they receive it, much less if they offer edits and comments and send it back.


I don't get your point. Since IIRC Office 2007, docx is the default format which is designed for compatibility. IIRC it allows embedding of fonts and other non-text content, making compatibility concerns an edge case. If all parties use relatively recent version of MS Office with overlapping featureset, there should be no compatibility concerns, unless somewhere in the collaborative pipeline you are departing Office ecosystem altogether.

I don't think discussion around fault is meaningful in such scenario altogether.


> It is the equivalent of seeing somebody use right-click to copy-paste

What would you prefer, instead? CTRL-C, CTRL-V? SHIFT-INS, CTRL-INS? some vim incantation?

As a select-to-copy/middle-click-to-paste guy, seeing people use these inferior alternatives looks extremely annoying to me.


I think that depends if your workflow is oriented around the keyboard or the mouse. For me it is keyboard oriented and I frequently use (Ctrl)+Shift+Arrows for finegrained selection anyways so Ctrl+C/V is most convenient. I also use Vimium in my browser. If my workflow was mouse oriented I'd instead use Gesturefy.

I think they're equivalent and certainly both are better than using the right-click menu.


I don't always use the mouse, but when I do, I only use the mouse, no keyboard needed at all.

Using CTRL+C/V requires an unholy synchronization of mouse and keyboard. But unix-style middle-click paste is entirely mouse-controlled and very elegant. Of course, if you are inside a text file, you can use vim keyboard tricks that are even faster because you don't need to select the text.


> Using CTRL+C/V requires an unholy synchronization of mouse and keyboard.

You can select text with the keyboard by holding down the shift-key, and use the cursor-keys or use ctrl to efficiently jump by word-boundaries.

No mouse required.


There are other reasons to not like Word other than 'it six's. One is availability. If you're on Linux all day, your options don't really contain word - it's not free to download and install due to needing a license. You have to go and find some other computer that runs windows, and start it up - just for that one document. Then you have a single use computer, which obviously isn't a great experience. Closest thing that I use is Google docs, sometimes. I (and many others like me) know how to use word (and excel) pretty well due to some life in that world. I still remember most of many of the hotkey combinations to step through the menus. The reason I use vim for everything now is - I use vim for everything. I live in vim/tmux/ssh/tui, so most problems go there first if they can. There are obvious benefits to the vim/git/Typst setup, if plots need to go in a document for example, but it's also strengthened by word not being easily available.


There is a free web version of Word that is pretty good.


Do you mean a Word from Office360?

From my experience Word from Office360 breaks complex formatting of desktop Word document even worse, than LibreOffice/Google Docs does.


You lost me at “web”. Local-first is so much nicer for my workflow—and I am sure this holds for most other people.


The person I responded to uses Google Docs.


Key word, sometimes:

Closest thing that I use is Google docs, sometimes.... The reason I use vim for everything now is - I use vim for everything.

<https://news.ycombinator.com/item?id=39230525>


You mean OpenOffice? Because it's not quite the same thing. It can cause some differences in appearance for docx (though that doesn't mean much, since M$FT Word also doesn't display their own doc format consistently either).


No, OpenOffice is the Apache project that shares ancestry with LibreOffice.

I’m talking about the Microsoft product that is now sold under the Microsoft 365 Online brand.

It isn’t a perfect replacement but for the times I’ve used it, it’s been pretty good.


If you're going to bring up Kerouac with his rolls of paper, you're better off talking about WordStar than Word. Word divides the document into discrete pages, while WordStar documents were long uninterrupted ribbons of text, just like Kerouac's rolls. Perhaps Kerouac "got it," but so do George R.R. Martin and Robert Sawyer, writers who continue to use WordStar decades after its demise (Sawyer even talks about the benefits of this undivided waterfall of text on his website [0]. Text editors also similarly long ribbons of text, and are just as conducive to putting words down as any bloated word processor that is optimized to produce two page corporate memos or colorful party posters to be posted in the lunch room.

I've used Word professionally since the mid-1990s. I do know how to use it properly, and it still sucks.

"Writing" isn't meant to be done in a word processor, which was developed as a business tool, not a creative tool. Writing should be done in whatever tool one wants to write in.

Word is, be design, both a desktop publishing app and a secretarial tool. For book-length writing, it works poorly with long files, the file format is subject to corruption. The docx format is also proprietary and subject to Microsoft's whim; any conversion scheme is a hack (though Pandoc and many others do work adequately). Unless you learn the ins and outs of Word's style scheme (and sometimes even if you do) and follow it slavishly, formatting is often inconsistent and there's no certainty that the styles you apply to make your document to make it look a certain way ensure it looks that way on someone else's machine.

There's no doubt, though, that a Word-compatible word processor needs to be in every writers' toolkit, since it is the standard in the publishing world.

[0] https://www.sfwriter.com/wordstar.htm, scroll down or search for THE LONG-HAND PAGE METAPHOR.


You can just go to view >> web layout in MS Word to get a pageless view. And view >> print layout to go back to pages FYI


I'm not sure I'd use GRRM to support anything regarding writing productivity


I was under the assumption docx was simply an XML wrapper in a well known format. MS can't come in and break it without creating a new extension, invalidating deprecation concerns. That said - I do everything in markdown regardless so I can use a code editor more comfortably.


> I was under the assumption docx was simply an XML wrapper in a well known format. MS can't come in and break it without creating a new extension, invalidating deprecation concerns.

Sort of.

It's a ZIP containing a collection of XML files. The actual content is a single file, but you need separate ancillary XML files for things like styles, links, headers/footers, numbering schemes and so on. Each with complicated namespacing and nesting rules and each file referencing items in the others.

The area where it is most at risk (for me personally in practice) is the lax handling of the file by Word itself. As a specific example there are nested XML elements where sibling child elements define properties on the main element, such as properties that define paragraph styles.

Being siblings they should be supported in any order, but in practice generating DOCX files and importing them into Word will fail for no obvious reason, until you reorder them in the raw XML (even though they are still at the same level in the hierarchy). Then they work.

In other words, it's less the 'spec' and more the MS implementation that makes it fragile. And different versions of Word can have different behaviour in that regard.


You're kinda contradicting yourself there.

Kerouac used a typewriter, which is about as minimal as it gets before you drop down to pen and paper. Something like Word is full of distractions: fonts, section headings, various formatting options, etc.

If you really want to get into the flow of writing, do it like Kerouac: plain text editor that wraps words at whatever width is reasonable to you.

After you're done, then copy it into a word processor and apply your formatting rules. Or just stick with the plain text editor and use anything from markdown to tex.

Ultimately, though, use what you're most comfortable with! That's going to be different, sometimes, for different people. The idea that it's a "fact" that everyone should be writing in any particular way using any particular software is just nonsense.


> People (tech people at least) have lost sight of the fact that writing should happen in "word processing software"

The fact? Should? In "word processing software"? Why shouldn't it happen in "text editing software"? Writing produces text, after all.


This. Text editors automatically give you that exact Jack Kerouac "rolls of paper" experience. It's just text. Formatting comes later, word or tex or some other system you like.


I agree with your premise of "don't be distracted" when writing, but for me Word often is the distraction. I use a live-preview markdown editor (e.g., Typora, MarkText) to let me get my thoughts onto paper (screen) with low friction. It's easier to just hit "#" rather than drag the mouse to the style bar and select heading. Or more importantly for me, it's so much easier to hit "$" and seamlessly go into LaTeX for math than it is to open the equation editor and start selecting all the template objects.

When it's time to collaborate, I use Pandoc to turn it into docx and then I send it around and the final formatting happens in Word because that's the easiest for everyone to work with, but the "get the ideas down" phase works best for me in a more "minimal" editor with little formatting.

I love the idea of Quarto, and if I had that when I was in grad school, it would have made my life so much easier. The workflow I see for Quarto is that you can write your paper while you're doing the experimentation because the code is embedded with your thoughts. But in that case, you're mostly slowed down by the research process so it can be a little more clunky to get the writing done because you have time and you're iterating over ideas more than words in that phase. I'd use it now for work in the R&D phase, but I know I won't have a critical mass of collaborators to make it worth while.


Quarts is great.

There's also Typst which is the new kid on the block but seemingly very good.


> slowed me down and distracted me from actually writing

It's a bit of a matter of perspective.

The distinction between looking at a document in markdown versus Word is a bit analogous to the distinction between looking at a movie in its textual form as a screenplay versus looking at a movie as a piece of video: Text is capable of abstraction in a way that video is not.

In the screenplay, it might say "table", but when the director translates it to video, the director will have to decide: What kind of table? What design? What period? What texture? Is there anything on the table?

None of these decisions matter to the construction of the story, so, for a screenwriter, it would be very distracting if they had to make all of those decisions just to be able to get "table" committed to the medium.

In Markdown you worry about text and nothing but text. But Word shoves a particular font in your face as soon as you're laying down the first letter, so, if you don't like Word's choice of fonts, you can either let it annoy you throughout the project, or you can start worrying about fonts right then and there, which will be a distraction. If you write Markdown in a code editor, then, presumably, you've already set up the code editor in a way that doesn't annoy you. And then your future self (or someone else entirely) can worry about the font.


It's worth noting here (even though I also use vim/markdown/typst, and Word is a thing of the past) that markdown does say something more than "just the text".

It puts the header notation and style (like italic, etc) in-line. So does Typst or LaTeX, and I can't think of any typical stand-off examples for headers and such, but it does muddy the text in-line in that sense. It typically doesn't really slow down the writing, but if you're writing \# as comments for code, then don't have those wrapped in \`\`\`, you can get some problems.


We collaboratively write screenplays in markdown (actually fountain but its markdown plus some screenwriting stuff) and save them into Dropbox.

pandoc and some other tools turn those scenes into a full screenplay.


I know you're just font as an example, but doesn't your markdown editor also shove a font in your face? You can adjust default behavior in Word just as well as in another editor.

Anyway, to work within your analogy, I would say that Word lets the write do a bit of a 'mockup' of the set with nearly 0 effort. Like "I want a table here", so 2-3 clicks and you have it. Then you can "let the director" take your mock up and flesh it out properly later. As a writing, it helps me to see the mock up of the product as a go, but I want that mock up to be effortless. And as I said above, I do some work up front to make sure that Word's mock up looks good (or good enough).


...and that's the problem. You can easily get distracted by the look of the words on the page. Is the heading big enough, centered, got enough white space around it. By the time you've faffed about with that, I'm onto my second or third paragraph of content.


Obviously everything that puts text on a screen puts a font in the user's face. But Word also presents you with styles, which make what you see changeable from the whitespace on up. Not only can one paragraph look different than the next, one word in a paragraph can look different from the word following it.

It's true that many document elements, such as tables, are easy to create in Word. This puts it over the edge into desktop publishing territory. In itself, that's not especially a bad thing, especially if your target is a printed file or a PDF. That still makes it a publishing tool, not especially a writing tool.


Not a flamewar, honest question from someone that couldn't stand Word last time I used it (far over a decade ago) but actually likes Open Office (well, LibreOffice, but I still call it Open Office). I'll grant that the Excel equivalent is nowhere near feature competitive with Excel, but the Word equivalent is, in my opinion, better.

Have you tried Libre Office? I'd love to hear your opinion.


A thing I like is that LO writer has a vim editing plugin somewhere on github


For me personally it's the reverse. After 15 years of writing almost exclusively with vim (or vim-like) input scheme using anything else would really break my flow. Can you even do something like ct, (delete everything between current cursor and next comma, then start insert mode) in word? Even if you could, it would be some 5-key monstrosity which makes you move from home row.

Now granted, if you're just hammering out words without any editing this doesn't matter, but I think almost every piece of good writing has had 3+ revisions.

Of course this is a personal preference, but my friends who do a lot of writing (and never used vim at all) still seem to prefer a distraction-free editor with a lot less features and a much less noisy UI for the actual writing.


He doesn’t understand ct, nor can he grok the beauty of vim and all the muscle memory that doesn’t interrupt the (writing) flow. Since he learned to scroll with the mouse and click at a position to change something every problem is a nail.


Word is the worst very popular program for the inverse reason of why Excel is the best; the extent to which a regular user can easily determine and or modify "why something they see on the screen appears the way it does."

In Excel, you click on the cell and see either e.g. the number or formula used to get the result you see.

In Word, well, it's just difficult to figure out exactly why a thing looks the way it does.


Excel's conditional formatting is pretty damn difficult to inspect, IMO.


You have to find it and turn it on, but there is a tool called "Style Inspector" that shows the applied styles and direct formatting of the selected text (or the location of the cursor).

Can also use control+space to strip off any direct formatting.


1. Write prose in plaintext using preferred text editor

2. Add formatting using preferred markup language

3. ???

4. Profit!

Seriously, though, writing prose in a simple text editor and worrying about formatting later is far less distracting than writing prose in a WYSIWYG word processor. Also, adding formatting using a markup language ends up looking far nicer far faster than using a WYSIWYG word processor.


I tend to agree that most users don't know how to use Word well. I think this is true of IDEs as well. Most users, even technically savvy ones, use a small subset of features and disregard the lesser-used aspects. And then are later surprised when they see someone use a feature that they didn't know existed because it was outside the ambit of their immediate knowledge.

However, I'm not sure that even with greater knowledge of Word's features, whether developers in general would come to like it.


> To people who say "Word sux", I say "That just means you don't know how to use Word properly".

Or maybe those people just have eyes which can spot the differences in spacing between documents created with different Word versions, the mediocre kerning, and a multitude of other typographical annoyances.

Even a website is easier to replicate exactly in another browser than a Word document to be replicated from scratch in a new version of Word. And if by some miracle you manage that, the end result will always look meh.


> To people who say "Word sux", I say "That just means you don't know how to use Word properly".

I used to teach how to write serial letters with Word to secretaries when I was in high school. I used to write VBA macros that call in and out SAP systems when I was a junior software engineer - because "I was young and I needed the money."™

Now I either write in Emacs or Sublime on a Linux box with 2 TB RAM, or in Overleaf (LaTeX collab Web application), and I say: "Word sucks", I shall be suffering no more.

Word does not exist in my operating system (except for QEMU), and nobody notices.

LaTeX creates beautifully typeset publications, and most day to day writing requires nothing more than plain text, which is the most durable format.


Yes, anyone who's ever learned LaTeX and why it was invented immediately sees how absolutely awful Word is at its rendering. Funny that a trillion-dollar company can't seem to figure out a better algorithm than Donald Knuth mocked up in Pascal decades ago.

I think of this as another version of enshittification -- the acceptance of poor performance as the "standard".

And don't get me started on math equations in Word ...


Seriously, Word (and other word processing software) and TeX come from two separate lines of evolution. Knuth is a computer science guru, and invented TeX to effectively and precisely output technical and scientific documents. Word and other word processors (before and after) came from the creative and business worlds. The two didn't have any influence on each other for many years -- you can write a novel in LaTeX, and you can write a scientific paper in Word, but those aren't what the program was written for, and it shows in the awful editing and rendering of equations in Word. (I can't make a similar assertion for LaTeX...being based on plain, rather than formatted, text, it's easier to separate content in a LaTeX document from the end format, while in Word the content and formatting are inextricably bound together.)


To be fair, if you use Word properly you can separate content and formatting. The key is to use styles. The problem is nobody learns this and by default the bold and colour buttons etc are right there and oh so tempting. Everyone thinks they can use Word because you just type and do a few obvious things like change font size etc. It's just not considered something that needs to be learnt.


> To people who say "Word sux", I say "That just means you don't know how to use Word properly".

"Word sux" because it takes forever to start up and I forget what I went there to type. It's like opening a jetbrains ide to take a quick note. The only time that's worth it is when I'm sitting down to write for several minutes at a time, which happens (maybe) once a quarter, vs 17 times a day for quick notes.

Now, you could argue that I just don't want what word offers, but that's what everyone means when they say something sucks. It doesn't do the thing they want (usually because the implementers made trade-offs for other things).


> Writing any markup or markdown syntax in an IDE is a disaster for the creative process

MS Word or markup in an IDE are not the only options.

I used to use Lyx quite a lot. The best of both worlds in many ways.

There are text editors which give you a simple UI than an IDE without the distractions and complexity of a word processor. I find Kate quite pleasant to write in.

There are distraction free writers editors (which will often save in wordprocessor formats) that exist only because a lot of people find wordprocessors are a "disaster for the creative process".


Rant ahead:

So, as it must be me not knowing, can you tell me why Word won't let me change table column size ~50% of the time (for the same table from the same source). Half the time autofit works (yay!), half the time it won't let me resize columns, neither by typing the number, nor dragging the invisible off-page divider (Microsoft should just make the transparency of off-page content 50%??), nor changing to draft view and dragging the actual dividers ... wtf is going on there? If I paste it into OneNote first then it fits ... sometimes, if I drag a column wider first (which it already rendered 5x the width OneNote did) then it will let me narrow it afterwards, what's that feature called AutoNoNarrowColumnRenderedIncorrectlyFiveTimes ExpectedWidth?? I'll just search the settings to turn that off ... oh wait!

These are the times I long for markdown, or 'reveal codes' ... MS Word has a lot of problems they could probably have fixed if they hadn't put so much effort into preventing interoperability. These sorts of issues were around 20 years ago when I stopped using Word, and 5 years ago when I restarted. Same asinine poorly implemented numbering and styles that are unintuitive, opaque, and ungainly ... and don't get me started on search! Multi-highlights? Sorry best I can do is "find next" with no find previous, no regex, ... you can do find in AutoText though, right, right? ... and all the AutoText and AutoCorrect gets saved in a single sensible format that's easily modified? ... Word changes the format of all windows when you open a new one too, just in case you thought the suck was restricted to within the window chrome ... and doesn't have always-on-top, and doesn't open windows in their last position, and ...

Doesn't suck ...???!

Whilst you're here, any ideas why OneNote eliminates footnotes so you can't cut-paste between Word and OneNote? I'm sure it's not flawed and I'm just holding it wrong, right ...?


It's not you. Things like WordPerfect's "reveal codes" feature were a major reason law practices were so reluctant to give it up. It's far easier to fulfill a court's rules on how to format briefs when you have that kind of control over your document's formatting!


Part of the reason I stopped using Word was that I absolutely abhorred 2007-and-beyond’s equation editor. I was using previous versions of Word with Mathtype for my math homework, but I found the new equation editor really hard to use. Around the same time I had coincidentally to Linux, and OpenOffice has an even worse equation editor.

Pandoc was a game changer for me. I picked up LaTeX equation editor pretty quick, and being able to write markdown was so much more pleasant in my mind.

It’s not perfect; tables are a pain still, but I have no desire to go back to Word.


You are probably the only person I've heard that liked the old mathtype editor :-) The new editor is terrific because you can use just type the equation in latex format (e.g. A_c = \pi R^2 ). Then you hit the space bar and it converts to wysiwyg style. Most of my equations are on the simpler side I guess.


I didn't know about being able to type the LaTeX stuff; that's pretty neat.

I think part of it was that I had basically memorized all the keystrokes for the MathType editor, and most of them didn't work in the MS Equation editor, which annoyed me. Also, I had issues with parentheses formatting correctly but I suspect that's been fixed in the last 15 years.

Still, I really do prefer to work with Markdown in general. The Markdown -> Pandoc -> LaTeX rendering just ends up looking prettier in my opinion, and at this point I'm pretty useless in any editor that doesn't have Vim keystrokes. Pandoc irons out the parts of LaTeX that I really hate (the `` vs " being the thing that's given me the most headaches), while letting me drop into raw LaTeX when I need it; not even getting into the fact that there's just math stuff that (as far as I know) doesn't work in Word or MathType's equation editor (e.g. bussproof trees).

I do get pretty annoyed when people try and tell me that regular LaTeX is "just as easy" as Word, because even as someone who has a reasonably good handle on LaTeX I can say that is just not true. TeX is arcane and weird and annoying and inconsistent, and I don't blame people for using Word compared to it.


> People (tech people at least) have lost sight of the fact that writing should happen in "word processing software"

Uh, that's just like, your opinion, man.


> My most favoritest feature so far is that it converts tables AND equations to markdown and latex perfectly.

You might be interested in Texmacs [1]. It is has a wysiwyg interface, and it handles nicely tables and mathematical equations. Also you can export documents to the latex format.

[1]: https://www.texmacs.org


> writing should happen in "word processing software", and Word is the best-in-class.

Or Scrivener. No Zotero plugin, though.


> People (tech people at least) have lost sight of the fact that writing should happen in "word processing software", and Word is the best-in-class.

It depends on the person. The best way for me to write is on paper, on a desk with lots of empty space, with paper versions of all reference material.


> Writing any markup or markdown syntax in an IDE is a disaster for the creative process.

I would like to quote this on a MonsterWriter landing page


We do the opposite of this - we write in markdown but sometimes need to get feedback in Google Docs. Pandoc doesn't convert to google docs so well, but it does to docx, so our pipeline is

markdown -> pandoc -> docx -> upload to google doc -> share


My kingdom for proper markdown support in google docs. Just let me toggle between wysiwyg vs markdown. Collaboration at a tech company using google docs is comically painful at times.


+1 strong agree. I like Google Docs’ realtime collaborative editing, but the wysiwyg formatting (even if one has memorized the keyboard shortcuts) makes it hard to be fast the way I am in a Markdown-aware editor or vim. Plus I miss my vim motions!


I'd settle for a Markdown import. You could do your editing and writing of raw text in whatever you feel happy with, and then have gdocs transform it to its supposedly native format on upload.

Gdocs would then really only need to support the same semantics with underscores, asterisks, octothorpe heading levels and title sizes.


Oh boy, do I have a treat for you. Try saving the following as `input.md`:

    # Hello *world*
    
    ## Math
    
    $$x = 2 \cdot y^3$$
    
    - foo
    - bar
    - baz
    
    1. Yes
    2. No
    3. Maybe
    
    ---
    
      Right     Left     Center     Default
    -------     ------ ----------   -------
         12     12        12            12
        123     123       123          123
          1     1          1             1
    
    Table:  Demonstration of simple table syntax.
Then run

    pandoc -t html input.md | xclip -selection clipboard -t 'text/html'
And paste into a Google doc.


Whoah, that is a seriously neat trick. Thank you.


You know what, that's a sufficiently cursed workflow that it wraps back around to adding nerd cred


Like using a hex editor to build up a Word doc?


You might find it interesting that I never read any docx sent to me and instead run it through pandoc to convert it to markdown first.


I actually found more control in writing complex formulas and scientific notation with MD and then using pandoc to convert than with Word directly. Also, it's almost magical how pandoc does the conversion: fast, accurate, and without fuss.


For documents that require a longer process, I prefer formats that I can use comments and TODO in, with a text editor that feels familiar.

Other than that, I strongly agree. Word is also de facto standard when you want others to open and edit your document.


Word has comments. They work well as todo markers if they don't need to be retained (just make a comment to mark something and then delete it when it is resolved).


Out of curiosity, why use Pandoc? Can't Word natively save as HTML, PDF, etc?


Without an add-in (it may or may not exist anymore), Word cannot natively read from (and format) or write to (with markup) Markdown. Ask any one of several novelists who work in Markdown; once you convert it to Word (presumably via Pandoc) and send it to your editor/publisher, it stays in Word throughout the rest of the editorial process.


you managed to find a lean way to go docx to pdf using pandoc ? last time i tried it required a whole latex stack behind it.


Process over Tools.


While pandoc is certainly a very useful tool, don't you think it's overkill to call a 143MB (on my Linux system) pandoc executable to do word (un)wrapping?

Emacs has `fill-paragraph` built in and `unfill-paragraph` is a short function definition [1]. Both work across multiple paragraphs.

[1] https://www.emacswiki.org/emacs/UnfillParagraph


Nice point! I agree that it's a bloated method, however, this was the best solution I could come up with at that point of time. I am certain that there are better solutions, it's just me who could not find it. I went with this solution anyways since it was still an improvement compared to my previous method in vim which involved setting the text width to a large number like 9999, highlighting the text I wanted to unwrap and then typing 'gqq' for formatting, and then yanking it. And of course, I had to re-wrap the text if I wanted to maintain the original format of the text file.

You are absolutely right and I understand what you mean though. I am open to trying other alternatives and I should try to come up with a better method to workaround this problem.

Never tried Emacs, just went with Vim so far. Did not know that Emacs had already an elegant solution for this problem. Nice! :)


My solution would be to go to the start of each paragraph and hold J (shift + j) until the entire paragraph was joined onto one line, and then go down to the next one. I guess it depends on how long your emails are but this is pretty quick.


My problem with this method is that it's not convenient especially when there are lots of paragraphs. When you highlight all the paragraphs and press shift J, it unwraps everything into a single line. This results different paragraphs ending up on the same line. You also need to undo the changes you made to avoid disturbing the original file content.


"vapJ" might do the trick for one paragraph. So maybe you can make a mapping that does vapJ and moving to the next paragraph ("]]"?) in your vimrc will let you do this with a count.


qfvapJ]]q

to record a macro that joins a single paragraph together

<N>@f

to then replay that macro <N> times


I haven't needed it myself, but it might be easier to just keep the document in line-break-per-paragraph form and just turn on visual-line-mode...


I have almost never thought about the file size of a program (even for big games) on a daily use machine.

I am certainly not gonna be learning and switching to emac just to perform this task for unwrapping a .docx file


It is one of the most useful programs I use and the only useful program I ever used that was written in Haskell.


> only useful program I ever used that was written in Haskell

Never used Shellcheck?


Whoa, I use Shellcheck a lot and did not even know it was written in Haskell.


xmonad team checking in.


Wasp team reporting to duty: https://github.com/wasp-lang/wasp


Don't know if you would call this a "program" but PostgREST is written is Haskell too.

https://github.com/PostgREST/postgrest


Quite a coincidence: I also stopped hard-wrapping my emails after reading that same essay, and I also use Pandoc to prepare my emails (or a program, invoked by a shortcut in vim, that sends the selected text to Pandoc).


A very interesting coincidence indeed. Nice to see that others have also thought about the same problem and have found solutions similar to mine. :)


I probably use it every day, without noticing, to view markdown docs in a terminal, via a `.lessfilter` invoking [my fork of] https://github.com/Orange-OpenSource/pandoc-terminal-writer

I also very much like Pandoc-markdown's ‘simple table’ syntax, because it's actually human-readable and human-writable without confusion and pain.


less filters are another highly underappreciated tool.

Back in the day there were a couple of utilities (I think they were "catdoc" and "wordview") which could take MS Word input and generate text-only output. Good for reading MS Word attachments in mutt or from a terminal / console / SSH session.


For those curious / on a nostalgia trip, "mswordview" was the name at the time. The project lives as the wvWare library, which reads and parses MS Word 2000, 97, 95, and 6 file formats:

<https://wvware.sourceforge.net/>

A primer on lessfilter:

<https://www.miskatonic.org/2020/06/24/lessfilter/>

And lesspipe:

<https://www-zeuthen.desy.de/~friebel/unix/lesspipe.html>


I use it for writing my thesis, obviously it should be done in LATEX but I ended up using markdown + a yaml config file for some templates


Interesting. And how do you send and read emails? Mutt?


I used to use Mutt, but now I don't because my email has its own special domain, and the email hosting service I use doesn't let me export IMAP and POP3 details unless I pay them.

Currently, I just open vim in the terminal, write what I need, then copy the text. I open dmenu using a shortcut, type something like "unwr", which is sufficient for selecting the "unwrap-clipboard" script of mine, press enter, which unwraps the text on the clipboard. Finally, I paste it in the email client.

I know it might seem a bit tricky, but it's better than what I did before. I used to set the textwidth to a really large number like 9999, highlight the text I wanted to unwrap, and then type 'gqq' for formatting. And don't forget, you also have to wrap the text back if you don't want to change the original format of the text file you wrote.

You can use soft-wrapping in vim, but I don't prefer it. Soft-wrapping is not as convenient to me as hard-wrapped text when using vim shortcuts. For instance, if you are using soft-wrapping in vim and you press 'o,' the insertion mode will start at the end of the paragraph because vim will consider the entire paragraph as a single line. However, there are many situations where I only want to insert text in the middle of a paragraph. You will also most likely set j to act like gj and k to act like gk, to make the cursor move between lines that are visually separated but actually form a single line. I don't like this either.


I had a script that piped to curl to send, Mail.app for downloading via POP and Vim to read/write, but I gave up and ended up copy/pasting as well.

Someday I may try that again, if email is still relevant by then. Vim and email would be my perfect set up.


You might find vim-anywhere what you need. Roughly speaking, in almost any text-entry field you can hit a hotkey, get a temporary macvim buffer, and when you wq the contents are pasted where you came from.

https://github.com/cknadler/vim-anywhere


> I had a script that piped to curl to send, Mail.app for downloading via POP and Vim to read/write

Wow, this is very interesting, and I might even try it at some point. It might have been a bit challenging to sync and read the emails though, but the sending part seems nice.

Was the reason for why you gave up related to syninc emails?


I liked that Mail.app stored every message as a single separate file, instead of the mbox that Thunderbird and others did. But at some version they started saving a hash with a multiple folder structure and that broke the setup. I had a few issues with MIME encoding as well, which were probably my fault. It was too much work and I eventually gave up, but if I find the will to do it again, I'd choose a simple POP downloader + some filter (if not in address book or previous recipients, move message to “Unknown”).

qlmanage worked great for HTML messages and attachments, though.


FYI, the maildir format (usable with mutt and numerous other email clients) also saves each email message as an individual file, and avoids mbox's notorious quoting issue.

<https://en.wikipedia.org/wiki/Maildir>


You might find the "edit in vim" extension useful.

I've got it installed on MacOS and Linux, occasionally use it for longer HN comments.

The extension automatically invokes a vim session with the contents of your current browser edit window, and reads back in the output of your vim session when you're done with it.

<https://addons.mozilla.org/en-US/firefox/addon/edit-with-vim...>


  :set tw=0 linebreak nolist
You'll get soft-wrapped text in vim, with no line limits.


I thought the solution to hardwrapping in email was to use the text/plain with format=flowed.

https://datatracker.ietf.org/doc/html/rfc3676


It _should_ be, but when I've tried to use format=flowed text still ends up displayed hard wrapped, especially on major mobile clients.


I thought writing code in vim was debatable, but emails...you went to far for me.


Why? I don't use (Neo)Vim the editor specifically for anything except code, but I do use Vim bindings pretty much everywhere where I have to write text, because it's much more convenient. If I wanted to write an email and my email client didn't support vim keys, it would make sense to just write the email in vim and copy it to the mail client.


In Vim, J on a selected paragraph does this job.


The usual challenge (and what OP said ... somewhere in this thread IIRC) is that that can be tedious to invoke on long (many paragraphs) documents.

There are recipies which will reflow paragraphs to the end of a document (or a given mark), but here one of the issues is text which shouldn't be reflowed, say, Markdown syntax for tables, lists, possibly blockquotes, code blocks, and the like.

My preferred solution is to write either single-sentence-per-line (rarely) or flowed text in vim using:

  :set tw=0 linebreak nolist wrap
Latter lets me just type paragraphs without breaking lines. Vim will flow those in the edit buffer. Other tools which can't handle hard linebreaks are much happier.


Pandoc is the FFmpeg of document conversion!

While I am very fond of both tools, Pandoc's command-line interface seamlessly integrates with my understanding and intuition, unlike FFmpeg's byzantine command line options and concepts. With FFmpeg, I frequently find myself documenting specific incantations and recipes in my notes, lest I should forget how to solve certain conversion problems. Pandoc, on the other hand, has spared me from such cognitive overload.

This should not be taken as criticism of FFmpeg though. FFmpeg is solving very complex problems too. Both Pandoc and FFmpeg are excellent tools. Both tools have saved me hundreds of hours of research, experimentation, trial and error, etc.


Funny thing, while FFmpeg's CLI options aren't super intuitive, I do find it much easier to use than any GUI video editing or conversion program I've ever found. Those things can really write the book on byzantine user interfaces. At least Google or ChatGPT can usually give me fairly clear sets of options for FFmpeg that will do what I want.


We got a new video system at a work that is a bit beyond my knowledge, but the files it puts out are very big (read the Apple ProRes whitepaper to figure out which option on this system would give us the smallest video files). I've owned Apple's Compressor on my own Mac for years now and never used it, and even it's a bit less intuitive than I'd like.


I wrote pretty much all my university work in Markdown with inline Latex, using Pandoc to generate a pdf. I'm sure there are things you can do in "pure" latex that you can't do this way, but for most normal cases, it's so much easier. You just use the simple Markdown syntax for basic text formatting, and then can use the power of Latex when you need to display mathematics, graphs, tables or similar.


I do the same thing, except using org mode instead of markdown. I've really come to love org mode for writing.


Org mode is amazing. The only problem with org mode is that more people/tools aren't using org mode.


I do something similar for my notes, but instead converting to HTML! (e.g. https://cswartout.com/notes/cse422.html) I keep my notes as markdown in a repo, then "build" them onto a web server after pushing. I've found it easy to write and always accessible, which is nice.


OT, you're the second sciolist I've found online.

See: <https://www.etymonline.com/columns/post/bio>

(Bottom of page.)


I'm doing that too... did you render against a template file in the end?

I also went ahead and set up my root "notes" folder to be served with mkdocs so I can easily browse them, and just render a PDF when I must submit some file to a third party


I used a vim template plugin (I believe called vim-templates, believe it or not) to get some YAML and formatting boilerplate, and then I think I rendered against the default templates. I do remember having to adjust the defaults for some reports, though.


Me too :-) It was a steep learning curve but worth it. I used it 'within' VSC.


damn, I'll try that


Pandoc is a great cli tool in terms of its UI and code quality. An interesting fact is that its creator is John MacFarlane, a Philosopher [0]

https://en.m.wikipedia.org/wiki/John_MacFarlane_(philosopher...


And a damn fine bluegrass fiddler [0]!

[0]: https://www.whiskeybrothers.net/wb_bios.html


[flagged]


> It has 995 open issues in its Github repository.

This is not a sensible metric for code quality. For one thing, only about 20% of the currently open issues are tagged as bugs - more than that are suggested improvements.

> Haskell program was supposed to work right if it compiles, wasn't it?

No. Especially for tasks like string manipulation and format munging, you cannot capture the complexity of the domain into types.


Pandoc is one of my favourite all time tools. As the website says:

> If you need to convert files from one markup format into another, pandoc is your swiss-army knife

It also sits at the heart of Quarto[0], which adds Jupyter-like code execution (in R, Python and others) into document production. Combined with RStudio as an IDE, it's my new favourite way to write anything - from static documents to full on code notebooks.

No affiliation with Posit, the company behind Quarto & RStudio. Just a happy user.

--

[0]: https://quarto.org/

[1]: https://posit.co/products/open-source/rstudio/


Codebraid[0] is another option for integrating code execution with Pandoc. I find myself using Quarto for building sites and Codebraid more for single documents. Both great tools building on Pandoc.

--

[0]: https://codebraid.org/


Why do all these tools pretend that knitr / R Markdown never existed, and that they have invented some novel concept? I looked through the docs of Quarto, Jupytext, and now Codebraid, and none of them mention prior art.

There's a long legacy here, it does nobody any good to disregard it. Maybe Knuth is well-acknowledged for his invention, but I think for instance Yihui Xie is a little under-recognized.


Is there some way we could advertise this better? The quarto homepage already says “Quarto is a multi-language, next generation version of R Markdown from Posit, with many new new features and capabilities.”

When talking about Quarto within the R community we usually frame it this way, but obviously it’s not a very useful description if you’ve never heard of RMarkdown.


If that's how you frame it, then I stand corrected and I apologize for my incorrect criticism.


From what I can tell, Quarto is essentially an installer for Knitr/R, that also comes with a bunch of goodies, like when working in VSC (or Rstudio) it auto-suggests cross-references to content in the document, like figures/chapters/equations/etc. It also has a github action that builds and deploys the site in like 1 one line. Just removing the friction and lowering the bar is very helpful sometimes.


That's true, but quarto also has full support for Python and Jupyter notebooks, not to mention julia, and observable. It's really built from the ground up to be multi-language so that everyone can benefit from all the goodies in RMarkdown.


And knitr is built on the prior art foundations of Sweave....


And sweave is built on noweb :)


...which was based on CWEB/WEB and so on... ;-)

Though I guess we can stop there in this particular case, unless someone knows of an example of literate programming that precedes Knuth. I'd be interested if there are such examples...


Quarto is amazing for creating reports and programmatic slide decks!


A question to experienced Pandoc users:

I want to write a small book that I want to generate in 3 formats: HTML pages, EPUB and PDF. What is the best input format (source format) for the book? Pandoc Markdown? CommonMark? GFM?

I'm a little hesitant to committing myself to Pandoc Markdown or any Markdown because they all have tiny differences with each other. Each is like its own standard.

I considered Org-mode for some time but there are so many edge cases in which Pandoc does not parse Org-mode properly. I mean sometimes simple things like internal links are not rendered properly by Pandoc in the generated output.

So what's the best format to write the input in? Any ideas? Opinions?


FYI, I've done this myself several times, though usually working with an extant book, either from a text or OCR dump, or hand-typing it myself (don't ask).

For works which consist principally of standard sections (e.g., Book / Part / Chapter / Section / Subsection / ...), fairly standard font styles (normal/roman, italic, bold, code blocks / pre / poetry), footnotes/endnotes, and perhaps a few tables, illustrations or images, Pandoc-flavour Markdown is far more than sufficient. Writing or formatting is virtually seamless.

If you're writing something with more complex internal formatting, then I'd lean more strongly into LaTeX. You can do most of your initial authoring in Markdown and generate LaTeX from that, for further finishing work, or simply start with LaTeX. The key discriminator here would be either mathematical formulae or complex image placement. Note that creating the output you want in HTML or ePub (itself effectively a specialised HTML format) might still be challenging.

The next step up would be a specific layout tool (Krita or Adobe Illustrator, say).

But start with Markdown + Pandoc and see if you like the results. It should be Good Enough, and if not offers a smooth path to more powerful tools.


If you're familiar with markdown and your book is basically text with some images, I'd strongly recommend Pandoc Markdown.

Pandoc Markdown-as-input is probably the best-supported input format for Pandoc, as far as "reasonable defaults for outputs in other formats" is concerned, and it's broadly compatible with the norms of other markdown styles.

You can always drop into latex, include custom CSS headers, etc. It's also a format where the "formatting" won't generally get in the way of actually-writing, unlike HTML or LaTeX (speaking on behalf of mere-mortals, here).


Nice! Good luck.

I'm writing a small book. I shared my experiences with Pandoc and Asciidoctor in case it helps you or anyone:

https://adammonsen.com/post/2122/

Your use case may differ from mine (I didn't see you mention printing), but my anecdote above might help suss out tooling differences between Pandoc and Asciidoctor.

Here's an example printable book generator using Asciidoctor PDF:

https://github.com/meonkeys/print-this/


What snet0 suggested might be your best bet. You can use pandoc markdown but then slip into LaTex if you need to do something more complicated.


If anyone else was looking for what snet0 suggested and where, here's their comment: https://news.ycombinator.com/item?id=39227851


I'd use leanpub markdown. They generate those formats plus you get a page for the book, can charge for it, and advertise it. (Some of those tasks cost money).

Won't work if you want to keep everything local, though.


I came here to say the same thing. Laying content out for a book comes with way more issues than "just" converting between document formats. Leanpub does it well out of the box.

If you want to look down the path of implementing book layout yourself, here are two breadcrumbs from my bookmarks:

https://journal.stuffwithstuff.com/2014/11/03/bringing-my-we...

https://iangmcdowell.com/blog/posts/laying-out-a-book-with-c...


For leanpub, I started using it for the layout. And kept using it for the marketing.

That's one thing I learned, having written a couple of books. Even though writing is tough, it is easier in many wyas than the marketing.


Thanks! Didn't know Leanpub has its own markdown too. Yes I do want to keep everything local.


Ah, then probably not a fit.


There is not a good answer. Markdown is a poor format and there are a number of almost compatible variations. However Markdown in all formats is somewhat limited and so there will be some things you cannot do that if you work on a complex project you will want to work on.

However markdown - if you stick with the subset that everyone supports is the most widely supported alternative. If you go with a specific markdown you lose support for something else you might want. If you go for non-markdown you will lose support for most of the world.

I personally selected restructured text which is really powerful for the complex documentation I'm trying to create. However I keep running into nothing else supports it problems (I can extract doxygen from C++ - but only with tools that don't support the latest. I haven't figured out what to do about Rust documentation)


Is there any reason for why you’re not considering AsciiDoc?


Seconding this. Asciidoc was created to be a simple mapping to DocBook - which was specifically designed for writing books. It should be able to handle everything you need, but if there is some esoteric requirement, you can write your own processor.

Then again, content is king. Write it on napkins if you must. When complete, you can spend two days to transcribe to whatever format the publisher requires.


That sounds like what I need. I have been using Latex (which I find distracting) and restructured text (which I am less familiar with at the moment) with Sphinx. How does Asciidoc compare to restructured text? Both were intended for writing documentation so are similar in capabilities?


I've written a book in asciidoc using proprietary tooling. It was a fine experience.

However, the open source tooling doesn't support what I need for a physical book, so I chose to use pandoc and markdown instead mostly for to the market size of markdown.

(I previously wrote my own tool chain for rst to latex and epub, so I'm week seats of what features are needed to make digital and physical books. I was sick of using a format that had limited tooling while the world had moved on to markdown.)


No such reason. I'm willing to try out AsciiDoc.

I mean I did not try AsciiDoc until now because there are so many choices of input formats and the ones I've tried so far have been disappointing one way or the other.

I talked about Org-mode rendering broken in edge cases. Same with Latex too. I see Pandoc has first-class support for its own Pandoc Markdown format. But the support for all other input formats seem patchy.

If you think Pandoc has good support for AsciiDoc without any edge case issues, I'll be most certainly trying it out.


Pandoc has no support for asciidoctor as an input format — you're expected to just use asciidoctor itself to convert adoc files (and there's no reason not to). Asciidoctor can do HTML and PDF, not sure about EPUB though.


My opinions: * write in Pandoc Markdown. * give each sentence its own line. It helps with composing and reordering, and gives much cleaner diffs if you keep this in a git repo. * personally, I used GitHub for html, Pandoc to make the epub, then Calibre to turn the epub into a pdf.

The "internal links" thing is a pain, admittedly. I have an idea for a workaround:

* sprinkle hidden, unique <a id="ch1.2"></a> around

* on GitHub, use links like chapter1#ch1.2

* for Pandoc, preprocess to remove the filename before the #

I'm working with a big enough book that it's an undertaking, so I haven't done this yet.


Interesting idea re:internal links. For sufficiently complex issues of this nature, pandoc filters[0] are a powerful tool for this kind of mid-conversion processing. I've made some cool projects with the Python package panflute[1]

[0] https://pandoc.org/filters.html

[1] https://github.com/sergiocorreia/panflute


If you are committed to using Pandoc to generate the three formats then I don't see why you can't commit to Pandoc's flavor of markdown.


I'd just use Pandoc Markdown.

For simple things, you can easily write Markdown that is compatible with all dialects. Mainly, remember to indent your lists 4 spaces if you soft wrap.

But Pandoc Markdown has the most extensive support for other extensions, like footnotes, figures, etc. That's useful because it minimizes how much HTML or Latex you need to write, which in turn makes your documents more portable.

There are other formats that support more features, but in my experience the communities are smaller and the syntax is not as pretty. Ultimately you're betting on that format continuing to exist longer than Pandoc, which I think is not a great bet in most cases. The only format which I think might have better long-term support and compatibility is CommonMark, but it comes at the tradeoff of substantially fewer features. Which again means sacrificing portability because everything you can't do in the base language you need to do in HTML or Latex.


Like others above, I write long form in Word/docx and convert with pandoc. Word supports styles that help experiment document-wide with look, it has an outline mode that helps with re-organizing material, and it handles inline media gracefully. Any of that is painful in markdown. There's no way I would write 20-500 page document without Word.

I use markdown for the 90% of my writing that is blurbs (and parse it into something like a graph/knowledge kb, and often render via pandoc to pdf, word, Anki, and html).

In both cases, I restrict myself to using features that can be parsed.


I have used markdown with custom lua filters for things like chapter delimiters, non breaking spaces and notes for inserting images. But my setup was kinda wonky as I exported from markdown to ODT and then used LibreOffice to convert it to PDF while adding images to the layout manually as this is impossible to mechanize with LibreOffice.


Depends on what features you want in your book. Out of most of the free/open source tooling, pandoc is probably the best.

I've written multiple books with it and have also used proprietary tooling of publishers. I have a few plugins I use to customize my books, but have yet to find a tool that I wouldn't need to customize.


I've translated HTML markup to Markdown, and from Markdown to LaTeX before fine-tuning the LaTeX, to produce PDFs for printing hard-cover books.

Does Markdown have any way to specify eg "begin a chapter on a new page" ? I don't think this is really a thing in Markdown or HTML but I'm admittedly a casual Pandoc user.


If you are writing a book, presumably you would have each chapter in a separate markdown file.

So, then you convert each chapter file to pdf, and then join the pdfs.


You can inline LaTeX chapter/section break commands even when processing Markdown, and in several ways (dropping \newpage directly in the content before chapters, using header templates, as YAML metadata in the Markdown file, even on the command line).

Google has many examples; one's here using a header file: https://medium.com/@sydasif78/book-creation-with-pandoc-and-...

More are here, including an example using the header-includes YAML metadata param: https://github.com/Wandmalfarbe/pandoc-latex-template/issues...


I use LateX mostly but if it's not a very complex document I'd say markdown wins for simplicity.


I have written several books in pandoc and I love it. I make an easy script to output drafts in PDF, docx, odt, and epub with one command. I am very happy with pandoc Markdown for this.


Might depend on what kind of book it is - do you have a lot of images, tables, cross-references, ... or is it mostly plain text?


Mostly plain text but some images, tables and cross-references too.


Personally, whatever helps with the specific writing part of it all the most is what's best. If you find writing in a given dialect of Markdown or LaTeX or Org-mode is easiest, do that. For me, that's Markdown with embedded LaTeX, for others it's Org-mode, or RST, and so on.

Pandoc handles these fairly seamlessly, and with many options for PDF engines, though I'd say it has a preference for LaTeX and HTML in the backend and Markdown in the frontend, based on my experiences with the edge cases (sometimes entirely solvable with a little Haskell or Lua).

Since LaTeX is the default for PDFs, it pays to keep that in mind and help LaTeX help you (you can use it inline with Markdown or included as preamble in configuration), but sometimes I've just had better luck converting via HTML to PDF ("-t html output.pdf" or directly chaining on from output.html) for what I'm writing in the moment, though other times I'm not stressing LaTeX as much and can just go straight from Markdown to PDF (for example, just writing up something with inline maths). I prefer to avoid LaTeX or HTML's escaped character encoding and often need far more than a single Latin font can provide, so I've ended up dealing with LaTeX's limitations here (even in lualatex and xelatex) more than what I'd suspect is typical. Meanwhile, the standard HTML to PDF backend uses Qt, and I've found it works for everything else I've needed when LaTeX isn't the right backend (and it does come up). On one occasion, I did have to switch that to weasyprint, and that was everything sorted. Alternative backends is an unsung power that few have, while pandoc not only has many built-in (or it is at least internally aware of) but will also integrate with any CLI needed.

Output to all three with HTML, EPUB, and PDF can just need a bit of fiddling before it comes out right, depending on how much you're willing to mess with specific metadata for each versus accepting the limits of what Pandoc can handle universally in its AST. Invariably, some compromise is required, but the core semantics of Markdown (including extensions) almost always translate without an issue. The dialect problem of Markdown is really just in the confluence of said semantics with things that have not been separately included, such as the lack of an actual header in Markdown (Pandoc here allows YAML for some, or you just fall back to HTML).

So, tldr; there's no "best" input format, except the one that you find most comfortable to just write the book in, but I find Pandoc is usually best approached from Markdown with the LaTeX or HTML backends. It's powerful and oh so very handy, but it's not going to do all the thinking for you, just a lot of the grunt work, same as any other tool. When in doubt, the user manual is quite readable, and I've found it answered almost every question I had. When it doesn't, other people do, and when they don't, it means I'm either going about it the wrong way or I get to solve an actual problem (but usually the former). But, as always, the most important thing is actually writing it, distribution comes later, so focus your efforts on that and the tools you need to do that effectively.


> If you find writing in a given dialect of Markdown or LaTeX or Org-mode is easiest, do that.

I find Org-mode the easiest but like I said in my comment, the conversion quality is not great. Pandoc breaks a lot of stuff in Org-mode in edge cases. One example I shared in my comment was Pandoc breaking internal links.

So by selecting something I find the easiest I have burned many hours of troubleshooting figuring out why the output does not look right.

That's why I want to draw upon the wisdom of the community here to find out which input format works best and by best I mean flawlessly. No edge case issues. No rendering flaws. If I get the specific recommendations, I'll try them out for sometime and then commit myself to it instead of burning more time trialling all of the different input formats.


Unfortunately, the perfect is very much the enemy of the good here. Aside from HTML, I'm afraid that PDF and EPUB are very much driven by purpose-built tools designed to show interactively what it will look like as output. This means that they've both delved into a depth of subtle semantic differences that makes flawless output an extremely difficult task. Of course, practically, pandoc can resolve the vast majority of what people actually use, but everything will still be hit by edge cases from time to time, leading to subtle issues or incompatibilities between EPUB, PDF, and HTML. Each edge case can, of course, be solved in isolation, so finding something that's solved the ones you are encountering already is the ideal, providing a seamless experience for your work. Sadly, each of those is built to solve someone else's specific work, and so sometimes we just have to accept that we either need to compromise on something, we need to paper over the gaps by combining the right tools, or we have to write something ourselves. Fortunately, it isn't the 80s anymore, so many of the tools we have are the "right" ones, and pandoc is very good at combining them.

Again, I find that Markdown (with inline LaTeX or HTML) seems to be Pandoc's preferred starting point, and that the HTML backends are quite useful (particularly when not needing full LaTeX), so perhaps there's some luck to be had there, since HTML may preserve Org's linking and such a bit better, though I don't use Org myself so can't attest to it. And if there's really a problem, then perhaps Pandoc needs some help sorting Org-mode out!


Riffing on crafting pipelines by combining tools...

Org mode can also export html and markdown, so that's three potential pandoc inputs, with potentially different properties. All of which might be massaged before input. And in extremity, an org-mode parser permits emitting customized input. Then pandoc's parsing and filters permit altering the pandoc ast in flight. And the ast isn't hard (assuming comfort with ASTs), so if some other tool has templates and output one likes, one might skip the pandoc backend and emit it oneself from pandoc ast json. Rather than hoping to persuade that other tool to both accept and generate what's needed.

So for instance, last year I had a project written in a project-specific markdown dialect, kludged to pandoc-flavored markdown, parsed with `pandoc -t json`, and html emitted custom from the pandoc ast. With embedded directives from dialect to emitter. And html templates copied from non-pandoc tools. In a language with nice pattern matching (julia's Match), the emitter was a short page of code.

"Avoid reinventing wheels, but sometimes it's easier to assemble a satisficing custom vehicle, than to find and adapt a previously-built one."


Great comment! Thanks for engaging in this discussion and offering some good perspective about my Pandoc issues. Really appreciate it!


The author is great - smart and responsive.

He's inspired a small number of long-time, serious contributors.

Together they maintain a super-high-traffic utility that simplifies the very arbitrary complexity of document formats for untold millions of users.

It's really a stunning example of social good.


Besides that, he's a tremendous philosophy professor. I took a class with him years ago. Besides being brilliant, I found him funny, approachable, and never condescending, which is rare for philosophy professors of his renown.


Pandoc is probably the best-maintained open-source software that I've ever worked with. We use it a lot internally at Column, and every time we've opened an issue, it has been comprehensively resolved in <24 hours. I'm a GitHub Sponsor and it's up there with coffee with the best money I spend on a monthly basis. JGM, you are a gift to everyone who spends too much time parsing word documents.


Pandoc is amazing. I'm an architect so I use a lot of InDesign. But of course it's not good for intensively editing the text itself, it's a presentation tool. Everyone I know just uses Word and copies and pastes. The poor souls.

I learned LaTeX late in life for a few reasons: - I publish in journals that all have different layout requirements, so reformatting to submit to more than one is a big pain. - Using a non-WSYIWYG editor enforces clarity. It's a lot easier to see each sentence as a whole. If a sentence is longer than a single line in VSCode it should probably be more than one sentence. - The features of an IDE allow you to see each line (sentence) in one place, and move things around more flexibly with the IDE shortcuts. You can easily rearrange the flow of a text. And of course you can make inline comments without having to clean them up before sending the doc somewhere. - I don't have to care about things like image placement and anchoring in the text until output. This is an area that WYSIWYG editors like Word are particularly painful.

There are other reasons like equations and notations, but that's enough for now. All that said here's my workflow:

I write LaTeX in VSCode (soon switching to vim). Then, I can use Pandoc to convert to Word if I need to (it's still where most of the templates come from in the discipline). This is also helpful in working with collaborators in my area since they typically won't know LaTeX.

Here's where it gets fun - I write LaTex, then use Pandoc to output directly to .icml (InDesign Markup). These link directly in my InDesign document, and so I can edit text where it's better and more clear to edit (IDE), then seamlessly get it into the environment where I have maximum layout control. I haven't gotten to do this so many times that I need to write a script to automate conversion as part of my tex build, but I probably will soon for fun.

Pandoc just works so well, allowing me to concentrate on making good content and not having to sweat all the annoying file conversion details. Thank you to the developers and maintainers.


While looking for Pandoc + Make for Website templates, I stumbled on the Website of Jilles van Gurp[1] (hi @jillesvangurp[2]) and I have seen it evolved over time. It is beautiful[3].

1. https://www.jillesvangurp.com

2. https://news.ycombinator.com/user?id=jillesvangurp

3. https://github.com/jillesvangurp/www.jillesvangurp.com


Worth noting that the author has also created a markup language, djot.

https://github.com/jgm/djot


Prof. MacFarlane is also one of the commonmark maintainers.


Pandoc saved my ass so many times when I worked in research. I would write a beautiful typeset paper in latex and then have to send a colleague a word doc.

You can turn any file into anything. PDF to rtf, latex to .doc, etc. It does a great job. Written in Haskell, too!


I worked in a place where I was expected to produce documents in docx format. Having spend the last few years using TeX or just writing plain markdown stuff this was quite unappealing. So I opened Word once, created a template, and from then on used markdown to write documents (or perhaps it was org-mode). I never had to open Word again apart from making sure they looked OK. My documents looked better than everyone else's.

In a way, I hope I never have to use Pandoc again, but I'd hate for it to not exist.


You are my hero.


I tried to use it to make an invoice system: wanted to convert plain-text CSV (description,amount,cost) --> to Markdown tables --> to PDF.

But I was unable to align the following 2nd table with taxes: cells are all over the place and it does whatever it wants. And there is no information online to be found about it.

(I eventually gave up long time ago and still to this day manually do them in LibreOffice Writer adding taxes with a calculator)

Except this, it's a really neat piece of FOSS software!


Have you looked at hledger, which generates everything from plain-text accounts files?

https://hledger.org/

https://hledger.org/invoicing.html

https://plaintextaccounting.org/

One previous discussion: https://news.ycombinator.com/item?id=20012499


Maybe you could use html as an intermediate point instead of markdown. Might give you more control over the layout.

Might have to use a headless chromium wrapper (maybe pandoc has this anyway) to then get to pdf but that may not be too bad


I was thinking the same thing: I would use HTML as an intermediate, targeting PDF through weasyprint.

In fact I quite often go .md -> .html with pandoc, but write the .md in such a way that, when translated, it is the kind of html that weasyprint will be able to turn into the PDF that I want.


I tried something like that but ended up going with markdown -> html -> puppeteer to generate an A4 pdf -> ghostscript to compress it.

It’s an ugly script that’s been working quite well for more than a decade, but I wouldn’t recommend it to anyone other than myself.


I have a very similar homegrown mess. I wonder how many of us there are doing the same thing for this use case.


By looking at this thread, quite a few. The problem is creating a solution that would fit all of our idiosyncrasies.

For example, in my code, if a table has the class “total” it sums all <td>s which contains a dollar sign, and so on.


I use pandoc quite often - but I wish the intermediate, internal pandoc format was a little more expressive, exactly for things like this. I also tried making an invoice.


I just recently put together something for invoices that wound up being Jinja2 + data -> HTML-> weasyprint -> pdf. Was quite straight forward, all in all.


Every couple of years I need pandoc for some project. And teach time I relearn the same idiosyncrasies. Some odd defaults, the sometimes annoying depths you have to do to customize HTML templates, the weird filter infrastructure. What a neat strange program it is.


I have this same issue, and the same with `jq` and `GNU Parallel`.

When you need them you need them, and nothing else quite works, but I have to re-learn them every time.


I outsource most of my esoteric `jq` syntax questions to ChatGPT. It does really well with them, usually turning up solutions that I'd struggle to munge together from several different google search results.

I wonder how ChatGPT would do with focussed pandoc requests.


I'm not allowed to use chatgippity at work, sadly. But Ive heard a lot of people mention this so I might try it at home and see what's what.


Same here! Haha.

Can you think of any command-line tool you might not use for 6 or 12mo but when you crack it open after a long time it is intuitive how to do what you need to?

I hypothesize muscle memory is required for efficiency at the command line.


> Can you think of any command-line tool you might not use for 6 or 12mo but when you crack it open after a long time it is intuitive how to do what you need to?

Most of them that I do use; sed, awk, even perl. I don't bust them out often, but when I do most of what I need is pretty "front of mind".

I'm not asserting it's necessarily a tool issue vs a "me" issue, but jq, parallel, and pandoc (although pandoc the least of these) just don't "click" with me, and even if I use them multiple times a month I have to go back to previous commands, --help, or man pages routinely.


Set a huge number for your shell history and dedup. You’ll effectively save every command you ever typed in chronological order. You can even append comments to the end of the command for your future self.

Then, fzf your history.


I do, and use fzf, and that helps, but with most tools I'm a bit OCD about understanding what I'm doing.

I also use Anki SRS flashcards and put a lot of tool usage exemplars in there; it not only helps me remember a bit of what I need to do, but if nothing else I remember THAT I put it there so I can use that to go look it up again. And this is coming from someone who grew up with and is comfortable with `--help` screens and man pages.


Which is actually quite a strong point for guis, amidst all their problems


Agreed, surely if there's one button in a GUI that will just do the thing you need (say, spitting out a PDF version of the document you're editing) but you'd otherwise need to recall multiple command-line programs, options, and/or arguments.

However, often I'll open up a GUI I haven't used in a while and feel like I'm just as lost as I am with a command-line tool I haven't used in a while. I rely on notes I've taken and try to stick with stable software.


Absolutely love Pandoc. I used it through my undergrad to take notes for all of my courses. Markdown with inlined LaTeX just made sense.

It made university more accessible, as I get frequent hand cramps while writing notes. So I started to take them with Pandoc and added custom macros.

The best feeling was when professors would ask for a copy of my notes at the end of the term because they were formatted so well!


I love Pandoc!

I recently learned you can use LUA to write custom plugins and change some of the converting behavior. I'm using it for example to create slides similar to the "sent" program.

It helps me bootstrap new presentations and talks very quickly: https://github.com/KarimJedda/justslides


The latest website I was asked to build is just Pandoc + Make. It works super well, it's very fast and decently flexible. I even do blog-like processing by just calling Pandoc several times.


I used to build my blog with Pandoc, make, and a little bit of Python/Jinja2 glue to generate indexes/tags/RSS. The amount of glue tends to get out of hand as the project's needs grow though.


Pandoc is amazing and immensely useful. Just in case you need something simpler, let me suggest Hastyscribe.

Statically compiled cross-platform program for convertng markdown to self-contained portable HTML with a nice styling embedded by default. Simple, small, fast, hackable, written in Nim.

https://h3rald.com/hastyscribe/

https://github.com/h3rald/hastyscribe


I've been (slowly) writing a book for about a year and a half now probably, and Pandoc has made it so easy. I write everything in Markdown and use a seven-ish line Makefile and that's it. It generates the PDF and EPUB both extremely well and the customizability of everything in the generation process is fantastic.

I end up using it for random doc generation too in my day to day. It's just a damn fine piece of software that always works and has every feature I could dream up for my use cases.


I'm surprised to see no one has pointed out RMarkdown + RStudio[1] as one simple, low-effort way interface with Pandoc using an IDE you may already use.

I used to write papers and slides in LaTeX (using vim, because who needs render previews), then eventually switched to Pandoc (also vim). I eventually discovered RMarkdown+RStudio. I was looking for a nice way to format a simple table and discovered that rmarkdown had nice extensions of basic markdown (this was many years ago so maybe that is incorporated into vanilla markdown/pandoc).

The RMarkdown page claims:

> R Markdown supports dozens of static and dynamic output formats including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more.

...which I think is largely due to using pandoc as the core generator.

RStudio shows you the pandoc command it runs to generate your document, which I've used to figure out the pandoc command I want to run when I've switched to using pandoc directly.

This is a bit of a "lazy" way to interact with pandoc. Maybe the "laziest" aspect: when I get a new computer, I can install the entire stack by installing Rstudio, then opening a new rmarkdown document. Rstudio asks whether I'd like to install all the necessary libraries -- click "yes" and that's it. Maybe that sounds silly but it used to be a lot of work to manage your LaTeX install. These days I greatly favor things that save me time, which seems to get more precious every year.

[1] https://rmarkdown.rstudio.com


I recently switched to Pandoc to parse my Markdown documents for my blog. Up until then I used the original Perl script. It’s not slower and I get syntax highlighting for my code snippets. My only regret is not switching sooner.

Maybe one day I’ll write a blazing fast Markdown parser that does exactly what I want, full control and maximum simplicity and all that. In all probability though, I won’t. I really like what little I have seen from Pandoc.


The only issue I have with pandoc is the dependency hell it requires. It requires a lot of haskell dependencies and just eats up your storage space.


The pandoc binary I have is certainly large at 206MiB (more than I expected!), but it doesn't have any weird dependencies I can see. Just GMP, ncurses, and such. All the Haskell parts are statically linked, which is probably the reason it is so large.


Arch Linux is linking dynamically, IIRC, and there it is 'only' 64 MiB: https://archlinux.org/packages/extra/x86_64/haskell-pandoc/


It also pulls in about a hundred separate Haskell libraries along with it. Not really complaining, but it's funny that pandoc accounts for about half the programs on my laptop.


As is generally the case, there are official docker images readily available and it's a fantastically light, low-coupling way of adding conversion to a stack.

Recently rewrote a content stack to use Markdown (among other formats) for the source, the file system as the database, generating outputs (including HTML with embedded Mathjax LATEX) via pandoc, and it works absolutely brilliantly. Fully recommend.


Why do you use the docker image instead of just the normal executable?


A better question is why wouldn't I use the docker image? The docker image is an official work output of the project, handles all dependencies without messing up my target machines (I use the pandoc/extra which includes pretty much every ancillary need such as Latex), and is trivial to keep up to date and current holistically. It is by default isolated and controlled to a degree, and allows me to trivially tape together as necessary.

The other comment nailed it pretty well, though they hedged it by citing habit (presumably to counter the weird anti-docker trend that has arisen). Dockerizing (or simply containerizing) most vendored products is a choice that is often beneficial, and the marginal overhead is a rounding error.


I'd reckon it's partly habit, but it is a very nice way of "installing" something easily and being able to also very easily _uninstall_ it without leaving any cruft behind, as well as having completely disparate multiple versions of things, if you need that.

For single-file binaries this is less of an issue, but even those sometimes require dependencies that you already have something else that needs a different version of that same dependency causing conflict.


What distro doesn't have pandoc in its package manager?


I was speaking more generally, not pandoc in particular.

But even with packages in a manager, there have been times where different packages conflict on different dependencies that when packaged up in a well written docker image (which not all are) would have obviated.

As a developer I see this more with programming languages than other things, and this very thing is what led to language "package management systems" like rbenv, sdkman, asdf, and the like.


This is what I used to typeset my novel. My editor and publisher tried multiple methods and they kept being impressed by how clean and "just right" the versions I sent them were... In the end, we ended up using the PDFs generated by Pandoc instead of InDesign or whatever proprietary stuff they used.


I use it to generate PDF/EPUB versions from GitHub style markdown for my ebooks. The default output was good enough, but I wanted to customize a few things [0]. I didn't know LaTeX, but I was able to use solutions from stackoverflow sites. Later I found that some users had created templates I could've borrowed.

I use mdBook [1] for web versions though. I found the default setup much easier to use. And it came with themes (light/dark/etc) that readers can choose.

[0] https://learnbyexample.github.io/customizing-pandoc/

[1] https://github.com/rust-lang/mdBook


A few weeks ago I wrote a MediaWiki extension which uses it to convert documents to wiki articles

https://m.mediawiki.org/wiki/Extension:PandocUltimateConvert...


Shameless plug: I built a static site generator with Pandoc and shell (with RSS support!).

https://github.com/alxmrs/pandoc-website-template


I have used it to kickstart a blogging project that I wish to come back to soon. The Lua inter-op for custom readers, writers and filters is great but I wish there was more editor integration and even perhaps an official IDE/editor with built-in debugging features (probably something already do-able with Emacs but I haven't checked). The only blocker for my project is no support for "ChunkedDoc" for Lua filters [1] which forces me to write more code and a complicated Makefile.

[1]: https://github.com/jgm/pandoc/issues/9061


Here is the obligatory "Chat with Pandoc" - GPT

https://chat.openai.com/g/g-YX1CmSAA9-franz-enzenhofer-pando...

installs pandoc and then you can interact with it via chat interface.

sad that to PDF conversion does not work, as it would need pdflatex and I can't find a simple downloadable version (amd 64 linux) anywhere on the internet. If you have one on hand, please link it. I willl upload it, too. then PDF conversion will work.


I really like using pandoc as a build system [1] for my personal website to convert .md to .html. I can use templates, automatically generate a table of content and run some lua scripts to get the desired result, such as clickable headers.

[1]: https://github.com/furiousteabag/asmirnov.xyz/blob/master/bu...


Has anyone had success with running pandoc in a browser/compiling it to webassembly?

I haven't had time to look into it a lot but I think that would be amazing.


Well there's this: https://github.com/y-taka-23/wasm-pandoc I tried it some time ago and it worked quite well


I have develop a commercial data wrangling tool (Easy Data Transform) that outputs data tables to CSV, Excel, markdown, XML, JSON etc. It would be neat to support output to some of the other formats supported by Pandoc, such as DOCX, PPTX, PDF, Latex etc.

Has anyone tried integrating the Pandoc command line tool into a desktop product on Windows or Mac?

How big is it?

What are the licensing implications of shipping it with my (closed source) software?


(not a lawyer, not legal advice) With Pandoc being GPLv3 you can't just link it with your software. Distributing pandoc standalone (different, unmodified binary) and calling it from your software should be okay as they're technically different programs. Probably requires further investigation and maybe a lawyer though.


I had a one-off requirement for a slide deck for something, and I really don't think in that form - so I ended up writing a detailed outline in markdown, using pandoc to turn that into pptx, and then checking that in libreoffice. Worked surprisingly well; I didn't have a style to conform to (so I don't know how well libreoffice handled those) but the defaults worked out.


Such a beautifully effective tool.

The static site generator for my personal site is just it with a thin bash script wrapper: https://git.sr.ht/~kb/open-notes/tree/main/item/.build.sh.

So much simpler than pulling a universe of node modules.


Love this, been using it for years to write markdown as a base and then transform to html or pdf using latex for maths.


Atlassian's Jira wiki format and text editing tools drive me up the wall. There's something about how Atlassian does UX that rarely agrees with me. If I write markdown in a sensible editor (Obsidian), and convert it to Jira wiki, that's a big win for me. Love projects like this.


I've been using it to generate Docx/PDF from my markdown. Just started to explore means to customize (with tex template). Recent updated obsidian plugin requires explicitly configuring pandoc with latex which started the rabbithole customizing experiment :)


Huge fan of Pandoc. I don't use it for my personal website anymore, but I created a very crude "site generator" that piggybacks off Pandoc called pblog[0].

[0]: https://pblog.btxx.org/


Pandora is awesome. Add a self published author, this is a key tool in my tickets to have a single source of truth and (relatively) easily create beautiful PDFs and EPUBs.

I previously used restructured text and had to write custom tooling, but now I can write markdown on Jupyter.


And by "Pandora" .... I mean pandoc, and that I hate typing on my phone


I mainly use Typora for markdown editing and saving as PDF

Just check pandoc installation and get started guide. Overwhelmingly detailed, like I never use command line before. Maybe it assumes many users of pandoc are writers or Information Developer, without much tech skills.


I used Pandoc recently to convert a large Word document into markdown. It took a lot of babysitting and manual tweaking to get to the final result. Overall it probably still saved me time compared to doing the conversion manually, but only just. YMMV.


Recently I learned pandoc can act as a web server, and that the type system of Haskell itself provides a strong security guarantee that nothing will ever be written to disk, because nobody understands the IO Monad. Security through obscurity!


I use pandoc with some Make scripts to generate the epub of my novels and short stories. Having a reproducible way to iterate from source to final docs during edits and correction passes is amazing. I can't imagine doing it any other way.


I do all my scientific writing in Markdown (SublimeText + JabRef for bibliography). In ST I have a macro that runs Pandoc to convert .md files to .odt/.odp, including images and formatted references. Wonderful program to work with.


I have many books underway and I'd like to use pandoc in my automated artifact gen workflow, especially for makkng epub files. But I have not got it working the way I want yet.

In theory I like its design. as a CLI guy


Quite honestly the most valuable tool I'd learned in the 2010s (having been a significant Unix / Linux user since the 1980s).

In the late 1990s / early aughts, I'd written a toolchain to generate multiple document formats from a source based on HTML fragments. With Pandoc (and usually Markdown, occasionally LaTeX), I've discarded all of that, and have greater utility (more outputs, and FWIW, more input formats as well if I choose those).


One of the best pieces of software you’ll ever use. And if you do find a bug, the maintainer will fix it within hours of your Github report.


In a world where most READMEs and the like are in Markdown format I like to use Asciidoc and I like to keep the two in sync using pandoc


It would be funny if one of the arrows in the visual on the side were missing. “Sorry, you can’t convert that.”


You can use Markdown with pandoc to make a pdf akin to the LaTex ones. For simple stuff, it was nice.


I see a Haskell project, I upvote.


I love Pandoc. I wish the python library were a little more accessible.


Pandoc is amazing. I think of it kind of like ffmpeg for text files...


great analogy


the svg on the side of their page is kind of hilarious

though I've used this program too in the past to seamlessly convert html to text to easily import into a database...


Has anyone created a small, lite version of pandoc yet.


Is there a better UI for it than the usual one?


This is sick!


yes


Yes, I know.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: