Hacker News new | past | comments | ask | show | jobs | submit login
Diving deeper into custom PDF and ePub generation (nibblestew.blogspot.com)
50 points by ingve on Sept 20, 2022 | hide | past | favorite | 10 comments



I think there may be a market for a typesetting language that interacts directly with PostScript instead of LaTex + dozens of packages -> PDF or HTML + CSS + forcing pagination -> PDF. It would basically be a high level language that is compiled to PostScript and perhaps word, DVI and even HTML. I say there is a market for it because I imagine that there are many companies using a collection of hacked together scripts perhaps involving word-mailmerge to make their invoices and letters. Unless there is an industry standard I'm not aware of. LaTex I feel is "just enough" for whitepapers but breaks for almost anything past that. Document generation for example is impossible without using a templating engine. There's nothing wrong with that specificaly but it's limited in many ways that add up to it being an annoying experience. I've also used HTML/CSS -> PDF but there is no pagination by default. We need a real typesetting language.


great developers want to re-make tools + tools never pay well in the market => zealots and dreamers get glued to tedious toolchain managers for a short time, then retreat.. repeat

markets seek to reward initiative, but do reward great initiative, greatly? no, so the cycle will always repeat

source: I have a 3-ring binder from Adobe Systems that I picked up from their Embarcadero building, pre-Red Book.


>source: I have a 3-ring binder from Adobe Systems that I picked up from their Embarcadero building, pre-Red Book.

could you expound on what this means and how it functions as a source?


I can really recommend weasyprint for (markdown→)html→pdf generation: https://github.com/Kozea/WeasyPrint

Things like columns, footers/headers, hyphenation are all easily done with modern CSS.


+1 for weasyprint, only library I’ve found that respects all modern CSS and importing fonts


I used a tool called AsciiDoctor (http://www.asciidoctor.org), which uses a markup called AsciiDoc. Its not Markdown, but similar. It works quite well, but is still missing some features like forced page breaks, the last time i used it about a year ago.

Edit: corrected tld


The correct TLD for AsciiDoctor is .org, not .com : https://asciidoctor.org/


Not related to this article per se, but I wonder if there is a tool (or a market) for "validating" pdfs. Specifically I'm thinking of a tool that validates that different fields in a pdf have been filled. Sort of like "required" fields in a document that need to be filled for back office processing. I'm thinking about a recent experience where I had to send a pdf back and forth via email a couple times because some fields were missed each time. Perhaps an automated system would've caught this and saved everyone some time.

Might even make for a nice weekend project


One quote from the article that I disagree with:

"This is "Markdown-like" but specifically not Markdown because novel typesetting has requirements that can't easily be retrofit in Markdown."

Any HTML can be included with Markdown. Why not use HTML for specific typesetting use cases?


This is a lot like feeding JSON to a YAML parser. Barring the known edges, it works, but if that's normal, why the YAML parser?

Inlining a bit of HTML into Markdown is a good way to inject some forms Markdown doesn't support, but there's some tipping point where the Markdown is just in the way.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: