Hacker News new | comments | show | ask | jobs | submit login
Show HN: ReLaXed – High-quality PDFs using web technologies (github.com)
344 points by zulko 4 months ago | hide | past | web | favorite | 79 comments



The examples lack hyphenation, which partly explains the too-variable interword spacing. Is this because Chrome still fails to support hyphenation, unlike, for example, Firefox?

There are other subtle defects, which make these PDFs pretty good, but not high quality.

Here is a brief discussion of some of the shortcomings of web typography, and why we still need to use TeX if we want the most beautiful and easiest to read results:

https://lwn.net/Articles/662053/

All that aside, this is impressive and should be useful to many people.


I would say similarly to hyphenation is TeX's ability to place page breaks optimally. I don't believe any web technology can solve this problem at the moment.

Just printing the <p> tag, with its constraints of text layout on all layers (word, line, paragraph, page) already has a lot of details you need to get right to get naturally readable text flow before adding on all the other complexities of html. For instance, if you have a single line creep onto the next page, but you could also just move the entire paragraph to the next page and subtly adjust spacing on the first page, then that is preferable so that each paragraph resides entirely on one page. This is obviously not always possible or desirable, so it turns into a search problem with many variables that can be dynamically altered in the middle of text flowing.

My understanding of modern CSS engines is both that a) CSS itself lacks the natural primitives to even express constraints you'd find in TeX, and also b) the concerns necessary to solve page layout to this degree fall into the type of search problem that browsers tend to try to avoid when rendering.

Of course, there's an argument to be made that if people don't realize it's missing, maybe it wasn't terribly valuable to begin with. I'd imagine for most home uses it's not very useful, but the fact that you can typeset decades old documents at a de-facto professional level, for free, OR with heavily modified engines allowing more modern practices, is really quite amazing. I hope the effort that went into formalizing "readable text" doesn't get lost as people move on from TeX--it'd be great to get some of this capacity in a browser with competing implementations; TeX is a lot to learn for most people, and it's also turing complete, which is IMHO mostly a bad sign for accessibility.

There are also projects which attempt to render HTML to TeX, but they were frankly mostly terrible the last time I looked. I honestly wonder if it's easier these days for javascript to attempt to render the DOM to TeX and just leverage the browser as much as possible, but I'm not familiar enough with the DOM to speculate on how much this is likely to work on unaltered pages. My guess is you only get so much for free before you have to specifically consider that output scenario, just like other types of responsive layout.


> I would say similarly to hyphenation is TeX's ability to place page breaks optimally. I don't believe any web technology can solve this problem at the moment.

I wonder how hard it would be to just compile TeX to WebASM or something.... TeX is _ridiculously_ fast on a modern PC, hundreds of pages per second. All the libraries/macros might be problem, but I bet you could prepare a useful-but-minimal set that would fit in a MB or two gzipped.


I've always been bemused by the major focus that we need a brand new implementation of the algorithms so that they can take advantage of modern computers. Ignoring just how bloody well it already takes advantage of faster processing power.

(I get that there are some good arguments for a more developer friendly extension language.)


I think Latexbase does something like this (they claim it works offline too, using service workers). They claim to be using Emscripten/LLVM but it doesn't seem like they use wasm yet (probably because they didn't consider it a viable option back in 2016 https://latexbase.com/p/b0a174a5-09b3-4598-9686-3a73be2dc8e5). And while it's not open source I still think it's a really cool proof of concept..


Hyphenation can be tuned via CSS but I have never been happy with it:

https://www.w3schools.com/cssref/css3_pr_word-break.asp

From what I remember LaTeX has better algorithms, both in how to distribute words between lines, and in knowing where in a word it is ok to cut.


https://github.com/bramstein/typeset This is an implementation of the line breaking algorithm used in TeX in Javascript.


For anyone that wants to have a good read, try reading the TeX-book's section on hyphenation. Tons of fun that goes over some of the difficulties.

As a fun trivia, think where to hyphenate the word record. In all forms.



As recently as a year ago, Chrome support of automatic hyphenation only worked on some platforms. If the support is better now, I'm glad to hear it.


It is not. They have not made progress.


maybe something like :

https://github.com/ytiurin/hyphen Franklin M. Liang's hyphenation algorithm, implemented in Javascript.

could be integrated.


When I saw "... using web technologies" I was curious if it uses Puppeteer. package.json confirms that is indeed the case.

https://github.com/GoogleChrome/puppeteer

(I work for Chrome DevTools team, creators of Puppeteer)


I was wondering the same as it’s a common use-case for the project I run (browserless.io). Seems to be a big demand for sane PDF rendering and generation.

Been pretty interesting seeing webtech handle these kinds of problems


Happy serendipity!


How would that compare to, say, an HTML template + wkhtmltopdf?

Also I feel like the biggest gripe with generating (long) PDFs from HTML are things such as page numbering, orphans and widows, semantically correct word-wrapping, page margins, etc...

Chrome does a decent job but is nowhere close to what LaTeX can do.


I have gathered some comparisons in this wiki page:

https://github.com/RelaxedJS/ReLaXed/wiki/ReLaXed-vs-other-s...

It is open to contributions, so any thoughts welcome. In a nutshell, all your points are valid. Chrome is one of the best browsers, but still behind LaTeX in some aspects. But which will evolve faster in the future ?


> But which will evolve faster in the future ?

Are you comparing Chrome and LaTeX? Chrome is certainly evolving faster overall, but features related to PDF printing have not changed much, or any at all.


I'd suggest there's quite a lot more inertia to overcome with Chrome (or any HTML/CSS-based tech) to get improvements in print features than with a non-web tech... But hey, no reason not to try :)


CSS paged media supports page numbering, widow and orphan control and page margins.

https://developer.mozilla.org/en-US/docs/Web/CSS/Paged_Media


> Also I feel like the biggest gripe with generating (long) PDFs from HTML are things such as page numbering, orphans and widows, semantically correct word-wrapping, page margins, etc...

a blog about this issue: http://www.pagedmedia.org


Oh how much i would love to have good way how to generate print quality PDFs. The real problem is not hyphenation but how lines are composed. If you want even lines in type set to block then there is probably only Adobe Indesign and LaTeX anything else uses "single line composer" i dont know the algorythm but Latex and Indesign are only ones which take multiple lines into considiration. Latex is sort of Okay but the algorythm in Indesign is still highly superior. I suspect that is some Adobe secret sauce. Pity because you cant run indesign on server, you have it open and use "extendscript" their version of old ECMAScript 3 :(


Adobe's secret sauce is largely implemented in the microtype package in LaTeX world (character protrusion for optical margin alignment and font expansion for more even interword spacing and less hyphenation). Also the technology didn't originate at Adobe; Adobe purchased the technology from URW who developed the hz-program that was the real pioneer for those micro-typographic adjustments.


Microtype package is pretty great. I want latex to work for me - maybe i am missing something but it always bothered me that i have to mix content with styling. I guess it is great when you are writing something yours and you work on styling and content at the same time. Most of my uscases are unfortunately feeding some text input and spitting pdf.


Have you looked at Prince [1]? It's commercial, but highly regarded.

The coolest project I've seen with it is OMA (Rem Koolhaas' architecture firm), which uses it to print internal, very professional-looking booklets automatically generated from data, text and photos stored in Sanity [2]. (The Sanity team also built the system to make the booklets.)

[1] https://www.princexml.com

[2] https://www.sanity.io/docs/introduction/what-the-headless


I know about it but it is pretty expensive so i never gave it any thought. I am not sure if i can test it anyhow.

It is probably nobrainer if you are generating pdfs all the time but i would have to use it on multiple projects to make it financially possible. Funnily enough right now i am working on archive for architecture company. But thats like 100 pdfs.


It's not cheap, but if you run it on a single server, you can use it for as many projects as you like.

Apparently someone made a hosted SaaS version, though, which might be affordable if you don't produce a lot of pages: https://docraptor.com/signup.


Are you sure that it is Prince powering this?


Yes. It's in the documentation.


This thesis has a good discussion of the issue. Pages 15-16 discuss Adobe’s secret sauce (though it is secret). https://www.tug.org/TUGboat/tb21-4/tb69thanh.pdf


Seems kind of neat. But for my purposes I will still use Markdown to PDF using pandoc etc.

What really upsets me... the typography still looks shit compared to LaTeX... MS Word / LibreOffice can do better. Would rather stick with plaintext again.


FOP is the only TeX alternative that can get close to it on basic typography in a FOSS implementation. I had a toolchain that ran ReST -> Docbook -> XSL-FOP -> PDF but the hard drive it was on bit the dust and I haven't gotten around to recreating it. Still much more pleasant than wrestling with LaTeX's rigid predetermined layouts. The result was nice and didn't have the crusty PDF LaTeX appearance.


Well there are a bunch other engines out there, but it just so happens they all eat tex or something very close to it (context).

Five or six years or so ago I used reportlab (in Python) to generate some PDF reports (using the flowables API); it does kinda work but layout is more complicated than in tex and output quality is several notches down.

As far as appearance goes, you can make tex look like almost anything, even with fairly low effort.


How does this improve upon Pandoc?

https://i.imgur.com/tMkMjNV.png

In the image, ConTeXt generates PDFs. The EA box represents HTML documentation exported from Enterprise Architect, but could be any structured document that pandoc can parse. The source repository contains various themes for the final PDF.

Using ConTeXt offers several compelling features, such as: citations, cross-references, and ability to produce EBPUBs.


Pandoc ultimately either has to move the html through another markup format such as laTeX or uses a plugin that attempts to convert html4 to pdf code.

This uses a full browser rendering engine that supports modern html5/css3/js by ultimately running a headless browser.

I suspect pandoc is still a great approach for a lot of cases. Running a headless browser isn't cheap, especially at scale. If your output is a simple book or an invoice, pandoc is probably the way to go. If you want to pdf websites or dump an html file with charts into a pdf, use this.


I'm not sure I understand the image, but to my knowledge you can't just do any html -> pdf with pandoc.


I have no idea how well it works, but pandoc supports html as an input format (if I understand right, it supports most/all input->output pairs).


I believe they generate Latex fromh html and pdf from latex.


Would you happen to know the origin of this diagram? I like the font and overall style.


This is neat, but perhaps switching the final typesetting engine from chromium's PDF printer to LaTex (via Pandoc maybe) would make it more useful. You'd get more control over things like page numbering and TOCs, plus good justification/microtypography, which is important to most publishers.


Related, why doesn't anyone ever mention [Apache FOP](https://xmlgraphics.apache.org/fop/) for this kind of thing? I've had great success with it.


Waiting to see an example with footnotes and auto references ;)


That will probably be difficult because Chrome just "prints" a PDF. Therefore headers, footers, footnotes, and page numbering is a difficult issue to solve.


FYI, Puppeteer does indeed support header and footer templates when printing to PDF:

https://github.com/GoogleChrome/puppeteer/blob/v1.3.0/docs/a...


I am thinking about it and there may be a way to do it using Pug mixins (like LaTeX macros).

Also, ReLaXed supports Markdown-it, which in turn has plug-ins for footnotes and citations, for instance. Not sure what you mean by auto-reference, but that should be possible, like in any other HTML page, wouldn't it ?


Not OP, but one of the most useful things about wkhtmltopdf is its ability to create an index page (edit: TOC) with references to all heading tags (<h1>,...). Is this something that can be done with ReLaXed?


Your PDF will automatically reference the h1, h2, etc, as sections and subsections in your PDF (i.e. you should see them in the sidebar's document tree in PDF viewers that have such a feature). You can also certainly generate a TOC using some javascript framework. For a quick google search I found this:

https://css-tricks.com/automatic-table-of-contents/


This looks nice - as a regular latex user, I'd say it (latex) sits roughly between excruciating agony and the actual worst thing in the world.

So the beginnings of an alternative looks great!


How do we pronounce this? Re-LACKED?


Probably. People is abusing the X letter (do they know that it is not pronounced anymore as "ki" in modern greek?).


> Do they know that it is not pronounced anymore as "ki" in modern greek?

They may know that, but it is irrelevant. English lacks the phoneme /x/ and the usual substitution in assimilating foreign words (or the letter-play that Donald Knuth started with TeX) is the closest unvoiced velar that English has: /k/. See how most English speakers pronounce the name of J. S. Bach as [bak], with only a small number of pedants saying [bax]. Or, outside of Scotland, Loch Ness is usually [lɔk], not [lɔx].


All I want is a system that gets the basic right and is version-controllable in git (plain text source code). Latex is just ridiculously complex and inconsistent. Even after years of using it, I have to google how to do most things every time. I would prefer a simple PDF generator that uses pug/HTML (which I know by heart) any day.


https://github.com/bramstein/typeset

This is an implementation of the line breaking algorithm used in TeX in Javascript. It would be nice to add to obtain better typographic results with justified text.


Looks like the perfect solution to my resume. The latest iteration is in HTML/CSS, because it allowed me to easily get the exact layout I wanted (so painful in LaTeX...), but getting a consistent PDF was a challenge.


I produce all my PDFs with pandoc's markdown and in-line html: letters, slides and papers with citations. Depending on whether I need mathjax I use wkhtmltopdf or chromium (JS-based hypens with Hyphenopoly) or just http://weasyprint.org/ if no JS is involved.

This pug language seems to be a good alternative to intermixed markdown+html.


I find Markdown most natural for writing because I do not have to worry about formatting or syntax.

Currently I deliver ~2 PDF reports per week using Ulysses or MacDown for content creation (distraction-free writing), and then typesetting everything into InDesign.

Thank you for creating this tool, I will try it next week.

The ability to render Markdown to Pug as an "Import Markdown" feature would be key for many people to adopt this.


Inline markdown and external markdown files are both supported. Have a look at the "Book" example. Every chapter is in its own Markdown file. Most of the other examples have parts where I simply switch to markdown.

I am also a big markdown user and I have found that for writing reports all day long markdown clearly wins over Pug, in particular with tools like

https://atom.io/packages/markdown-preview-enhanced

But the day where you need to produce a super-nice report with a bit of custom layout, Pug/SCSS is awesome.


Completely missed that. Thanks, will try.


Really beautiful stuff!

I'm in the process of launching BreezyPDF.com which can generate equally as wonderful PDFs from the HTML/JS/CSS you're already using.

Here's a demo of turning a complex dashboard into a PDF: https://ruby.demo.breezypdf.com


How does this compare with Prince?


Prince is a 3800$ software. Prince seems to encourage XML/HTML/CSS for writing documents, and I didn't like this. With ReLaXed I am trying to show that Pug/SCSS makes document writing much more natural.

Where Prince wins is in its support for CSS @page extensions (having pages with different margins etc.), it looks much more adapted to professional publishing. There are certainly many more advantages related to typography but I don't know them.

Link to Prince:

https://www.princexml.com


Looks like it uses headless chrome for HTML to PDF conversion so its not going to support advanced print CSS like Prince.

Real issue is Prince is the only browser that supports full print CSS, none of the major browsers seem to care about better print output anymore.


I used to use Prince in a few print-heavy applications. Eventually I moved to a LaTeX based workflow instead (we needed some features that Prince couldn't be expected to deliver). But Prince was a real pleasure to work with, and the output quality was very good. If LaTeX weren't a viable option in my project then I would use it again.

Interesting random factoid: Prince is (was?) written in a language called Mercury [1], which is kind of a statically typed Prolog. Research into Prince turned me on to state-of-the-art logic programming, so I'm thankful for that as well. :)

[1] http://www.mercurylang.org/


For one, everything here appears to be free and open source. Prince is pretty costly to run as a small endeavor and last I knew their pricing model wasn't very kind to horizontal scaling.


DocRaptor is PrinceXML as a service with reasonable pricing.

https://docraptor.com


DocRaptor is great. We used it for generating invoices for several years until our volume justified buying Prince and running it locally.


Check out https://breezypdf.com, which is reasonably priced with horizontal scaling.


Not sure if this is related to the format of the PDF somehow, but my computer completely froze when trying to open the Alice pdf in the GitHub viewer. This is on Safari, Chrome was fine.


Upon further inspection, the GitHub renderer works fine on PDF's much larger [1], and the native Safari PDF viewer opens these PDF's fine. I suspect there is something the GitHub renderer, your pdf generator, and Safari's js engine disagree on.

[1]: https://github.com/mynane/PDF/blob/master/Docker%20——%20从入门到...


Interesting project! Can this be used to potentially include JavaScript in resulting PDF files and use them to animate images in the PDF?

This would be useful for some presentations I think.


You can include Javascript in the pages, but it won't be animated in the end.

Maybe you would be interested by this project to make slideshows with Pug/SCSS/Vue.JS. There you can make plenty of animations:

https://zulko.github.io/eaglejs-demo/#/


Thank you! That is a brilliant suggestion indeed.


Given that some of the usecases are to print books, and that this internally uses HTML and CSS< would there be an officially supported way to publish ePubs?


ReLaXed really focuses on PDFs right now (to keep the initial focus small) but it produces an HTML file as a byproduct. From what I understand there is not far from an HTML file to an Epub.

This being said, the primary goal of this library is to enable to make documents with complex or fancy layouts. Epubs generally have a simple structure (chapter/section/paragraph) and can be written using for instance Markdown:

https://pandoc.org/epub.html


Very cool. Could standard Latex mathmode be used here?


Mathjax notation is supported:

https://www.mathjax.org/

See some basic use in ReLaXed in the "paper" or "slideshow" examples, or here for a basic documentation:

https://github.com/RelaxedJS/ReLaXed/wiki/Features#equations


Looks awesome. Can it use CMYK colors? That is killer feature for print centers.


Does it support things like mathjax? This is a really important question to me.


According to this example in their repo, it does indeed support mathjax: https://github.com/RelaxedJS/ReLaXed-examples/blob/master/ex...




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: