I’d love it if you gave it a spin; please let me know if you find anything nasty!
chrome --headless --disable-gpu --print-to-pdf https://www.google.com/
As soon as you need to control anything, you have to use [Puppeteer](https://developers.google.com/web/tools/puppeteer/)
Other than that, it runs the article through Readability to extract just the main content and applies customizable CSS / HTML to it.
We support 'captured' HTML pages. Basically what we do is we fetch the full HTML of the content and store it in a PHZ file (polar HTML archive) and then we save that to disk (it's just a zip file with JSON metadata).
The Polar app is an Electron app so it has full access to render HTML.
We then inject our self into the network layer using protocol interceptors and if you're loading the URL you just captured we load the content from the PHZ instead of the network.
You can then annotate the content, take notes on it, tag it, and keep it forever without risk of it vanishing.
I use it for important documents that I can't afford to ever lose. For example, the Etherium whitepapers are in HTML , not PDF. they're also living documents so I can just capture anytime I want.
HTML files don't often print properly so this way I can keep them the way they were meant to be seen.
In any case, many people have tried creating such a tool. I once used to believe that such functionalities should be part of the browser itself. But there's always been a disconnect between local files and browser. Now in the mobile world, it is even difficult to stay in sync.
percollate pdf --output p.pdf https://github.com/danburzo/percollate
The font is gigantic and the page tiny. Barely get to the second headline on the first page.
And there is no way to tune this on the command line (yet).
It's probably a good idea to not introduce flags like `--font-size 12p` or `--page A4` since it leads down a rabbit hole. (Where do you stop?). Directly passing down CSS seems appropriate here.
> percollate html Not implemented yet