Firefox (Mac) is not even able to print this article into a PDF. It gives you only one page and the page ends with a half sentence with the letters cropped in the middle. One third of the text is missing.
I cant count the number of times I had to take screenshots to save crucial information from web pages.
A tip if you're using iOS/iPadOS: In Safari, taking a screenshot now supports taking "full page shot", which you can save as a pdf. You get this option when you tap the tiny screenshot preview.
I had no idea. I’ve been using an app called Tailor to stitch together multiple screenshots of web pages. Until today, that is, thanks to your tip. Thank you!
>Saving pages as HTML is not ideal because a) you get an HTML file plus a folder, not very practical if you want to retrieve them later, and b) you never know how that page is going to render in future versions of your browser.
Yes but for my use case, which is better scientific communication means, PDF is not enough.
Consider for example slides for a presentation. The typical mathematician does them in TEX which outputs a PDF. Then the PDF is (sometimes) made available online. I realized that PDF slides are far inferior to HTML slides (where you can add demos and whatnot, shameless example [0]). Just put all in a github repository and anybody can take them home.
I gave up on the idea of reliably saving web pages in PDF.
I use now "SingleFile", a Firefox or Chrome extension that helps to save a complete page (with CSS, images, fonts, frames, etc.) as a single HTML file.
Great job! I suggest that you go deeper into how the annotation feature works in the add-on description, as that could be potentially useful. My main issue with this solution is whether this file format is going to be reliable in the future. As with MHTML that was mentioned in another comment, formats that are not widely used may one day not be supported the same way they are today. Not sure how you've achieved this as clearly HTML cannot hold media file by itself (unless there's something I am not aware of), so it must do something at a file system level, so I wonder is it going to still work in future versions of the operative system?
Thanks so much for your comment. I've read the original blog post (https://blog.webrecorder.io/2019/08/29/desktop-app.html) and it sounds like a very effective mean of capturing web content. I was not aware there's a standard for web archiving purposes, that sounds like something that would kept being supported in the future. And from what I read it's better than PDF as it captures interactive elements, as well. I will give it a try.
I had similar problem and wrote a browser addon: https://2read.net/ It converts websites to "readable" form and if you have IPFS running, it will also "pin" content locally. In most cases it works better than just printing an article. Here is an example with mentioned article: https://ipfs.io/ipfs/QmYPkcXgKLBye3L8M1VJWsGAb2mJXkJSEncqcSC...
This is brilliant! I like the fact that it also cleans up the page. I don't know much about MFS but I see there's a video linked from the add-on description so will take some time to listen. My main concern is what format is going to be always readable in the long term, so it should be either something widely accepted and distributed like PDF, or something that is going to be retro-compatible and always running smoothly in future machines. Thanks for your answer.
This has been a problem for years, which is too bad. No one really uses MHTML or any alternative. Hopefully Web Bundles* becomes a commonly supported spec.
Problem is, technology that is not widely used/distributed/supported is not reliable for archiving purposes. The idea is in principle good. I've just learned that there's an open format called WARP that was explicitly created for web archiving purposes, any relationship with what you are describing?
Reader view is (to my understanding) a heuristic that gives you the biggest wall of text of the article. If it works, it is nice. But in these cases you can usually also c&p.
However, it does not work if what you want to save is not a wall of text. Some table, some receipt, etc. It does not work for the Google search box, for facebook, or Amazon, just to give an example.
I cant count the number of times I had to take screenshots to save crucial information from web pages.