It would be great if the browser makers adopted something similar to SingleFile as a standard to replace MIME HTML. Though I think SingleFile should be updated a bit to take advantage of new APIs to make the output page cleaner.
With a small bit of JS at the top of the page (or provided by the browser), all the assets could be included are the bottom of the page in an organized JSON blob containing the base64 encoded files, and the rest of the page could remain untouched. The JS simply intercepts the elements as they are parsed and swaps out the src/hrefs for the appropriate data urls below. By adding in a super restrictive CSP to the top of the page, it's guaranteed to be safe and private to open. Here's a prototype:
I agree that browsers should offer an API in order to get all the resources easily. It would make things much easier. However, for security reasons, this API would still be restricted to environments like Web Extensions or automation tools like Playwright. FYI, I also took another approach which might interest you, see https://github.com/gildas-lormeau/SingleFileZ. The main drawback is that the HTML produced by SingleFileZ is not valid (but tolerated by the HTML5 spec.).
It's interesting you mention Zotero, I use Zotero all the time especially for university work. It has similar functionality in that it saves webpages. Is there any overlap with Singlefile? Or might the integrate?
That's excellent, I do use Singlefile as well but Zotero offer the database/citation functionality on top. Brilliant extension though, after ublock it's probably number 2 install for me.
.webarchive is just a binary plist, only works on Apple OSes and has security issues. I posted about the various ways to encapsulate HTML last week, including SingleFile, for those who are interested.
I thought the same, until I added automatic crash reporting to my app. It was a revelation: the app that I thought was very stable actually crashed all the time! But nobody told me about it!!! Customers apparently just restarted the app and ignored the crash, despite me offering free email support!
So, I really think some telemetry is needed to make reliable software. It should be anonymous, of course, and limited to things you need to know (eg. crashes or assertion failures) and not include things that don't help the customer (eg. usage statistics for market research are not ok in my opinion).
But if you collect no data at all, you'll never know of your customers issues.
It’s fine if it’s opt-in. Most developers simply assume consent, however, which has all of the normal problems that come along with assuming consent in any context (ie your assumption being wrong in a significant number of cases and the completely avoidable consent violations that occur thereby).
Apple has had anonymous crash reports for years. Unfortunately, they are close to useless, because you get reports only for a tiny fraction of crashes. I assume it's either because nobody opts in to share analytics with 3rd party developers, or because Apple is just incompetent and their tech doesn't work.
In any case, I received only a handful of crash reports from Apple. I thought everything was okay, except that customers occasionally reported an issue, but I never got a crash report from Apple that could explain the situation. Until I built my own reporting solution, which just sends a stack trace to my server in a signal handler. I started receiving dozens of crash reports per week.
So while on the one hand I applaud Apple for trying to do the right thing, as a developer I can only say that the crash reports they share with developers are so few they are close to useless.
Thank you for the link, this is interesting. It's nice Apple is apparently building support for custom crash reporting into the OS.
From a privacy perspective it sounds like it logs exactly the same info that I currently log with PLCrashReporter, except that it only works on macOS 12.
Is there anything like this, except a CLI tool? Something like yt-dlp. I know of curl and wget, but only the basics. I tend to use qutebrowser, which isn't supported, and I don't really expect it to be supported either. Something more neutral run from the shell would be great.
A PDF doesn’t represent a responsive/resizable version of the page so it will look awkward on most screen sizes even if the original would have handled it.
A PDF doesn’t capture scrolling behavior, so a nested scrolling element will lose most of its content, and a page with a chat prompt or cookie notice might have part of its content covered.
A PDF won’t capture even simple interactive elements like image carousels, lightboxes, and collapsible sections, so content may be lost (“oops, I saved it on the second slide of the image carousel, but I really wanted the first one”).
As far as I know, a PDF won’t include embedded audio/video.
Many PDF exporters chunk the document into paper-sized pages (but, to be fair, some don’t).
Not sure if this tool nails all of those cases, but those are reasons why I’ve saved local copies of pages in the past.
The reason is the same in 2022 as it's always been and will remain - PDF is a lossy conversion - you throw away big piles of structure and metadata, often actual text, code, image resolution. Fidelity often matters a lot to digital packrats.
I'd say the difference is similar to source code vs. compiled-binary-for-a-single-target. The webpage is the source code and can be rendered for various targets if it's responsive. The PDF is just how it looks on a single target, and does not include how it behaves (animations, video, hover effects, etc). So both have benefits.
SingleFile only "compiles" the resources (stylesheets, images, fonts etc.) into an HTML file. You can always transform this HTML file into a PDF afterwards. You will get the same or even better result thanks to the prior transformation.
Could it be made possible to save images externally instead of base64? I need a quick way to replace images. It contradicts the name but scrolling & missing in my editor sucks. Unless someone knows the pro tools.
That is the default behaviors' in Chromiun browsers. Right click the page > save as > (WebPage Complete). It gives you one html file and a folder containing all images
For a bookmarking project I'm working on, I'd like to be able to click a button "capture screen" and have the entire rendered single page sent to an s3 bucket.
The Web Extension API is required if you want to download resources blocked via the "Same-origin policy", see https://developer.mozilla.org/en-US/docs/Web/Security/Same-o.... This is the main problem and it is quite common. This could be circumvented by using a proxy, but this would introduce security issues.
I think this is the case in terms of longevity and convenience. Pages are saved in HTML. As long as you have a software that can read this type of files, you'll be able to view your saved pages.
Just started using SingleFile — love it. Are there any other utilities/projects that you think that would have such personal and broad utility they would be a decade+ runway?
I've been using SingleFile for Firefox for a long time. Works great, and is way better than what I used to do when traveling in 2007, which was to go to an internet cafe, and save .html files to a USB stick for offline reading when I got back home.
Feel the power of the Manifest v3 - https://news.ycombinator.com/item?id=33063619 - Oct 2022 (273 comments)
SingleFile: Save a complete web page into a single HTML file - https://news.ycombinator.com/item?id=30527999 - March 2022 (240 comments)
Show HN: SingleFile Lite, new version of SingleFile compatible with Manifest V3 - https://news.ycombinator.com/item?id=29331038 - Nov 2021 (2 comments)