Show HN: SingleFile is finally available on Safari (macOS/iOS)

dang · on Nov 17, 2022

SingleFile: Save a complete web page into a single HTML file - https://news.ycombinator.com/item?id=30527999 - March 2022 (240 comments)

Show HN: SingleFile Lite, new version of SingleFile compatible with Manifest V3 - https://news.ycombinator.com/item?id=29331038 - Nov 2021 (2 comments)

russellbeattie · on Nov 17, 2022

It would be great if the browser makers adopted something similar to SingleFile as a standard to replace MIME HTML. Though I think SingleFile should be updated a bit to take advantage of new APIs to make the output page cleaner.

With a small bit of JS at the top of the page (or provided by the browser), all the assets could be included are the bottom of the page in an organized JSON blob containing the base64 encoded files, and the rest of the page could remain untouched. The JS simply intercepts the elements as they are parsed and swaps out the src/hrefs for the appropriate data urls below. By adding in a super restrictive CSP to the top of the page, it's guaranteed to be safe and private to open. Here's a prototype:

https://www.hypertext.plus/demo/htmldoc-test1.html

gildas · on Nov 17, 2022

I agree that browsers should offer an API in order to get all the resources easily. It would make things much easier. However, for security reasons, this API would still be restricted to environments like Web Extensions or automation tools like Playwright. FYI, I also took another approach which might interest you, see https://github.com/gildas-lormeau/SingleFileZ. The main drawback is that the HTML produced by SingleFileZ is not valid (but tolerated by the HTML5 spec.).

bradleyankrom · on Nov 17, 2022

I’ve been using this extension in Firefox for a while and couldn’t be happier with it. Kudos!

hamsterbase · on Nov 18, 2022

I have created a completely offline knowledge management tool that can connect to singilefile and store web pages with one click.

Once saved, it supports highlighting, full text search, and creating RSS feeds.

In addition to html, it also supports importing mthml, webarchive.

Thanks to singilefile, I don't need to develop a separate browser plugin.

Document: https://hamsterbase.com/docs/integrations/singlefile.html

ksidudwbw · on Nov 17, 2022

Why not charge 5$, am I the product?

gildas · on Nov 17, 2022

No, the source code is licensed under AGPL. The business model is to eventually sell licenses to companies selling proprietary products/services.

Edit: I forgot to say that it's thanks to donations (a big thank you to Zotero) that I could have bought a Mac and paid the development license

account-5 · on Nov 17, 2022

It's interesting you mention Zotero, I use Zotero all the time especially for university work. It has similar functionality in that it saves webpages. Is there any overlap with Singlefile? Or might the integrate?

gildas · on Nov 17, 2022

I confirm they use SingleFile under the hood to save snapshots :)

account-5 · on Nov 17, 2022

That's excellent, I do use Singlefile as well but Zotero offer the database/citation functionality on top. Brilliant extension though, after ublock it's probably number 2 install for me.

gildas · on Nov 17, 2022

Don't worry, even if you only use Zotero, I wouldn't blame you. It's a very good product. Thanks a lot for your support :)

nyreed · on Nov 17, 2022

Does this have advantages over .webarchive?

gildas · on Nov 17, 2022

Only Safari can open .webarchive files. Files saved with SingleFile can be opened in any browser (without needing to install any extension).

Edit: I just did a test with https://www.theguardian.com/world/2022/nov/17/three-men-foun.... The CSS of the resulting webarchive is very broken and the file weights 4 times more than the one saved with SingleFile.

russellbeattie · on Nov 17, 2022

.webarchive is just a binary plist, only works on Apple OSes and has security issues. I posted about the various ways to encapsulate HTML last week, including SingleFile, for those who are interested.

https://www.russellbeattie.com/notes/posts/the-decades-long-...

sneak · on Nov 17, 2022

There’s nothing I love to see more on an App Store page than that super cool “Data Not Collected” check mark. :)

newaccount74 · on Nov 17, 2022

I thought the same, until I added automatic crash reporting to my app. It was a revelation: the app that I thought was very stable actually crashed all the time! But nobody told me about it!!! Customers apparently just restarted the app and ignored the crash, despite me offering free email support!

So, I really think some telemetry is needed to make reliable software. It should be anonymous, of course, and limited to things you need to know (eg. crashes or assertion failures) and not include things that don't help the customer (eg. usage statistics for market research are not ok in my opinion).

But if you collect no data at all, you'll never know of your customers issues.

sneak · on Nov 17, 2022

It’s fine if it’s opt-in. Most developers simply assume consent, however, which has all of the normal problems that come along with assuming consent in any context (ie your assumption being wrong in a significant number of cases and the completely avoidable consent violations that occur thereby).

wahnfrieden · on Nov 17, 2022

Apple has a framework for doing this without data collection now

newaccount74 · on Nov 17, 2022

Apple has had anonymous crash reports for years. Unfortunately, they are close to useless, because you get reports only for a tiny fraction of crashes. I assume it's either because nobody opts in to share analytics with 3rd party developers, or because Apple is just incompetent and their tech doesn't work.

In any case, I received only a handful of crash reports from Apple. I thought everything was okay, except that customers occasionally reported an issue, but I never got a crash report from Apple that could explain the situation. Until I built my own reporting solution, which just sends a stack trace to my server in a signal handler. I started receiving dozens of crash reports per week.

So while on the one hand I applaud Apple for trying to do the right thing, as a developer I can only say that the crash reports they share with developers are so few they are close to useless.

wahnfrieden · on Nov 18, 2022

You’re referring to something older

newaccount74 · on Nov 18, 2022

So what's that new framework? Can I use it to reliably transmit stack traces when the app crashes?

wahnfrieden · on Nov 18, 2022

have a look at https://github.com/ChimeHQ/MeterReporter which uses the wwdc20 frameworks

newaccount74 · on Nov 19, 2022

Thank you for the link, this is interesting. It's nice Apple is apparently building support for custom crash reporting into the OS.

From a privacy perspective it sounds like it logs exactly the same info that I currently log with PLCrashReporter, except that it only works on macOS 12.

wahnfrieden · on Nov 20, 2022

Ah ok

opan · on Nov 17, 2022

Is there anything like this, except a CLI tool? Something like yt-dlp. I know of curl and wget, but only the basics. I tend to use qutebrowser, which isn't supported, and I don't really expect it to be supported either. Something more neutral run from the shell would be great.

gildas · on Nov 17, 2022

You can run SingleFile from the command line actually :) See https://github.com/gildas-lormeau/single-file-cli

avinassh · on Nov 18, 2022

is it possible to use CLI to download authenticated/logged in pages? Is there anyway I can pass cookie/localstorage data?

gildas · on Nov 18, 2022

I confirm you can pass cookies. You can also add data in the local storage via a user script.

testfrequency · on Nov 17, 2022

Can someone make a practical reason for saving an entire webpage locally vs a PDF export in 2022?

Sidnicious · on Nov 17, 2022

A PDF doesn’t represent a responsive/resizable version of the page so it will look awkward on most screen sizes even if the original would have handled it.

A PDF doesn’t capture scrolling behavior, so a nested scrolling element will lose most of its content, and a page with a chat prompt or cookie notice might have part of its content covered.

A PDF won’t capture even simple interactive elements like image carousels, lightboxes, and collapsible sections, so content may be lost (“oops, I saved it on the second slide of the image carousel, but I really wanted the first one”).

As far as I know, a PDF won’t include embedded audio/video.

Many PDF exporters chunk the document into paper-sized pages (but, to be fair, some don’t).

Not sure if this tool nails all of those cases, but those are reasons why I’ve saved local copies of pages in the past.

pvg · on Nov 17, 2022

The reason is the same in 2022 as it's always been and will remain - PDF is a lossy conversion - you throw away big piles of structure and metadata, often actual text, code, image resolution. Fidelity often matters a lot to digital packrats.

joemi · on Nov 17, 2022

I'd say the difference is similar to source code vs. compiled-binary-for-a-single-target. The webpage is the source code and can be rendered for various targets if it's responsive. The PDF is just how it looks on a single target, and does not include how it behaves (animations, video, hover effects, etc). So both have benefits.

gildas · on Nov 17, 2022

SingleFile only "compiles" the resources (stylesheets, images, fonts etc.) into an HTML file. You can always transform this HTML file into a PDF afterwards. You will get the same or even better result thanks to the prior transformation.

maliker · on Nov 17, 2022

I use SingleFile occasionally for saving and sharing interactive pages. E.g. ones with zoomable charts. Works great.

theyknowitsxmas · on Nov 18, 2022

Been using SingleFile for a while.

Could it be made possible to save images externally instead of base64? I need a quick way to replace images. It contradicts the name but scrolling & missing in my editor sucks. Unless someone knows the pro tools.

TowerTall · on Nov 20, 2022

That is the default behaviors' in Chromiun browsers. Right click the page > save as > (WebPage Complete). It gives you one html file and a folder containing all images

theyknowitsxmas · on Nov 26, 2022

I completely missed that. Unfortunately some sites have multi-size links and Chrome doesn't export them if not rendered in DOM.

aantix · on Nov 17, 2022

Is the extension needed?

Could this be an entirely client side API?

For a bookmarking project I'm working on, I'd like to be able to click a button "capture screen" and have the entire rendered single page sent to an s3 bucket.

gildas · on Nov 17, 2022

The Web Extension API is required if you want to download resources blocked via the "Same-origin policy", see https://developer.mozilla.org/en-US/docs/Web/Security/Same-o.... This is the main problem and it is quite common. This could be circumvented by using a proxy, but this would introduce security issues.

pvg · on Nov 17, 2022

Lots of detailed info in the github repo

https://github.com/gildas-lormeau/SingleFile

jvican · on Nov 17, 2022

Thank you for doing this in the open and all the work you've put into this.

victorclf · on Nov 17, 2022

Is this better than good ol' mht files?

gildas · on Nov 17, 2022

I think this is the case in terms of longevity and convenience. Pages are saved in HTML. As long as you have a software that can read this type of files, you'll be able to view your saved pages.

SanchoPanda · on Nov 17, 2022

This is excellent news, thank you so much!

pryelluw · on Nov 17, 2022

Nice. Thanks for making it and it sharing.

siva7 · on Nov 17, 2022

Did we really need 30 years to finally get a snappy way to save websites in a single file?

gildas · on Nov 17, 2022

Maybe, I started coding SingleFile (for Chrome) almost 13 years ago

anoojb · on Nov 17, 2022

Just started using SingleFile — love it. Are there any other utilities/projects that you think that would have such personal and broad utility they would be a decade+ runway?

rchaud · on Nov 17, 2022

I've been using SingleFile for Firefox for a long time. Works great, and is way better than what I used to do when traveling in 2007, which was to go to an internet cafe, and save .html files to a USB stick for offline reading when I got back home.

marcellus23 · on Nov 17, 2022

Safari has been able to save pages as webarchives since the beginning.