Wow, it's really rare these days to see a tool that supports WARC.
Despite being an ISO standard [1] and the default archive format of the internet archive, and despite a handfull of lovingly crafted tools (such as webrecorder [2], warcprox etc.), it never seems to have caught on in a broader context.
Really a shame - I' deeply convinced that the ability to archive and replay requests is a technique for defending and strengthening user rights.
I have taken a shine to ArchiveBox[0] - a self-hosted application that will store dumps in a variety of formats (PDF, HTML, extracted text, optionally ping InternetArchive to take a snapshot, WARC, etc) as well as keep a SQLite metadata archive. I have taken to using it for all blog articles I read - the internet amnesia is the only way I can rely upon retaining the content.
Oh, I see they recommend writing WARC files with wget using `--no-warc-digests`. You really should not do that - one, it's just a sha1 and neither costly in terms of CPU nor storage. Two, the digest is used to create revisit records for de-duplication. If you disable that you or someone else might end up with lots of duplicate resources on re-crawling.
Absolutely. You can save a HAR file from devtools at least.
If you want to generate WARC from browsers, warcprox is relatively easy and fast - but setting up the proxy settings etc. is cumbersome if all you want is a single archive.
By the way, there are some great tools that use WARC under the hood such as perma [1]. They provide reliable snapshots of single documents with a stable URL.
Despite being an ISO standard [1] and the default archive format of the internet archive, and despite a handfull of lovingly crafted tools (such as webrecorder [2], warcprox etc.), it never seems to have caught on in a broader context.
Really a shame - I' deeply convinced that the ability to archive and replay requests is a technique for defending and strengthening user rights.
Links:
[1] https://www.iso.org/standard/44717.html
[2] https://github.com/webrecorder/webrecorder-desktop
[3] https://github.com/internetarchive/warcprox