
Ask HN: How to archive online content? - abjKT26nO8
Online content has a habit of disappearing. Sometimes I have a link to an article from a few years back, I go to it and there is nothing there. 404 or someone migrated the platform to a new engine, but did not bother to preserve the old stuff. Sometimes people get emotional about the content they created and take it all down. A lot of times it is all actually pretty high-quality. Sometimes YouTube takes something down for various reasons.<p>I would like to be able to archive things I find especially worthwhile, but the web doesn&#x27;t seem to be built to make it easy. Downloading the website with a browser&#x27;s in-built function &quot;Save page&quot; breaks it (relative links, you don&#x27;t really download the whole page etc.) - something that doesn&#x27;t happen with Office documents, PDFs, just about anything that is meant to be moved around. Websites seem to be heavily glued just one place.<p>To fix these problems I used to convert a bunch of articles to LaTeX, generate a PDF and be done with it. However, frequent LaTeX editing is tiresome. Pocket seems to be an option, but sometimes it fails to archive an article and just redirects you to the original version; what&#x27;s more it is a service just like YouTube, so I really have no guarantees of preserving the things I save there.<p>Do you have any good semi-automatic methods for archiving online content?
======
galihrahayu
I'm using _Save Page WE_ for full-page save, so far can save whole page to
single HTML file, available for Chrome and Firefox.

[https://chrome.google.com/webstore/detail/save-page-
we/dhhpe...](https://chrome.google.com/webstore/detail/save-page-
we/dhhpefjklgkmgeafimnjhojgjamoafof)

[https://addons.mozilla.org/en-US/firefox/addon/save-page-
we/](https://addons.mozilla.org/en-US/firefox/addon/save-page-we/)

------
obelix_
If the site is mediawiki based check out kiwix. They have tools to download
the whole site with search index. Also tools to search/reopen and render pages
from the dump.

I suppose similar tools have been created by the internet archive folk.

But you are right it should be much easier to archive stuff in 2018 and it
isn't esp thanks to all the JavaScript and XHR happening.

Edit: I just took a look at kiwix (been a while) they seem to also now archive
stackexchange sites not just mediawiki...so looks like they have different
archiving tools for different sites.

