
ArchiveBox: The open-source self-hosted web archive - yarapavan
https://archivebox.io/
======
yarapavan
ArchiveBox takes a list of website URLs you want to archive, and creates a
local, static, browsable HTML clone of the content from those websites (it
saves HTML, JS, media files, PDFs, images and more).

You can use it to preserve access to websites you care about by storing them
locally offline. ArchiveBox imports lists of URLs, renders the pages in a
headless, autheticated, user-scriptable browser, and then archives the content
in multiple redundant common formats (HTML, PDF, PNG, WARC) that will last
long after the originals disappear off the internet. It automatically extracts
assets and media from pages and saves them in easily-accessible folders, with
out-of-the-box support for extracting git repositories, audio, video,
subtitles, images, PDFs, and more.

How does it work?

    
    
      echo 'http://example.com' | ./archive
    
    

Documentation:
[https://github.com/pirate/ArchiveBox/wiki](https://github.com/pirate/ArchiveBox/wiki)

Github:
[https://github.com/pirate/ArchiveBox](https://github.com/pirate/ArchiveBox)

Demo: [https://archive.sweeting.me/](https://archive.sweeting.me/)

~~~
nikisweeting
Thanks for posting it ;)

It's been on HN a few times before recently, which is probably why it didn't
get tons of attention, you can see the other discussions here:

[https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-
Comm...](https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-
Community#archivebox-discussions-in-news--social-media)

