Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why not just use WARC and a program that can read them? Do archives need to be human-readable?


The thing about archives is you either parse them now or parse them later. With how much JS and other crap is served in modern social media frontends, I'm not sure WARC is the best format for archiving from them.


But that is the point of WARC: otherwise, your archival method need some sort of general inteligence (ai or human behind the scenes) to store exacly what you need.

With WARC (and good WARC tooling like Browsetrix-crawler) you store everything HTTP the site sent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: