Hacker News new | past | comments | ask | show | jobs | submit login

Archive.org is a bit of a special case, you need to call them repeatedly to archive a website. They do have a rate limit there, it's pretty aggressive* to the point you could trip it by manually using the site. They must have forgotten to limit the OCR files download.

* If they had a better API (a simple non-synchronous API would be enough, one where we could send a list of URLs would be even better), one could have made a lot less calls.




Last time I wanted to bulk-archive a bunch of urls, I asked about it, and sent a txt file full of URLs to someone and they put it in the archival queue.


They have a Google Sheets "API" which I've used and works reasonably well:

https://archive.org/services/wayback-gsheets/


This has been broken for the past month (just stuck on waiting for workers for several days), did they fix it?


I believe you can upload WRAC files to IA and ask them to index the content. Saves them the need to do the archiving and you won't be rate limited on their end.


So they just trust you that your archives are not manipulated?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: