Internet Archaeology: Scraping time series data from Archive.org
sangaline.com
39 points
by
foob
6 hours ago
deferredposts
3 hours ago
So what is the policy of The Internet Archive on this level of scraping? Do they have a rate limit in place?
foob
2 hours ago
Yes, they start sending 429 (Too Many Requests) responses if you don't use appropriate delays. They also provide a public API [0] which I believe is intended for automated requests of this type (as opposed to crawling the Wayback Machine website directly).
[0] -
https://archive.org/help/wayback_api.php
