Hacker News new | comments | show | ask | jobs | submit login
Internet Archaeology: Scraping time series data from Archive.org (sangaline.com)
39 points by foob 6 hours ago | hide | past | web | 2 comments | favorite





So what is the policy of The Internet Archive on this level of scraping? Do they have a rate limit in place?

reply


Yes, they start sending 429 (Too Many Requests) responses if you don't use appropriate delays. They also provide a public API [0] which I believe is intended for automated requests of this type (as opposed to crawling the Wayback Machine website directly).

[0] - https://archive.org/help/wayback_api.php

reply




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: