Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How much load does this place on the Internet Archive? It'd be a shame if this thing's access patterns caused them trouble.


If I read the code correctly, it's one-at-a-time? Which minimizes the stress; if we're slow, it'll slow down.

It'd be nice if it had identification in the UserAgent, so that we could complain to the right people if it was a problem.


Hi, Greg! Library author here. I'd be happy to add a configurable UserAgent. Perhaps the default would be a generic "waybackpack" but could be configurable to add contact info for the user. Does that sound about right? Prefer a different approach?

And, yep, the library is intentionally designed only to request one snapshot at a time.


waybackpack would be a great default; encouraging the actual user to add contact info would be better for you because we could complain to them instead of you :-)


Updated, merged, and pushed to PyPi as part of v0.1.0: https://github.com/jsvine/waybackpack/pull/5

Thanks again for the feedback. Really appreciate it — and the existence of the Internet Archive and Wayback Machine.


IA is very supportive of automated access, they even have a post about how to use wget to batch download lots of items: https://blog.archive.org/2012/04/26/downloading-in-bulk-usin...

Granted, that is not the Wayback Machine but I am sure they love people using them.


Agreed. From a quick look at code, it seems it just fires off every fetch request immediately after another compeletes.

Hopefully it gets patched to have a built-in rate limit (X requests per minute/hour).


If it's already serial, and only works with one backend (rather than being an arbitrary mirroring tool like wget), then the Wayback server can easily "express its preferences" for rate-limiting by adding artificial delay to request-responses that pass the rate-limiting threshold. Backpressure shouldn't be the client's responsibility.

(It only is traditionally, because so many sites do nothing to protect themselves from "being too nice", so arbitrary-backend mirroring-client devs allow their users the option to ask for less than they want. This isn't a sensible protocol design, on either side; it doesn't optimize for, well, anything.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: