
Internet Archive Never Forgets - prostoalex
https://theringer.com/internet-archive-wayback-machine-brewster-kahle-undeniables-6c1aaf0cc486
======
J_Darnley
Except for when someone gives them a robots.txt instruction. Then they forget
whole websites.

~~~
CM30
Yeah, which is the Internet Archive's biggest issue. What site uses a domain
name at one time isn't what site might use the same domain name at a different
date.

If the archive had more common sense, they'd note down when sites and
robots.txt files change and only block pages by the site owner with the
offending robots.txt file. So if say, Google went under and a new company
started up, a robots.txt file from the latter would only block files added
since the site switched over.

That would solve 99.99% of these issues.

------
meshr
Except for when someone’s government asks to remove content. For example, this
link
[http://web.archive.org/web/20150618110931/http://ozpp.ru/pam...](http://web.archive.org/web/20150618110931/http://ozpp.ru/pamyatka-
potrebitelyam-pri-poseschenii-okkupirovannyh-territoriy/) was removed from
Internet Archive machine by Russian government request It worked this way:
Russia blocked IA for Russians => IA removed content for everyone => Russia
unblocked IA for Russians.

