Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
hughw
on Aug 11, 2015
|
parent
|
context
|
favorite
| on:
“Stop reverse engineering our code”
Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.
mikeash
on Aug 11, 2015
|
next
[–]
Apparently yes, it would:
https://archive.org/about/exclude.php
syncsynchalt
on Aug 11, 2015
|
prev
|
next
[–]
My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.
X-Istence
on Aug 11, 2015
|
prev
[–]
Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.
New pages won't be archived though.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: