Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.


Apparently yes, it would: https://archive.org/about/exclude.php


My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.


Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.

New pages won't be archived though.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: