Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have the IA ever discussed why they retroactively apply robots.txt? I can see the rational (though don't necessarily think it is the best idea given the IA's goals) for respecting it at crawl time, but applying it retroactively always felt unnecessary to me.


It seems pretty obvious: copyright restricts distribution, so they hide pages that the apparent copyright holder apparently doesn't want distributed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: