Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To crawl Google URLs of the form google.com?q=x would be to disregard http://www.google.com/robots.txt , which seems like bad netiquette to me.


They aren't crawling, just noticing what pages clients who visit google.com?q=xxx go to next.

If anybody's search toolbar checks a site's robots.txt before sending clickstream data, I would be very surprised.

A client-side robots.txt rule would also make anti-phishing features trivial to bypass...just put a robots.txt on your phishing site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: