Hacker News new | past | comments | ask | show | jobs | submit login

  It is completely voluntary and unenforceable, but works very well.
Ha! It works very well: for Google, and Yahoo. They use informative user-agents, and respect robots.txt directives. They have to, they're large corporations, with shareholders.

Everybody else ignores it. Why would they listen?

Live example:

  216.55.185.45 - - [11/Sep/2011:06:50:56 -0400] "GET / HTTP/1.0" 200 4835 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0"
  216.55.185.45 - - [11/Sep/2011:06:50:57 -0400] "GET / HTTP/1.0" 200 4835 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0"
  216.55.185.45 - - [11/Sep/2011:06:50:57 -0400] "GET /blog/ HTTP/1.0" 200 114535 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0"
  216.55.185.45 - - [11/Sep/2011:06:51:01 -0400] "GET /blog/archives/2010/01/12/creating_an_account_on_the_scp_wiki_is_like_pissing_glass/ HTTP/1.0" 200 10490 "-" "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0"
This is all the traffic from this IP, with no lines skipped. You might notice that it doesn't request CSS, or any of the linked images, or favicon.ico. Nor does it populate the referral header fields, which of course it would be doing if it was actually Firefox, and there was actually a human clicking on these links. It doesn't even bother requesting a robots.txt, which I don't have anyway.[1] Do a whois check on the IP, and we get:

  Codero CODERO1999A (NET-216-55-176-0-1) 216.55.176.0 - 216.55.187.255
Codero's a dedicated server host. This is a spambot, looking for email addresses. Visit the IP in a browser, and you see a site selling fake Tiffany jewelry.

1: http://www.archiveteam.org/index.php?title=Robots.txt




Maybe, but most tracking services are large corporations, most websites wouldn't embed analytics scripts from spammers-r-us.com.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: