It is completely voluntary and unenforceable, but works very well.
Ha! It works very well: for Google, and Yahoo. They use informative user-agents, and respect robots.txt directives. They have to, they're large corporations, with shareholders.
This is all the traffic from this IP, with no lines skipped. You might notice that it doesn't request CSS, or any of the linked images, or favicon.ico. Nor does it populate the referral header fields, which of course it would be doing if it was actually Firefox, and there was actually a human clicking on these links. It doesn't even bother requesting a robots.txt, which I don't have anyway.[1] Do a whois check on the IP, and we get:
Codero's a dedicated server host. This is a spambot, looking for email addresses. Visit the IP in a browser, and you see a site selling fake Tiffany jewelry.
Everybody else ignores it. Why would they listen?
Live example:
This is all the traffic from this IP, with no lines skipped. You might notice that it doesn't request CSS, or any of the linked images, or favicon.ico. Nor does it populate the referral header fields, which of course it would be doing if it was actually Firefox, and there was actually a human clicking on these links. It doesn't even bother requesting a robots.txt, which I don't have anyway.[1] Do a whois check on the IP, and we get: Codero's a dedicated server host. This is a spambot, looking for email addresses. Visit the IP in a browser, and you see a site selling fake Tiffany jewelry.1: http://www.archiveteam.org/index.php?title=Robots.txt