
A Curious Case of Disregarded Robots.txt - mikelabatt
https://mike.pub/20170425-disregarded-robots-txt
======
sitkack
Robots.txt doesn't confer copyright.

What about domains that have been sharked ? Does controlling robots.txt now
give me the right to suppress all content ever originating from that domain,
for as long as I control robots.txt?

Internet archive is right to spider the site, but defer showing. Collection !=
dissemenation.

The IA isn't synthesizing, selling, cross referencing or afaict doing anything
nefarious with the data.

You are literally picking on the last org on the internet that needs to get
picked on.

~~~
cJ0th
> You are literally picking on the last org on the internet that needs to get
> picked on.

True but at the same time I do understand those who generally want others to
abide by their Robots.txt. INAL, but ideally I would love to see the IA having
the right to ignore Robots.txt (however, if some one wants to opt out of the
IA they should be given the option) while others shouldn't be allowed to do
so.

------
ryandvm
Meh. I'm pretty ambivalent about voluntary restrictions like robots.txt. As
far as I'm concerned it's mostly useful as a way for site operators to
document endless dynamic content or requests that are prohibitively expensive
(but not so much that they restrict access).

I figure if it's on the web and a human can read it, my computer ought to be
able to read it too.

~~~
mikelabatt
Yes, but should your computer also be allowed to disseminate that content
without the original author's permission?

~~~
true_religion
Well yes, the whole point of robots.txt is that it's impolite to refuse to
follow it, and doing so and getting caught might get you banned from the site.

~~~
SyneRyder
That's a really good point. I'm not a fan of Internet Archive ignoring
robots.txt, but if I'm really unhappy about it I can block their robot in
Apache using .htaccess rules (as long as they continue using archive.org_bot
as their user agent).

