

Please Stop Blocking Everything but Googlebot in Robots.txt - oskarth
http://danluu.com/googlebot-monopoly/

======
wodow
Due to this trend, the de facto crawler behaviour will be to do as Apple do
and observe Googlebot's rules as if they were your own [1].

(That is assuming a given crawler follows robots.txt at all!)

[1] [https://support.apple.com/en-us/HT204683](https://support.apple.com/en-
us/HT204683) (HN discussion was at
[https://news.ycombinator.com/item?id=9497264](https://news.ycombinator.com/item?id=9497264)
)

~~~
TheLoneWolfling
Reminds me of user agents.

------
didibc
Apropos archive.org: I recently found out they interpret a missing robots.txt
as access denied. I wonder if that's a mistake or on purpose.

~~~
abstractbeliefs
Generally, media without a license is considered "All rights reserved" by
default. Archive.org may be choosing to interpret no robots.txt similarly out
of courtesy (better not archive something sensitive by accident) or to
minimise legal liability.

