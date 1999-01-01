One important use case to exclude sections of your website is to not pollute the sitemap which Google crawls or to be more precise--the daily crawl volume Google allocates to your site. If you let every page be crawled more important pages get crawled less. Example: In the past, you created a content category which didn't turn out successful. Before you remove this category with plenty of links which would result in crawl errors it would be smarter to exclude them in the ROBOTS file and focus on your core categories.
I think the crux of the matter is found here:
> If you don't want people to have your data, don't put it online.
As much as I agree in principal with this, because of the way web requests work, I don't want to be associated with this group.
You cannot ignore copyright, and robots.txt is exactly what I would use if I didn't want something archived by an organisation I have nothing to do with.
AFAICT this page is a reaction to an archive.org policy of respecting robots.txt retroactively - e.g. oldwebsite.com runs from 1999-2009, domain expires in 2010, gets bought in 2011 and the new owners add a robots.txt disallowing IA. The archive.org copies for 10 years are now inaccessible.
One group has respect for authorship, and one does not.
It may not be the most palatable solution, but hardly a need for a tantrum, and intent to ignore well established rights.
User-agent: *
Disallow: /secret/
SEO is where robots.txt shines right now. It's not that people are trying to hide something it's because we don't want it to conflict with the content we actually want to promote.
