Hacker News new | more | comments | ask | show | jobs | submit login

The other huge problem here is that Google's FeedFetcher doesn't respect robots.txt. (Their reasoning is that it is acting at the direct request of a human to retrieve a specific resource, so it doesn't count as a bot.) Because of this, there is no easy way to stop it from hitting your site.

You can block the user agent, I believe "Feedfetcher-google" should work.

True, but (while possible) it's not straightforward to block access to specific files only. The same user agent is also used for Google Custom Search if you're using that. And it's still going to be hammering your firewall (although admittedly that's less catastrophic than trying to download a 10MB file repeatedly).

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact