
Tell HN: A Facebook crawler was making 7M requests per day to my stupid website - napolux
I own a little website I use for some SEO experiments. Of course there&#x27;s some content and a facebook sharing button for every post. The website is so little it runs on a &quot;single controller&quot; PHP app + a 400kb SQLite db, but can generate thousands of different pages.<p>Everything is hosted (together with a bunch of other websites) on a cheap DigitalOcean machine + free cloudflare plan for some caching. One of those websites as some alerting and it started to alert me about being down.<p>After some investigations I&#x27;ve found out the problem... the Facebook Crawler.<p>https:&#x2F;&#x2F;developers.facebook.com&#x2F;docs&#x2F;sharing&#x2F;webmasters&#x2F;crawler&#x2F;<p>That crawler was making more than 7M requests per day (with a peak of 300req&#x2F;second) to that website.<p>Their doc was not helping on how to block the bot.<p>og:ttl -&gt; ignored
robots.txt -&gt; ignored
HTTP 429 -&gt; ignored<p>I had to block the user-agent using cloud-flare rules.<p>If there&#x27;s someone working on that crawler here on HN, please stop ignoring basic Internet netiquette about crawlers. Next time you could hit someone on AWS. And then they&#x27;ll probably ask you to pay the bill ;)
======
memexy
Would be worthwhile to write up your investigation in a longer format. That
way other people would be able to find it through Google.

