

Ask HN: This bot has been pounding my server all night.. anyone else? - andrewljohnson

Here's a typical gibberish log entry. It's hitting my defunct Facebook app pages:<p>::ffff:143.248.246.87 www.trailbehind.com - [03/Jul/2009:14:55:58 +0000] "GET /fbtb/\/\/static.ak.fbcdn.net\/rsrc.php\/z9NIS\/lpkg\/1u1t8ygq\/en_US\/141\/161545\/css\/\/css\/text/javascript?fb_sig_in_iframe=1&#38;fb_sig_locale=en_US&#38;fb_sig_in_new_facebook=1&#38;fb_sig_time=1246593830.0314&#38;fb_sig_logged_out_facebook=1&#38;fb_sig_added=0&#38;fb_sig_api_key=f3e4e70471f55dc5c230d9b776c4e598&#38;fb_sig_app_id=29960428979&#38;fb_sig=e06c8de3574a5d0ccdcaadb052d64a82 HTTP/1.0" 200 218 "-" "cancho/Nutch-1.0 (crawl test; http://asdf.net/; asdf@asdf.net)"
======
apinstein
Nutch is evil. It's a bot-making tool. I set up my sites to block all Nutch
UA's via mod_rewrite.

------
jacquesm
iptables ?

just get rid of it. I have a script that tails my logs standard to detect bots
that do not respect the robots.txt file and walls them off. On a bad day there
are a few hundred of them. Periodically I flush the tables to get rid of old
entries so the load caused by the filter does not become too high.

Your 'friend' seems to be in Korea...

~~~
thorax
Mind sharing your script? That sounds handy!

~~~
jacquesm
ok, posted a link with the script:
<http://news.ycombinator.com/item?id=690608>

enjoy...

------
eli
mod_security for the win. I think its "Core" ruleset blocks Nutch by default.

