

Why Search Crawlers Sometimes Ask for URLs That Never Were Part of Your Site - cskau
http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-10.html

======
cskau
I'm setting up a new site and noticed that within seconds of starting the
server I was getting hits in the log like:

    
    
      67.195.112.231 - - [2011-06-17 18:49:37] "GET /SlurpConfirm404/starsong/pro-road.htm HTTP/1.0" 404 18 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
      67.195.115.174 - - [2011-06-17 16:53:30] "GET /SlurpConfirm404/drugstore.htm HTTP/1.0" 404 18 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
    

My initial thought was that I was getting crawled by a spam bot masquerading
as a Yahoo crawler, but after a bit of googling I found a couple of blogs
guessing on the nature of the strange requests.

My best guess is that they're using the above test to check if your server is
gladly responding even the most obscure request, thus making it look like a
spider trap/content farm.

