

Microsoft's Spam Bot has been Blocked - fallentimes
http://getclicky.com/blog/159/microsofts-spam-bot-has-been-blocked

======
jonknee
Any idea what they are doing? If it's just looking for cloaking you think it
would be better to not have the search.live.com referrer. That makes it easy
to game if you were somehow trying to cloak for the Live bot which I don't
think is a real problem anyway.

~~~
Harkins
The want to see if the page will attempt to customize a page full of spammy
keywords and ads in response to the perceived interests of the visitor.

It's a smart check to make, but Microsoft Live does it so heavily that it bugs
the hell out of webmasters.

------
fallentimes
I saw this issue come up on a few HN threads. Hopefully other analytics
programs follow Clicky's lead.

------
CalmQuiet
They say, "Anytime we see a referrer from serach.live.com, we check if the IP
is in the range of known IP addresses for their crawler, and if so, we just
ignore it."

I filter some stats according to user agent name: Anybody know if these bots
use regular (honest) agent names? Or do I need to start searching for "range
of knw IP addresses for their crawler?" (Anyone care to share such a range?

~~~
jonknee
It masquerades as IE 6 last I checked. It's not marked in any way as a bot, so
IP addresses are the way to go.

~~~
schammy
Correct, the user agent in no way indicates it's a bot. Just a standard IE6
UA.

------
csomar
Webmasters also had this problem with Cuil.

~~~
aristus
Cuil did (does?) a lot of deep searching but that's not the problem here.
MSNbot is pretending to be an IE6-using human, including a _fake_ referer
header from their search engine.

Microsoft have been inflating their search traffic for years via many tricks.
I think they still count every misspelling in IE that kicks to their MSN page
as a "search".

~~~
briansmith
Like Google Chrome?

~~~
aristus
Never used it. Does it really kick you to a Google search page, with paid ads
and everything?

~~~
csomar
chrome is fast and simple. and it kick you to google search, that's a good
point. I vote + for chrome

------
ssn
robots.txt won't work?

~~~
Jem
If I remember correctly, it doesn't. The crawler ignores it.

The problem is, some people still want to be indexed by the Live crawler, but
without the spam in the referrals. At least by blocking the crawler purely in
the stats, they still have a chance at ranking with Live without the pain of
sorting through hundreds of fake referrals.

