Hacker News new | comments | show | ask | jobs | submit login

One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads.

That's not quite what I've been reading. I believe the more common claim is that Google has a disincentive to algorithmically weed out the kind of drivel that exists for no other reason than to make its publisher money via AdSense. It's about aggregate effects, not failure to clamp down on individual sites. Or, put another way, it's not if certain sites are serving Google ads, it's because that kind of content is usually associated with AdSense.

AdSense is definitely a problem for search quality. It creates the same imperative for the content farm as Google Search has: get the user to click off the page as soon as possible. And the easiest way to do that is to create high-ranking but unsatisfying content with lots of ad links mixed in.

I agree. Also interesting to see that Google defines webspam as "pages that cheat" or "violate search engine quality guidelines." By this definition, scraper sites are not spam at all. Nor are the spammy sites in my field which super-optimize for keywords in ways that make it difficult for legitimate content to rise to visibility.

If Google did not operate AdSense, it seems hard to believe the company would not have penalized this sort of behavior ages ago. A love for AdSense is probably the single largest thing spam sites have in common worldwide.

"By this definition, scraper sites are not spam at all."

Disagree. Our quality guidelines at http://www.google.com/support/webmasters/bin/answer.py?hl=en... say "Don't create multiple pages, subdomains, or domains with substantially duplicate content." Duplicate content can be content copied within the site itself or copied from other sites.

Stack Overflow is a bit of a weird case, by the way, because their content license allowed anyone to copy their content. If they didn't have that license, we could consider the clones of SO to be scraper sites that clearly violate our guidelines.

Smaller competitors can't eat their lunch in web search, because all the content that was on the web is now on Wikipedia, YouTube, or Google Maps. Personally, I search these directly from the address bar. For the past four years I've only had two use cases for web search: 1. as a spell checker for proper nouns (and before Alpha, as a calculator) 2. to circumvent paywalls on scholarly papers by doing filetype:pdf on the title (works better than Scholar most of the time).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact