

An effective way for GOOG to punish scrapers? - jonprins

As was suggested elsewhere, there's blacklisting if you're logged in. But that's rife for abuse.<p>It would take more horsepower, but Google has plenty of that. I'm sure someone at Google has thought of this; or is thinking about implementing this; or has completely dismissed this as impossible/ineffective; but I wanted to throw it out there to see what HN thinks about it.<p>Determine the canonical source. In this case, a Stack Overflow post. Each site that scrapes the content from the Stack Overflow post increases the site rank for said Stack Overflow post.<p>On one hand, it will increase any rank for original content that gets recycled and spread across the 'net: reblogged Tumblr posts; retweeted Tweets.<p>On the other hand, it puts another tool in the hands of black hats.<p>Thoughts?
======
mooism2
_Determine the canonical source._

Well, that's the hard bit, isn't it? What are the consequences of getting it
wrong? If Google bans my site from their index because it thinks I stole my
content from a scraper, that's going to be hard to take.

~~~
eof
Not really. Sure if they auto-ban you it sucks, but how hard is it to post
something, inform google of it and wait x hours for it to show up on some
autoscraper site. (of course this requires google cooperation).

Edit: it seems like a good solution from google POV would be to _inform_
people of their impending banning and give them the chance to defend
themselves by posting original content that then gets scraped elsewhere.

~~~
mooism2
Surely it's not that hard for a scraper to post something original, inform
Google, wait X hours, then plant it on StackOverflow/wherever?

~~~
meric
If the parasite kills its host...

~~~
mooism2
A parasite that kills its only host didn't have enough hosts.

------
sambeau
Not all scraping is bad: some provide extra services such as search.

Google itself is a scraper.

