
Attack of the Splogs--One Of Our Posts Copied 152 Times Without Attribution - davidw
http://www.techcrunch.com/2007/11/09/attack-of-the-splogs%e2%80%94one-of-our-posts-copied-152-times-without-attribution/
======
davidw
I have a suspicion that Google makes a shitload of money off of spam sites. I
have a friend who makes upwards of 1000 dollars a month off of scraped
content.

If I were 'economically rational', I ought to be doing the same thing instead
of screwing around with things like langpop and squeezed books.

~~~
nickb
I've been reading a certain blog that deals with 'adsensing' and making money
on the web through joint ventures, arbitrage etc. There are people out there
who make $100K a month (and some that make over $1M a month) using all kinds
of schemes. Yes, Google makes a ton of money too off these guys as well. I've
never tried any of these things myself...

------
mynameishere
How many largish bloggers are there? 10,000 maybe? Certainly, when the
pagerank hits a certain point, keeping a human-verified whitelist of sites is
doable.

That is, if something is getting a high enough position on google, a google
rep could look for a phone number and verify that a human, not a bot, is
behind the content.

Of course, this will hurt short-term ad revenue...

~~~
jakewolf
All it would take is a google toolbar button "kill the splog"

~~~
cellis
You forget something: people that download said killsplog feature are not
going to be running into splogs often (i don't). Other than that, I think its
just fine...good for the economy. Eventually (as Eric S. ) said, advertisers
will wise up, and start paying less. If they don't, good for google.

~~~
jakewolf
Or angry blog owners would hunt down the sploggers for revenge.

------
jakewolf
Spammers are also using Amazon Mechanical Turk to reword posts and articles in
dozens of differnt ways. This helps them get around services such as
Attributor.

SLEAZY!

~~~
nickb
I've seen software (I think they charge over $4K per install) that uses hidden
markov chains etc to create content that looks like it was written by humans.
You give it few keywords that you'd like to optimize your site/pages for and
this thing does the rest. Google does ban few of them but they're getting more
sophisticated with every new version so Google has _really_ hard time keeping
up.

~~~
dpapathanasiou
I've read about that software, too, although from a research perspective:
<http://www.eblong.com/zarf/markov/>

I would have never thought to use it for spam generation, though... I have to
admit (grudgingly) that some of those spammers are clever.

------
henning
Mm, so what I understand is that Google has duplicate content detection to
punish people who do this kind of thing.

However! If you have a high-ranking site and you scrape from a low-ranking
site, Google might index you first and think the originating site was stealing
"your" content!

And then there's the fact that it wouldn't be hard to replace words with
synonyms, chopping up multiple articles, etc. so that Google probably wouldn't
detect it. This is what spammers do.

