
How Organized Spam is Taking Control of Google's Search Results - JoelSutherland
http://www.seomoz.org/blog/how-organized-crime-is-taking-control-of-googles-search-results
======
PaulHoule
Part of the problem with "spammy" content coming out on top is often the
competition from real content is pretty thin.

To take the example of Pandora jewelery, Pandora is a company that controls
it's marketing channels with an iron fist. They're very careful to partner
with better-than-average jewelers and each retailer has an exclusive
territory. So far as I know, there's no legitimate channel for new Pandora
products online (everybody who claims to sell them on Ebay seems to have fewer
than 10 feedbacks.)

Thus, other than Pandora's official website there's no legitimate e-commerce
presence for Pandora online, so there's nothing to compete with the junk.
Somebody might randomly write about them, but there's nobody (legitimate)
who's got a feedback loop going where revenue supports content creation and
marketing efforts -- which will inevitable come out on top against amateur
competition.

Demand Media, ExpertsExchange and quite a few junk sites similarly thrive on
the lack of good content. I was having trouble changing the ribbon on an old
typewriter a few weeks ago, and web searches asking about this particular
model turned up junk pages with advice like:

(1) Buy a new typewriter ribbon, (2) Take the old typewriter ribbon out, (3)
Put the new typewritter ribbon out

Now, these pages were keyword stuffed with the name of the typewriter, but
they didn't even bother to have an affiliate (or other) link to a place where
I could buy the goddamn typewriter ribbon, which according to them is 33% of
the work!

Once more, the feedback loop doesn't exist to nourish a good answer here, so
of course the blight is going to move in.

~~~
barrkel
To reinforce the feedback angle, I think it's important to point out that the
weeds starve out the opportunity for genuine content to grow into a valuable
audience. Even when there's real content that wants to compete, the risk is
the upfront investment required to peek out above all the crap exceeds the
profit from serving that niche.

~~~
gojomo
[negative allelopathy]

And, in those cases where the content mills do have a few morsels of useful
information, they've usually just pushed other less professionally/cynically
optimized sources for the same info down off the first page of results.

------
jasonkester
I run a blog host, so I get to see these spammers at work. Every day, they
sign up for several hundred new accounts and post informative articles on how
to find NFL Jerseys, Ugg Boots and Tiffany Jewelry, all with plenty of links
back to sites like the one in the article.

The scary thing is that it's not automated. There are real people pasting in
content and checking to see that it's correct. Fortunately for me, it's all
going straight into my bayesian filter's spam corpus and making it easier to
detect, but even for my one site it must be costing somebody a lot of money to
post it all.

If Google had an API to report this stuff, I'd be happy to forward it along to
them on the fly. Seems that there are plenty of User-generated-content sites
like mine with a ton of valuable spam data if anybody figured out a way to use
it.

If anybody's interested, here's what we're doing to keep the site spam free:

[http://expatsoftware.com/articles/2010/03/care-and-
feeding-o...](http://expatsoftware.com/articles/2010/03/care-and-feeding-of-
happy-spammer.html)

~~~
Osiris
That gives me an idea. What if Google were to provide free anti-spam tools
like Akismet that integrate with forums, blogs, and wikis? They could detect
the spam patterns and essentially blacklist those spam sites from the search
engine. With enough sites using their tools, Google could build a significant
dataset of what these sites are trying to do to generate spam links.

Wouldn't detected who's trying to generate link spam be a fairly effective way
of removing them from search engine results?

------
davidmathers
I ran across this a few months myself while searching for bicycle info. Note
that this is completely different than the spam content (SO, eFreedom, etc.)
issue.

I wanted to learn more about the bianchi infinito and what I found was a
number of web stores selling bicycles at impossible prices. I mean oscommerce
or zencart or whatever instances of legitimate looking web stores. Then when
you look closer they're mostly in Indonesia and in order to complete the
purchase you have to bank wire the money.

I think it's spilling over from alibaba and similar sites where 99% of the
vendors are scams. Now those vendors are creating whole ecommerce web
presences to make their scam sales.

The stores I saw didn't usually make the first page of results. Usually third
or fourth. Sometimes second. Anyway I was a bit shocked out how many fake
stores there were and how they ranked as highly as many legitimate bike shops.

~~~
VMG
I think I would be interested in having the option to filter out _all_ search
results for shopping sites. There already is Google Product Search for that.

------
res0nat0r
Funny this is being posted the same day as Matt Cutts' HN post which is
currently at #1.

Also per exhibit one of the article: The first hit for "nfl jerseys" I get,
even with pws=0 is to nflshop.com. The website that nfl.com links you to when
you click on the "shop" link.

More bandwagon jumping about google spam being out of control? I like to think
so.

~~~
PaulHoule
Note that Cutts is talking about a different problem: which are sites that
'syndicate' content and end up ranking better than the original site. People
have been complaining about this for years (usually third-tier bloggers who
don't have much ranking power) but people perceived that this became a crisis
in the last few months.

The morals are also different too. Some people might not like eFreedom, but
the fact is that StackOverflow is CC-BY-SA. Anybody who wants to repackage
StackOverflow content in a different way is free to do that. I do think that
StackOverflow should generally outrank eFreedom, but a site like eFreedom can
potentially add value a lot of value.

On the other hand, other spam sites are generating original crap content with
their own crap content generation system... And if they aren't, they can
switch to some other content generation method to get around duplicate content
filtering.

(And speaking of which, duplicate content filtering content of some kind is
absolutely essential for a workable web search engine... It's not even a
matter of spam. Building a search engine for one the largest units of a large
Uni, we found that there were many documents that were duplicated all over the
place for all sorts of reasons, and that since the on-page factors are the
same, these tend to form 'plugs' of search results that displace other
results.)

~~~
logjam
"...but a site like eFreedom can potentially add a lot of value."

Genuine question here...are you talking specifically about eFreedom, and if so
exactly what value does it add? When I've inadvertently stumbled in there, the
questions and answers are an exact ripoff of SO, and I (and I suspect everyone
else) just immediately clicks on the "from StackOverflow" link so all the
responses in the original can be read.

~~~
PaulHoule
For one thing, eFreedom.com actually answers your question. This is different
from ExpertsExchange (which promises you might get an answer if you fork over
$, yeah right) or eHow which only sometimes answers your question, and if it
does, does the worst possible job that could possibly be done.

Community sites, at least in their early phases, need to focus on getting
people to put content in more than they need to focus on making it easy for
people to get it out. Delicious is the classic example: it's a roach motel
which makes it very easy to put your bookmarks in, but doesn't provide a
useful browsing interface for your and other people's bookmarks (other than
having a list of recently hot for various tags.)

Particularly in the semantic age I think there's a lot of room for remixing CC
content to improve browsing and discoverability.

~~~
rhizome
I don't understand this. Are you saying that efreedom adds original content to
that which they scrape? I avoid them like the plague, but my exposure has
taught me that they are merely reprinting SO content with crappy formatting.
Not much of a value-add in my eyes.

~~~
PaulHoule
No, I'm not really defending eFreedom. However, I think that sites that are
~like~ eFreedom in some ways to be useful. For instance, large scale text
mining could create things that are more than the some of their parts.

For example, I think within 10-20 years at the most we'll have systems that
can decompose text into facts and then reassemble it into 'original' text.

~~~
rhizome
Actually there are link farms that are doing exactly that in order to appear
to robots to be original text. However, it's just chunks of text "mined" into
a mass of subject-focussed sentence fragments. That's the thing, the race to
the bottom is: original content is scraped without improvement in order to pay
someone else via ads, and original content is generated without regard to
coherence in order to pay someone via ads.

Two ways to do this: you have good content either left intact (no value-add)
or rearranged or otherwised structurally corrupted in order to appear to be a
different/better answer (value-minus), or you have advertisers being led to
believe their ads are showing on relevant content, when it's really just a
jumble of random words loosely oriented around a concept. "The dog was dog
walking. Dog food always is in the grocery store. RALEY's. It dogged him for
years..." so on and so forth.

On one hand users are being defrauded and on the other, the
advertisers/affiliates. There is no defense for eFreedom, nabble, mail-
archive, and their ilk. They are bad people, bad for business and bad for the
internet. I sincerely believe this.

------
jonknee
I see NFLShop.com as #1, but some of the rest appear to be spam (in some cases
it's hard to tell if they are legitimate or not). I looked for legitimate
sellers of NFL jerseys and by and large they have terrible SEO. I don't know
how companies haven't gotten wise, but check out FinishLine.com's NFL jerseys
landing page:

[http://www.finishline.com/store/shop/nfl/nfl-
jerseys/_/N-2z7...](http://www.finishline.com/store/shop/nfl/nfl-
jerseys/_/N-2z7k7?categoryId=cat10015)

They have "nfl-jerseys" in the URL which is about the only redeeming thing.
The page title is unrelated, which is what would show in a SERP. I clicked on
the top result, a women's Ben Roethlisberger jersey and the page has nearly
zero information and the images 404 (!).

[http://www.finishline.com/store/product/reebok-womens-
pittsb...](http://www.finishline.com/store/product/reebok-womens-pittsburgh-
steelers-ben-roethlisberger-replica-
jersey/_/A-22814?categoryId=cat10015&productId=prod602846)

Compare that to one of the results that comes up in Google and you can see
why. Great titles, URLs, the filters don't require forms.

------
kqueue
<http://duckduckgo.com/?q=nfl+jerseys>

I believe Google's results are less spammy than duckduckgo for this particular
query.

~~~
luigi
Bing has similarly poor results. This isn't a Google problem -- it's endemic
to all search engines.

~~~
kqueue
Agreed. The article's title deliberately said Google, as if the other engines
are doing a better job.

~~~
reinhardt
No, the article's title said Google as if the other engines do not exist / are
irrelevant.

------
shawndrost
"Google needs to greatly lower the value of keyword-rich anchor texts."

Won't this have a lot of adverse effects? And if keywords in anchor text
become less valuable, can't spammers compensate by ramping up their existing
efforts?

"I would not be surprised to see Google shift even more ranking signal power
from anchor-text heavy links to relevant social media “chatter”."

Why would this be harder to game than links?

Spam happens because search is hard. There are probably solutions, but they're
not as easy to come by as the ones suggested in tfa. Still, it's good to see
this sort of community feedback on search results, especially given how
responsive the search team is to this sort of thing. Keep up the good work,
guys.

~~~
VMG
There might be an AI solution to discriminating real social media chatter from
fake content.

------
pixcavator
Organized _crime_? [http://www.seomoz.org/blog/how-organized-crime-is-taking-
con...](http://www.seomoz.org/blog/how-organized-crime-is-taking-control-of-
googles-search-results)

~~~
yaix
Yes, they sell copied stuff. Here in China there are huge organizations
handling these things. No surprise that in some of the examples the wording
sound very Chinese and even some Chinese characters appear. I am pretty
certain where those guys are located.

~~~
wingo
Contraband and counterfeits have long been a source of funding for gangs --
both sides of the northern irish conflict, ETA in the Basque country has a
commercial wing, and I'm sure the more traditional mafias are into this sort
of thing too.

------
moultano
There's a change slowly rolling out that improves the [nfl jerseys] query
substantially. I'll check on the others. Thanks for the examples.

Some really dramatic changes to how we use links are on the way. (Sorry I
can't say anything more specific. This is a really sensitive area.)

------
DanielBMarkham
Warning: contrarian rant ahead.

Something has been bugging me for a while, and it took a few hours after I
read this article to figure out what it was.

I love the coining of a new term: "organized spam", and I love calling out
things that are wrong, but I wonder if we're not taking this crime metaphor a
bit too far.

Look guys, it's a search engine. You type in a search term, it gives you
results. There's nothing magic or special about it -- anybody with a smidgen
of database training can make one (although nowhere near as Google's, granted)

Although some of these examples involve people ripping other people off, I get
the feeling that somehow Google has become such a part of our lives that we
feel as if somehow these folks trading links and trying to get attention are
acting criminally. That anything that gets in the way of my getting instant
information is a crime against humanity. That really bugs me.

It's not. Get over yourself. Sure, large parts of this may be well-funded, but
there's nothing necessarily criminal going on. For instance lots of poor
people in lots of third-world countries are making money dropping by my blog
each day and telling me how awesome I am. It's not expected, but I'm happy
they're making a few dollars. I can live with the inconvenience or try to fix
it on my end. I don't need to blame them.

I don't like the state of Google search right now either, although I'm still a
loyal customer. But what I see in the marketplace is humans reacting logically
to their best interests. If you're going to monetize google search so that
billions of dollars flows through it, there's going to be some ancillary
effects that nobody predicted. Instead of blaming the people, understand that
the people are just regular, intelligent folks doing the best they can. Hell,
my wife is in a social group with a lady who made several thousand dollars
adding advertiser text to her blogs -- until Google delisted her. She saw
nothing wrong with it, and still is pretty pissed at Google. From her
standpoint Google crapped all over her party.

And yes, Google has every right to delist sites and such. More power to them.
I hope they continue to delist and evolve their search engine. I hope they get
a handle on this. But I think we should all separate our well-wishes for
Google's success from our opinions of our fellow man. I've heard linkspammers
and spammers called "subhuman" and all sorts of nasty things. While there are
criminals who are trying to rip you off, there's no evidence that there are
more criminals on the web that anywhere else. Most of these people are trying
to make a living. The fact they might inconvenience you on your way to get an
answer to a technical question or find the latest mp3 you have to have is
really not that high on their list of priorities -- nor should it be.

Google needs to do a better job. Period. There seems to be this "conversation
machine" right now where people post articles showing how bad search is, then
folks come out and rant, then Google makes an announcement. Repeat and rinse.
It's as if we went down to the local newstand and asked the grocer for a
magazine on trucks. He gives us a bunch of magazines on boats, so -- we blame
the magazine publishers! It's simply not logical. A little perspective,
please. Google is the provider here and those of us who like them should try
to help out. But we shouldn't cross the line into thinking that anybody that
annoys Google or searcher is somehow evil or criminal. That's crazy. Much
better to understand people as rational actors than to demonize anybody who
tricks some random American internet company.

</rant>

~~~
coderdude
I'm not going to comment on your entire post, nor am I against what you're
saying, but I did want to comment on this bit:

>It's as if we went down to the local newstand and asked the grocer for a
magazine on trucks. He gives us a bunch of magazines on boats, so -- we blame
the magazine publishers!

This analogy would be more true to the situation at hand if you say that the
magazine publishers are using methods that they know will increase their
chances of getting boat magazines in front of your eyes when you're seeking
truck magazines. Do you think my assertion is off-base?

~~~
DanielBMarkham
Yes I do, and here's why.

The magazine publishers are free to configure their magazines and the world
around them in any way they wish. The newstand operator is responsible for
what goes on inside his stand. If he's serving up junk, do we go blaming the
rest of the world for the quality of his service?

Somehow we've taken Google out of the picture as an independent agent, It's as
if whatever program they are running is somehow golden, and by outsiders
changing the inputs that Google uses so that it doesn't work correctly that
somehow the outsiders are at fault. But outisders don't set the inputs --
Google does. Outsiders don't write the ranking algorithm -- Google does.
Outsiders don't make money from having ads alongside search results and
tracking individual's search behavior -- Google does. Outsiders are free to do
whatever they want -- that's the entire reason for picking one search provider
over another, the fact that one engine can take the world as it is and do a
better job of organizing it than another one can.

If we don't expect Google to be responsible for how they process data -- if we
somehow place Google's poor results and put the blame on the world at large,
then exactly what of value is Google providing here in our relationship?

Like I said, I'm a fan. I want them to do well. I'm happy to help if I can.
But hell if I'm going to let Google off the hook for providing good search
results simply because the nature of the internet has changed. Things change.
That's what they're supposed to do.

This is like writing a web app that is open to SQL injection attacks and then
getting pissed at everybody else when they crash your system. Except there's
one big difference: with an SQL injection attack there is an outsider directly
interacting with your system, perhaps malevolently. With Google, _outsiders
don't even enter data in, Google goes and gets it_. We've got the shoe on the
wrong foot, as my mother used to say.

~~~
j_baker
I understand what you're getting at, but you're _way_ off the mark here:

> The magazine publishers are free to configure their magazines and the world
> around them in any way they wish. The newstand operator is responsible for
> what goes on inside his stand. If he's serving up junk, do we go blaming the
> rest of the world for the quality of his service?

There's a difference between selling junk and selling something that's
obviously criminal. If you walked into a store where every magazine had
"VIAGRA - 50% OFF. MAIL US YOUR MONEY". Do you honestly mean to tell me that
the magazines in question were perfectly ok and it was the magazine vendor who
did something wrong?

It's good that you realize that it's humans that are committing crimes, and
not subhuman beings. But that doesn't excuse them nor should you.

Yes, Google has some level of responsibility here and they should be held
accountable. But they're not the ones actually committing the crime.

------
mythobit
I know this isn't a perfect solution. But I made this site that uses Google's
Custom search to allow you to maintain your own blacklist so that you can
filter out sites you don't want displayed. Here's the link: <http://blacklist-
search.appspot.com/>

------
underdown
How is "greatly reducing the value of anchor text" going to improve search?
Didn't we all start using google because anchor text was a great ranking
signal? It seems the appropriate course of action is to devalue links from
sites that either ignorantly or willfully pollute the link-o-sphere.

------
nvictor
i agree with people who are complaining. this morning i was looking for a
docking station for my cowon mp3 player. that spammy-ass website called
techframe kept showing 3 times in the first few results. it was annoying.

------
EGreg
I guess google shows different things to different people. My search for NFL
Jerseys for example, seems just fine:

<http://grab.by/8DYK>

what do you think?

~~~
spullara
Except for the first one, those are the spam jersey sites.

~~~
magicalist
well the results can't all be the first one repeated over and over.

if you search for 'nfl jerseys' you're probably looking to buy a jersey, and
at least a few of those (eg football fanatics) do in fact look like legitimate
stores.

~~~
tesseract
What if I'm trying to find a sports geek's blog post about the history of NFL
jerseys, or something like that? My personal problem with all the search
engines is that legitimately interesting/useful amateur content, or even
things like mainstream news articles, gets lost in a sea of sites that are
trying to sell me stuff when my query could be construed as even remotely
commercial. Unfortunately this is a trend I don't see changing, because (a)
the hawkers have more expertise and resources than the bloggers when it comes
to SEO, and (b) the search engine itself benefits financially by assuming I
want to buy things and showing results accordingly (especially if it's Google
due to AdSense).

------
gregable
Looking at the specific examples:

[nfl jerseys]

#1) <http://www.nflshop.com/category/index.jsp?categoryId=2237409>

Visit nfl.com, click "shop", then choose the "jerseys" tab, this is the page
you are on. Seems _perfectly_ relevant. The domain does not contain "jerseys"
in it, and while the title does - it's the Jersey's category page for the
nfl's shopping website, that makes sense. Hardly spam.

#2) <http://www.footballfanatics.com/NFL_Jerseys>

Visit www.clc.com, the collegiate licensing company, click
retailers->collegiate retail outlets, Football Fanatics is one of 13 licensed
collegiate retailers. Most major college universities sell their football
merchandise through them. It's been around (run whois) since 1997, 14 years!
Perhaps it's not ideal for NFL (non-college), but it's definitely Not Spam.

Unfortunately, below this some of the results do start getting ugly - there
aren't too many online retailers that can legally sell NFL merchandise. Even
Amazon is just a storefront for the NFL Shop (see <http://www.amazon.com/NFL-
Football-Fans/b?node=374273011>). That might make it a good result, but it's
essentially duplicated content given the NFL Shop result.

[pandora jewelry]

#1/#2) Pandora.net, totally not spam, this is the type in [amazon], get
amazon.com kind of result.

#2.5) Below the second result I see a shopping results box which has only
pandora jewelry from authorized retailers.

#3) <http://www.pandoramoa.com/> \- the Pandora Mall of America stores.
Authorized pandora retailer. The domain has been around since 2007 (4 years)

Below this, the rest is getting ugly. Similar to [nfl jerseys], there aren't
many online retailers legally able to sell pandora jewelry, so once Google has
listed the only 3 good results available, what do want them to do? Try
[jewelry] or [necklaces] - queries where there are lots of legit destinations
and the top 10 results are all non-spammy.

[thomas sabo]

#1/#2) ThomasSabo.com, just like [pandora jewelry], this is exactly what 99%
of the people with this query want.

Same story for the non-existant good results below.

These 3 queries are a very specific type of query where there are only one or
two relevant results, but there are lots of sites that "match" the query. I'm
not saying the rankings after the first few relevant results are good, but
what would you propose to show after those relevant results as an alternative?

Writing an article about a specific class of queries is fine, although the
author doesn't really propose a better set of results. The implication made is
that this issue applies to a broad set of queries which it doesn't seem to.
Ironically, the author's signature line is a link to
<http://www.tomsgutscheine.de/>, whose title translated to english appears to
be: "Coupons, Coupon Codes & Coupons (January 2011) - Tom's Coupons".

