

Google News gets gamed by a crappy content farm - lotusleaf1987
http://www.salon.com/news/feature/2010/08/20/associated_content_google_news_open2010/index.html

======
raganwald
The last time I whined about this kind of thing, someone on HN patiently
explained to me that this is the "invisible hand" at work. A certain number of
people click a spam link, ignore the content, and try another link. Some click
through on an ad, that's a win for Google. Some get disgusted with stuff like
this, Mahalo, and so forth and those users defect to another engine.

As long as the increase in revenue outweighs the loss of users, Google has no
incentive to change. In fact, they have an incentive to make their search
"worse."

There is a stereotype of an MBA that runs a business with a spreadsheet,
firing people that embody the company's institutional knowledge and
outsourcing their "function" to body shops in another time zone. The
stereotypical numbers-driven MBA carefully manipulates prices and product
supply to maximize revenues, balancing price discrimination against the loss
of customers.

The end result is a company that is hostile to its customers. Cell phone
companies and airlines come to mind. I would compare this to Google, except
it's even worse in a sense, because users are not Google's customers.
Advertisers are Google's customers.

There must be voices within Google arguing passionately for improving the
quality of its results from the user's perspective. I imagine that the
response from management can be taken straight out of the movie "Blade:"

    
    
        For fucks sake, these people are our food, not our allies.

~~~
CWuestefeld
The problem with your hypothetical MBA is that he thinks he understands value
because he's got numbers in a spreadsheet -- but unfortunately that data is
incomplete.

But Google are the kings of data mining. I would expect that Google would (a)
understand the information missing from their "facts"; and (b) have the brains
and data available to uncover that missing information -- if they wanted to.

------
_delirium
The DuckDuckGo approach of just blacklisting a few of the larger and more
egregious content farms seems like a decent band-aid. It's not a real fix, but
it cuts out a large number of the bad cases, since at least for me, a handful
of large content farms account for most of the times when I've accidentally
clicked on a Google result and then realized "oh ugh, it's one of these
sites". Spammy blogs are the other big problem, but there's no easy band-aid
for that one.

~~~
w00pla
> just blacklisting a few of the larger and more egregious content farms seems
> like a decent band-aid.

The real band aid would be to enable users to blacklist sites that they don't
want to see. I often search for things and come across the same spam sites
that I do not want, over and over.

After that a distributed trust model should be built so that users can share
blacklists of spam sites.

~~~
BonesLF
Click settings on Google News and choose the sites you want less content from.

Blacklists are awesome...but no one but "us" knows how to use them.

~~~
w00pla
It is not google news but sites which aggregate search terms. A good example
is www.eudict.com which repeatedly puts up other peoples' search terms as real
words.

(If you are searching for some words, you are flooded with EUDicks search
spam)

------
patio11
Taking Devil's advocate: tell me what, specifically, you get from the Kansas
City Tribune's warmed over regurgitation of a wire service report that you
don't get from Associated Content here?

Darn it guys, paying marginally literate people who know almost nothing to
scrape together a few words and throw ads against them was _our_ business
model. Go get your own!

~~~
alecco
Perhaps [it's] time for a content mill focused on quality. But then it
wouldn't benefit of the trick of making people click on ads because the
information is incomplete.

------
jogle
I find that most often the "spam" that comes up on searches is on the user-
submit-enabled sites -- vimeo, myplick, authorstream, slideshare, scribd, etc.
-- and it's their lack of moderation that is the issue. (Do a search right now
for "watch colts vs packers online" and see what I mean.)

Blacklisting those entire sites would seem to be a real disservice to the
legitimate content that is uploaded, but then again, if the site
administrators can't keep their content in order, why are they getting crawled
by the G so often anyway?

The real deal is this, in many cases: the spam that's created on those sites
during certain times is foreseeable and follows a particular pattern -- it's
created based around news or events, such as sports, movies, sex tape outings,
etc. -- so it would seem easy to get rid of the spam before it hits the search
engines, right? One option is just to expect an influx during certain time
periods and increase moderation during that time. Another option is to prevent
new users from publishing content until they've submitted moderator-approved
non-spam content first.

But that would mean those sites don't get the impressions... and
impressions-->ads-->$$$... so where's their incentive?

------
dminor
"jab hot search phrases into their prose until it becomes a bloody pulp"

I love this quote - it so accurately describes "white hat" SEO at its worst.

~~~
patio11
A black hat SEO once described the field as "finding out what search engines
want and giving it to them until they bleed."

------
benologist
They must have applied their revenue-algorithm to News ... the one that always
gives me shitty content farms and scrapers monetized with Google ads when I
search.

------
yvoschaap
My experience as Google News publisher and user has shown the Google News
ranking algorithme is heavily biased towards time of publishing of the
article. This leads to a CNN story published 1 day ago about topicX to be
ranked lower compared to a AC article on topicX published one hour ago.

In this case she searches for an "old" topic, hence its algorithme ranks the
newest published article as most relevant, dispite its low quality compared to
the older onces.

------
gojomo
Sadly, up-and-comer Blekko has other problems with this query, with the top 6
results for [dr laura n-word /news] being the exact same syndicated opinion
piece at different newspapers' nearly identically-themed sites.

