The last time I whined about this kind of thing, someone on HN patiently explained to me that this is the "invisible hand" at work. A certain number of people click a spam link, ignore the content, and try another link. Some click through on an ad, that's a win for Google. Some get disgusted with stuff like this, Mahalo, and so forth and those users defect to another engine.
As long as the increase in revenue outweighs the loss of users, Google has no incentive to change. In fact, they have an incentive to make their search "worse."
There is a stereotype of an MBA that runs a business with a spreadsheet, firing people that embody the company's institutional knowledge and outsourcing their "function" to body shops in another time zone. The stereotypical numbers-driven MBA carefully manipulates prices and product supply to maximize revenues, balancing price discrimination against the loss of customers.
The end result is a company that is hostile to its customers. Cell phone companies and airlines come to mind. I would compare this to Google, except it's even worse in a sense, because users are not Google's customers. Advertisers are Google's customers.
There must be voices within Google arguing passionately for improving the quality of its results from the user's perspective. I imagine that the response from management can be taken straight out of the movie "Blade:"
For fucks sake, these people are our food, not our allies.
The problem with your hypothetical MBA is that he thinks he understands value because he's got numbers in a spreadsheet -- but unfortunately that data is incomplete.
But Google are the kings of data mining. I would expect that Google would (a) understand the information missing from their "facts"; and (b) have the brains and data available to uncover that missing information -- if they wanted to.
The DuckDuckGo approach of just blacklisting a few of the larger and more egregious content farms seems like a decent band-aid. It's not a real fix, but it cuts out a large number of the bad cases, since at least for me, a handful of large content farms account for most of the times when I've accidentally clicked on a Google result and then realized "oh ugh, it's one of these sites". Spammy blogs are the other big problem, but there's no easy band-aid for that one.
The DuckDuckGo approach of just blacklisting a few of the larger and more egregious content farms seems like a decent band-aid.
I really like that they take a pro active approach to issues. It was one of the main reasons I switched.
Spammy blogs are the other big problem, but there's no easy band-aid for that one.
A lot of "spammy blogs" are owned and operated by the same people. The whois data is usually enough to tip you off. I do not see why some spiders do not do a whois query and black list the owner/company; at least for a set period of time, and allow appeals.
> just blacklisting a few of the larger and more egregious content farms seems like a decent band-aid.
The real band aid would be to enable users to blacklist sites that they don't want to see. I often search for things and come across the same spam sites that I do not want, over and over.
After that a distributed trust model should be built so that users can share blacklists of spam sites.
It is not google news but sites which aggregate search terms. A good example is www.eudict.com which repeatedly puts up other peoples' search terms as real words.
(If you are searching for some words, you are flooded with EUDicks search spam)
Taking Devil's advocate: tell me what, specifically, you get from the Kansas City Tribune's warmed over regurgitation of a wire service report that you don't get from Associated Content here?
Darn it guys, paying marginally literate people who know almost nothing to scrape together a few words and throw ads against them was our business model. Go get your own!
Perhaps [it's] time for a content mill focused on quality. But then it wouldn't benefit of the trick of making people click on ads because the information is incomplete.
I find that most often the "spam" that comes up on searches is on the user-submit-enabled sites -- vimeo, myplick, authorstream, slideshare, scribd, etc. -- and it's their lack of moderation that is the issue. (Do a search right now for "watch colts vs packers online" and see what I mean.)
Blacklisting those entire sites would seem to be a real disservice to the legitimate content that is uploaded, but then again, if the site administrators can't keep their content in order, why are they getting crawled by the G so often anyway?
The real deal is this, in many cases: the spam that's created on those sites during certain times is foreseeable and follows a particular pattern -- it's created based around news or events, such as sports, movies, sex tape outings, etc. -- so it would seem easy to get rid of the spam before it hits the search engines, right? One option is just to expect an influx during certain time periods and increase moderation during that time. Another option is to prevent new users from publishing content until they've submitted moderator-approved non-spam content first.
But that would mean those sites don't get the impressions... and impressions-->ads-->$$$... so where's their incentive?
They must have applied their revenue-algorithm to News ... the one that always gives me shitty content farms and scrapers monetized with Google ads when I search.
My experience as Google News publisher and user has shown the Google News ranking algorithme is heavily biased towards time of publishing of the article. This leads to a CNN story published 1 day ago about topicX to be ranked lower compared to a AC article on topicX published one hour ago.
In this case she searches for an "old" topic, hence its algorithme ranks the newest published article as most relevant, dispite its low quality compared to the older onces.
Sadly, up-and-comer Blekko has other problems with this query, with the top 6 results for [dr laura n-word /news] being the exact same syndicated opinion piece at different newspapers' nearly identically-themed sites.
As long as the increase in revenue outweighs the loss of users, Google has no incentive to change. In fact, they have an incentive to make their search "worse."
There is a stereotype of an MBA that runs a business with a spreadsheet, firing people that embody the company's institutional knowledge and outsourcing their "function" to body shops in another time zone. The stereotypical numbers-driven MBA carefully manipulates prices and product supply to maximize revenues, balancing price discrimination against the loss of customers.
The end result is a company that is hostile to its customers. Cell phone companies and airlines come to mind. I would compare this to Google, except it's even worse in a sense, because users are not Google's customers. Advertisers are Google's customers.
There must be voices within Google arguing passionately for improving the quality of its results from the user's perspective. I imagine that the response from management can be taken straight out of the movie "Blade:"