
List of content farms to block with Google Personal Blocklist - jonknee
http://www.jongales.com/blog/2011/02/14/list-of-content-farms/
======
chc
I don't entirely understand the outrage at Demand Media. They're not a bastion
of quality, but for a lot of searches, there aren't better results. For
example, the other day I was making chocolate cream of wheat and I wanted a
quick refresher on the proper chocolate:everything else ratio. I opened a
bunch of the top links, and the best one for quickly getting the info was
eHow. For searches where there are better results, I'm not compelled to click
on the one or two from Demand Media.

------
timmorgan
Experts-exchange is indeed an annoying site, but I'm not sure I'd classify it
as a content farm. There is actually a lot of information there not found on
other sites.

~~~
jonknee
Then don't add it to your block list... That's the point of customizable block
lists. I agree there's a lot of content there not found on other sites, but
there's also a lot of content on eHow not found on other sites. The content's
junk though, which is the point and is why I blocked both.

~~~
timmorgan
I didn't mean to sound ungrateful. I appreciate the list.

I just meant to inform that, by definition, I don't believe e-e is a true
content farm. For anyone adding these sites blindly, you may miss out on that
obscure answer you've been looking for.

[edit] I don't know how I find myself in the position of defending e-e, but
there are lots of SQL Server related answers on e-e that haven't yet been
duplicated on Server Fault or other more friendly sites.

~~~
jonknee
No worries. That's why I updated the sub-heading to "and general spammy user
generated content sites" because sites like EE aren't classical content farms
(they're even smarter by getting people to give them content for free and then
selling it to members!).

------
udp
Hmm. Although this seems like a good idea at first, sometimes I'm searching
for something obscure and I actually _do_ find the answer on one of those
annoying websites.

I can't help thinking it's probably more useful to see all the results even if
I have to wade through the junk, just in case the answer is hidden in there
somewhere.

(PS. did you know you can see the answers on experts-exchange.com if you
scroll down far enough?)

~~~
jonknee
On SERPs with blocked results there is a note that some results have been
blocked and you can click it to reveal the blocked results. So if you don't
see what you're looking for you can turn back on the spam and check.

------
ladon86
I assume that when you block a site, Google are sent the search term as well
as the site you blocked.

Assuming that this is the case, it isn't very helpful to search for
'mahalo.com' and then block mahalo.com. What do you think should rank higher
than mahalo.com for the term mahalo.com?

A better approach would be to install the extension and continue as normal,
and then next time you search for "convert int to string" [0] and eFreedom
appears above stackoverflow, you block that. That way you can enjoy the
benefits and create valuable data, instead of just doing something you might
as well have used a Greasemonkey script for.

[0] Only joking!

~~~
jonknee
No, Google is only sent that you're blocking the domain.

<http://news.ycombinator.com/item?id=2218528>

~~~
timdorr
But it's only clickable from a search page on Google. Can't they correlate the
data pretty easily?

~~~
jonknee
Sure, but considering that's the only way to add to the block list I cannot
imagine they didn't foresee exactly what we're doing. I'd wager "Let me go
find all the sites I hate the most and block them!" is the natural reaction to
being given a block list feature.

------
shubber
Am I the only one who find w3schools more annoying than expertsexchange? Maybe
it's just the searches I perform, but I usually want to see w3.org results,
not the often misleading webbed CF results.

------
jonknee
Hopefully Google will add import/export capabilities so it's easier to share
lists like these.

------
elsewhen
i think there are some unintended consequences with blacklist list-sharing:

1) if google ever starts using this data as a signal in their search
algorithm, do you really think demand media is going to give up, or do you
think they are going to start moving their content on the 500k+ domains they
own, effectively making it impossible to deal with the problem.

2) list-sharing seems to be starting an arms race where spammy sites start
pushing their own lists that include all of their competitor's sites but not
their own. (i am not accusing the OP of doing this, but i think its best for
each user to choose individually.

if these sites really have such egregiously bad content, then users should
very quickly be able to decide for themselves when a site should be
blacklisted.

~~~
chc
Point 2 only applies if you blindly apply any blacklist you come across. If
people who blindly obey everything they're told are so numerous, this "arms
race" should exist just as much with word-of-mouth and blacklists aren't
introducing a new problem.

~~~
elsewhen
i am sure that hacker-news users are savvy enough to check each site one-by-
one, but blacklist-sharing in general can be detrimental if they get
distributed to mainstream users.

imagine if these lists get to be 100-domains long... some users are simply
going to take the shortcut and apply-all. the due-diligence required to cull a
good list takes time. the very fact that this post is getting upvoted is
because even hacker-news users are looking for shortcuts.

blacklists introduce a new problem: SEOs distributing such lists that include
lots of weak sites (to make the list look credible) along with their
competitor's domain snuck in.

~~~
chc
So, again, it's a matter of trusting your source. Just like with word of
mouth.

~~~
elsewhen
word of mouth is typically a one-by-one recommendation, where doing your own
research is straightforward and quick. ie, if a friend recommends a movie for
you to watch, you might check it out, read the reviews, and then add it to
your netflix queue.

with a long list, doing the due diligence becomes orders-of-magnitude more
labor intensive, so users are more incented to just copy/paste an entire list.
in the case of netflix queues, perhaps there isn't too much at stake... but
with a domain blacklist, SEOs and spammers have their entire livelihoods at
stake... if this extension takes off, and if google starts using this data, i
would expect well-funded SEOs to start heavily promoting their lists (with
their sites omitted, and their competitor's sites included).

of course hacker-news users may be immune from such tricks, but we represent
just a minority of the data that google will get from this tool... if it goes
mainstream, i am sad to admit that my parents are likely fall for "follow
these simple steps to dramatically improve your google results"

------
golgo13
What is wrong with Livestrong.com? Isn't that for bike races and raising money
for cancer?

~~~
jonknee
You'd think, but it's actually a Demand Media content farm. Livestrong.org is
what you're probably thinking of (Lance's foundation website).

<http://www.demandstudios.com/health-writing-jobs.html>

~~~
pasbesoin
I recall reading that Armstrong's an investor in the content farm in question,
to boot.

(Give a bit of a new twist to all those yellow wrist bands, doesn't it?)

------
cpeterso
Does Google "beta test" their search results on their employees? I imagine
Google could allow their employees to flag bad sites and search results to
improve their search algorithms. Google can't trust anonymous internet users,
but they can presumably trust (to some degree) their 24,000 employees's real-
world feedback to improve their search algorithms.

~~~
retube
Their search engine is being constantly improved every day by people using it.
Every search result you click on is recorded, along with no doubt the time
elapsed before you clicked another result and so on until you found what you
were looking for. Google is a gigantic statistics engine with a growing data
set.

And in terms of Google employees testing their search results - 1) their
employees aren't exactly the average Google user demographic 2) their emplyees
is way to small a number to draw any conclusions about how useful/accurate
their search results are. They can only figure that out by trialing new algos
on a fraction of users. If successful, they push to all.

------
robertoaloi
I've created a repository to manage the list:
[https://github.com/prof3ta/google-chrome-blacklist-
suggestio...](https://github.com/prof3ta/google-chrome-blacklist-suggestions)
I'm including some of your entries in the list. Contributions are more than
welcome.

------
duck
I find it ironic that this list includes adsense ads with three things totally
irreverent to the content.

~~~
mahmud
there is no such a thing as "right" or "wrong" answer in contextual targeting,
at least not anymore. Other targeting strategies kick in; even if the text of
the page is accurately categorized, you could still be hit with ads targeted
at you by profile, or someone is doing category runs (GroupOn ads supersede
all others for me) or the site is under review or its content unanalyzed
(charity hell), or you're in an experiment, etc.

------
Tichy
Couldn't Google just block those directly?

------
kqueue
it is missing the following:

\- osdir.com

\- markmail.org

\- www.dennyweb.com

~~~
basugasubaku
As I wrote on the other thread, markmail is the best site for searching
through Apache mailing lists that I've seen. Many of the Apache project
websites link to markmail for their archives. Why are people wanting to block
it?

~~~
kqueue
It is not different that mail-archive at all. You bashed mail-archive in your
comments because it doesn't provide an added value. markmail doesn't provide
an added value either.

~~~
basugasubaku
? I didn't say anything about mail-archive. I mentioned the official Apache
archives (<http://mail-archives.apache.org/mod_mbox/>)

The value of markmail, for me, is you can search. I'm not bashing the Apache
archives, but they are not searchable as far as I know.

~~~
kqueue
Yes you did in your comments.

[http://www.jongales.com/blog/2011/02/14/list-of-content-
farm...](http://www.jongales.com/blog/2011/02/14/list-of-content-farms/)

user> mail-archive? are you crazy? that site has been around forever & I’ve
found more answers there than I can count.

you> What mailing lists are you finding useful on mail-archive that aren’t
available elsewhere? Do you ever visit mail-archive directly? I’ve only seen
mail-archive clutter up SERP pages, often with the question I have going
unanswered. I see no point in duplicating content on Google Groups, only with
a significantly worse UI.

The point is, they are all doing the same. mail-archive, osdir, markmail
archive mail, place ads next to them. Some provide more features on top but
the main idea is archive public content and make money out of it.

You can always use gmane to search apache archives, at least it is ad free.

~~~
basugasubaku
lol, what? I am not jongales.

I have not noticed ads on markmail. I was mostly surprised people are
advocating blocking markmail because I've seen it linked to often by the
Apache project sites:

<http://couchdb.apache.org/community/lists.html>
<http://shindig.apache.org/mail-lists.html>

I've also noticed when people refer to an posting on the mailing list, they
often use the markmail URL.

So it seemed like it has the blessing of the community.

~~~
kqueue
oh haha, I thought you are the author of the article. nm. :)

