

Spam vs Mahalo: Matt Cutts Explains the Difference - melvinram
http://www.seobook.com/matt-cutts-eats-mahalo-spam

======
melvinram
Matt Cutts surely can't ignore this debate for much longer. The least he could
do is chime in with Google's official perspective on the situation whether
it's for or against the Mahalo approach.

~~~
shawndrost
It looks like he's unable to comment at the moment:
<http://news.ycombinator.com/item?id=1143911>

~~~
keltex
That's not the real Matt Cutts

~~~
boundlessdreamz
The real Matt Cutts has said he is aware of the mahalo problem. see this
<http://twitter.com/mattcutts/statuses/9513414423> and the thread.

------
NZ_Matt
There is a large incentive for google to send traffic to Maholo. The low
quality of the pages causes the user to want to exit the page. What do they
click on when leaving? the adsense ads! I'm not saying Google are going out of
their way to support this but it is easy to see why they may be hesitant to
penalize Maholo.

------
pierrefar
This is doing the rounds among SEOs on Twitter:
[http://www.mahalo.com/search?q=mattcutts%20is%20gay%20and%20...](http://www.mahalo.com/search?q=mattcutts%20is%20gay%20and%20wont%20ban%20us)

~~~
simonw
The "is gay" thing is lame.

~~~
mvandemar
Yeah, lamer though that Mahalo turned it into a permanent page:
<http://www.mahalo.com/mattcutts-is-gay-and-wont-ban-us>

------
bhp
All these Calacanis / SEO Book cat fights are getting annoying.

~~~
byrneseyeview
They're also important. If Mahalo, Demand Media, etc. all continue to do what
they're doing, then they'll have built a publisher-paid paywall around most of
the content online.

That is bad news for anyone who tries to get long-tail organic search traffic.
It's good news for sites with great brand names, but terrible news for anyone
else.

~~~
qeorge
Its not fair to lump Demand in with Mahalo. Demand's business model is filling
in holes with their own original content, not scraping (or "aggregating")
other people's content.

AFAIK, Google views their relationship as symbiotic, not parasitic.

~~~
byrneseyeview
Demand Media does not produce high quality content. They product content that
is good enough to rank (i.e. it's written in English) and unique. But they
have a strong incentive to have bad content! If your article on "how to make
pancakes" tells someone how to make pancakes, they close their tab and make
pancakes; if it's 300 words of "original content" that makes no sense, you'll
end up clicking through to another site (that has to pay for the privilege).

When you think of how many struggling freelancers use those long-tail guides
to build their business ("How to shoot a commercial for a gym," or "How to
write brochure copy for life insurance,"), you can see the magnitude of this
problem. People who could trade their time for traffic now have to trade their
_money_ for traffic. When they're just getting started, money is harder to
come by than time. The result: fewer people creating this kind of content,
more of them joining organizations that pay for the traffic instead.

~~~
greyman
But, doesn't Demand only occupy one slot in search engines result pages for a
certain keyword search? Even if Demand would create billions of articles, one
for each conceivable search term, other nine spots in first 10 search results
are still available for others, aren't they?

All in all, I still think this is primary a Google problem, not Demand's. They
can publish anything they want, it's their right protected under free speech.
It's Google which should be concerned about quality of their pages.

~~~
byrneseyeview
The #1 search result gets about 40% of all organic clicks. And Demand Media
has more than one property (eHow, Wikihow, Cracked, livestrong.com). So I
wouldn't be surprised if there were some searches for which Demand Media got
more than half of all traffic.

You are correct about this being Google's problem. These guys exist to exploit
an arbitrage opportunity: Google's search algorithm picks them, and the
average searcher's which-engine-do-I-choose algorithm picks Google. In the
long run, one of these things will stop being true.

~~~
qeorge
Cracked is not owned by Demand Media. It was a second-rate competitor to MAD
as a print magazine, but has found tremendous success online.

Their model is similar to Demand's in that it is UGC + payment, but that's
about it. The topics are not generated by an algorithm, for instance.

~~~
byrneseyeview
<http://www.demandmedia.com/brands/>

I was surprised, too.

~~~
skinnymuch
Wow. Can't believe they own Cracked. Learning this made me die a little
inside.

------
ashu
How can Google NOT look at that sample page Aaron has displayed? That is truly
terrible :( Also, Matt Cutts is generally very honest and public, but in this
case, uncharacteristically quiet.

~~~
prawn
If you visit a page on Mahalo that you think doesn't cut it, be sure to hit
the 'Ads by Google' link and send feedback to Google about it.

------
jasonlbaptiste
some of these discussions make a lot of sense for HN as many entrepreneurs
here are building businesses around content ie - "what is spam?". This one
does not.

------
andrewpbrett
"I talked to him, and so I said what software do you use to power your search
engine? And he said we use Twika or MediaWiki. You know, wiki software, not
C++ not Perl not Python. And at that point it really does move more into a
content play. And so it is closer to an About.com than to a Powerset or a
Microsoft or Yahoo! Search."

~~~
maurycy
Depressing.

------
moultano
If the title of an article is a question, the answer is no.

------
btipling
How much original content is there in Google Reader? Not defending Mahalo,
especially if it stealing copyrighted content from other sites, but not having
original content doesn't mean you don't provide value. Google Reader has value
and no original content.

~~~
byrneseyeview
If Google started giving people public Google Reader pages, would you like it
if Scoble's Google Reader outranked your blog when people searched for your
blog's name?

There is value in duplicate content. But there's no value in a search query
taking you to a search page, where the site you wanted to land on has to _pay
Mahalo_ for adding an extra click between you and what you wanted.

------
benatlas
The internet is only moving in the direction where Google points and Google
favors quantity over quality. All Internet problems flow from this Google
unstated rule.

Google would rather make a cent each on a million stolen pages of crap than
$10 on an original content, plus the spam "author" is not going to ask
anything for the "content".

------
gcb
manhalo and seo are the reason HN need a friend-foe system like slashdot.

Every time you post anything about it, you prove you don't deserve to exist.
Or you are a spam bot.

------
melito
FLAME BAIT IS FLAMING

~~~
melito
While I can understand downing based on my knee jerk immaturity towards this,
but is any of this legitimate?

Will anyone benefit from having read this article?

Are there really that many people interested in this?

Are you all really up voting this stuff because you're legitimately
interested?

~~~
codexon
Many startups here rely on cheap SEO for marketing.

If Mahalo (or anyone else with a high PR domain) can outrank everyone else
simply by spamming, it's not fair to a nascent startup

------
jasonmcalacanis
This whole thing is insane.... we have "stub" pages just like Wikipedia.

These are topic pages that people are working on and THEY DON'T RANK in search
engines until they we get the word count to around 300-500 words.

We are the process of NOINDEXING the pages that are below 300 words just to
make Aaron happy... we actually had these noindexed before our last version
and that got lost in the shuffle of the new launch (really, it did... when you
do new code you might leave something out of the old code).

i'm also getting a list of every page under 300 words and having the page
managers build them out in 30 days or deleting them.

Anyway, i thank Aaron for busting out chops and making us better!

The claims that we are "scraping" are absurd... we're using google, bing,
twitter, etc. apis to do a comprehensive search page.

i dont know everything about SEO, but i don't understand this claim by Aaron.
i think he is trying to start trouble for us... and maybe it will work. Thanks
pal!

~~~
aaronwall
A couple clarifications if you don't mind then ;)

\- If you don't want those pages indexed in Google then why are you submitting
them in an XML sitemap?

\- I have already shown examples of the 0 original content pages ranking, so
how can you claim that they do not rank?

\- You are not scraping directly, you are pulling from 3rd party sites and
using it as content on your own site. Which is worse, because there is no way
to opt out of it.

\- My problem is not just with what you call stub pages, but with most of your
pages. When you give people embed code to embed your content in their site you
give them an iframe AND a direct link back to you. If you want me to stop
highlighting the absurdity of it then perhaps you should hold yourself to the
same standards as what you offer others. But you do just the opposite when you
embed 3rd party content in your site. You slap a nofollow on the links _and_
embed the content directly into the page (rather than in an iframe).

\- Worth noting that every time I mention the above point you end up talking
about stub pages or experiments or some other strategy to try to redirect
attention. But in reality, what I am talking about is what you do on almost
every page of your website.

~~~
jasonmcalacanis
1\. everything in site is in the sitemap... it's not selective. it will be
shortly.

2\. they don't get traffic is my point... we look at any page that gets over
100 page views in a month and we build those pages out. so, even if you find a
page that ranks it will not have traffic. if it has traffic it gets built out.

3\. we are not scraping, we are using search APIs

4\. i dont understand this issue of our widgets (which don't get used to be
honest.. it's a failed program)

5\. this is simply false... our traffic comes from how to articles,
walkthroughs and Q&A. if you want to know what the top 10 pages are they are
things like how to play guitar and call of duty walkthrough pages. those
things are 3-5k words!

just lay off dude... go troll someone else.

~~~
aaronwall
\- 1. everything in site is in the sitemap... it's not selective. it will be
shortly.

Ah, so now you admit it was intentional. But good on you for (eventually?
hopefully?) fixing it.

\- 2. they don't get traffic is my point... we look at any page that gets over
100 page views in a month and we build those pages out. so, even if you find a
page that ranks it will not have traffic. if it has traffic it gets built out.

If a person has a quarter million pages that are getting 5 visits each that is
still a lot of traffic. Especially when the page has 0 editorial costs.

\- 3. we are not scraping, we are using search APIs

The end result is what people would typically call a "scrapper site". It is
irrelevant how it is created (if you scrape directly or syndicate from
somewhere else that is scraping). The issue is a lack of editorial control
(see your page about 13 year old rape) and a lack of citing sources with
links.

\- 4. i dont understand this issue of our widgets (which don't get used to be
honest.. it's a failed program)

Search engines have duplicate content filters. If the content is within the
page as HTML (as you do on Mahalo) then you can often outrank the original
source for their own content. You bypass this issue and me mentioning it if
you only use an iframe to embed the content in your pages. But if you embed it
directly into the HTML (as you are doing right now) then of course it is
bogus.

\- 5. this is simply false... our traffic comes from how to articles,
walkthroughs and Q&A. if you want to know what the top 10 pages are they are
things like how to play guitar and call of duty walkthrough pages. those
things are 3-5k words!

I am not talking about your top 10 pages. I am talking about the bottom
300,000 pages, which in aggregate get far more traffic than the top 10 pages
do. :D

\- just lay off dude... go troll someone else.

Not trolling at all. Just trying to give you valuable feedback, as you have
claimed it to be publicly multiple times (unless you were lying when you
stated that) :D

~~~
jasonmcalacanis
Anyway, we're deleting any short pages right now and noindexing any short
pages.

this will all be done in the next 72 hours and then there will be nothing to
complain or write about after that Aaron!

Thanks for making us better.

~~~
aaronwall
Good first step!

Does that mean that (for the remaining pages on the site)...

a.) the other scraped content which exists on the remaining pages will be put
in an iframe (rather than as text on the page)

\- OR -

b.) that you will be removing nofollow from the pages you are scraping content
from?

Either you trust the content enough that you should link to it directly, or
you should put it in an iframe such that search engines don't see it. Either
route would likely be more akin to fair use than what you are currently doing
(automatically scraping 3rd party content into your pages and using it to rank
against the content creators, without permission, and without a way of opting
out).

~~~
btilly
After this many lies, do you actually believe he is planning on actually
taking that step? In my books he's already used up his credibility. I won't
believe anything he says until he says what he has done and someone else
publicly verifies it. (He has lied enough that I don't think it worth my time
to bother verifying anything he says. The bozo bit is well and truly flipped.)

~~~
aaronwall
Fair point. ;)

------
zaidf
Funny. By those standards applied strictly, google's own search result pages
would be considered spam. Jason can argue that there is value in the SELECTION
of content that is not explicitly visible but nonetheless significant.

Similarly, Hacker News homepage might be considered spam because it's mostly
titles of articles. Until you take into account the value of the votes--which
is value that these google guidelines don't address clearly.

~~~
houseabsolute
And you'll notice that Google does not index its own search result pages, nor
encourage anyone else to do so. Perhaps "spam" is too pejorative a word and
"things that should not be in our search engine" would be a better
description.

~~~
newman314
For what it's worth, Topsy is doing the same for tweets. I've seen Google show
a Topsy search page more than once while searching for things recently.

For example, links such as <http://topsy.com/s/toyota+grand+jury+subpoena> are
submitted into google to be indexed.

~~~
newman314
Backtype does the same too as far as I can tell so there are definitely quite
a few companies out there with such activities.

This is a bummer as this just serves to increase the noise to the detriment of
getting good results.

