

Google Probably Knows What Crap Results Are - jdminhbg
http://blog.obiefernandez.com/content/2011/01/google-probably-knows-what-crap-results-are.html

======
cobralibre
This line of thinking doesn't make much sense with respect to a company as
large as Google. The fact that the Google Alerts team has identified and
solved a particular problem doesn't really allow us to draw reliable
conclusions about what's going on in other parts of the organization.

 _So the question is: Are they not taking greater action against spammers
because it would hurt their bottom line? And, if so, is that evil?_

In particular, leading questions like this strike me as premature.

~~~
jdminhbg
I'm not at all a Google expert, but it seems unlikely to me that the Alert
team has figured out a way to filter crap results and the Search team has not.

~~~
cobralibre
I'm not a Google expert either -- whatever that would mean -- but I've worked
in large software organizations. My point was that the linked article naively
conceptualizes Google as a unitary entity that either "knows" something or
doesn't.

As to your point, a software engineering organization releases products that
are the results of myriad goals and constraints (requirements, priorities,
schedules, bugs, etc.). Products aren't merely reflections of what the
engineers can or cannot do. So it's likely that the search team can filter
crap results? Is that really all that we need to know before we decide that
Google search is intentionally designed to profit from spam?

~~~
jarin
I have a Ph.D. in Google and I can confirm definitively that Google's
development processes are similar to any other large software company.

------
jarin
Google does seem to be getting clogged up with crap results, but I can
reliably find what I'm looking for within a few searches. Unfortunately,
that's more than I can say for most of the alternative search engines right
now. I had major frustration yesterday trying to find Ruby/Rails topics after
switching to Blekko as my default search engine (even with the slashtags and
everything). I gotta say that the /seo slashtag is awesome though.

~~~
greglindahl
Did you try out /ruby ?

~~~
jarin
I think so, I know I tried /rails though. I was looking for some ActiveSupport
methods.

~~~
greglindahl
During our beta we have only a limited crawl/index, 3 billion pages. Very
specific technical queries and queries with a lot of words seem to suffer
most.

Would you like to be an editor for the /ruby and /rails slashtags?

~~~
jarin
Totally! I applied for /rails last night, I'll apply for /ruby too :)

~~~
greglindahl
You're approved for both. And if you think they should be a single slashtag
give a shout, I'm not that familiar with the Ruby community.

------
samd
If promoting spam sites filled with ads was so profitable why wouldn't Google
just put more ads on its search results page and cut out the middle man

~~~
raganwald
Plausible deniability: _We_ don't cover our pages with AdSense. Do we direct
you to pages with AdSense? Yes, but only when our algorithm determines that
the content on those pages is relevant.

~~~
samd
Perhaps it is important for them to maintain an aura of neutrality and cold
calculation. Though if we are supposing that Google is intentionally crippling
it's algorithm, would it be so much to suppose that they would just
clandestinely run the spam sites themselves to eliminate the middle-man?
Though perhaps the cost of secretly doing it themselves is higher than the
AdSense payments made to the sites.

This all assumes that Google's reputation remains untarnished by merely
linking to spam rather than filling their own site with spam, but I don't
think that's quite the case. Google's reputation is intricately tied up with
the quality of the results it gives.

~~~
raganwald
I am not suggesting that Google intentionally cripples its algorithms, just
that it doesn't rush to "fix" something you or I might consider a problem.

 _Google's reputation is intricately tied up with the quality of the results
it gives._

I think the problem here is the question of whether an adsense-adorned page
with scraped content is judged "low quality" by the people doing the search.

I'm no expert in other domains, but judging by the "I CAN HAZ CODES"
programmers out there, if they type a programming question into Google and
they get a page with scraped content from StackOverflow, will they care?

My guess is that if the scraped content has the answer, they're happy. They
aren't interested in doing more research, looking the author's SO reputation,
or anything else. They get their answer, and maybe they click an AdSense link
if it catches their eye.

JM2C!

~~~
samd
You're right, it does depend on people's perception of quality.

I can only recall being frustrated with the scraper sites when they don't have
the answer I need, because when I go back to the Google results and try
another page I find that they have the exact same scraped content. Out of the
first ten or twenty results there are only a few unique pieces of content.

Perhaps if Google eliminated redundancy from their results I'd never even
notice whether the content was scraped or not.

------
Ryan_IRL
The only benefit I can see to Google are a lot of those spam sites rely on
AdSense to make their money. If it makes sense for a developer to register 100
domains and game some keywords, then they must be making a _little_ money.

~~~
smokinn
That "only benefit" is a pretty massive one considering AdSense accounts for
nearly all of Google's profit.

~~~
rkalla
Agreed.

Google shared with us early last year that the revenue split on Adsense is 68%
([http://adsense.blogspot.com/2010/05/adsense-revenue-
share.ht...](http://adsense.blogspot.com/2010/05/adsense-revenue-share.html))
- meaning you keep 68% of what they are paid to run the ad and they keep 32%.

You can imagine some very respectable proportion of their income every year is
the 1/3 coming from spam/scraped/etc. adsense placements.

Another interesting article on HN just new is that Google's algorithms already
_seem_ to know what crap content is and what good content is:
[http://blog.obiefernandez.com/content/2011/01/google-
probabl...](http://blog.obiefernandez.com/content/2011/01/google-probably-
knows-what-crap-results-are.html)

and my guess is that Google will only roll out improved search results when
the cost to the company is great enough to justify the loss of income.

It is a publicly traded company, I don't think Google can just cut out
millions of dollars of revenue from their bottom line because they want to not
be evil - the shareholders would probably ask for people's heads on platters.

Of course only Google knows the extent of which their income-from-spam is, but
I imagine it is significant otherwise they would have solved that problem
already as the interest in Duck Duck GO, Blekko and Bing/Yahoo continues to
rise/be-discussed-more (Don't know if the ACTUAL usage suggests that people
are doing more than just talking or moving over to using different services
full time)

Google's compute power is other-worldly. This WP article doesn't do it
justice: <http://en.wikipedia.org/wiki/Google_platform#Data_centers>

and I have had a hard time trying to find an article that was written 2 years
ago about the data centers around the globe that Google has built. The scale
is unbelievable of each installation and there are something like 30 around
the globe right now: [http://royal.pingdom.com/2008/04/11/map-of-all-google-
data-c...](http://royal.pingdom.com/2008/04/11/map-of-all-google-data-center-
locations/)

More than nefarious under-dealings, I think this situation literally snuck up
on Google and by the time the publicly-traded company had algorithms to
determine the extent of the shenanigans, they realized it would have a
noticeable effect on their bottom line if they simply culled all those results
out in one day.

They are either going to roll out changes in stages and slowly increase the
quality while keeping an eye on what that does to Adsense income and really
publicize each change so they rebuild trust with all of us, or they will
respond heavy-handidly in a year or so with a "new algorithm change" that
"online publishers are up in arms about!" again.

My guess is on the slow-and-gradual approach with a big publicity boost so we
are shown they care and are working on it Matt Cutts-style :)

While I've noticed the lagging quality in their search, I still use the
Big-G... it's fast for me and gives me accurate results. Then again I mostly
search for tech, if I was searching for weight loss, health, sex, appliances
or any other topic that is DOMINATED by ads, I would have given up and gone
back to using a damn phone book a while ago.

~~~
alain94040
_Google's compute power is other-worldly_

Actually it's not. I could cite hundreds of examples, but here's one: when I
wrote a blog post about getting a huge traffic boost for my startup following
on some Mark Cuban in PR Newswire, I titled my post "take the elevator, not
the stairs", and Google AdSense served ads for Thyssen elevators. That's how
smart it is.

~~~
btilly
Sorry, but you're wrong.

The fact that they don't always have it doing the smart thing you'd want them
to do in no way lessens the fact that Google has a lot of compute power that
they are throwing at a lot of problems.

------
rorrr
They definitely know about mahalo spam site, but they kept it in their index
anyway, even though it's against their TOS.

Google search is turning into crap.

~~~
jarin
I think I've heard that they prefer to remove things like that
algorithmically, rather than writing hard filters for individual sites. Makes
sense, because if you can come up with an algorithm to get rid of known crap,
it will probably also get rid of some unknown crap too.

~~~
javanix
Yes, hard filters at a place like Google would quickly get out of hand.

Not to mention probably calling down the wrath of the FTC for anti-competition
violations.

~~~
rorrr
Just create a list of banned domain names. And let the users do that as well.

