

Google rolls out algorithm change in the US - po
http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html

======
A1kmm
> sites that copy others’ content

It would be interesting to know how Google determines, in an automated
perspective, which site is the original and which is the copy - especially
when copying could go both ways. For example, I could write an article,
license it under the GFDL, and someone could copy it to Wikipedia. I might
then copy the Wikipedia improvements back to my site. Technically, I had the
content first - so would Wikipedia be penalised?

If there is a bias against smaller sites, this might make smaller sites be
reluctant to license their content under licenses that let bigger sites copy
them.

~~~
DarkShikari
This is a huge problem with Wikia. Wikia has a relatively high pagerank.
Sometimes, because of Wikia's often-abusive policies, communities will decide
to move their wiki off Wikia and host it separately.

But Wikia will refuse to remove the original wiki even if all the contributors
want to move it, in order to maximize advertising revenue. This means that
it's nearly impossible to get the new wiki to rank highly on Google, even if
all links across the internet are changed to point to the new wiki, because
Wikia's pagerank is so high that Google deems the new one to be a "copy".
_This is why Wikia moved all their wikis to subdomains: in order to piggyback
on the pagerank of the main site._

As a result there are a ton of long-dead wikis on Wikia that still get more
search traffic than the active equivalent. Obviously this hurts users, since
they get long-outdated information as a result.

In short, once a wiki is placed on Wikia, it's basically impossible to ever
move it anywhere else because of Google's anti-duplicate biasing.

See <http://en.wikipedia.org/wiki/Wikia#Controversy> for more info.

~~~
jongraehl
I assume you're saying that the Wikia admins will prevent the community from
deleting or overwriting content with a pointer redirecting to the new place?
This does seem extremely likely if there's significant search engine traffic
and ad revenue.

~~~
DarkShikari
Yes. Now imagine if you had a blog on Blogspot and you wanted to host it on
your own site instead -- and Blogspot prevented you from deleting your posts
because they brought Blogspot good ad revenue?

------
nikcub
When Pagerank was created, it worked really well to find the most relevant
sites. The old search engines such as AltaVista and Excite were drowning in
spam since the SEO experts had figured them out.

Does anybody else feel that Google has been figured out now, and that they are
now just applying patches to a system that is fundamentally broken (in the
same way Excite and AV did)?

There might be an opportunity here for a new search engine - one that creates
a new core method of ranking, as PageRank was, to filter out duplicate
content, content farms, malware sites, etc.

"Relevance = links" today looks as tired as "relevance = keyword density" did
10 years ago, but I have no idea what the new "relevance = " is

~~~
alanthonyc
The "fundamentally new" system for defining relevance is the social graph.
It's why Facebook and Twitter are so hot.

~~~
CWuestefeld
That's partly true, but given the way that the social graph is used today, I
don't think it can satisfy the entire need.

The social graph is essentially a personal thing, not professional. Assuming
that we see a need to partition our professional and personal lives (and that
should be true, in order to protect both sides), then the professional side
isn't going to be reflected well in the personal graph.

So long as our networking tools don't recognize this split, they are going to
be (relatively) starved for content that is strongly oriented towards the
professional world. For example, I wouldn't expect my wife to be able to find
deep information about Medicare reimbursement if we relied on mining social
sites.

~~~
johnzabroski
For a lot of professional needs, the most valuable information is hidden in
the deep web. For example, many MIS/DSS/ERP/CRM/TLA systems don't provide a
lot of public information about how their systems work, so if you are trying
to evaluate the market place, it would probably take you 3-6 months of
research.

For example, I work at a healthcare services start-up, and we have one person
full-time researching our competitors and figuring out how to differentiate us
in the marketplace, as well as figure out what keywords someone might search
for to find this kind of product. We're mainly mathematicians and smart
programmers who found a way to break into the market initially, not marketers,
so our terminology doesn't match domain experts well.

Anecdotally, I think most users are VERY bad at ranking how valuable
information is. For example, we'll get paid $100,000/year for something that
takes 1 week and provides zero insight into their business, and then they will
refuse to pay us at all for something that makes them $30,000,000 a year! This
is so common that I am wondering if its more a symptom of human nature than
our customers.

~~~
schwabacher
I am really curious what makes them $30,000,000 a year? Are they getting it
for free, or are they loosing out on the income?

~~~
johnzabroski
Fixing their billing procedures and optimizing their price list for services.
This is pretty tricky stuff and easy to get wrong, because there are many
factors that influence how hospitals get paid, such as stoploss rules, carve
outs, worker's compensation groupers, various mother-baby rules, etc.

Often times hospitals receive $0 for something they should be getting $10
million per year for, just because nobody ever contests the NOPAY response.
Finding that sort of error isn't easy.

------
izendejas
Brilliant use of crowd-sourcing to gather a test set via the chrome plug-in.
Makes absolute sense, otherwise you can overfit. Great turn-around time also.

~~~
abraham
From the article:

> It’s worth noting that this update does not rely on the feedback we’ve
> received from the Personal Blocklist Chrome extension

~~~
joshzayin
The _implementation_ of the algorithm didn't rely on it, but they did _test_
it using that data.

>However, we did compare the Blocklist data we gathered with the sites
identified by our algorithm, and we were very pleased that the preferences our
users expressed by using the extension are well represented. If you take the
top several dozen or so most-blocked domains from the Chrome extension, then
this algorithmic change addresses 84% of them, which is strong independent
confirmation of the user benefits.

------
zone411
I know this is not a popular opinion here, but I'm not sure if I like Google
reacting to what tech bloggers echo chamber talks about. Are we sure that this
what is good for regular users and for the creation of quality content on the
Internet? This group is not representative at all.

There seems to be enough "normal" spam in the search results that Google
should be focusing on first. Just yesterday I searched for "viking dishwasher
clog" and the #6 result is a .info page that is nothing but obvious spam.

I previously mentioned the issue of Wikipedia poorly regurgitating content
from original creators (sometimes with attributing, sometimes without) and
outranking that original page, even if it contained approachable, illustrated
and in-depth article, and Wikipedia's content was much weaker. This is not
something that tech bloggers will notice - they are much more likely to read
Wikipedia's programming, science, math, or tech articles, which are of much
higher quality than those on other topics, so you won't hear them complaining
when they see Wikipedia as #1 everywhere.

Another bias I starting seeing lately is the high ranking of new Q&A sites
(another focus of tech bloggers), when excellent topical forums with very good
content, often exactly with the answers I needed, are ranking poorly. I’m
speculating here, but I think Google’s focus on links might hurt its ability
to bring up deeply hidden content from forums. These forums won’t get many
links from tech bloggers and others looking for the next big thing, but they
have very valuable content for many long-tail searches.

------
ladon86
Is the change live yet?

I remember an example provided here was "nstoolbar bottom bar". Let's take a
look: <http://www.google.com/search?q=nstoolbar+bottom+bar>

Granted, it was probably linked from HN somwhere which will have made matters
worse.

~~~
rksprst
Yes, the change is live (rolling out slowly through the day). A ton of large
sites have already been affected. It's a pretty big change.

------
bryanh
Not to nitpick, but...

    
    
        If you take the top several dozen or so most-blocked domains from the Chrome
        extension, then this algorithmic change addresses 84% of them, which is
        strong independent confirmation of the user benefits.
    

Well, what if they are mistakenly penalizing Common Jack's blog because it
gets re-posted across the interwebs? I bet no one is manually blocking Common
Jack's blog, so that factor is lost.

Example: let's say Google just randomly penalizes 84% of ALL domains. Chances
are that they intersect with, you guessed it, 84% of the Chrome extension's
data. Independent confirmation!

~~~
mtkd
The problem for Google is that any non-algorithmic approach (using a feedback
mechanism) will just be abused by the SEO industry - they'll just crank up
mechanical turk to start reporting anyone ranking above their clients.

~~~
bryanh
I'm not advocating that, I'm just saying that the "independent confirmation"
tells us little about the innocent sites that get negatively affected. They
are, by definition, excluded.

------
jbk
I really hope it will change for the better, but I don't trust Google anymore:

Small example: When you search "VLC" on google on find scams websites shipping
VLC with a toolbar/crapware/malware, buying or not adwords...

Google plainly refuses to remove those websites, because they gives Google
some money. So they make money from scammers, know it clearly and do not want
to do anything...

And as we are a very small team of volunteers, it is impossible to attack
those websites...

NB: this happens with other software too.

~~~
gimpf
Strangely enough, your primary example always directed me to videolan in the
past, and continues to do so.

I very much prefer DDG these days, but those statements are just not true. And
I do not believe that Google would purposefully provide _much_ _worse_ results
for US customers, relative to EU ones.

~~~
jbk
The english ones are the only fine, because the websites are in english...

Try .fr, .it, .es and you'll see...

And on .us, just deactivate your adblock.

------
JacobIrwin
Google works hard (or maybe less) to keep their front page UI ultra clean. I
do think users could benefit a lot from a small drop down arrow under the
search field that stretches open another field to input the words searchers do
not want included in their results. Most of us in the Hacker News community
know this can more easily accomplished by including a "-" (minus sign) before
such terms; we know how to use "-" and "site:..." to narrow query results.
However, I'd say 50% of Google users don't know these tricks and waste time
navigating to the advanced search link.

~~~
scottmp10
There is an advanced search link next to the Google search bar that provides
the functionality you described for non-power users.

~~~
JacobIrwin
I mentioned that in the last line. I just feel it's not efficient for those
who want to leave out certain terms... being that "-" is the most common query
alteration (or maybe second after "+".

------
darksaga
For me, it's too little too late. I know a lot of people who have already
moved to different search engines. Google lost a ton of money in the process.
They didn't help themselves any when it was revealed JC Penny had successfully
gamed Google for almost an entire year - and Google had no idea until the NY
Times broke the story.

~~~
weego
You may well know a lot, but I bet 100% of them are serious tech interest
people which accounts for a tiny tiny % of the internet. You could argue that
tech people drive the trends that spread out to the masses but in this case I
think it's just bashing the big guy cos he's an easy target.

The idea that most people really care about the purity of their search
results, let alone would understand what all this means even if the BBC or NYT
ran a proper article is just unrealistic.

------
teoruiz
Is there any reason to change the algorithm, initially, only in the US?

------
JamesDB
How will this impact sites like Rotten Tomatoes?

They have little original content, but are still a useful source.

------
codefisher
It looks like I have got higher up the ranks for some of my keywords, so the
change must be good ;)

------
bkaid
It'll be as interesting to see how this affects Bing rankings.

~~~
joshhart
It should affect them automatically. We know Bing uses Googles search -> page
results coming from the Bing toolbar. If a site's ranking is reduced by
Google, the same should happen (maybe to a much smaller extent) in Bing since
that feature is now smaller.

~~~
trezor
It doesn't have to and I believe you are mistaken/oversimplifying.

As far as I understood the whole Bing ordeal was that users with the Bing
toolbar reported not the links and ranking shown, but what the users chose as
the "correct" hit for that search.

In that regard this doesn't really have to alter Bing's results in any way at
all.

